bare in mind that PD is single threaded (more precisely it only has one dsp thread) , i.e. its on one core. so it doesn’t take much to overload dsp on a rPI.
(top allows you to show per core load too… Im guessing your 25% is the one line summary)
changing the governor usually makes a quite difference, perhaps you already had it set some how?
(you can cat the same file, to see if it set to performance or ondemand)
the pi3b+ adds about 15% more cpu, so it might be its on a threshold.
unfortunately audio glitches are a bit ‘digital’ (on/off) … doesnt matter how late you are on your dsp processing… you get a glitch, also this is for every buffer, so any ‘extra’ load can take you occasionally into extra glitch - so you could need just 1% more, and it might work
to fix, could also try with a larger audio buffer, this is always worth doing - as it allows you to see how far you are off, and also if the glitch is a underrun, or ‘a feature’ of the patch code.
another option, is to consider using pd~ which basically launches another pd (and consequently gives you another dsp thread on a different core), it adds a little bit of latency. (I believe it’ll be one audio buffer of latency, but not tested yet)
my plans is to use this with Orac, by putting each orac chain (optionally) on different cores, where the platform supports it - this would allow it to scale better.
(not got around to it yet, since Organelle/Bela are single core, so really its only the rPI… but ‘design’ is in my head, including dealing with issues of a different process space etc )
obviously given the rPI pretty limited resources, moving from one core to multi core opens up a considerable amount of dsp. (id say, 2-3x, as you still need a bit left over for the running of the OS, externals IO threads etc, which are currently running on those cores)