After trying out Piper the voice synth software I should say that it initialises and synthes small text in ~0.3 seconds solely on cpu while preinitialised server with Coqui-tts using cuda (nvidia gpu api) synthed the same text at ~0.36 seconds, while voice quality is not too different, at least on my vrappy laptops speakers