I just found this. Main page [https://k2-fsa.github.io/sherpa/onnx/index.html]
This is huge! As a german, I use thorsten medium
[https://huggingface.co/csukuangfj/sherpa-onnx-apk/resolve/main/tts-engine-new/1.10.26/sherpa-onnx-1.10.26-arm64-v8a-de-tts-engine-vits-piper-de_DE-thorsten-medium.apk]
as he simply made the best dataset. Mixing english with german, speaking
numbers, single letters, pausing without a “.” but just a linebreak, all those
can be essential. And… it is nearly perfect! And all local! This is crazy!
eSpeak can finally go to rest!
Um ok but is this really a big deal? Good sounding TTS is nice but crappy sounding TTS is good enough for most purposes and is fairly easy. Speech to text is way harder.
Speech to text exists with FUTO keyboard and whisper.
Espeak is the only TTS for german, and it is 32bit and sounds awful. It is quite embarassing to use that for navigating with other people used to… modern voices.
Um ok but is this really a big deal? Good sounding TTS is nice but crappy sounding TTS is good enough for most purposes and is fairly easy. Speech to text is way harder.
Speech to text exists with FUTO keyboard and whisper.
Espeak is the only TTS for german, and it is 32bit and sounds awful. It is quite embarassing to use that for navigating with other people used to… modern voices.
For anyone that is dyslexic and privacy conscious it’s a pretty big deal.
Its also nice for anyone that is only privacy conscious.