I am looking for a voice assistant that is as private as could be its a NEED not a want. I am aware of Mycroft who has discontinued service. I am aware of Rhasspy but it seems like a lot to setup and maintain.
I’m more looking for how to privatize something mainstream on mobile to always have it with me, not a standalone at home device. I’m talking Alexa, Siri, Google assistant, ect. For to do lists, reminders, looking up questions on the web. The usual use cases. In what ways can one make their assistant a little more privacy friendly without not using one? Which assistant is best? Thanks.
Edit: I’ve considered setting one behind VPN, Behind something like Pihole to block Calling Home from said app or device.
Checkout OpenWhisper and Wyoming
Can you use say futo voice input with OpenAi’s API why does one need to use whisper. And Wyoming seems to be an integration with home assistant.
You don’t need to use Whisper, I got some names mixed up. I was thinking of wyoming-faster-whisper which uses the FOSS speech to text system faster-whisper, but there are others that can be used.
Edited my original comment to fix that.
Wyoming is a protocol for voice assistants.
It ties together:
- speech recognition services (faster-whisper, vosk, whisper.cpp, OpenAI’s Whisper API)
- text to speech services (piper)
- wake word detection services (openWakeWord, snowboy, porcupine1)
- intent handling services
- intent recognition services
Home Assistant can interact with that protocol. I think the addons run servers for various components used by the wyoming protocol server that the integration can use, but I run it separate from Home Assistant, so idk.
Not sure what futo is capable of, but you can use anything that can communicate with a wyoming server. I’m willing to wager you can, but idk.
OpenAI’s ChatGPT API and LLM models are orthogonal to this, but probably could be used as an intent or as the fallback when no other intent was recognized. So I’m pretty sure you could link up getting a response from OpenAI or any other LLM API, but I haven’t tried setting that up for myself yet. wyoming-handle-external lets you pipe the input text to the stdin of whatever program you give it and responds with the program’s stdout, so you could definitely use this to pass it to OpenAI or Ollama.
IIRC, whisper interfaces with openai
Leon looks promising. I need something online though. For internet based questions and answers. I’m not sure if Leon is internet capable or have such extensive usages for daily use. That’s why I wondered if it would be easier somewhat to privatize a main stream voice assistant. I know there is drawbacks either way with what I need to do.
Home assistant has a built in voice assistant function that can be as simple or robust as you need it to be. The whole thing can be setup fully locally and mine runs easily on an old micro-pc I got for $100. I had it running on a Pi3b originally but the STT and TTS would take 10+ seconds to process, which was too long.
Out of the box it controls local devices, does to-do lists, controls media, sets timers. Setting reminders doesn’t work out of the box, but can be setup with some great community templates. Services that require web content like “tell me the news” or “what’s the weather in Seattle” need to be either setup with custom commands that have access to the info you want, or need to go through an LLM.
Luckily, the past few months have seen the open home foundation add integrations for LLM’s, both local and web-based (chatgpt, gemini, etc) are possible, so you can have it run queries through models run on a local GPU. Though this is currently fairly bleeding edge and I haven’t tried running a local LLM myself yet so I can’t speak to it’s complexity.
More on that here: https://www.home-assistant.io/blog/2024/06/07/ai-agents-for-the-smart-home/
I really love the idea of setting this up. I keep up fairly with these types of projects. I just lack the time to implement this myself. Which is why I was mentioning a mainstream service and finding a way to privatize it. I plan to use the assistant for work. I work long hours. By the time I am done for the day I feel cooked. I need a relatively plug and play system. How could I help privatize say an echo dot? Or some other mainstream Voice Assistant.
Here’s what I run, this is all 100% local. The most time I spent on this project was actually on getting the wakeword recognition (which is another fairly new function in HA) setup on these old teleconferencing devices: https://drive.google.com/file/d/1e2T1ibNw5GeIOUA1eqQbjwp1s2g5h5XN/view