6
votes
Real-time speech-to-speech translation
Has anyone used a free, offline, open-source, real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)? There are a few libraries that written that purportedly can do or help with local speech-to-speech:
- https://github.com/ictnlp/StreamSpeech
- https://github.com/k2-fsa/sherpa-onnx
- https://github.com/openai/whisper
I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Although real-time would be great, a short delay would work.
RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.
Any suggestions?
The combination of things you are asking for is a challenge. Certainly combining offline use with underpowered devices. Specifically, the under-powered device bit is making it a next to impossible ask, I think. As for accurate speech to text and accurate translation you do likely need to run two models, a good speech to text model supporting both languages. A LLM also trained reasonably well in both languages, ideally trained to do translation. There are other options, but the result there likely will be much more crude.
As far as RTranslator goes, I was doing a quick google search myself and came across their repo. The online remark makes me think you used the v1.0 version of it, the v2.0 version does everything on your phone. According to their github anyway, but they also make it clear there are minimum hardware requirements for it to have any reasonable chance of working:
Edit:
I might not be entirely right, I also came across LibreTranslate which for translation uses a offline library called argos-translate which still uses neural machine translation but might have slightly lower hardware requirements? Not entirely sure about that last bit, but figured I'd throw it in there.
2.1.1 is installed. Looks like there's no TTS for Korean, and numerous other languages. The WalkieTalkie mode translates English into Korean text just fine. Meaning, it is not quite speech-to-speech, yet.
Unless I'm missing a TTS engine for Korean? If that's the case, the software should detect the desired output language and prompt to install a TTS engine.
Yeah I don't know, it is the only app that I came across that sort of seems to check your boxes. As to why there aren't many (any) others, well that's because it is not a problem with an easy solution.
This almost certainly not what you're looking for, but the translation app on Apple Watch has passable translation and can work offline with a number of commonly-spoken language. I just tested it with English and Japanese and my pre-K understanding of the language was perfectly satisfied with it. The app works much the same on iPhone as well, but with more features.
The thing that you'll have a hard time with is the "real time" part. You'll likely never have a translation software that can talk back at the same speed you'd see, for instance, a dub on top of a speaker in a finished video. At least when it comes to languages with differing sentence structure. In those cases, you generally need to have the entire idea of the sentence to come through before it can reliably start translating.
Did you see the video on StreamSpeech's GitHub?
I don't know the hardware backing that demo, but it seems quite close to real-time.
I've seen many broken links, and this one isn't terribly impressive. :P
Sorry. Try this (audio starts about 10 seconds in):
https://github.com/ictnlp/StreamSpeech?tab=readme-ov-file#gui-demo
Well color me impressed.