Получи случайную криптовалюту за регистрацию!

Telegram Voice-to-Text Tested the Telegram STT Premium featur | Spark in me

Telegram Voice-to-Text

Tested the Telegram STT Premium feature and compared it with our @silero_audio_bot, it is surprisingly decent. I have no idea which engine they use, but:

- It seems to have at least 2 languages (I tried to speak Russian, English, German and Spanish, it picked up only Russian and English);

- The pipeline seems to be - language classifier + STT;

- It works only with voice recordings, not audio files in general. I.e. it avoids the huge pain in the ass we had to endure to parse audio and check MIME tags vs extensions vs actual codecs used;

- It is 2-3x slower than our bot on average (a 30s file was processing by us in 4-5s, theirs took 10-12s), but it also supports some form of hash based caching (the same message is processed instantly);

- It boasts some recasing and repunctuation model, but on anecdotal tests it performed worse, probably due to lack of polish in their pipeline;

- As for quality - it is subjective, I ran some anecdotal tests on funny / difficult / purposefully misleading or made up phrases, and it is decent, though I believe that our models are still better;

- Yeah ... and the elephant in the room - it should be manually triggered on each message and it is hidden behind a paywall for premium users;