2021-04-12 20:19:54
A Streaming Interface for Silero Models EE
We have created a gRPC-based streaming interface for our EE models based on silero-vad.
Not sure if we are going to make any of this public, but writing an interface that adds value (as opposed to just having it) is difficult.
Key features:
- Unlike Google we do not rescore full results at the end of utterance / sentence => all results are kind of "final";
- Therefore "early" partial responses are a separate feature (i.e. 2 seconds after the start of utterance);
- Automatic handling of speech that is too long (i.e. 7 seconds or longer) - we have some hacks ensuring we do not cut words in the middle;
- Threading and multiprocessing;
- We had to create fast / efficient versions of silero-vad (10k or 100k params) to be included in the gRPC server;
- The service also proxies VAD responses, which may be useful downstream;
Hopefully, since real people do not speak at the same time, this would increase the hardware utilization efficiency 2x compared to a plain HTTP interface in case of phone calls.
In future we will also be calculating the sizings of our system using the streaming interface, i.e. how many real conversation each given sizing can really handle.
An educated guess - if we can handle 20 queries per second or 10 queries per 500ms with ~40 RTC, I suppose that would mean about 40 conversations.
470 viewsAlexander, 17:19