According to the developers, this model operates in real-time and does not require expensive equipment:
- On the RTX 3060 graphics card, speech synthesis is performed with a delay of about 0.5 seconds;
- On the RTX 4080 graphics card - approximately 0.2 seconds.
The developers are confident that the open ecosystem around Kani TTS will contribute to the rapid development of new services in the Kyrgyz language, including voice interfaces for government agencies and localized solutions for businesses.
The open model is already available for testing and implementation:
test it;
download.
The project involved developers: Ulanbek Abdurazak, Denis Pavlov, and Nursultan Bakashov.