The Kyrgyz Speech Synthesis Model Kani TTS 2 Ranks Top on the Hugging Face Platform

Евгения Комарова Local news / Exclusive
VK X OK WhatsApp Telegram

The team of the Kyrgyz startup NineNineSix has once again showcased its achievements on the international technology stage, as reported by the High Technology Park (HTP) of Kyrgyzstan.

Recently, the startup presented an updated version of its speech synthesis model — Kani TTS 2, which has already managed to secure one of the top spots among TTS models on Hugging Face, the largest global platform for artificial intelligence.

Kani TTS 2 represents a significant advancement compared to the previous version, providing the ability to generate up to 40 seconds of speech in a single pass, which is more than twice the capabilities of the first model.

According to representatives of the HTP, achieving such a position in the TTS rankings on Hugging Face for an open model from Kyrgyzstan is a rare and important event.

About the NineNineSix Team

NineNineSix is a group of Kyrgyz developers specializing in artificial intelligence technologies and language solutions.

Previously, the team developed the first version of Kani TTS, as well as created the AI assistant AkylAi and a voice speaker, which became the first artificial intelligence that speaks the Kyrgyz language.

Voices for Low-Resource Languages

A significant portion of major companies in the AI sector focuses on English and other widely used languages, leaving low-resource languages unattended. However, NineNineSix has chosen a different path.

Kani TTS 2 supports Kyrgyz, English, and Spanish languages, and the model's architecture allows it to be trained for other languages, accents, and dialects.

One of the key features of the project is that the team shared the complete pre-training code, enabling other countries or research groups to create their own voice models based on Kani TTS 2.

Nursultan Bakashov, co-founder of nineninesix.ai, noted: “Kani TTS 2 is the next step after our first version: we made speech generation more stable and taught the model to handle long segments. Our goal is to create compact and open models that are easier to launch and adapt for various languages and accents, including low-resource ones. We want to demonstrate that world-class technologies can develop in Kyrgyzstan, which is why we opened both the model weights and the entire pre-training code so that any team can train TTS from scratch for their language.”

Kani TTS 2 includes the following improvements:

* Speech generation of up to 40 seconds in a single pass;

* Support for zero-shot voice cloning, allowing voice cloning based on a short audio fragment;

* Fully open architecture and training code;

* Entry into the top 3 TTS models on Hugging Face.

As noted by the HTP, the model has about 400 million parameters, was pre-trained on approximately 10,000 hours of speech data, and can operate on a GPU with 3 GB of video memory, making it accessible for local and server applications.

The HTP emphasized that Kani TTS 2 is not just another AI model. It is a confirmation that Kyrgyz specialists are capable of developing world-class technologies and competing in the global artificial intelligence market. NineNineSix demonstrates that Kyrgyzstan can not only consume but also create advanced AI solutions.
VK X OK WhatsApp Telegram

Read also: