Header Ads

How Smallest.ai Is Leveraging Small Models To Fix Voice AI’s Latency Problem

Voice AI seems to be echoing louder by the day, emerging as a core business interface, but it still resonates feebly when it comes to conversations. 

When Akshat Mandloi and Sudarshan Kamath were intrigued by this rather silent zone, there were whispers of opportunity for their brainchild Smallest.ai. But, it was not the only one. A host of startups have been busy trying to resolve the perpetual challenges of latency, scale, accuracy, and deployment. 

Plugging the gaps has been the need of the hour as voice AI becomes the interface for India’s digital economy that’s likely to reach $1 Tn by 2030, making up a fifth of the country’s GDP. 

After being seasoned in building AI systems that run in production for close to a decade, Mandloi was miffed by the industry’s obsession with large models. “Why do we need such large models to solve everything? To solve very specific business use cases, you don’t need very large models,” he argued.

Mapping The Silent Zones In Voice AI

Friends since their days in college, Mandloi and Kamath wanted to build something of their own. The opportunity beckoned to them when they began closely tracking voice as a modality.

Voice was an inevitable transcendence on the AI landscape. But most speech models, Mandloi noticed, were being built for content creation, not for conversations. “Not a lot of players were focusing on real-time conversations. That’s where we saw the gap.” 

Latency stood out as the core problem as they started exploring the issue. In live conversations, even small delays distort the experience. Smaller, tightly optimised models, they believe, should outperform their larger peers where responsiveness mattered the most.

Will small be really beautiful? The duo zeroed in on the concept of small models when Smallest.ai was rolled out in 2023.

Switching Between Text And Speech 

As bootstrapped startup founders, Mandloi and Kamath worked out of bedrooms and cafes, using their savings to build their first proprietary text-to-speech model. 

The breakthrough came mid-2024, when Smallest.ai shared a demo on LinkedIn. It soon went viral.

That response validated their approach. Since then, Smallest.ai has launched multiple versions of its speech stack and is now on its third-generation text-to-speech model. The system supports emotional context, breathing patterns, multilingual output across English, Spanish, major European languages, and the top Indian languages.

The company gradually expanded beyond text-to-speech into speech-to-text, memory layers, intelligence layers, and speech-to-speech pipelines.

“The final goal is how can we make conversations much more natural from a business perspective,” Mandloi said.

Smallest.ai harps on smaller models, faster conversations, and voice systems built for how people actually speak. Small Language Models (SLMs), according to the company, offer superior speed, lower costs, and reduced memory usage, delivering efficient, reliable enterprise performance comparable to large language models (LLMs) at a fraction of the resources.

Mandloi explained that their technology is almost entirely proprietary, distinguishing it significantly from existing open-source solutions. He emphasized that the company conducts fundamental research rather than simply adopting off-the-shelf models. This involves training models from scratch and heavily tuning specific architectural layers.

According to him, these architectural changes are driven by two primary objectives. First, speed, where the model is optimised to process the responses faster. Second, conversational flow, which involves adjusting the layers to make the interactions feel more natural and humane.

Mandloi clarified their approach to infrastructure. While they currently work closely with AWS (Amazon Web Services) to host their solutions, he noted that the system is designed to be infra-agnostic, meaning it is not strictly tied to a single provider and can function across different platforms.

With a horizontal platform, Smallest.ai focussed banking and financial services first. “People always want to talk. Texting is not a natural modality in case of BFSI,” Mandloi explained. 

In his view, only a small percentage of users are truly comfortable typing. Most prefer to explain problems verbally in their native language and expect resolution the same way. That makes BFSI a natural early market, even though enterprise adoption cycles tend to be slower.

A year-and-a-half later, Smallest.ai counts a host of publicly listed banks and fintech companies in India and the US as its clients. The founders, however, refused to disclose the names in an interaction with Inc42.

Winning The Confidence Of Enterprises 

Businesses often shy away from AI. It is not because of any doubt whether AI works or not. It is more because of security, deployment flexibility, and expectation-setting, according to Mandloi.

“Sometimes we have to sit down with our customers and explain to them that it can’t solve everything for you,” he said.

The company takes a phased approach, starting with narrow use cases and expanding over time as confidence builds. “Security has been designed into the product from day one.” 

While Smallest.ai works closely with Amazon Web Services (AWS), its stack is cloud-agnostic and can also be deployed on customer infrastructure. “If you want a bit more security, you can get it deployed on your cloud in a seamless, smoother manner,” Mandloi noted.

This flexibility, according to him, plays a critical role in building trust with regulated enterprises.

Making A Noise In A Crowded Market

The global voice AI space is becoming increasingly crowded, with players like ElevenLabs and Cartesia raising large funds and expanding across geographies.

Mandloi does not see this as a threat. “It also gives validation that we are in the right market,” he said, adding that competition helps stay focussed. “It’s good to have competition. It just keeps you pushing.”

While inspired by academic research, Mandloi says Smallest.ai’s models are fundamentally proprietary. “Whatever is out there in the open source, we have tuned and changed a lot of layers,” he said.

The company focuses on reworking architectures to improve speed, realism, and conversational flow rather than chasing model size. “We do a lot of research on how we can make the models faster, how we can make it more conversational.” 

Smallest.ai raised $8 Mn (around INR 70.9 Cr) last year in a seed funding round, led by Sierra Ventures. The San Francisco-based startup had earlier raised about $1.7 Mn from investors like 3one4 Capital, Upsparks Capital, and DeVC.

The company’s revenue model combines annual licensing (for enterprises) and SaaS subscriptions, essentially relying on a usage-based fee structure. Smallest.ai reported 300% revenue growth in the US and 150% on-year growth in India, driven by rising demand for scalable voice automation.

The founders refused to divulge more on the company’s financials so far. 

Gearing Up For The Future  

Talent is scarce around the world in speech research. Mandloi described this as “a needle in a haystack” problem. Public visibility around Smallest.ai helped attract attention, but hiring remains highly selective.

The founder is candid about the Indian ecosystem. “Risk capital is a bit less in India as compared to what’s available in the US,” he said. While Smallest.ai’s fundraising worked out due to timing and traction, he believes raising money for deep-tech bets remains harder in India.

For other AI founders, his advice is that speed matters more than perfection. “Get to your customers as fast as possible,” he said, adding that distribution is now harder than building. For globally relevant products, Mandloi recommends testing early in the US market and working backwards.

Edited by: Kumar Chatterjee

The post How Smallest.ai Is Leveraging Small Models To Fix Voice AI’s Latency Problem appeared first on Inc42 Media.


No comments

Powered by Blogger.