Sarvam AI Launches Vision & Bulbul V3, India-Built Models Challenge ChatGPT, Gemini, Claude in OCR and Voice

Sarvam AI from Bengaluru introduces Sarvam Vision OCR and Bulbul V3 TTS, highlighting strong performance on Indian language documents and local voices. Benchmarks show competitive accuracy and natural speech, reinforcing India-specific AI development for broader deployment.

Indian startup Sarvam AI is drawing global interest after releasing two new models, Sarvam Vision and Bulbul V3, that perform strongly on India-focused tasks such as reading regional language documents and generating natural Indic voices, challenging tools like ChatGPT, Gemini, Anthropic Claude and ElevenLabs on benchmarks and real-world feedback from developers and users.

For years, core AI development has been dominated by labs in the United States and China, while India has mostly featured as a services hub. Sarvam AI, based in Bengaluru, is trying to change that story by building what the company calls a "sovereign AI", developing foundational models fully in India for Indian use cases and broader global deployment.

Sarvam AI OCR benchmarks and Sarvam AI document performance

The OCR model Sarvam Vision is at the centre of this shift. According to Sarvam AI, the system is outperforming well-known models such as Gemini 3 Pro, DeepSeek OCR v2 and ChatGPT on tests focused on extracting text from complex documents, especially when those documents contain Indian languages, intricate formatting or dense technical content that often defeats older OCR engines.

Sarvam AI co-founder Pratyush Kumar highlighted recent results on X, noting that Sarvam Vision reached an accuracy of 84.3 percent on the olmOCR-Bench benchmark, scoring above Gemini 3 Pro and DeepSeek OCR v2, while ChatGPT’s performance on that test was much lower, underlining how a focused India-built model can rival larger international systems.

Performance on another benchmark, OmniDocBench v1.5, also helped Sarvam Vision gain attention. Sarvam AI reports an overall score of 93.28 percent there, with especially high numbers on pages containing complex layouts, technical tables and mathematical expressions, where traditional OCR systems usually struggle because of irregular structures, overlapping elements and heavy use of symbols.

Sarvam AI market reaction and Sarvam AI expert views

These benchmark figures have shifted how some commentators look at Sarvam AI. The company was previously questioned for working on Indic models instead of broad multilingual systems, yet the strong OCR and speech results are now seen as proof that focusing on Indian requirements can create value where large global labs have not prioritised support for local languages and formats.

Tech commentator Deedy Das, who earlier doubted this path, openly revised that view. "I was wrong about Sarvam. When I wrote about them a year ago, I felt like the direction to train small Indic language models was wrong. But boy, have they turned it around," he wrote. "They have the best text-to-speech, speech-to text, and OCR models for Indic languages, and that's actually really valuable. The pricing is very reasonable."

Users testing Sarvam Vision have also begun sharing quick reactions online, often highlighting the benefit for Indian scripts. One user reported using the product recently and wrote, "I used this a couple of days ago! Oh man wow." Such posts reflect early enthusiasm from developers and businesses that need reliable tools for Indian language workflows.

Sarvam AI Bulbul V3 and Sarvam AI Indic voice model

Alongside the OCR model, Sarvam AI has launched Bulbul V3, a new version of its text-to-speech system focused on Indic languages. Bulbul V3 is designed to convert written text into audio that sounds natural and consistent, competing with offerings from ElevenLabs, which many teams previously used for voice applications despite cost and language limitations.

Sarvam AI set out the goals for this release in a detailed blog entry. "Today we're releasing Bulbul V3, our most capable text-to-speech model designed to deliver natural, expressive and production-ready voices for Indian languages," Sarvam noted in a blog post. "Bulbul V3 minimizes failure modes, delivering content-accurate, stable speech across the inputs that matter for India-specific use cases."

Bulbul V3 currently supports more than 35 voices across 11 Indian languages, and Sarvam AI states that it intends to reach 22 languages in total, which would give product teams and content creators in India a wide set of options for localised audio, from customer support bots to learning tools and entertainment projects targeting regional audiences.

Sarvam AI enterprise adoption and Sarvam AI Indic use cases

Developers building products for Indian farmers and rural users are among those adopting Bulbul. Pratik Desai, founder of KissanAI, explained how the system fits into that platform’s work serving Indic speakers. "We use Bulbul as our go-to tts model for our Indic use cases, and they have just gotten better with each release. Meanwhile, ElevenLabs cost never made sense for Indic or any other languages."

Together, Sarvam Vision and Bulbul V3 show how Sarvam AI is using India-specific benchmarks and practical deployments to move from scepticism to recognition, positioning its models as tools that handle Indian languages, document formats and voice needs with a level of accuracy and cost that appeals to businesses, developers and users across the country.

FAQs
What are Sarvam Vision and Bulbul V3, and what do they aim to do?
Sarvam Vision is an OCR model focused on Indian languages and complex documents, while Bulbul V3 is a text-to-speech system for Indic languages with multiple voices.
How does Sarvam Vision perform on benchmarks like olmOCR-Bench and OmniDocBench?
It achieved 84.3% accuracy on olmOCR-Bench, outperforming Gemini 3 Pro and DeepSeek OCR v2, and scored 93.28% on OmniDocBench v1.5, especially excelling with complex layouts.
What does Sarvam AI mean by a sovereign AI, and why is it significant?
The company aims to develop foundational models fully in India for Indian use cases and broader global deployment, moving beyond relying on foreign labs.
What has been the reaction from experts and the market to Sarvam AI’s results?
Initial skepticism about focusing on Indic models has lessened as OCR and speech results show strong value; some experts praised their text-to-speech, speech-to-text, and OCR capabilities, noting reasonable pricing.
Who are early adopters of Bulbul V3 and what are the typical use cases?
Developers such as KissanAI are using Bulbul V3 for Indic-focused applications, including customer support, learning tools, and entertainment, where cost and language coverage matter.
GoodReturns Finance

More From GoodReturns

Notifications
Settings
Clear Notifications
Notifications
Use the toggle to switch on notifications
  • Block for 8 hours
  • Block for 12 hours
  • Block for 24 hours
  • Don't block
Gender
Select your Gender
  • Male
  • Female
  • Others
Age
Select your Age Range
  • Under 18
  • 18 to 25
  • 26 to 35
  • 36 to 45
  • 45 to 55
  • 55+