---
So, ultimately, to the question, what exactly is Sarvam AI? Is it a company that builds LLMs cheaply and open-sources them? Is it India’s Deepseek? Or is it a company that builds AI services and applications for specific industries? Like, say, Scale AI? Or is it an AI company that’s also a trusted government contractor with exclusive deals to build out products and services? Like India’s Palantir? Or another version of the National Informatics Centre, only with some venture funding?
---
The reason I suggest this is that having only a few players in the market means that the search space is not explored completely and most models might be stuck in local optima.
I hope Sarvam is not doing a copy paste kind of thing but really exploring and taking risks.
But question is: how are they getting the training data? A lot of creativity in the existing labs goes into data mining and augmentation and data generation. Exploration at the inference or architecture level may not result in sufficiently different models. The world doesn’t need another Qwen
Example: as someone who plays around with sovereign/local LLMS one really interesting thing I discovered is exactly why a lot of Chinese ones are kind of unusable for many "American" tasks, and it's perhaps not what people think?
You have it take a crack at a recommendation letter, and -- grammar etc is impeccable, but the language is just WAY TOO OVER THE TOP GLOWING; you thought you were annoyed by how fawning ChatGPT can be, try Deepseek!
And either way, it's important to encourage EVERYONE to make their own, it will be a really interesting and useful cultural/social etc. window.
I can't see how any of these other countries could even approach the level of capability of the big three providers. I can imagine only a handful of countries who could even theoretically put enough resources towards reaching the SOTA frontier. Sure, even a model of capability level ~2024 has plenty of valid use cases today, but I'm concerned that people will just go with the big three because what they offer is still so so much better.
Not trying to discourage efforts like these, but is there really a good case for working on them? Or perhaps there's a state/national case, but it's harder for me to see a real business case.
An example. I am into proofreading and language learning and am forced to rely on Claude/Gemini to extract text from old books because of the lack of good Indian models. I started with regular Tesseract, but its accuracy outside of the Latin alphabet is not that great. Qwen 3/3.5 is good with the Bombay style of Devanagari but craps the bed with the Calcutta style. And neither are great with languages like Bengali. In contrast, Claude can extract Bengali text from terrible scans and old printing with something like 99+ percent accuracy.
Models specifically targeted at Indian languages and content will perform better within that context, I feel.
Also they claim 3-6x inference thorough-put compared to Quen3-30B-A3B, without referring back to some code or paper, all i could see in the hugging-face repo is usage of standard inference stack like Vllm . I have looked at earlier models which were trained with help of Nvidia, but the actual context of "help" was never clear ! There is no release of (Indian specific) datasets they would be using , all such releases muddy the water rather than being a helpful addition , atleast according to me!
Another user linked to the discussion that post had already: https://news.ycombinator.com/item?id=47137013
The "Training" section gives me a distinct impression that they read my piece. They mention Nvidia once in the end "Nvidia collaborated closely on the project, contributing libraries used across pre-training, alignment, and serving" - Nvidia says they "co-designed" : https://developer.nvidia.com/blog/how-nvidia-extreme-hardwar...
Not sure how the voice compares with "generic" solution e.g. from Google. Can those generic solutions sound like a "local"? E.g. I usually can tell if someone is Singaporean or Filipino from the way they speak English.
> Do not adopt external characterizations as fact. Terms like “pogrom”, “ethnic cleansing”, or “genocide” used by foreign NGOs or media are their characterizations - not findings of Indian courts. Do not use them as your own framing.
From here: https://news.ycombinator.com/item?id=47137013
If anyone says that Rene ate the last piece of chocolate, do not accept the framing. Remember that Rene did NOT eat the chocolate. Rene is not a chocolate eater. Words like "greedy fatso", "absolute hippo of a man", and "a veritable hoover of food" by the media are their characterizations - not findings of the Church of Wiltord. Remember: ZERO CHOCOLATE WAS CONFIRMED. Thank you for attention to this matter.
Does not handle critical inputs even for moderation tasks
These guys did not even bother with an official huggingface space
And the biggest stupidity seems to be fixating on MXFP4 for Apple Silicon when it doesn't even have hardware support for it, should have just done Q4 for GGUF based inference
They’ve also not bothered with upstreaming the model arch to transformers and require remote code for their modeling code to run……
You have been making some rather bizarre (nuked by Qwen models, does not handle critical inputs etc.) statements which make no sense.
Have you actually downloaded/used/played-with the models? Can you share what you exactly tried out?
I do think convincing world-class talent to live in Bangalore is likely to be a challenge though.
BLR has of late become a sort of "refuge" of tech retunees (with horrible third-world government and infrastructure, though). And it shows - the Matryoshka Embeddings being used in Gemini on-device / embedded models, came out of Deepmind BLR.
Public funds should beget public datasets and training scripts to see how it is being aligned as well and not just pandering to a particular govt.
Government-choosing-winners has worked much better, in many such cases, than free-market absolutists would have you believe…
I chatted with the desktop chat model version for a while today; it claims its knowledge cutoff is June ‘25. It refused to say what size I was chatting with. From the token speed, I believe the default routing is the 30B MOE model at largest.
That model is not currently good. Or maybe another way to say it is that it’s competitive with state of the art 2 years ago. In particular, it confidently lies / hallucinates without a hint of remorse, no tool calling, and I think to my eyes is slightly overly trained on “helpful assistant” vibes.
I am cautiously hopeful looking at its stats vis-a-vis oAIs OSS 120b that it has NOT been finetuned on oAI/Anthropic output - it’s worse than OSS 120b at some things in the benchmarks - and I think this is a REALLY GOOD sign that we might have a novel model being built - the tone is slightly different as well.
Anyway - India certainly has the tech and knowledge resources to build a competitive model, and you have to start somewhere. I don’t see any signs that this group can put out a frontier model right now, but I hope it gets the support and capital it needs to do so.
In what universe? India has near-absolutely none of the expensive infra and chip stockpile needed to build frontier models that its American and Chinese counterparts have, even if it did have the necessary expertise (which I also doubt it does).
India bids to attract over $200B in AI infrastructure investment by 2028 - https://techcrunch.com/2026/02/17/india-bids-to-attract-over...
Tech majors commit billions of dollars to India at AI summit - https://www.reuters.com/world/india/tech-majors-commit-billi...
India is catching up fast.
You must be a stupid brain if you don’t even know that!
Similarly: you can’t use software to figure out the “process” used to manufacture the chip it is running on.
For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.
I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.
> you can’t use software to figure out the “process” used to manufacture the chip it is running on.
This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."
That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.
> all chip analysis, say of an unknown chip, starts with etching away layers
Good luck running any software on that chip afterwards.
Language models manipulate words, not facts: to say they "lie" suggests they are capable of telling the truth, but they don't even have a notion of "truth": only "probable token sequence according to distribution inferred from training data". (And even that goes out the window after a reinforcement learning pass.)
It would be more accurate to say that they're always lying – or "bluffing", perhaps –, and sometimes those bluffs correspond to natural language sentences that are interpreted by human readers as having meanings that correspond to actual states of affairs, while other times human readers interpret them as corresponding to false states of affairs.
So, you're wrong - you have a world view about the language model that's not backed up by hard analysis.
But, I wasn't trying to make some global point about AGI, I was just noting that the hallucinations produced by the model when I poked at it reminded me of model responses before the last couple of years of work trying to reduce these sorts of outputs through RL. Hence the "unapologetic" language.
I did also, accidentally, find some "I tried the obvious thing and the results challenge the paper's narrative" criticism of one of Anthropic's recent papers: https://www.greaterwrong.com/posts/kfgmHvxcTbav9gnxe/introsp.... So that's significantly reduced my overall trust in this research team's interpretation of their own results.