Today marks a defining moment for AI in Northeast India. After years of research, corpus engineering, and relentless iteration, we’re releasing Kren-M™, the first foundational large language model built for Northeast Indian languages.
Not adapted. Not fine-tuned from a generic multilingual base. Built from the ground up for Khasi and English, with the infrastructure in place to expand across Garo, Mizo, Assamese, Meitei, Nagamese, Kokborok, and beyond.
This is Northeast India’s entry into the global AI conversation, on our own terms.
Why We Built This
The story of Kren-M starts with a simple, frustrating reality: AI doesn’t speak our languages.
Every major language model treats Khasi, Garo, and other Northeast languages as afterthoughts, if they’re included at all. Tokenizers shred our words into meaningless fragments. Models auto-translate when they shouldn’t. Bilingual conversations break down. Cultural context disappears.
For years, we watched global tech companies pour billions into models that work beautifully for English, Mandarin, and Spanish, but leave 45 million people in Northeast India behind.
So we decided to stop waiting.
At MWire Labs, we set out to prove that high-quality foundational models can be built for low-resource languages when the work is driven by people who actually speak them, understand them, and live in the communities they serve.
What Makes Kren-M Different
Kren-M isn’t just another multilingual model with Khasi bolted on. It’s a purpose-built bilingual system designed to handle the linguistic reality of Northeast India.
Custom Tokenization
We trained a specialized tokenizer and added 2,135 Khasi and Garo tokens to the base Gemma-2-2B vocabulary. The result? 30-36% fewer tokens per Khasi sentence, which means the model sees complete morphemes instead of broken fragments, and that translates directly into better fluency and understanding.
Clean, Proprietary Corpus
We curated 5.43 million hand-cleaned Khasi sentences, the largest Khasi text dataset ever assembled. Every sentence was manually reviewed. We removed HTML artifacts, code-mixing noise, auto-generated spam, and verse citations. What remains is production-grade training data that reflects how Khasi is actually used.
Bilingual Stability
Kren-M doesn’t auto-translate. It doesn’t echo instructions in the wrong language. It understands when to respond in Khasi, when to respond in English, and how to handle natural code-switching without breaking conversational flow. This came from response-aware supervised fine-tuning with over 33,000 carefully constructed examples.
Production-Ready Performance
- 2.6 billion parameters (Gemma-2-2B base architecture)
- 45.5% improvement in validation loss after continued pre-training
- Runs on 6GB VRAM—deployable on standard hardware
- Supports chat, translation, summarization, and domain-specific tasks
Built in Shillong, for Northeast India
Here’s what sets MWire Labs apart: we are rooted here trying to solve a problem we understand.
Our team is based in Shillong. We speak these languages. We grew up hearing them. We know the dialectal variations, the morphological quirks, the cultural contexts that can’t be captured in a dataset scraped from the web.
When we say Kren-M “sounds like home,” we mean it literally. Every design decision, from tokenizer training to corpus filtering to instruction tuning—was informed by lived experience with these languages.
This model was built entirely in-house. No outside funding rounds. No imported talent. No compromise on local understanding.
Real-World Impact
Kren-M enables use cases that were impossible before:
Government Services
Local-language chatbots for citizen services, automated Khasi translation for policy documents, voice assistants for public helplines.
Enterprise & Business
Customer support automation, bilingual call center agents, domain-specific chatbots for tourism, agriculture, and healthcare.
Education & Preservation
Language learning tools, cultural documentation, digital archiving of oral traditions.
Research & Development
A foundation for specialized models in legal, medical, and technical domains.
For the first time, organizations in Meghalaya and across Northeast India can deploy AI that actually understands their communities.
Open Collaboration
We believe AI for Northeast India should be built with Northeast India, not for it.
That’s why we’re releasing Kren-M as an open model and sharing our research openly:
- Model: huggingface.co/MWirelabs/Kren-M
- Technical Documentation: mwirelabs.com/models/kren-m
- Research Paper: DOI: researchsquare.com/article/rs-8144118/v1
If you’re working on regional NLP, building language technology, or deploying AI in Northeast India, let’s collaborate.
What’s Next: Kren-NE
Kren-M is just the beginning.
In early 2026, we’ll release Kren-NE, a larger multilingual model (Gemma-2-9B base) covering:
- Khasi
- Garo
- Mizo
- Assamese
- Meitei
- Nagamese
- Kokborok
- Nyishi
All built with the same rigorous approach: custom tokenization, clean corpora, task-aware fine-tuning, and deep cultural understanding.
Our goal is to establish Northeast India as a hub for inclusive AI development, proving that world-class language models can be built anywhere, for any language, when the work is driven by local expertise.
About MWire Labs
MWire Labs is the AI research division of MWire Consulting, a Shillong-based firm that has been delivering enterprise IT solutions across Northeast India since 2017. Through MWire Consulting, we’ve deployed systems serving 8+ million citizens for government departments, built survey platforms for tribal consultations, and developed chatbots for public services.
MWire Labs was founded on the belief that the deepest expertise for building AI in Northeast Indian languages doesn’t exist in Bangalore or Silicon Valley, it exists right here, in the hills, among the people who speak these languages every day.
We’re building the future of regional AI, one model, one language, one community at a time.
Connect with us:
🌐 mwirelabs.com
🤗 huggingface.co/MWirelabs
📧 Contact: connect@mwirelabs.com
Join the conversation:
Are you a researcher, developer, or language community member interested in advancing AI for Northeast India? We’re always looking for collaborators. Reach out, let’s build together.
Kren-M is the first foundational AI model for Northeast Indian languages, developed entirely in Shillong by MWire Labs. The model is available for research, development, and commercial deployment.
