Our Foundation Models
We create and open-source foundational models designed to support research, education, and practical applications.
Kren-M
Generative Model
Kren-M is a bilingual (Khasi–English) language model developed through extensive continued pre-training and supervised fine-tuning of Gemma 2 (2B). Specifically designed for the Khasi, a low-resource Austroasiatic language spoken in Meghalaya, Northeast India, while retaining English fluency from its base model.
~3B params
NE-BERT
Encoder Model
NE-BERT: A regional state-of-the-art open-source model for 9 Northeast Indian languages. Built on ModernBERT for superior speed and accuracy in low-resource NLP.
~149M params
Northeast India NLP
Multilingual NLP models and tokenizers for underrepresented languages of Northeast India, built for civic use and reproducibility.
Assamese RoBERTa
language Model
Assamese RoBERTa is a custom monolingual RoBERTa-Base model pre-trained from scratch on the Assamese language.
~110M params
Meitei RoBERTa
Language Model
The Meitei-RoBERTa-Base model is a high-performance, monolingual transformer encoder pre-trained from scratch on the entire Meitei Monolingual Corpus.
~110M params
KhasiBERT
Language Model
Foundational Khasi model trained on ~3.6M sentences. Useful for translation, summarization, and low-resource NLP research.
~110 params Encoder
Mizo-RoBERTa
language Model
Mizo-RoBERTa is a transformer-based language model for Mizo. Built on the RoBERTa architecture and trained on a large-scale curated corpus, this model provides state-of-the-art language understanding capabilities for Mizo NLP applications.
~110M params
Let's Build Together
Are you a researcher, developer, or part of a language community in Northeast India? We are always looking for partners to collaborate on new datasets, fine-tune models, and advance the state of regional AI.