45 Million Speakers • 21 Languages • 8 States

Your Language.
Your Legacy.

AI doesn't understand Northeast India's languages. You can change that. Contribute in 2 minutes.

Languages

2 min

To Contribute

Quick Contribute

Your Name

Select Your Language

Type

Text

Voice

Stay connected for future projects

No login required • Open data • Contributor credited

Why This Matters

45 million people speak Northeast India's languages. But AI doesn't understand them. Yet.

Invisible Languages

Google Translate doesn't support Khasi. Siri can't understand Garo. ChatGPT can't read Mizo. Without data, these languages are invisible to technology.

The Data Gap

English ~trillion tokens

NE Languages ~million tokens

Big tech ignores low-resource languages. But WE can build the datasets needed.

Community Power

You don't need to be a linguist. Every sentence you write, every voice clip you record trains smarter AI for YOUR linguistic heritage.

What We're Building

Your contributions power real AI applications for education, government, and daily life

Translation Models

Bidirectional translation between English and indigenous languages for schools and government.

Voice Recognition

Speech-to-text systems that understand regional accents and dialects.

Open Datasets

High-quality, freely available corpora for researchers and developers worldwide.

Languages We're Preserving

21 languages across 8 Northeast states

AdiText

Tibeto-Burman

AngamiText

Tibeto-Burman

AoText

Tibeto-Burman

AssameseText + Voice

অসমীয়া

Indo-Aryan

BhutiaText

Tibeto-Burman

BodoText

Tibeto-Burman

GaroText + Voice

Garo

Tibeto-Burman

HmarText

Tibeto-Burman

KarbiText

Tibeto-Burman

KhasiText + Voice

Ka Ktien Khasi

Austroasiatic

KokborokText

Tibeto-Burman

LepchaText

Tibeto-Burman

LimbuText

Tibeto-Burman

Meitei (Manipuri)Text

ꯃꯩꯇꯩꯂꯣꯟ

Tibeto-Burman

MizoText

Mizo ṭawng

Tibeto-Burman

NagameseText

Indo-Aryan Creole

NyishiText

Tibeto-Burman

OthersText

Mixed/Unclassified

PnarText

Austroasiatic

TangkhulText

Tibeto-Burman

ThadouText

Tibeto-Burman

WanchoText

Tibeto-Burman

WarText

Austroasiatic

Part of the Community

Contributing to efforts in linguistic research, dataset creation and endangered language documentation

Your Language. Your Legacy.

Quick Contribute

Why This Matters

Invisible Languages

The Data Gap

Community Power

What We're Building

Translation Models

Voice Recognition

Open Datasets

Languages We're Preserving

Part of the Community

Your Language.
Your Legacy.