Research & Publications

Our work focuses on advancing the state of AI for the languages, cultures, and knowledge systems of Northeast India.

KhasiBERT: A Foundational Transformer Language Model for the Khasi Language

Architecture: RoBERTa-base Parameters: ~110M Corpus Size: 3.6M sentences (63M tokens)

Why Multilingual Transformers Fail for Khasi: A Linguistic Analysis of Low-Resource Austroasiatic AI Gaps

Multilingual models like mBERT and XLM-R often fail on typologically distinct, low-resource languages such as Khasi, producing unreliable predictions due to tokenization bias and structural divergence.

Join Us in Building Inclusive AI

MWirelabs invites researchers, educators, and developers to collaborate in shaping technology that reflects the world’s diversity.