Research & Publications
Our work focuses on advancing the state of AI for the languages, cultures, and knowledge systems of Northeast India.
KhasiBERT: A Foundational Transformer Language Model for the Khasi Language
Architecture: RoBERTa-base Parameters: ~110M Corpus Size: 3.6M sentences (63M tokens)
Why Multilingual Transformers Fail for Khasi: A Linguistic Analysis of Low-Resource Austroasiatic AI Gaps
Multilingual models like mBERT and XLM-R often fail on typologically distinct, low-resource languages such as Khasi, producing unreliable predictions due to tokenization bias and structural divergence.
Other Works & Technical Notes
Join Us in Building Inclusive AI
MWirelabs invites researchers, educators, and developers to collaborate in shaping technology that reflects the world’s diversity.


