Low-Resource/
Indigenous/
Endangered Languages
“Particular challenges that confront linguists documenting endangered, minority, and Indigenous languages include limited resources, diversity or lack of standardized forms within a language (e.g., various writing systems are in use), and a general lack of online presence.”
- Group finding
Low-resource languages - languages with limited digital content, linguistic resources, and research attention - should not be overlooked in this landscape. While the development of LLMs has led to remarkable advancements in natural language processing, addressing the challenges and opportunities of low-resource languages remains a critical and complex endeavor.
Liana Mankatah, Berk Ugurdag, Tomohiro Nozaki, Alazar Teffra, Tricia Estrada, Max Russo, and Zachary deMello first researched five low-resource languages: Siberian Yupik, Southern Kurdish, Ryukyuan, Assyrian, and Nahuatl. They then designed their own language technology product for speakers of these languages!
Cite this project as: Mankatah, L., Ururdag, B., Nozaki, T., Teffra, A., Estrada, T., Russo, M., & deMello Z. Low-Resource/Indigenous/Endangered Languages in the Age of AI. Under the supervision of professor Lara Bryfonski and teaching assistant Yuko Hirasawa. LING 1000: Introduction to Language, Georgetown University, Spring 2024.
Click on the links below to see other students' work on this topic
Ahn, H., Goodwin, N., Ke, A., & Roder, W. Low Resource / Endangered Languages and AI. Under the supervision of primary instructors Travis Richardson and Xiang Li. LING 1000: Introduction to Language, Georgetown University, Fall 2023.
Alfonso Alcaide, N., Jobson, G., & Chen Wu, C. Low-Resource Languages in the Age of AI. Under the supervision of professor Tan (Laura) Yi and teaching assistant Josh Linden. LING 1000: Introduction to Language, Georgetown University, Fall 2023.
Larson, A., Murphy, K., & Kamath, T. Low Resource Languages in the Age of AI. Under the supervision of professor Danni Shi and teaching assistant Ping Hei Yeung. LING 1000: Introduction to Language, Georgetown University, Fall 2023.
For further information, we direct you to the following resources:
Digital Endangered Languages and Musics Archives Network (DELAMAN)
For more information about how speakers of low-resource languages turn to language technology for archiving their languagesProceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP) - ACL Anthology (2023)
A treasure trove of information about the study and revitalization of endangered languages through the use of computational linguistic tools
Additional Resources:
Ahmadi, S., Azin, Z., Belelli, S., & Anastasopoulos, A. (2023). Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki. In Proceedings of the Second Workshop on NLP Applications to Field Linguistics, pages 52–63, Dubrovnik, Croatia. Association for Computational Linguistics.
Blasi, D., Anastasopoulos, A., & Neubig, G. (2022). Systematic Inequalities in Language Technology Performance across the World’s Languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486–5505, Dublin, Ireland. Association for Computational Linguistics.