Multilinguality and LLMs
"With Tajik, there appeared to be some inconsistencies in output translation, seemingly due to the influence of Iranian sources that were being drawn upon."
- Group finding
Large Language Models (LLMs) represent a breakthrough in the field of artificial intelligence and natural language processing. These models are trained on massive amounts of text data, enabling them to acquire a deep understanding of linguistic structures, semantics, and context. However, in the era of LLMs, the spotlight has often been on major languages with abundant resources and data for training these advanced AI models such as English.
Multilinguality involves the ability of LLMs to operate across multiple languages, leveraging their inherent understanding of linguistic structures and patterns. While many LLMs have initially been developed for high-resource languages, there is a growing recognition of the importance of extending their benefits to low-resource languages, which have historically received less attention due to data scarcity and resource limitations.
In order to add to the diversity of LLMs, Sophie Mayle, Yvette Tseng, and Sarah Kennedy researched how such models handled French (a high-resource language) and Tajik (a low-resource language) input.
Cite this project as: Mayle, S., Tseng, Y., & Kennedy, S. Multilinguality and LLMs. Under the supervision of professor Yi (Laura) Tan and teaching assistants Josh Linden. LING 1000: Introduction to Language, Georgetown University. Fall 2023.
Open-source LLM translators employed in this project included:
For further information, we direct you to the following resources:
Liu, Y., Ye, H., Weissweiler, L., Wicke, P., Pei, R., Zangenfeind, R., & Schütze, H. (2023). A Crosslingual Investigation of Conceptualization in 1335 Languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12969–13000, Toronto, Canada. Association for Computational Linguistics.
Ruder, S. (November 14, 2022). The State of Multilingual AI.