Repository of Student Work
Final projects for the 2023-2024 academic year (Fall and Spring semesters) have now been uploaded to the repository. Using the drop-down menus below, you can learn a bit more about each of the six topics students could choose from for their semester-long research project on AI and linguistics issues.
Background Information
Before embarking on one of the six topics below, all students were first required to acquaint themselves with the basics of computational linguistics. Check out the "Background Information" tab above for the resources we shared with our students. They will most certainly serve as informative baselines for you and your students!
Ethical Concerns and Fairness Issues in AI/LLMs
The information generated by AI tools such as ChatGPT is possible due to having been trained on Large Language Models (LLMs) - large corpora of natural texts produced by everyday users of language. What ethical concerns are there about using others' work to train LLMs?
Cross-Cultural Considerations and Analytics in NLP Models
Natural Language Processing (NLP) tools have sometimes been flagged for using racial slurs, offensive language, and otherwise perpetuating bias in the responses they generate to user inquiries. What biases did our students discover, and what were they ideas for rectifying the issue?
Linguistic Analysis of AI-Generated Content: Phonetics/Phonology
Speech-to-text and text-to-speech technology is some of the most innovative and sought after uses of artificial intelligence. As such, our students used their newly acquired skills and understanding of phonetics (the study of how sounds are produced and interpreted) and phonology (the study of how languages organize their sounds into systems) to judge the accuracy of popular text-to-speech technology.
Linguistic Analaysis of Machine Translation Quality
Machine Translation (MT) - a tool with which most of us will be intimately familiar (thanks, Google Translate!) - takes input in one language and produces the equivalent in another language. There are scores of challenges MT tools must face in providing accurate translations, and our students put their newly acquired knowledge and understanding of semantics and syntax to good use in this project.
Multilinguality and Large Language Models (LLMs)
A core tenant of linguistics is that all languages - and varieties of language - are equally complex, culturally rich and important, and capable of communicating the intricacies of human thought. This belief, unfortunately, does not ensure that the world's languages are equally represented online. In fact, most languages are designated as low-resource, meaning that Large Language Models simply don't have sufficient training data to allow for accurate Machine Translation. What are some computational linguistic work-arounds for this problem? Our students explored the issue!
Low-Resource/Indigenous/Endangered Languages in the AI Era
The abililty for grassroots, often Indigenous, language activists to control document, record, safely store, and manage recorded language is referred to as data sovereignty, and it is a paramount concern in contemporary efforts to preserve and revitalize the languages of Indigenous and minoritized peoples. Our students researched how archivists and speakers of five such languages are turning to AI for help in their efforts.