Cross-cultural Considerations and Analytics in NLP Models
“Treat AI as a mouthpiece for its corpus, not a thinking thing.”
- Group takeaway
The rapid advancement of Natural Language Processing (NLP) has led to the development of increasingly sophisticated models capable of understanding and generating human language. However, these models often reflect biases present in the data they are trained on, which can perpetuate and amplify cultural biases. For instance, as the following student researchers discovered, text-generating programs like often generate language that can be hearably racist, xenophobic, or transphobic.
Cite this project as: Bhowmik, R., Brown, M., Burke, S., Gaffigan, M., Ilitch, T., Jergins, A., & Landegger, L. Dogwhistles: Coded Rhetoric and Language Models. Under the supervision of Professor Lara Bryfonski and teaching assistant Xiang Li. LING 1000: Introduction to Linguistics, Georgetown University.
Spring 2024.
Click on the links below to see other students' work on this topic
Hemcher, A., & Barlow, O. Cross Cultural Considerations of LLM and NLP Development and Use. Under the supervision of professor Yi (Laura) Tan and teaching assistant Josh Linden. LING 1000: Introduction to Linguistics, Georgetown University. Fall 2023.
Maloney, E., Sarafoglu, S., & Jaworski, H. Cross-Cultural Considerations and Analytics in NLP Models. Under the supervision of professor Sue Lorenson and teaching assistants Lauren Levine and Evelyn Diaz-Iturriaga. LING 1000: Introduction to Linguistics, Georgetown University. Fall 2023.
For further information, we direct you to the following resources:
Arora, A., Kaffee, L., & Augenstein, I. (2023). Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia. Association for Computational Linguistics.
Cao, Y., Zhou, L., Lee, S., Cabello, L., Chen, M., & Hershcovich, D. (2023). Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Dubrovnik, Croatia. Association for Computational Linguistics.
Flavelle, D., & Lachler, J. (2023). Strengthening Relationships Between Indigenous Communities, Documentary Linguists, and Computational Linguists in the Era of NLP-Assisted Language Revitalization. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 25–34, Dubrovnik, Croatia. Association for Computational Linguistics.
Mendelsohn, J., Le Bras, R., Choi, Y., & Sap, M. (2023). From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15162–15180, Toronto, Canada. Association for Computational Linguistics.