Tell me a story and I’ll tell you where you’re from: dialect recognition using machine learning algorithms
Jerid Francom, Romance Languages
Friday 3/20 at 3pm
Greene Hall 528
Linguistic variation is a pervasive characteristic of languages. It occurs at all linguistic levels and is predicted by socio-demographic variables, time period, and geographical and political boundaries. Understanding how languages and language varieties differ has attracted much attention from public and academic communities, and for good reason –it reflects the unique ways in which humans interface the world and holds the key to understanding language’s place in cognition.
In this talk I explore language variation through a lesser-traveled path: machine learning. Focusing on Spanish-language variation, I provide results from a series of text classification tasks that suggest variation between Argentine, Mexican, and Spanish dialects, present in the ACTIV-ES Spanish-language corpus, can be modeled and used to accurately predict where a speaker is from based on word choices alone.