How AI and 40 years of recordings can help detect the subtle forces that shape how we speak

The Philadelphia Neighborhood Corpus collected more than 400 audio recordings to study spoken language. AI is helping researchers understand the data.

Listen 14:08
A person holds one old cassette tape out of many. (Bigstock/Tik.Tak)

Someone holds one old cassette tape out of many. (Bigstock/Tik.Tak)

This story is from The Pulse, a weekly health and science podcast.

Find it on Apple PodcastsSpotify, or wherever you get your podcasts.


Joshua Plotkin, a biology professor at the University of Pennsylvania, uses mathematics to study the forces that drive organisms like viruses to evolve. 

But now, with the help of an unlikely tool, he wants to use similar mathematical models to shed light on something equally dynamic: language.

“At the beginning, when evolutionary biology really got going with Darwin, it was actually really closely intercalated with language studies and cultural evolution,” said Plotkin. 

Vast amounts of textual data found within ancient libraries and archives allowed linguists a head start on developing their own theories of change compared to budding biologists. 

Without a similarly large fossil record, Charles Darwin cited the work of linguists to lend credence to his ideas on survival, competition, and fitness within the animal kingdom. 

But today, thanks to technological advancements like DNA analysis, Plotkin says much more is known about how organisms change than how cultural markers like language do — despite the existence of Google Books and social media platforms. 

Since antiquity, linguists have used written texts to show how words and grammar have transformed over time and trace the impact of inventions, migrations, and cultural shifts. 

Plotkin wanted to use digitized textual archives to apply the math he used to study genes to uncover previously hidden patterns in language. 

But when Plotkin shared his ideas with his linguist friends, they were skeptical. 

Not in his math, but in the data he wanted to use.

“They were kind of like, ‘That’s not really language,’” Plotkin said. 

Real language, they argued, wasn’t the polished prose found in books or even the casual scrawlings posted on someone’s Facebook wall. 

It was the everyday, fast-paced, messy conversations that people have in real life interactions.

Spoken language is where subtle changes take root and spread from person to person, only to show up much later in written texts. 

For this reason, spoken language is also more difficult to study with traditional methods than written texts. Shifts in pronunciation and word usage happen in real-time and are hard to capture and quantify. 

“And I thought, if we just had some data set of how spoken language has changed over time. That would be like, perfect — a goldmine of really studying a real, important cultural change,” Plotkin said. 

Then his linguist friends told him about a treasure trove of data that existed, and it was just a short walk from his office. 

 

The goldmine 

 

That goldmine is called the Philadelphia Neighborhood Corpus, a collection of more than 400 recordings of everyday Philadelphians, spanning from 1972 to 2012. 

The project, spearheaded by pioneering linguist William Labov, is an archive of the city’s linguistic evolution, preserved not in written form but in raw, recorded speech.

Meredith Tamminga, a linguist and one of Labov’s former students, now oversees the collection, which for years was stored in its physical form inside her mentor’s old lab at an old Victorian house on campus. 

“I just lived in fear of the building burning down,” she said. “Because it was the only copies of all of these recordings from this epic span in the history of sociolinguistics.”

Subscribe to The Pulse

Today, the entire collection has been digitized and imported into a computer program.

Labov’s approach to data collection revolutionized the study of linguistics and birthed a new subfield: sociolinguistics. 

Unlike earlier researchers who would gather data by asking people who were seen as representative of certain populations to recite word lists and write down their pronunciations phonetically, Labov believed in recording natural conversations. 

Armed with a tape recorder, his goal was to capture people’s everyday speech without the self-consciousness that comes with being observed. 

He later analyzed this speech by measuring the sound waves people produced while they spoke to calculate the exact location of a speaker’s tongue. 

He theorized that by collecting demographic data as well, he could identify the forces that shaped variations in human speech and use math to better understand them. 

His method worked. Labov found that people’s pronunciations varied based on social factors like class, and he was able to quantify how much those factors pushed people’s tongues in one way or the other.

For decades, Labov sent students into closely packed Philadelphia neighborhoods, hoping to find fertile ground for interaction and language change. 

“This class is really like no other class that I’ve ever taken,” Tamminga said. 

He provided students with strict instructions on how to introduce themselves and conduct interviews without letting on that they were interested in how subjects spoke. 

“One thing that Bill would sort of teach you to do was not to go out there in your Penn sweatshirt,” said Tamminga. “‘Try to dress like a normal person,’” he would tell his students.

Students asked residents personal questions about their lives and experiences. The goal was to get them talking — and talking naturally. These recordings, rich with unguarded speech, provide a window into how language evolves in real time.

 

Using new tech for old recordings  

 

Joshua Plotkin, the biology professor, was fascinated by what he found in these recordings. Not only had Labov and his team shown that certain vowels in Philadelphia had shifted dramatically over time — think “water” becoming “wooder” and back again — but Plotkin noticed something even more intriguing: speakers would often change how they pronounced the same word within a single conversation.

For example, in one recording, a woman says the word “work” twice — once with one pronunciation, and then differently just a few moments later. 

Plotkin had seen this kind of variation again and again in these recordings and started to suspect that, because this change was happening within such a short time scale, something other than social factors was at play.

“[Within a conversation,] there’s going to be sloppiness, and there might just be a chance that I mispronounce the vowel. But if I see a systematic thing happening, then that makes me start to think, well, maybe there’s something forcing this vowel to be produced in a certain way, or forcing it to change over time,” said Plotkin. 

He hypothesized that these shifts in pronunciation could be a subconscious effort to avoid miscommunication. Maybe, without even realizing it, the speaker was adjusting her pronunciation to make herself clearer, ensuring that her listener didn’t mistake “work” for a similar-sounding word like “walk.”

To test this theory, Plotkin turned to ChatGPT.

“Many linguists have thought [large-language models] are going to be completely useless in the study of language. Like, why would you study text generated by an LLM? When you could study speech generated by humans?” he said.  

But Plotkin was interested in applying the math under the hood of ChatGPT, like a highly advanced autocorrect machine, to the vast amounts of linguistic data held within the Philadelphia Neighborhood Corpus. 

“They have 400 different recordings of different speakers encompassing on the order of 2 million words uttered,” said Plotkin. 

With ChatGPT, he can quickly calculate the likelihood of every possible word in a given context for each of those two million words. 

In the case of the woman’s pronunciation of “work,” the large-language model suggested that “walk” was the most plausible alternative. The speaker may have intuitively adjusted her pronunciation to prevent confusion, unconsciously modulating her speech to ensure clarity.

Plotkin is still in the early stages of his research, but with access to the context-rich Philadelphia Neighborhood Corpus and the analytical power of ChatGPT, he’s optimistic. 

If he and his team can identify these patterns in millions of instances, it could reveal new insights into the subtle forces that shape how we speak—and how those shifts spread across a population.

WHYY is your source for fact-based, in-depth journalism and information. As a nonprofit organization, we rely on financial support from readers like you. Please give today.

Want a digest of WHYY’s programs, events & stories? Sign up for our weekly newsletter.

Together we can reach 100% of WHYY’s fiscal year goal