The Internet is a realm of anonymity. But even the most careful programmers — including many cybercriminals — have discernible quirks. Researchers at Drexel University have now figured out how to tap into them.
“I found the stylistic fingerprints in coding style, which make programmers quite identifiable,” said Aylin Caliskan-Islam, a graduate student in Drexel’s Privacy, Security and Automation Lab, and lead researcher on the project.
The method analyzes a piece of code for a variety of stylistic features — including how coders use spaces, name their variables, and structure their programs — then tries to match it to code samples it’s already seen. By examining the underlying “grammar,” and using a machine learning technique, Caliskan-Islam said the team has unprecedented accuracy.
“I’m able to de-anonymize 250 programmers with 95 percent accuracy,” said Caliskan-Islam, who also worked with colleagues at the Army Research Laboratory and Princeton University, among others.
Previous efforts have been limited to picking out an author from just 50 options.
The technique also works to rule out potential authors if a piece of anonymous code is authored by someone completely new to the system — a more realistic scenario for detectives working to track down the creator of malware.
In addition to aiding law enforcement, the tool could be used to settle copyright and plagiarism disputes.