A new AI chatbot might do your homework for you. But it’s still not an A+ student

Emma Bowman

December 19, 2022

Enter a prompt into ChatGPT, and it becomes your very own virtual assistant. (OpenAI/Screenshot by NPR)

Why do your homework when a chatbot can do it for you? A new artificial intelligence tool called ChatGPT has thrilled the Internet with its superhuman abilities to solve math problems, churn out college essays and write research papers.

After the developer OpenAI released the text-based system to the public last month, some educators have been sounding the alarm about the potential that such AI systems have to transform academia, for better and worse.

“AI has basically ruined homework,” said Ethan Mollick, a professor at the University of Pennsylvania’s Wharton School of Business, on Twitter.

The tool has been an instant hit among many of his students, he told NPR in an interview on Morning Edition, with its most immediately obvious use being a way to cheat by plagiarizing the AI-written work, he said.

Academic fraud aside, Mollick also sees its benefits as a learning companion.

He’s used it as his own teacher’s assistant, for help with crafting a syllabus, lecture, an assignment and a grading rubric for MBA students.

“You can paste in entire academic papers and ask it to summarize it. You can ask it to find an error in your code and correct it and tell you why you got it wrong,” he said. “It’s this multiplier of ability, that I think we are not quite getting our heads around, that is absolutely stunning,” he said.

A convincing — yet untrustworthy — bot

But the superhuman virtual assistant — like any emerging AI tech — has its limitations. ChatGPT was created by humans, after all. OpenAI has trained the tool using a large dataset of real human conversations.

“The best way to think about this is you are chatting with an omniscient, eager-to-please intern who sometimes lies to you,” Mollick said.

It lies with confidence, too. Despite its authoritative tone, there have been instances in which ChatGPT won’t tell you when it doesn’t have the answer.

That’s what Teresa Kubacka, a data scientist based in Zurich, Switzerland, found when she experimented with the language model. Kubacka, who studied physics for her Ph.D., tested the tool by asking it about a made-up physical phenomenon.

“I deliberately asked it about something that I thought that I know doesn’t exist so that they can judge whether it actually also has the notion of what exists and what doesn’t exist,” she said.

ChatGPT produced an answer so specific and plausible sounding, backed with citations, she said, that she had to investigate whether the fake phenomenon, “a cycloidal inverted electromagnon,” was actually real.

When she looked closer, the alleged source material was also bogus, she said. There were names of well-known physics experts listed – the titles of the publications they supposedly authored, however, were non-existent, she said.

“This is where it becomes kind of dangerous,” Kubacka said. “The moment that you cannot trust the references, it also kind of erodes the trust in citing science whatsoever,” she said.

Scientists call these fake generations “hallucinations.”

“There are still many cases where you ask it a question and it’ll give you a very impressive-sounding answer that’s just dead wrong,” said Oren Etzioni, the founding CEO of the Allen Institute for AI, who ran the research nonprofit until recently. “And, of course, that’s a problem if you don’t carefully verify or corroborate its facts.”

An opportunity to scrutinize AI language tools

Users experimenting with the free preview of the chatbot are warned before testing the tool that ChatGPT “may occasionally generate incorrect or misleading information,” harmful instructions or biased content.

Sam Altman, OpenAI’s CEO, said earlier this month it would be a mistake to rely on the tool for anything “important” in its current iteration. “It’s a preview of progress,” he tweeted.

The failings of another AI language model unveiled by Meta last month led to its shutdown. The company withdrew its demo for Galactica, a tool designed to help scientists, just three days after it encouraged the public to test it out, following criticism that it spewed biased and nonsensical text.

Similarly, Etzioni says ChatGPT doesn’t produce good science. For all its flaws, though, he sees ChatGPT’s public debut as a positive. He sees this as a moment for peer review.

“ChatGPT is just a few days old, I like to say,” said Etzioni, who remains at the AI institute as a board member and advisor. It’s “giving us a chance to understand what he can and cannot do and to begin in earnest the conversation of ‘What are we going to do about it?’ ”

The alternative, which he describes as “security by obscurity,” won’t help improve fallible AI, he said. “What if we hide the problems? Will that be a recipe for solving them? Typically — not in the world of software — that has not worked out.”