ChatGPT and other artificial intelligence programs have taken higher education by storm, and students and professors are giddy with the classroom possibilities. A new study showing that in the short term, AI may do scholars more harm than good — and concludes that because of the sudden ubiquity of the technology, classrooms need more AI chatbots, not fewer, so their impact may be better understood.
HEC Paris professor Brian Hill has written an academic paper based on a study he conducted in a first-year master’s-level class in behavioral economics. The study reveals a reduction in performance when students use AI chatbots to help with their assignments — and suggests that without a better understanding of the tech involved, the professionals of tomorrow may do a significantly worse job when aided by AI than they do working without it.
“Our classroom experiment suggests that there may be situations in which the professionals of tomorrow do a considerably worse job when aided than when working alone — perhaps due to biases that have been long understood, perhaps due to some that remain to be further explored,” Hill says.
CLASSROOM EXPERIMENT REVEALS CLASSROOM CHALLENGES AHEAD
Hill, a professor of economics and decisions sciences and research director at the French National Centre for Scientific Research, examined the use of ChatGPT in a class assignment in which 49 HEC Paris students participated.
Each student was randomly assigned two out of 14 case studies for their course. For one, they were asked a question and had to produce an answer from scratch. For the other, students were provided with a response to the relevant question, asked to evaluate how well it fared, and asked to correct it if necessary. Students were not told whether the response was from ChatGPT or another student. The final answers provided by students in both the answer task and the correction task were marked using the same grading scheme. What counted was that, via correction or alone, they provided a full answer.
The result: The average grade on the “correct” task was 28% lower than the average grade on the “answer” task where students answered themselves, with students dropping 23 marks out of 100 on average when they corrected the work of others.
STUDY ARGUES FOR MORE CHATBOTS IN THE CLASSROOM, NOT LESS
In short, the results showed that students got a grade that was on average 28 percentage points higher when answering alone as opposed to starting from a ChatGPT-generated response — a significant reduction in performance when correcting a provided response using ChatGPT compared to when students produced an answer themselves from scratch. Hill notes that one possible explanation for this discrepancy could be confirmation bias.
The study suggests the need for more research into performance at the human-AI chatbot interface, Hill says — and it argues for “more, rather than less, chatbots in the classroom,” he adds. “One of the skills of the future, that we will need to learn to teach today, is how to ensure that they actually help.
“While the answer task is arguably representative of traditional work practices, the correct task may correspond more closely to many jobs in the future. If AI tools become as ubiquitous as many predict, the human role will be to evaluate and correct the output of an AI — precisely as asked of students in this task.”
Questions about this article? Email us or leave a comment below.