The whole experiment started as a dinner table conversation. Christian Terwiesch’s older child was interested in emerging AI tools that recognize, process, and create images. The younger was interested in the code generation capabilities of Open AI’s remarkable new chatbot, ChatGPT.
“Dad, you teach at a university,” one of his kids suggested. “See how this thing would do on your exam.”
Terwiesch, the Andrew M. Heller professor of operations at University of Pennsylvania’s Wharton School, had no expectations at that point. His hopes were not high. The family sat at a computer and cut-and-pasted into the chatbot’s prompt, with no editing, the first question from Terwiesch’s final exam for his Wharton Operations Management course.
“I was just blown away. The answer was correct, and it was really well-worded and well-reasoned,” Terwiesch tells Poets&Quants. “I said, ‘Wow, this machine is awesome.’”
‘WOULD CHATGPT GET A WHARTON MBA?’
According to a white paper Terwiesch published on January 17, “Would ChatGPT Get a Wharton MBA?” , Terwiesch would have given the chatbot an A+ on that first question, which asked students to identify the bottleneck in a seven-part process in an iron-ore refinery.
ChatGPT, which stands for Chat Generative Pre-trained Transformer, earned another A+ on the second question on inventory turns and working capital requirements. Terwiesch thought he was sitting next to some kind of wonder machine.
Then it “came back down to Earth,” he says. On the third question, which was more complex, the computer chatbot couldn’t do the math.
“In a sense, that is an irony, because it’s good at reasoning, telling stories, humor – all really human things,” Terwiesch says. “The things that humans have always been bad at – like mathematics – we kind of thought a computer would be good at.”
In the end, the chatbot scored well on three of the five questions; Terwiesch says he would have given it a final grade of B or B-. Not too shabby for a fledgling bot taking its first final at one of the most prestigious business schools in the world.
TO BAN OR NOT TO BAN
Soon after its launch in November, ChatGBT went viral. It passed out relationship advice, composed original rap lyrics about Elon Musk (an Open AI cofounder), and demonstrated an ability to write, debug and explain computer code. It also triggered a debate about the possibilities behind its astonishing capabilities versus the ability for people to abuse them.
This month, the Stanford Daily reported that 17% of surveyed Stanford University students said they used the chatbot in their final exams and assignments. Of those, 7.3% reported that they’d submitted written material from ChatGPT with edits, and 5.5% reported submitting ChatGPT material with no edits whatsoever. New York City’s public schools blocked ChatGPT access on school computers and networks out of fear of cheating.
Terwiesch, who serves as chair of Wharton’s Operations, Information, and Decisions Department and co-director of Penn’s Mack Institute for Innovation Management, makes clear that plugging his final into the chatbot was not a careful research study. It was simply an experiment spurred by his dinner table discussion. But the results and the implications he wrote about afterwards spurred its own flurry of headlines, reigniting a debate about how business schools should use this wondrous new technology: Ban it or use it.
Terwiesch hopes the conversation will be a bit more nuanced.
These conversations, happening now at business schools around the world, echo those from nine years earlier. That’s when MOOCs (Massive Open Online Courses) first exploded onto the scene and, some believed, would sound the death knell for all but the top ranked MBA programs. Instead of shunning the new technology, Terwiesch was among the first business school faculty to make his MBA coursework available to the general public. Contrary to what some thought, video didn’t kill the classroom star.
Poets&Quants had the chance to talk with Terwiesch at length this week about his white paper and the implications of ChatGPT on business education. The conversation has been edited for length and clarity.
You explain in your white paper how ChatGPT scored A+ grades on the first two questions of the exam, but then did worse on the third. Can you explain where it tripped up?
It messed up in two ways: First, it really got lost in the question, but I give it great credit that it recovered when I gave it human hints. So, in addition to the questions with the command line, you can add comments that build upon and add to the initial question. It really took those into account and changed its quote unquote “thought process.”
It failed bitterly at the math. I’m not an AI expert, but it’s my understanding that this has to do with the nature of how the chatbot works. It’s basically a prediction machine that makes probabilistic statements based on the patterns of characters and words it has seen in the question. That works well with language. I think the earlier versions of the technology were trained by Shakespeare texts. If you’ve read all of Shakespeare’s work over and over again, you kind of get the rhythm of it and can extrapolate.
Math doesn’t work this way. It works for 2 + 2 = 4 because that statement is found often enough on the internet that it’s able to make a prediction. But as you get into bigger numbers – even for multiplication tables or for problems a fourth grader could do in five minutes with a pen and paper – there’s just not enough evidence available that the machine has seen on the internet. So, those simple things it really struggles on.
Is it conceivable that it will get better at those kinds of things over time?
I’m not a technology expert, but my gut feel is they will add another layer of software on top of that. For example, when the machine needs to generate its next word, it will use its current prediction method. When it needs to make a mathematical statement, it will reach out to a more traditional calculator and outsource that part.
When you get to five or six digit numbers, though, you can’t have it memorizing solutions. That’s like a kid memorizing the multiplication table as opposed to being able to do the multiplication cognitively.
What was your initial reaction after realizing ChatGPT scored a B to B- on your final exam? Was it interesting, alarming, what?
Initially, I was just like, “Wow. We’re living in exciting times.” You know, I took a class on neural networks when I went to college 30-plus years ago, and these things were so lame then. Seeing it now is just unbelievable. You cannot even imagine how the world will look in 10 years. So my first reaction was just being in awe.
I think there are two types of alarms here. One, is the type of alarm that this thing is going to pass my tests and many other tests — AP exams, college entrance exams, homework assignments, etc. – so, we have to rethink education a bit. Then there’s the other alarm, those raised by people like Sam Harris: Is AI taking over the world? Does this machine have consciousness? I think both of those are interesting.
On the educational front, I think every student in my class is welcome to use it for case discussion and preparing for class. If anything, I think it would add to the dynamics and the action of an MBA classroom where people get outside advice. I mean, it’s how the business world works. You have two people bargaining with each other: On one side is advice, maybe by McKinsey, and the other one is an investment banker from Goldman Sachs. Being able to critically look at what advisors tell you and realize when the advisor is wrong, and potentially reconciling conflicting opinions, I think that makes the MBA classroom a richer place that is closer to the real world.
I do think we have to revisit testing. But I think for classroom discussion, this is going to be great. I mean, again, we educate MBAs to be leaders in a moving, dynamic, complex world and with this technology, this world just moves a little faster, becomes a little more dynamic. If anything, the demand for MBA students has only gone up.
NEXT PAGE: How is Wharton responding to ChatGPT?