skip to main content

'The results astonished me': what happens when ChatGPT takes a university exam?

'In the GenAI era, the only way that we can guarantee academic integrity, and ensure the person being given a qualification actually knows something about the subject, is through supervised, in-person assessment.'
'In the GenAI era, the only way that we can guarantee academic integrity, and ensure the person being given a qualification actually knows something about the subject, is through supervised, in-person assessment.'

Analysis: It's clear that GenAI tools can already effectively deal with and solve many of the topics taught to university students

As a lecturer in UCC's School of Engineering, I decided to ask ChatGPT to attempt some of the exam questions I had previously given to my undergraduate students. I got ChatGPT and DeepSeek to attempt my Electrical Power Engineering summer 2024 exam paper and compared its performance to our undergraduate students.

I fed all the exam paper questions to ChatGPT with no additional information or prompts. I then graded ChatGPT’s output, carefully following the marking scheme I used to mark the 61 undergraduate students that sat this exam in May 2024.

The results astonished me. GPT-4o mini, the most widely used free version of ChatGPT, did not just pass the exam, it scored 90%, while DeepSeek-R1 scored even better with 94%. Both produced the answers to each exam question in seconds, without seeing any of the lectures or tutorial materials that my undergraduate students received.

We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences

From RTÉ Radio 1's Brendan O'Connor Show, tech journalist, and host of the For Tech's Sake podcast, Elaine Burke, reviews ChatGPT4o

For comparison, the average performance across the 61 UCC students that sat this exam in May 2024 was 63.8%. It is worth bearing in mind that these are third-level students, who all required higher-level maths and a minimum of 520 Leaving Cert points just to enter the UCC Engineering programme.

Like many of us, I have been a casual user of Generative AI tools such as ChatGPT for the last couple of years and was aware of the vulnerability of certain university assessment methods to GenAI. I would be wary of asking students to write text on a reasonably well-known topic in a non-supervised environment, as there is a risk that at least some students will delegate this task to ChatGPT. But I had no idea how effective GenAIs already are at some of the common engineering problems that are taught to university students.

I then tested ChatGPT and DeepSeek on more advanced and specialised engineering topics (for example, Masters-level exam papers) and the results were mixed. Strange errors and logical inconsistencies emerged when they were presented with more challenging engineering problems. It is important to stress that GenAIs cannot be trusted to give reliable responses to maths and engineering problems and are absolutely not a substitute for a qualified expert if the output is important. Despite this, their performance on university exams is impressive.

We need your consent to load this rte-player contentWe use rte-player to manage extra content that can set cookies on your device and collect data about your activity. Please review their details and accept them to load the content.Manage Preferences

From RTÉ Radio 1's Drivetime, why the new DeepSeek AI app is a game-changer

A recent study at the École Polytechnique Fédérale de Lausanne in Switzerland attempted to answer the question 'could ChatGPT get an engineering degree?'. The answer amazingly was 'yes'. It demonstrated that AI assistants, such as ChatGPT, "answered at least 65.8% of examination questions correctly across 50 diverse courses in the technical and natural sciences". Others have demonstrated GenAIs passing key professional examinations in the fields of law and medicine.

The implications of all this on how we teach in universities are huge. In the short-term, we need to be extremely careful with the type of student assessments that we use. Traditionally, the most important assessment method in universities has been the formal end-of-semester exam. Students are crammed into an exam hall and provide answers, via pen and paper, to a printed exam. These formal exams are strictly invigilated so there is no question of cheating.

However, formal exams are often seen as old-fashioned. Many argue that traditional end-of-semester exams place too much stress on students, and that the resulting grades can reward a student’s "exam technique", rather their actual ability in the topic being tested. Accordingly, universities have been encouraging more continuous assessment, such as multiple short assignments and online tests spread out over the duration of the university module, rather than one big written exam at the end. There has also been a shift towards online teaching and learning, a trend accelerated by Covid-19.

READ: ChatGPT or CheatGPT? Irish students weigh in on the latest AI

Going online has many benefits, allowing students greater flexibility and enabling university programmes to be offered long-distance, but many forms of online assessment now look extremely vulnerable due to rapid advances in GenAI. The integrity of university programmes that are completely online/remote is in serious doubt.

ChatGPT, DeepSeek and others can make convincing attempts at a whole range of assessments, be it writing text for a project report or essay, coding, or solving mathematical problems. In the GenAI era, the only way that we can guarantee academic integrity, and ensure the person being given a qualification actually knows something about the subject, is through supervised, in-person assessment. Our old friend, the traditional pen-and-paper formal exam, is making a massive comeback.

Attempting to police or ban the use of GenAI is futile

Attempting to police or ban the use of GenAI is futile. Some of my academic colleagues are old enough to remember students being prevented from taking calculators into exams on the basis that they needed to know the multiplication tables by heart. GenAI is here to stay, with newer, more capable tools being rolled out every few weeks.

We should encourage students to use these tools judiciously and responsibly, and to be transparent about using them. We need to continue to teach the fundamentals properly and instil the critical thinking skills to allow students know whether to trust a GenAI for a given task.

And in just in case you were wondering: no, I didn’t use ChatGPT to write this article.

Follow RTÉ Brainstorm on WhatsApp and Instagram for more stories and updates


The views expressed here are those of the author and do not represent or reflect the views of RTÉ