A recent study from Harvard University caught my attention and provides perhaps the most rigorous evaluation of GPT-4-based tutoring to date, and what makes it significant is not merely its findings but what it measured against. In a randomised controlled trial with 194 undergraduate physics students, a carefully designed AI tutor outperformed in-class active learning, the kind of research-informed pedagogy that has been demonstrated over decades to substantially outperform traditional lectures.
This was not a comparison against passive instruction or weak teaching (as many studies are), but against well-implemented active learning delivered by highly rated instructors in a course specifically designed around pedagogical best practices. The AI tutor produced median learning gains more than double those of the classroom group.
...
[...] this study took place in a context where students had access to expert instruction, course staff, and peer collaboration. The AI tutor supplemented human teaching; it did not replace it. The proper comparison is not “AI versus teachers” but rather “AI-supported instruction versus conventional instruction.”
...
[...] the Harvard study is not an isolated finding. ASSISTments, a mathematics tutoring platform evaluated across two large-scale randomised controlled trials involving thousands of students, achieved effect sizes of 0.18 to 0.29 standard deviations on standardised tests, with the largest gains for struggling students, earning it the highest ESSA Tier 1 evidence rating at a cost of less than £100 per student. Carnegie Learning’s MATHia, tested with over 18,000 students across 147 schools, produced effect sizes ranging from 0.21 to 0.38 standard deviations.
...
Yet there is a troubling paradox at the heart of AI tutoring. The very same technology that can produce effect sizes above 0.7 standard deviations can also make students demonstrably worse at learning. And I would argue that the harmful version is the one most students are currently using today.
...
Recent research is beginning to quantify what teachers have long suspected: when AI does the thinking, students stop doing it themselves. In a 2025 mixed-methods study published in Societies, Michael Gerlich found that frequent AI tool use was strongly negatively correlated with critical thinking ability, largely because of a mechanism known as cognitive offloading. The more participants relied on AI to remember, decide, or explain, the less capable they became of reasoning independently. Younger users (those most immersed in generative tools) showed the greatest decline. Gerlich describes this as a kind of “cognitive dependence”: efficiency rising as understanding falls. It’s precisely this frictionless fluency that creates the illusion of learning; the sense that one is mastering material when, in fact, the machine is doing the mastery.
...
The distinction between AI systems that enhance learning and those that destroy it is not about the underlying technology; GPT-4 powered both the highly effective Harvard tutor and the ineffective tools students use to avoid thinking. The difference lies entirely in design. The Harvard system was engineered to resist the natural tendency of LLMs to be maximally helpful. It was constrained to scaffold rather than solve, to prompt retrieval rather than provide answers, to increase rather than eliminate cognitive load at the right moments.
ChatGPT, by contrast, is optimised for frictionless task completion. It will happily write your essay, solve your equation, explain the concept you should be puzzling through yourself. It is designed to be helpful, not to promote learning, and those are fundamentally different objectives.
...
But these flaws are not going to stay flaws forever. [...]
...
Unlike human tutoring, where improvements spread slowly through training programmes and institutional change, AI improvements propagate instantaneously across millions of students. A breakthrough discovered whilst tutoring one student in Singapore becomes available to every student everywhere within hours.
We may currently be in the flat part of the curve, where AI tutors still lag behind skilled human tutors in various dimensions. But the trajectory is clear, and the mechanisms are in place. The question is not whether AI will eventually provide more effective instruction than human tutors (the positive feedback loops make this inevitable) but how quickly we reach the knee in the curve, and whether we possess the wisdom to deploy this capability well.
Really interesting. What I find the most compelling is the discussion of the different sorts of AI tool use — offloading your thinking vs. having the AI help you think (primarily, as they note, by...
Really interesting. What I find the most compelling is the discussion of the different sorts of AI tool use — offloading your thinking vs. having the AI help you think (primarily, as they note, by collecting facts for you, not thoughts), but I also thought this was spot-on:
A really, really uncomfortable truth about good teaching is that it doesn’t scale very well. Teacher expertise is astonishingly complex, tacit, and context-bound. It is learned slowly, through years of accumulated pattern recognition; seeing what a hundred different misunderstandings of the same idea look like, sensing when a student is confused but silent, knowing when to intervene and when to let them struggle. These are not algorithmic judgements but deeply embodied ones, the result of thousands of micro-interactions in real classrooms. That kind of expertise doesn’t transfer easily; it can’t simply be written down in a manual or captured in a training video.
This is why education systems rarely improve faster than their capacity to grow and retain great teachers. Pedagogical excellence replicates poorly because it resides not in tools or curricula but in people, and people burn out, move on, or are asked to do too much. Every nation that has tried to systematise good teaching eventually runs into the same constraint: human expertise does not compound exponentially.
This doesn't surprise me and I'm excited to read more about this. Education is one place where I think AI has the potential to be revolutionary, and to bring higher quality education to groups...
This doesn't surprise me and I'm excited to read more about this. Education is one place where I think AI has the potential to be revolutionary, and to bring higher quality education to groups that historically have the least access to it. Of course it can also be done very wrong, but I'm optimistic.
From the blog post:
...
...
...
...
...
...
...
Really interesting. What I find the most compelling is the discussion of the different sorts of AI tool use — offloading your thinking vs. having the AI help you think (primarily, as they note, by collecting facts for you, not thoughts), but I also thought this was spot-on:
This doesn't surprise me and I'm excited to read more about this. Education is one place where I think AI has the potential to be revolutionary, and to bring higher quality education to groups that historically have the least access to it. Of course it can also be done very wrong, but I'm optimistic.