Study: essay graders rarely detect AI, give higher grades - ~tech

kfwyre

December 17, 2024 (edited December 17, 2024)

Link

I’m not teaching at the college level, so for me, AI detection is generally pretty easy because a lot of the time it’s very obvious. I don’t use any automated detectors, which I think are pretty...

Exemplary

I’m not teaching at the college level, so for me, AI detection is generally pretty easy because a lot of the time it’s very obvious. I don’t use any automated detectors, which I think are pretty much garbage.

The main tells for me:

It’s written in a completely different voice from the student’s usual writing.
The font is different from the default, or suddenly changes mid-text.
It’s too perfect.
It uses vocabulary and sentence structure that far exceeds the student’s writing abilities.
It quotes from parts of the text that were not included in the assigned excerpts.
It outright hallucinates inaccurate information (usually the case when it’s applied to less well-known texts).
It suddenly appears as a single blob in the document’s revision history.
The student can’t answer questions about their writing.

My students are at an age where they’re not very savvy about utilizing it, and the gap between their expected language skills and the level of language that AI pumps out is quite large. I can see how it gets much harder to detect as students get older and wiser to workarounds, alongside the language gap narrowing.

That said, I have no doubt that some of my savvier students are using AI and “getting away with it.”

Why?

Writing will always have an analog loophole.

Instead of copy/pasting an AI-generated text into an assignment, if you instead have it generate an essay in another tab or on another device, but then you proceed to write THAT essay yourself, then you have circumvented every single tell I mentioned above.

This takes more work: you’re not so much copying the essay word for word as you are using it as a “first draft” that you’re reconfiguring into your own words to make it less obvious, but that’s still much easier than writing the essay from scratch. Nevertheless, if I check revision history, I’ll see writing over time. I’ll see words and sentence structures that the writer would be using. They’ll also have to read it closely enough in their rewrite that they’ll catch rogue quotes or hallucinations.

If this is done, it’s pretty much indistinguishable from a standard academic essay, in big part because academic essay writing is so formulaic.

Do I have students who are doing this? Almost certainly. Do I have any way of detecting it or countering it? Nope. So, when I say I’m catching students using AI, what I’m really doing is catching the obvious, low-effort instances.

The open question for me is whether using the “analog loophole” is a circumvention or just a foundational skill for the future. Plenty of adults I know now use AI regularly to draft emails and whatnot, with them doing an edit/rewrite of the text to make it sound less like AI. This is exactly what some of my students are likely doing as well. Who am I to tell them they shouldn’t when that is now how so many adults do their jobs daily? The administrators at my school recently told us they use AI “multiple times a day, every single day.”

Is this simply the new “way” of writing? Will generating your own text word by word be seen as too cumbersome for the future? We don’t have any problem with turning to calculators and spreadsheets for more than simple arithmetic (and, even then, many still do it for simple arithmetic). Will text generation be looked at similarly, where you know what you want to say but you rely on AI to actually say most of it for you in the first place?

Is this a new, convenient “boxed cake mix” that makes ~~baking~~ writing from scratch seem unnecessarily difficult and old-fashioned in comparison?

16 votes

[18]

Interesting

December 17, 2024

Link

I suspect that the most common method of detecting AI use by teachers at the middle and high school level is that the writing is at a level well above what the student is actually capable of...

I suspect that the most common method of detecting AI use by teachers at the middle and high school level is that the writing is at a level well above what the student is actually capable of using. It's easy enough to ask a student the meaning of a word they used, or just to summarize their own work and fail them if they don't know.

That leaves the ability to successfully cheat only to students who are at least smart enough to edit their "work" to something they are actually capable of, at least.

College is likely trickier because professors tend to be less familiar with the capabilities of individual students. But college essay prompts are more specific, and I suspect the vague answers that LLMs tend to give won't be adequate for longer papers. Or perhaps the student would need to prompt body paragraph by body paragraph? That would at least require understanding essay construction.

20 votes

[16]
DefinitelyNotAFae
December 17, 2024
Link Parent
Someone I follow on social media is having an issue with falsely being accused of using AI in grad school with a "60% of this essay is AI" result. I think grads are more likely to get falsely...

Someone I follow on social media is having an issue with falsely being accused of using AI in grad school with a "60% of this essay is AI" result. I think grads are more likely to get falsely flagged, personally, because they are more likely to use the language that a lot of the detectors will flag for. (Neurodivergent people are likely to flag too)

One of the more common strategies I've seen, and what saved this student, was working in Google docs and being able to pull detailed timelines of drafts and edits. That only works so far though.

21 votes
1. [8]
  dustylungs
  December 17, 2024
  Link Parent
  My high school sophomore daughter is taking a college English class outside of school and has had a similar same experience. I literally watched her write an essay (in the form of reviewing her...
  
  My high school sophomore daughter is taking a college English class outside of school and has had a similar same experience. I literally watched her write an essay (in the form of reviewing her drafts at many stages of writing) and know she didn't use any AI. Her teacher, however, gave her a zero and said her essay was flagged by an AI checker as appearing to have a high percentage chance of having been written by AI. My daughter then ran the essay through every AI checker she could find and saw a wide variety of high and low scores. When she talked to the teacher about this, the teacher's response was only, "you can tell me that you are not using AI to write your essay, but how do I know that?" She went on to tell her that she should use AI checkers for future assignments to make sure her submitted assignments don't appear as AI-assisted. So my daughter's writing style (for this class at least) is now adapting to the fickle selective pressures of a gauntlet of AI checkers.
  
  27 votes
  1. redwall_hp
    December 17, 2024
    Link Parent
    Some of LLMs' most heavily-focused training data is the corpus of published works written by skilled authors, whether it's academic papers, novels or nonfiction books. As something that more or...
    
    Some of LLMs' most heavily-focused training data is the corpus of published works written by skilled authors, whether it's academic papers, novels or nonfiction books. As something that more or less just predicts the statistical likelihood of one word following the chain of prior output words, like tapping the word suggestions on a phone keyboard, the biggest distinguishing mark of LLM output is that it's statistically likely to sound like quality writing from a professional.
    
    Thus, this is literally punishing someone for being too good at writing.
    
    13 votes
  2. [3]
    balooga
    December 17, 2024
    Link Parent
    Just pepper it with spelling errors and incorrect punctuation! That's a surefire way of knowing a human wrote it! (That's the lesson we should be learning from this right?)
    
    Just pepper it with spelling errors and incorrect punctuation! That's a surefire way of knowing a human wrote it! (That's the lesson we should be learning from this right?)
    
    11 votes
    
    [2]
    dustylungs
    December 17, 2024
    Link Parent
    Honestly, I was very tempted to call the teacher myself and ask if that's what she was expecting.
    
    Honestly, I was very tempted to call the teacher myself and ask if that's what she was expecting.
    
    9 votes
    
    ThrowdoBaggins
    December 19, 2024
    Link Parent
    I wonder if submitting all the early and intermediate drafts along with the essay might help her case, because at least that way the teacher can see for themselves how some ideas were changed or...
    
    I wonder if submitting all the early and intermediate drafts along with the essay might help her case, because at least that way the teacher can see for themselves how some ideas were changed or fleshed out along the way, and might lend some credibility?
    
    1 vote
  3. ShroudedScribe
    December 17, 2024
    Link Parent
    My condolences, this sounds incredibly frustrating. People talk about how being successful at school is different from being successful at real-life applications of that knowledge, and this seems...
    
    My condolences, this sounds incredibly frustrating. People talk about how being successful at school is different from being successful at real-life applications of that knowledge, and this seems to be another horrible layer to support that statement.
    
    The college I'm attending uses "Turnitin" for scanning papers, and I don't believe it uses AI - it's been around for a very long time. I know that it has a catalog of all papers scanned through it, because it will tell you if a section is similar to another paper, and say it came from "Blah Blah University." It also includes various webpages, such as .gov and .edu hosted public articles.
    
    But some instructors put very strict guidelines on it - "Your paper can't have more than 20% similarity. So don't restate the questions." Even when I submit papers that should, in theory, return a 0% similarity, two things are almost always flagged - the page numbers in the header of each page, and one or more of my references (in the reference section, not in-text citations). So I guess the only way to avoid that is to break the writing convention (MLA or APA) that requires page numbers, and only cite sources that are obscure enough that no one else has.
    
    This also encourages rewording cited material, which can be considered plagiarism in many cases, even if you cite the author.
    
    8 votes
  4. [2]
    SteeeveTheSteve
    December 17, 2024
    Link Parent
    ... someone could just have an AI rewrite it until it passed the AI checkers. Heck, they could figure out what the checkers are looking for and just have it written to pass. Would it not be...
    
    ... someone could just have an AI rewrite it until it passed the AI checkers. Heck, they could figure out what the checkers are looking for and just have it written to pass.
    
    Would it not be smarter to pull her aside and quiz her on her own report? If she actually wrote it then it should be easy to recall the info.
    
    3 votes
    
    tauon
    December 18, 2024
    Link Parent
    Issue is that this, in general, is impossible to predict. Not only because there’s a bunch of different ones out there and no outsider knows their inner workings (justifiably so). For many of the...
    
    what the checkers are looking for
    
    Issue is that this, in general, is impossible to predict. Not only because there’s a bunch of different ones out there and no outsider knows their inner workings (justifiably so).
    
    For many of the same reasons, OpenAI literally cancelled their own “AI text detection” tool. It’s not possible to do so verifiably and reliably enough to justify the potential pitfalls with the implications when there’s a false positive. But a bunch of companies offering a subscription and a supposedly easy fix to people who don’t know how LLMs work won’t care about that.
    
    5 votes
2. [6]
  Interesting
  December 17, 2024
  Link Parent
  My condolences to your friend, none of those AI detectors are anywhere near good enough to be worth even consulting. I've anecdotally heard that many teachers are now requiring multiple drafts and...
  
  My condolences to your friend, none of those AI detectors are anywhere near good enough to be worth even consulting.
  
  I've anecdotally heard that many teachers are now requiring multiple drafts and change history for submitted documents. I've also heard of them collecting the assignment and then asking in class for a handwritten 1 paragraph summary of the work handed on.
  
  15 votes
  1. [5]
    Weldawadyathink
    December 17, 2024
    Link Parent
    Drafts and change history sound awful for the way I write papers. I have never been able to successfully do the outline then 2-3 drafts that teachers seem to want. I hope the teachers are at least...
    
    Drafts and change history sound awful for the way I write papers. I have never been able to successfully do the outline then 2-3 drafts that teachers seem to want. I hope the teachers are at least flexible about what a “draft” consists of.
    
    10 votes
    
    [2]
    Interesting
    December 17, 2024
    Link Parent
    I'm with you, I had a bad habit in school of figuring out my thesis only when I was 75% of the way through the paper. I would write a sad little intro paragraph to start with and then go back and...
    
    I'm with you, I had a bad habit in school of figuring out my thesis only when I was 75% of the way through the paper. I would write a sad little intro paragraph to start with and then go back and write a real one right before my conclusion.
    
    Google docs history would probably be fine for me though, since I do a lot of moving around sentences and rewording things as I write a paragraph, in a way that would very obviously not be me retyping an LLM essay.
    
    10 votes
    
    smiles134
    December 17, 2024
    Link Parent
    That's pretty typical honestly and it's something I'd often recommend to my students. There's a false belief that writing is a linear process -> you start with your thesis, then you write your...
    
    That's pretty typical honestly and it's something I'd often recommend to my students. There's a false belief that writing is a linear process -> you start with your thesis, then you write your supporting paragraphs, then your conclusion, but it's really a lot messier than that.
    
    My students would do their papers in 3 drafts. First draft is about ~50% of the overall word count (and just gets graded on completion basis). Then we'd meet individually to talk about plans for the rest of the paper, what's working so far and what could be strengthened. And it was entirely up to them what that 50% looked like, so often I'd get a sentence or two of an "intro" and then body paragraphs without any connective tissue, so in our meetings we'd discuss organizational strategies, and how they could reorder things for the best logical arguments.
    
    2nd draft would be 75% of the paper, and I would require at the very least a thesis statement to be present at this point. They'd do a peer review so they could see how a neutral audience was responding to their arguments, if there were any gaps in their logic, etc. This also gets graded by completion.
    
    3rd draft is the graded draft. I would give a lot of comments throughout the paper and then overall feedback at the end. At that point, students could revise and resubmit for a new grade if they chose.
    
    13 votes
    
    [2]
    vektor
    December 17, 2024
    Link Parent
    I mean, for the sake of validating whether you wrote it yourself, as long as there isn't a gap between two states that goes from "this is a prompt" to "this is ChatGPT's reply", it's sufficient.
    
    I mean, for the sake of validating whether you wrote it yourself, as long as there isn't a gap between two states that goes from "this is a prompt" to "this is ChatGPT's reply", it's sufficient.
    
    5 votes
    
    Minori
    December 17, 2024
    Link Parent
    You'd hope so, but I've seen stupid requirements from university professors before...
    
    You'd hope so, but I've seen stupid requirements from university professors before...
    
    7 votes
3. vektor
  December 17, 2024
  Link Parent
  Any AI detector used in such a manner needs to be field tested to at least a basic degree by the institution using them. That would weed out almost all and cast sufficient doubt on any that...
  
  Any AI detector used in such a manner needs to be field tested to at least a basic degree by the institution using them. That would weed out almost all and cast sufficient doubt on any that remain. Plus, it would inform the people doing the testing about just how difficult the task they're asking actually is.
  
  A very basic field test could look as such: Enter an essay that was definitely not AI. Could be archived essays from a few years ago, could be written during a proctored exam, whatever. Get some students to supply "authentic" cheating attempts using AI tools; so have them make it look natural. See what the tool says about both those groups of works.
  
  Since we're leveling grave accusations at students here (expulsion being on the table), take the results from above, and shoot for 99% statistical confidence. 95% if the intended consequences are less severe (e.g. only down-grading the work in question).
  
  I'd be surprised if you could clear even 80% with any tool currently on the market.
  
  Barring that kind of statistical performance, the only other evidence I'd accept that "this was AI actually" would be if the AI company supplies the time the piece was generated, along with the user that had it generated. And no, asking ChatGPT when it generated a certain piece is not sufficient, the LLM itself does not have the information required but could easily hallucinate it. If OpenAI offers a "proctor"-API for exactly this purpose, and OpenAI specifically says that this is what the API does, then go for it. AFAICT, no one does this currently though. But they surely have the records that would make it possible.
  
  11 votes
stu2b50
December 17, 2024
Link Parent
You can also tell ChatGPT to be dumber, or give a sample of your writing to base the style and grammar of the essay to be.

That leaves the ability to successfully cheat only to students who are at least smart enough to edit their "work" to something they are actually capable of, at least.

You can also tell ChatGPT to be dumber, or give a sample of your writing to base the style and grammar of the essay to be.

5 votes

[2]

roo1ster

December 17, 2024

Link

Universities have ?gotten away with? using prof(or TA):student ratios that make it all but impossible for the teacher/grader to do anything more than lightly skim essays looking for key words and...

Universities have ?gotten away with? using prof(or TA):student ratios that make it all but impossible for the teacher/grader to do anything more than lightly skim essays looking for key words and basic reasoning.

My hot take: "AI exposes educational bankruptcy of current educational system"

*this really only applied to non-STEM undergrad classes... STEM classes having the benefit of objectively right/wrong answers to questions (and 0 essays).

10 votes

skybrian (OP)
December 17, 2024
Link Parent
This is a study done at one school. Before drawing broader conclusions, we would have to know how rigorous the essay graders are at that school. (Admittedly, the headline I wrote does invite wider...

This is a study done at one school. Before drawing broader conclusions, we would have to know how rigorous the essay graders are at that school.

(Admittedly, the headline I wrote does invite wider speculation.)

3 votes

skybrian (OP)

December 17, 2024

Link

Here is the abstract:

Researchers have discovered that even seasoned exam graders may find it difficult to identify responses produced by Artificial Intelligence (AI). This study, carried out at the University of Reading in the UK, is part of an initiative by university administrators to assess the risks and benefits of AI in research, teaching, learning, and assessment. As a consequence of their findings, updated guidelines have been distributed to faculty and students.

Here is the abstract:

The recent rise in artificial intelligence systems, such as ChatGPT, poses a fundamental problem for the educational sector. In universities and schools, many forms of assessment, such as coursework, are completed without invigilation. Therefore, students could hand in work as their own which is in fact completed by AI. Since the COVID pandemic, the sector has additionally accelerated its reliance on unsupervised ‘take home exams’. If students cheat using AI and this is undetected, the integrity of the way in which students are assessed is threatened. We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.

6 votes

donn

December 17, 2024

Link

Clearly, the solution is to use another AI model to lie about whether an essay is written by Generative AI or not.

10 votes

SteeeveTheSteve

December 17, 2024

Link

I do not envy academics right now. Scientific researchers too, wonder how many reports have been published with fake data generated by AI? It's bad enough we have bots "contributing" to wiki, at...

I do not envy academics right now. Scientific researchers too, wonder how many reports have been published with fake data generated by AI? It's bad enough we have bots "contributing" to wiki, at times writing entire articles about made up things.

5 votes

[2]

zipf_slaw

December 17, 2024

Link

shrug save your drafts. all of them, with unique time-stamp names. boom, exculpatory evidence.

shrug save your drafts. all of them, with unique time-stamp names. boom, exculpatory evidence.

4 votes

Noox
December 17, 2024
Link Parent
Honestly, why even drafts? We have autosave in every major document writing programme.. both Word and Google Doc have history of every line you write. Heck, turn on track changes if you really...

Honestly, why even drafts? We have autosave in every major document writing programme.. both Word and Google Doc have history of every line you write. Heck, turn on track changes if you really wanna drive the point home.

3 votes

[6]

Comment deleted by author

Link

[2]
stu2b50
December 17, 2024
Link Parent
I mean all those are somewhat orthogonal to what most people actually do. Being able to take time, to research, and to compile long form text beyond what can be done in an hour is not perfectly...

I mean all those are somewhat orthogonal to what most people actually do. Being able to take time, to research, and to compile long form text beyond what can be done in an hour is not perfectly represented by an in-class exam.

3 votes
1. asparagus_p
  December 17, 2024
  Link Parent
  You're right, and I think some kind of mix is required, such as a long-form essay as well as an oral exam/discussion to see how much the student can synthesize their essay. If there seems to be a...
  
  You're right, and I think some kind of mix is required, such as a long-form essay as well as an oral exam/discussion to see how much the student can synthesize their essay. If there seems to be a marked difference between their comprehension and what they wrote, it's a red flag. It won't stop some cheaters, but if the student uses AI to help with their essay, but then studies it, learns it, and can discuss it intelligently and with critical thinking, then maybe no harm no foul? The end goal may well have been achieved.
  
  2 votes
sparksbet
December 17, 2024
Link Parent
I think this is doable in some classes, but not for all the courses and situations where instructors are the most concerned about AI essays. Essays are used heavily in humanities courses for very...

I think this is doable in some classes, but not for all the courses and situations where instructors are the most concerned about AI essays. Essays are used heavily in humanities courses for very obvious reasons and cannot always be replaced by the options you describe, and even STEM graduates need to write papers for journals and theses/dissertations. Without courses to teach at least the basics of writing and editing something that would take more time to write than an in-class session, many students will not have the opportunity to gain these skills to use when they need them.

1 vote
[2]
smiles134
December 17, 2024
Link Parent
Probably because it's simply not possible to produce a 10-12 page essay entirely in a classroom setting, which are necessary parts of upper-level literature courses and a number of other...

Probably because it's simply not possible to produce a 10-12 page essay entirely in a classroom setting, which are necessary parts of upper-level literature courses and a number of other humanities disciplines.

1 vote
1. [2]
  
  Comment deleted by author
  Link Parent
  1. smiles134
    December 17, 2024
    Link Parent
    The only one of those that's a realistic option is an oral exam, which some disciplines do do, but they're time consuming to schedule so it may not be feasible for many instructors who are already...
    
    The only one of those that's a realistic option is an oral exam, which some disciplines do do, but they're time consuming to schedule so it may not be feasible for many instructors who are already overloaded with their courses.
    
    I'm in favor of overhauling our assessment methods and higher Ed in general but just saying "do all writing in class" isn't viable for writing heavy classes.

infpossibilityspace

December 18, 2024

Link

Not related to the study, but I'm curious if you could compare a students coursework with their in-person exam essays and see how their essay content and writing style changes, then use that as...

Not related to the study, but I'm curious if you could compare a students coursework with their in-person exam essays and see how their essay content and writing style changes, then use that as evidence they cheated in their coursework?
Maybe use automated sentiment analysis or some other tool (not AI) to provide repeatable concrete data about the writing style?

1 vote