13 votes

Flawed algorithms are grading millions of students’ essays

5 comments

  1. kfwyre
    Link
    I'm not surprised they're doing machine grading of essays. The testing companies didn't exactly do a great job with human grading prior to this: Here's another scorer's experience, this one from...
    • Exemplary

    I'm not surprised they're doing machine grading of essays. The testing companies didn't exactly do a great job with human grading prior to this:

    Test-scoring companies make their money by hiring a temporary workforce each spring, people willing to work for low wages (generally $11 to $13 an hour), no benefits, and no hope of long-term employment—not exactly the most attractive conditions for trained and licensed educators.

    Scoring is particularly rushed when scorers are paid by piece-rate, as is the case when you are scoring from home, where a growing part of the industry’s work is done. At 30 to 70 cents per paper, depending on the test, the incentive, especially for a home worker, is to score as quickly as possible in order to earn any money: at 30 cents per paper, you have to score forty papers an hour to make $12 an hour, and test scoring requires a lot of mental breaks.

    Here's another scorer's experience, this one from 2015:

    Because there was very little supervision and hundreds of temps, there were scorer's who came to work and clocked in, left the building, and came back to clock out at the end of the shift, never scoring one test for the day and received paychecks. Some of the scorer's would have cocktails at the bar (located in the same plaza as the grading site) during lunch or on break, and then return to scoring on the computer, others showed up "functionally" intoxicated. Some temps fell asleep while scoring or put their head down on the keyboard and went to sleep. Some of these people possibly graded a portion of your child's standardized test.

    Most people I talk to about standardized testing, even some other teachers, are surprised to find out that it is driven by for-profit companies. These are not the ones we should be entrusting to dictate anything in education, which should, in principle as well as in execution, be insulated from any sort of market forces or profit motive.

    Unfortunately, that's not the case in the United States, and testing companies will do what they can to maximize profits first and foremost -- students be damned. Machine grading is just another step down that road, likely because it's even cheaper and easier than hiring temps.

    If I come across as defeated and this comment comes across as little more than a pessimistic, status-quo-accepting "well, what do you expect?", that's because I am, and it is. I've made no secret here on Tildes about being burnt out in my career, and a big part of that is the ideals of education are in complete conflict with the execution. I no longer go into work to teach every day but to gather data, analyze data, and prepare students for data-gathering moments (as well as a thousand other auxiliary things that have been compressed into my role). Everything is standardized and scripted and metricked, and even though that sounds nice and objective, it's actually constantly shifting to keep us feeling inadequate and unsuccessful. The goalposts are always moving, and never in our favor.

    I have never once had a year in my career where the message from the data has been "great job -- keep doing what you're doing!" I don't know any other teacher who has. If we ever get any sort of commendation, it's always tied to a "but..." Listen closely in any staff meeting in any school across America and you'll hear the distinctive sound of the other shoe dropping. Keep listening year after year and you'll either hear the same shoe being picked up to be dropped again, or you'll hear someone bringing a brand new shoe to the room -- one that's sure to fall.

    There's always some new initiative, some new weak point, some new rabbit to chase. Most often these cost money, and most often that money goes to a company who is not held accountable via test scores or any other means. If a company's solution works and our test scores rise, the company gets our money. If the company's solution doesn't work and our test scores don't rise, they still get our money! We're the ones left holding the bag, which makes our hands all the more convenient to slap.

    As we know from articles like this one, so much of the very real, very damning conclusions being drawn in education are based on junk data and premises anyway. Educational policy and prevailing mindsets treat numbers like they're sacred simply because they're numbers without interrogating any of the processes that went into getting them. They insist on the implementation of suffocating accountability measures for schools without applying any sort of accountability to the companies operating alongside education, whose profit motives have a direct and vested interest in continued failure in education rather than in success.

    9 votes
  2. Akir
    Link
    I can't believe how completely inept and idiotic our leaders in public education can be. Computers are terrible at understanding human language. Even the best algorithms aren't perfect, and they...

    I can't believe how completely inept and idiotic our leaders in public education can be. Computers are terrible at understanding human language. Even the best algorithms aren't perfect, and they usually only work well when dealing with a very small contextual area. Yet here we are ruining childrens' futures because of extremely gullable people who are willing to spend hundreds of thousands of dollars on the private companies offering their solution instead of spending a little more to hire people to do the work.

    10 votes
  3. [3]
    vakieh
    Link
    Why is 'susceptible to human bias' a reason against AI rather than human marking? Cause I didn't want to be the one to tell you this, but the alternative is really susceptible to human bias, being...

    Why is 'susceptible to human bias' a reason against AI rather than human marking? Cause I didn't want to be the one to tell you this, but the alternative is really susceptible to human bias, being human and all...

    1. [2]
      imperialismus
      Link Parent
      I think the greater issue is that natural language processing simply isn’t sophisticated enough at present to understand a free-form essay. Not even close. The fact that you can feed in nonsense...

      I think the greater issue is that natural language processing simply isn’t sophisticated enough at present to understand a free-form essay. Not even close. The fact that you can feed in nonsense and get a good score demonstrates this clearly. A computer algorithm today simply cannot judge whether a longer text addresses a topic in a coherent and rational manner, formulating good arguments, using legitimate sources in a conscientious manner, etc. Instead they rely on proxies like sentence length and vocabulary. Human graders aren’t nearly as easily fooled by complete nonsense.

      When it comes to replicating human bias inherent in the training material, that’s not really a point in favor of humans so much as a demonstration that AI fails to improve on humans. But there is a greater risk that people will assume that an algorithm must be objective (after all, a computer has no feelings) and therefore assume that any bias present must actually represent some sort of ground truth. While in fact the algorithm may simply be replicating bias found in the training data set, which was prepared by humans. Just another example of lies, damned lies and statistics.

      12 votes
      1. Emerald_Knight
        Link Parent
        A solid example of this is the time image recognition software flagged black people as gorillas. The training data used failed to account for race, so the machine learning algorithm failed to...

        A solid example of this is the time image recognition software flagged black people as gorillas. The training data used failed to account for race, so the machine learning algorithm failed to account for skin color. Human beings create the training data, so naturally that data is subject to human bias.

        6 votes