13 votes

Teacher effects on student achievement and height: A cautionary tale

Posted November 29, 2019 by skybrian

Tags: education, teachers, students, studies, usa

https://www.nber.org/papers/w26480

Link information

This data is scraped automatically and may be incorrect.

Word count: 180 words

6 comments

[5]
kfwyre
November 29, 2019 (edited November 29, 2019)
Link
Thanks for posting this. I actually thought we were out of the woods with value-added modeling for teachers, but this and some searching I just did helped me realize that it's still in effect in...
- Exemplary
Thanks for posting this. I actually thought we were out of the woods with value-added modeling for teachers, but this and some searching I just did helped me realize that it's still in effect in some places. It's sad that this is still persisting.

For those unfamiliar with the terminology, value-added measurement is the belief that you can distill down an individual teacher's effect on student achievement into a single score. This score, allegedly, shows you which teachers are effective and which teachers are ineffective. Modern American educational policy reform has treated "ineffective teachers" as a bugbear and cites them as the primary issue impacting American students today. If you'd like to read one of my lengthy rants specifically on this topic, here you go.

One of my favorite (and alarmingly, still relevant) dives into value-added modeling is here (web archive link, because the original was loading inconsistently for me). In 2012, New York released value-added data for its teachers, collected from 2007 to 2010. The author performs some sanity checks on the data as a way of seeing if the measures actually do what they intend to and meaningfully measure teacher quality. For example, he supposes that an effective teacher is likely effective across multiple subject areas, so he plots the data for all teachers who taught ELA and math, comparing their ELA value-added score to their math value-added score. Presumably we should see some correlation, with the most ineffective teachers being ineffective in both and the most effective being effective in both, but there is none. The points on his graph make a better rectangle than a line.

Of course, maybe the change in subject matter is more significant than we thought. Maybe it's very hard to be effective in two different subject areas, so, what about teachers who taught the same subject? He then graphs the scores of teachers who taught the same subject, but in two different grade levels. Presumably here, the value-added scores would more closely align, as it makes sense that a teacher would be comparably effective in their 4th grade math classes as they are with their 5th grade math classes. Again, there's no correlation. Another rectangle.

Value-added modeling is yet another attempt to apply data as a form of accountability to education, but the primary problem with this method is that accountability is seemingly never applied to the data itself. Anyone even mildly familiar with data analysis is familiar with the "garbage in, garbage out" concept, but educational policy seems to have run with "garbage in, gospel out." It doesn't matter how the numbers are arrived at or whether they're precise, accurate, or even representative: what matters is that they're numbers, which feel like cold hard facts.

The easiest way to see this on the national stage is the fact that 21 states have adopted machine-graded essays for standardized testing. Here's a choice quote from another article on the subject:

One year, she says, a student who wrote a whole page of the letter "b" ended up with a good score. Other students have figured out that they could do well writing one really good paragraph and just copying that four times to make a five-paragraph essay that scores well. Others have pulled one over on the computer by padding their essays with long quotes from the text they're supposed to analyze, or from the question they're supposed to answer.

I wonder what their teachers' value-added scores were, and whether their pay was impacted? Surely these students had effective teachers, right? Well, there's a surprising answer for that too, from a senior researcher at the Educational Testing Service:

"If someone is smart enough to pay attention to all the things that an automated system pays attention to, and to incorporate them in their writing, that's no longer gaming, that's good writing," he says. "So you kind of do want to give them a good grade."

For millions of people out there, the data about student achievement that is driving many of the influential educational decisions and conclusions all the way from individual students to entire states, is in the hands of shitty algorithms, not humans, overseen by companies that value themselves, not students. Does this seem like a structure that benefits student achievement, or does this seem instead like something that benefits the bottom line of testing companies?

To get a sense of what accountability measures feel like on the ground, consider Indiana, which recently changed the test it uses for its statewide standardized assessments. The state's passing rates dropped precipitously on the new exam, and the State Superintendent and Governor both asked for "hold harmless" legislation in response.

Earlier this week McCormick, along with Gov. Eric Holcomb, and a string of lawmakers released statements calling on legislators to pass a “hold harmless” exemption, which would protect educators and schools from the negative effects of low scores. State test results can impact a school’s A-F grade and decisions about teacher pay.

Let's look at the language here. "Hold harmless". "Protect educators and schools from the negative effects of low scores." This news article unintentionally captures how data is used in almost purely punitive ways in contemporary education. Not only does a purely punitive environment perpetuate a toxic culture where everyone feels under the gun, but it's also antithetical to pedagogical frameworks themselves. If a child is failing, it means they need more support. But if a teacher, school, or district is failing, it means they need less? How does that make any sense?

Even outside of the language, the fact that they have to even ask for legislation to avoid damage from test scores means they're abiding by the "garbage in, gospel out" principle. Did the drastic change in test scores reflect the fact that they changed over to a completely new testing system, or did this instead reflect a sudden acquired statewide ignorance, whereby students suddenly knew less and teachers magically became less effective overnight? The latter is completely nonsensical, but it's how the data is being treated -- as if there's some magical truth to it.

In fact, if there is truth to be found in the numbers there's still little comfort, because the disparity in the data between the old test and the new one lets us know that at least one data set is lying. Either the old test was inaccurately measuring student achievement, or the new one is. The two can't coexist with different truths. If they do, it means that two tests are measuring completely different things, which defeats the purported point of standardized testing in the first place. There's no way to look at this that doesn't produce the conclusion that some, maybe all of this data, is garbage.

And if the data is garbage, then the analysis is garbage. And if the analysis is garbage, then the conclusions we draw from it are garbage. Unfortunately, this seems to go completely uncontested in educational discourse, so instead we have to do this ridiculous dance around numbers like they're unassailable and valid. Garbage in, gospel out. It's been this way for 20 years now.

12 votes
1. [3]
  skybrian (OP)
  November 29, 2019
  Link Parent
  And so, we're back to square one? We've tried hard and unfortunately don't know any objective, systematic way to tell good teachers from bad? I guess, just use good judgement? With all the bias...
  
  And so, we're back to square one? We've tried hard and unfortunately don't know any objective, systematic way to tell good teachers from bad? I guess, just use good judgement? With all the bias that might cause?
  
  I'm wondering what you think of another article.
  
  4 votes
  1. [2]
    kfwyre
    November 29, 2019
    Link Parent
    Ultimately I believe assessing teacher effectiveness to be a red herring. If we could adequately fill all teaching positions, and if teachers stayed in those positions, then focusing on teacher...
    
    Exemplary
    
    Ultimately I believe assessing teacher effectiveness to be a red herring. If we could adequately fill all teaching positions, and if teachers stayed in those positions, then focusing on teacher quality might be a primary concern and an important lever for improvement, but it feels completely disingenuous to prioritize teacher quality above all else when we have widespread teacher shortages and high attrition rates.
    
    I started teaching in a low-income district that had 70% staff turnover within the first five years of employment. SEVENTY percent. It was a revolving door. I achieved colloquial "veteran" status at my school in my second year simply because I didn't leave. There was nothing special about my school. It was a standard, inner-city public school, and this issue exists in pretty much every urban area nationwide. Sure, they were continually gathering data on our effectiveness, but is that the most important data point to consider when the teachers themselves are changing year after year?
    
    Furthermore, I firmly believe that teacher effectiveness is not an individualizable trait. At the school I started working at, I spent over $2000 of my own money on supplies my first year. I got DonorsChoose grants for essential classroom items. Most days the one copier we had in the school was broken, and I had no technology for students, so we did a lot of work on whiteboards and they copied things down onto notebook paper. My largest class had 44 students in it. I didn't have enough desks for all of them in that I literally couldn't fit that many in the room, but enough students were absent on any given day that usually everyone had a seat to sit in.
    
    I was, ostensibly, not a good teacher in my time there, but can that be attributed to me? What about my district, which failed to provide adequate resources and instead spent most of their money on consultants and middle-management? What about my principal, who failed to address scheduling concerns that caused my largest class to balloon to 44 kids? What about the students themselves? Don't they have some ownership in their education too?
    
    In my second year at the school, our principal poached a great teacher from a better, more suburban school. She had won many awards and was a local legend. She'd taught for 15+ years, and he was able to convince her to give up her position at the school she loved because she'd be able to have "such a big impact" at our school, which was doing considerably worse than hers.
    
    I remember seeing her in tears in week two. She didn't bring greatness to our school. Instead it broke her. She became a terrible teacher on account of the situation she was in. To me, teacher quality is more reflective of environment than it is a producer of it. Teachers who have resources, are supported, and are enabled to do their best work will, in aggregate, be of better quality than those who have no resources, aren't supported, and are limited to only being able to produce their worst work.
    
    Now, I will clarify that there are some genuinely awful teachers out there. I've worked with some. A teacher who completely fails to do their job should lose it, and even in places with strong union contracts that's possible, but it's a bit of a process. These teachers are few and far between, however. They're outliers. I've worked with hundreds of teachers in my career and I can count the genuinely awful do-nothings on one hand, and identifying them hasn't needed a judgment call because they've failed to even meet the most basic requirements of the job.
    
    There's a widespread belief that we as teachers, particularly those of us in unions, want to close ranks and protect them, but believe me, nobody wants them gone more than us. A deadweight teacher who is unwilling to do their part makes things worse for everyone. In a class with no instruction, half the students get upset that they aren't learning anything and the other half learn how fun it is to slack off and waste time. It kills students' trust in the educational system and reinforces a negative skillset and behaviors. This makes things more difficult for the other teachers who have those students. Not only that, but it's a bit of a slap in the face to us. We're out there busting our asses to make sure these kids learn and someone else is sitting behind their desk showing movies every day? Hell no. The only sympathy I have for teachers like that is that I think many of them once cared but were broken by the educational system and couldn't leave. Teaching is a dead-end career that's hard to get out of.
    
    I haven't read the other article you posted, but I'll give my thoughts there once I've had time to digest it.
    
    9 votes
    
    skybrian (OP)
    December 3, 2019
    Link Parent
    It occurred to me that an unfortunate downside of not being able to figure out which teachers are effective is that it might also be difficult to justify (to the skeptic) that paying more will...
    
    It occurred to me that an unfortunate downside of not being able to figure out which teachers are effective is that it might also be difficult to justify (to the skeptic) that paying more will help. Turnover certainly seems bad, but on the other hand if you can't show that students do better with experienced teachers then some might ask, why pay more?
    
    (And more generally, so many different attempts at improving education have been tried.)
    
    1 vote
2. arghdos
  November 29, 2019
  Link Parent
  That's insane. Maybe you give them a good grade on their programming and algorithms classes for pulling something like that, but for an essay???
  
  "If someone is smart enough to pay attention to all the things that an automated system pays attention to, and to incorporate them in their writing, that's no longer gaming, that's good writing," he says. "So you kind of do want to give them a good grade."
  
  That's insane. Maybe you give them a good grade on their programming and algorithms classes for pulling something like that, but for an essay???
  
  3 votes
skybrian (OP)
November 29, 2019
Link
From the abstract:

From the abstract:

Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice.

6 votes