Value-added modeling (VAM) is touted as a way to evaluate schools and teachers scientifically and fairly. Value-added models appear to address some concerns with comparing difference among students growth, however there are serious methodological problems with the reliability and validity of these measures (Amrein-Beardsley, 2008). John Ewing, president of Math for America, recently wrote of his concern about the mathematics of value-added modeling being used as a rhetorical weapon to convince others of the objectivity and value of its findings (2011).
VAM sounds deceptively scientific, simple and straightforward: compare the results of different treatments to see which is most effective or adds the most value to the variable in question. Even when the variable of study is a drug treatment, something that is clearly controllable and measurable, many factors can affect the results, such as the commonly recognized and experienced placebo effect in which those who were part of the experiment got better even when they did not receive any treatment.
When it comes to student learning, the variables and treatments are so numerous, nuanced, intricate, interconnected, complicated and uncontrollable that it should be obvious that making conclusions on them is unreasonable. Some problems include the contamination, corruption and limitations of the data. For example, some classrooms will have a 100% turnover of students in one academic year. Even in classrooms with less, the various issues that arise with the coming and going of students are significant.
The other variables such as class size, learning resources, school conditions, personal safety, home conditions and social environments can make a big difference in student learning. For example, students in a small class with adequate resource in a high achieving school whose needs are meet and who are cared for by families, communities and peers with high expectations will likely do better than those who do not have these things, even if they had the same teacher. Is the teacher to be blamed?
Even if the test were fair, the question remains as to who evaluates those teachers who are not teaching the subjects tested, which are most of the teachers in the upper grades. It is rare, even in elementary schools, to have only one teacher for all subjects. Who do you blame in this situation? What about those teachers whose tests scores say they are not doing a good job, when our experience and common sense tells us they are? The harm done to individual teachers is evident, but what about schools, programs and teaching methodologies that get unfairly rated low based on poorly designed studies. We are currently making policy decisions based on these limited and misapplied approaches, such as VAM, that will influence education for years to come.
Leading educational researchers in the United States collaborated on a paper for the Economic Policy Institute Research in Washington D.C. entitled Problems with the Use of Student Test Scores to Evaluate Teachers field (Baker, Barton, Darling-Hammond, Haertel, Ladd, Linn, Ravitch, Rothstein, Shavelson, & Shepard, 2010). In that paper, they reported on research studies using VAM. In several studies, VAM found teachers’ effectiveness ratings of one year varied dramatically the following year. For example, one study where teachers were ranked in the top 20% of effectiveness the first year, less than a third remained in that group the second year and a third moved to the bottom 40%. Another study found only 4-16% predication rate for teacher effectiveness ratings from one year to the next. We would not expect the effectiveness of a teacher to vary so much from year to year, which strongly suggests the tests are not measuring teacher effectiveness (Baker, et al., 2010).
When it comes to using these tests and value-added models for teacher accountability, it is safe to say, “The research base is currently insufficient to support the use of VAM for high stakes decisions” (McCaffrey, Koretz, Lockwood, & Hamilton, 2003, xx). Not only are there serious problems with the methodology of VAM, there are significant concerns about the high stakes test used to determine effectiveness. Though scholars acknowledged that VAM approaches are fairer and stronger than previous ones, they stated,
Nonetheless, there is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed. (Baker, et al., 2010, p. 2)
The National Academy of Sciences National Research Council Board on Testing and Assessment stated “VAM estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable” (cited in Baker, et al., 2010, p. 2).