There are new teacher evaluation systems that show substantial promise, as NCTQ will highlight in an upcoming report. They enable school district leaders to better differentiate among teachers and drive a more rapid rate of improvement. But as with any systemic effort, we need to watch for and guard against any unintended consequences.
On the lookout for such potential consequences, Brian Stacy, Cassandra Guarino, and Jeffrey Wooldridge (of Michigan State University and the University of California at Riverside) find that teachers of initially lower-performing students earned less precise and more unstable value-added estimates than teachers of initially higher performing students. In other words, it is harder to assess the true impact on learning for teachers who happen to work in more challenging classrooms.
To counterbalance this effect, the researchers suggest a few changes, including acknowledging and reporting out the varying levels of confidence in a teacher's value-added estimate depending on her students. Computer adaptive testing—tests that adjust the difficulty level of test items to a student's demonstrated ability—also may help. By not swamping a student with overly challenging test items, schools may be able to decrease student guessing and non-responses, yielding more accurate test results.
All measures of teacher impact, not just value-added, come under scrutiny, including the old tried and true measure: classroom observation. For example, past research has found that observation scores also tend to be lower for teachers of lower-performing students, potentially indicating bias.
Does this mean that all attempts to evaluate teachers should be jettisoned? No, but it does mean mitigating the risks with such reasonable solutions as proposed here and—shall we say it once again?—always using multiple measures to assess teacher impact.