I got this interesting e-mail from Vlad Kogan, who preceded me as an education reporter here. He wrote back to both Tyler Cramer and Jill Kerper Mora, who had a blog-off earlier this week in Schooled about tying teacher evaluation (and possibly pay) to test scores. Check out his response.
I found the discussion of the issue very interesting. Unfortunately, I think that the guest bloggers were talking past each other on several issues.
For example, Jill points out that different teachers may start out with a different “endowment” of students in their classrooms, which might in turn affect their ability to raise test scores. Yet Tyler points out that this problem can be overcome by comparing similarly situated teachers, in other words, teachers with similar “endowments” of students. This is a statistical process known as matching.
Jill points out that individual student scores likely vary year-to-year, but one could argue that this random noise should cancel out when you compare large enough groups of students, instead of simply comparing a single student over time.
In the end, linking test scores to pay would require some very sophisticated statistical methods to make teachers comparable and overcome some of the inherent measurement validity and reliability problems with tests. Yet, the more sophisticated statistical methods one uses, the stronger the technical assumptions required.
For example, the matching method Tyler describes has what statisticians call the “selection on observables assumption” — meaning that we have to assume that two teachers who are, as Tyler puts it, “similarly situated” based on things we can observe are also similarly situated on things we can’t observe or measure but which may also affect test performance.
Even the most basic regression models, which try to measure the effect of some treatment while “holding constant” other variables have implicit assumptions about the functional form of the relationship, the distribution of variables, correlation (or lack of) in error terms, etc. If these assumptions are violated, the results we get are inaccurate and biased in one way or another. The problem with all of these assumptions is that they are, in fact, assumptions — they’re things we often can’t test and must take on faith.
The public policy question is whether we think it’s fair to tie the compensation of real people to arbitrary statistical assumptions. If the assumptions are right, we may get improvement in the quality of education. If the assumptions are wrong, we’re going to have an unfair system of teacher pay. It seems like it’s up to the political process to decide whether the tradeoff is worth it, though we first have to make sure that voters and — more importantly — policymakers understand the tradeoffs involved. It’s not clear that they currently do.