Dear Nicholas Kristof:
Not you too? I have always looked to you as the defender of just causes; a voice of reason in times of crisis. I agreed with your passionate opening in your New York Times column Students Over Unions, September 12, 2012, noting the role of poverty as a factor in “the most important civil rights battleground” and that “the most crucial struggle against poverty is the one fought in schools.”
In adding your opinion to the Chicago teacher’s strike, you considered that today’s inner-city urban schools, “echo the ‘separate but equal’ system of the early 1950s. In the Chicago Public Schools where teachers are now on strike, 86 percent of children are black or Hispanic, and 87 percent come from low-income families.”
In this opinion piece, you also made the good points that I look for in your columns:
- The single most important step we could take has nothing to do with unions and everything to do with providing early-childhood education to at-risk kids.
- Teachers need to be much better paid to attract the best college graduates to the nation’s worst schools.
However, you lost me at, “How does one figure out who is a weak teacher?”
Your solution is to have schools look at value added measurements (VAM) using test data. You suggest that researchers are improving the use of VAM and that, “with three years of data, it’s usually possible to tell which teachers are failing.”
Before you put your faith in VAM, you might have perused, John Ewing’s article “Mathematical Intimidation: Driven by the Data” i
n the publication Notices of the American Mathematic Society.
Ewing’s thesis in the article addresses a common misuse of mathematics that “is simpler, more pervasive, and (alas) more insidious: mathematics employed as a rhetorical weapon—an intellectual credential to convince the public that an idea or a process is ‘objective’ and hence better than other competing ideas or processes.”
As the president of the organization”Math for America”, Ewing disputes the use of tests to evaluate teachers, schools, or programs, and he short lists four of the most important problems:
1. Influences. Test scores are affected by many factors, including the incoming levels of achievement, the influence of previous teachers, the attitudes of peers, and parental support. One cannot immediately separate the influence of a particular teacher or program among all those variables.
2. Polls. Like polls, tests are only samples. They cover only a small selection of material from a larger domain. A student’s score is meant to represent how much has been learned on all material, but tests (like polls) can be misleading.
3. Intangibles. Tests (especially multiple-choice tests) measure the learning of facts and procedures rather than the many other goals of teaching. Attitude, engagement, and the ability to learn further on one’s own are difficult to measure with tests. In some cases, these “intangible” goals may be more important than those measured by tests.
4. Inflation. Test scores can be increased without increasing student learning. This assertion has been convincingly demonstrated, but it is widely ignored by many in the education establishment. In fact, the assertion should not be surprising. Every teacher knows that providing strategies for test-taking can improve student performance and that narrowing the curriculum to conform precisely to the test (“teaching to the test”) can have an even greater effect. The evidence shows that these effects can be substantial: One can dramatically increase test scores while at the same time actually decreasing student learning. “Test scores” are not the same as “student achievement”.
In pointing out the flaws of VAM in testing, Ewing concludes:
“Of course we should hold teachers accountable, but this does not mean we have to pretend that mathematical models can do something they cannot. Of course we should rid our schools of incompetent teachers, but value-added models are an exceedingly blunt tool for this purpose. In any case, we ought to expect more from our teachers than what value-added attempts to measure.”
Ultimately, Ewing determines the tool, the data from a single metric, used to measure teacher performance is fundamentally flawed. I ask you to consider what other profession evaluates on a single metric?
Evaluate performers in any other profession and note the number of metrics used to determine success. Athletes have pre-season games, games, playoffs all of which give important data to determine improvement over time. Multiple industries release profit statements quarterly while parsing through the tremendous amount of targeted consumer data now available. Lawyers, doctors, and other professions are ranked not by single cases, but by professional performance accrued case by case. Government agencies use multiple measurements to determine progress in various sectors (employment, demographics, investments,etc) and provide monthly reports to determine progress; even the presidential race has a primary before the election. Yet there are those who would want teachers to be evaluated using the metric of a single test, taken one day out of one school year.
The single metric test is given state by state to measure growth in skills and subject area content in reading, writing, math and science. Elementary school teachers and teachers at the high school in these subject areas receive the most scrutiny. Many state standardized tests are given at specific grade levels. In other words, in my state of Connecticut, teachers in 5th, 8th or 10th grade who teach one of the “core” classes carry a different evaluation burden; their test results are widely publicized as the school ranking against other schools. Elective teachers (art, PE, music, foreign language) or “off-year testing” teachers do not receive the same level of examination by the public.
However, I do not advocate increasing tests at every grade level or in every subject in order to even the playing field. You write that the reliance on tests and VAM “are stirring skepticism and anger among teachers” because the evaluation system is being created by those who do not have authentic or extensive classroom experience. Instead, the evaluation system is being handed over in large part to the testing industry, and that testing industry lives in an incestuous relationship with publishing and educational “support” developers. The testing industry proclaims a school system’s success by evaluating data from a test, but within that same industry are multiple businesses that profit from a a school system’s need to purchase programs and materials they promote as necessary to pass the standardized test. “Failing the standardized tests? You need our reading/writing/math/science program!”
Before you hop onto the bandwagon with those advocating a one-test metric, consider how this opinion piece “Students over Unions” differs in both research and sentiment from your columns that bring national attention to the poor and the disenfranchised. So different was the tone of this piece as to have a caught the attention of several other writers who called you out specifically: Sarah Jaffe from Truth Out Five So-Called Libreral Pundits that are Attacking Teachers; and Education Week contributor Larry Ferrlazzo on his blog “When Bad Ideas Happen to Good Columnists” to name two.Valerie Strauss from the Washington Post in Why Rahm Emanuel and The New York Times are Wrong about Teacher Evaluation also found and incorporated the Ewing argument in explaining the lack of union support from your paper. You even got into a Twitter-tiff with education advocate Diane Ravitch http://Twitter.com/DianeRavitch
Ultimately, I am confident that you would not want one column, specifically this one column, to be used to define you for an entire year. You would not want one metric to measure your success as writer for the New York Times. You would not want one single opinion piece be used as measurement in order to evaluate your annual performance.
Well, neither do teachers.