Making the Grades: My Misadventures in the Standardized Testing Industry
Todd Farley (2009, PoliPoint Press)
I love kitchen confidentials and attempted-sports-career memoirs; all the way back to the days of Nelly Bly, some of the best narrative nonfiction stems from a writer’s adopting a way of life, or a profession, and writing about it. Todd Farley has come up with a quirky job that would never have occurred to me: for over a decade, he read and scored the essay sections of standardized tests.
It wasn’t quite intentional: it’s just what Farley did for a living while waiting to be a writer. He had moved to Iowa City, so that if he succeeded in getting into the famous Writers’ Workshop, he’d qualify for in-state tuition; and the salt mines of the test-scoring business were the best-paying temporary work going. He hung with it long enough to became a consultant and trainer, eventually spending three years at the ETS in Princeton (an obscenely high-paying job, but hardly less boring than all the others.)
Assessing students based on answers marked in bubbles A through E has always been fraught with difficulty and hidden error. The addition of essay questions to standardized tests must have been intended as a reply to critics of multiple choice tests, but it has really just created a whole new set of problems. The first of these is the sheer volume of writing that has to be read: “The project was a war of attrition, but eventually we won, each of the 100,000 essays getting scored by two different people over the course of four weeks.” That’s a room full of one hundred people, (minus those who couldn’t stand the tedium and quit) reading an essay every two minutes, and slapping down a number.
Second, those numbers have to agree with one another. “The issue ... is not whether or not you appreciate or comprehend an essay; the issue is whether or not you can formulate exactly the same opinion about it as do all the people sitting around you.” Each day’s work begins with sample responses provided to the scorers, with the intended correct scores, and a set of standards, known as the rubric, on which they are allegedly (and ‘holistically’) based. This is followed by a roomful of argument: ”How can they give that a four??” The trainer says something like this: “It’s a 4. The range-finding committee says it’s a4, so it’s a 4. Makes sense.” No, it doesn’t, but that’s how it works. (And if it doesn’t, the supervisors will find ways to nudge the numbers into shape, individual results be damned.)
Third, the scorers themselves are a tremendously variable lot. Some of Farley’s funniest scenes are character sketches of his fellows in the trenches, who are there, at best, for the same mercenary reasons he is; those with any actual prospects usually move on quickly, leaving a disturbing remnant of genuine unemployables. As Farley rises through the ranks to group supervisor and then to trainer, he meets more and more people like ‘the guy who gave all ‘two’s’, or the man who believed that the essays he was reading were a psychological test being administered to him, not to mention the people who barely speak English. The work is just so tedious and annoying that you can’t count on getting reasonably sane, smart people to do it.
Farley disclaims any real interest in education as such, so he comes fairly slowly to some of the issues raised by his work. When the ETS adds an essay section to the SAT, he’s still pretty naive: “I imagined, given the enormity and importance of that test, there had to be some cadre of teaching professionals reading the responses.” Not so fast, there, Tonto, it’s the same cast of castoffs you’ve been training and working with.
When a scale has four or five possible scores, there’s (at least) a ten to twenty per cent chance that another scorer would have picked a different number; and the chances must get even higher out at the thin end of the wedge, because if a student happens to write a brilliant and erudite response, it’s not unlikely to be over the grader’s head. If the rubric says to look for ‘kind’ in the answer, ‘benevolent’ might be in line for a zero.
Making the Grades is funny, but less so, the more you think about it. The whole testing and scoring enterprise looks more and more fraudulent, not because of some evil genius somewhere, but because the System (including, in particular, No Child Left Behind) has generated a demand for Numbers, any Numbers; the evidence is that the numbers are at least partly unmoored from meaning, even in the superficial sense. Yet they have all kinds of consequences in the real world, from school budgets to college admissions. In his epilogue, Farley recommends that we look much more skeptically at all such numbers: “My default position about any test results getting returned to students, teachers, or schools is ‘I don’t believe.’”
On the deeper level, there’s this, garnered from Derrick Z. Jackson’s* appreciation of the late Gerald Bracey, a former analyst for the National Education Association, whom Jackson quotes saying this: “What say we take a moment to consider a few of the personal qualities that standardized tests do not measure: creativity, critical thinking, resilience, motivation, persistence, humor, reliability, enthusiasm, civic-mindedness, self-awareness, self-discipline, empathy, leadership, and compassion.’’
What are we paying for? What do we want?
Email, November 2009
Not All Appearances Are Deceiving
2 weeks ago