Can Robots Grade Essays As Well as Humans?
By Jennifer Roland
Essays have long been considered the gold standard for measuring students’ understanding of a subject. But because multiple-choice tests have been graded by machines, making them easy and relatively inexpensive to administer, these sub-standard assessments are primarily what schools use for standardized test.
But is it possible to combine the more nuanced understanding that essays give us with the ease of automated grading? The Hewlett Foundation would like to find out. Last month, the organization announced a $100,000 prize that will go to the software designer who can best create an automated essay grader. The goal, they say, is to find an efficient tool that will encourage schools to use essays more regularly for assessments.
“[High cost and slow turnaround] typically mean that many school systems exclude essays in favor of multiple-choice questions, which are less able to assess students’ critical reasoning and writing skills,” Hewlett Foundation wrote in its announcement. “Rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And the more we can use essays to assess what students have learned, the greater the likelihood they’ll master important academic content, critical thinking, and effective communication,” said Barbara Chow in the release.
But there’s valid skepticism around robo-graders. Les Perelman, director of the Writing Across the Curriculum Program at MIT, who’s a noted critic of these types of tools, believes automated essay graders are incapable of measuring anything but superficial elements of an essay — and they do a bad job of that, too.
Automated essay graders prefer a standard five-paragraph essay format. They assess the maturity of vocabulary by counting the number of letters in the words and the number of words in each sentence. They look for the use of quotations, especially in the concluding paragraph, as a marker of an effective argument. And they check grammar and spelling. These elements are not the hallmark of good writing, Perelman says. In fact, most college writing instructors use their first-year writing courses to “deprogram students from writing five-paragraph essays,” he says.
But there are teachers who use these kinds of tools and appreciate their ability to grade papers quickly and help students become more comfortable with revising their work before submitting for a final grade.
Joe Swope, an A.P. and community college psychology teacher, uses the program SA Grader to grade his students’ written work. He says in addition to noticing that his students’ A.P. psychology test scores have improved, the automated grader has made him a “better teacher because it has freed me up to talk to students.”
“It teaches kids that writing is a process,” Swope says, which creates a love-hate relationship with writing for many of the kids he teaches. At first, they don’t like writing and revising – they’d rather just submit the paper and be done with it. But they’ve come to like the instant feedback as the course progresses. And, he says, it “helps prepare kids for the rigorous process of the A.P. test.”
His students complete one essay per week, and they can submit and resubmit as many times as necessary to get the grade they’re happy with. The program grades the essays immediately, which allows them to see their weaknesses while the subject matter is fresh in their minds. “By the time I read the papers and wrote comments and handed them back,” he says, it was often too late for students to benefit from the feedback.
One of the criticisms Swope has encountered is that automated graders are for lazy teachers. But, he says, “just because the software is grading doesn’t mean I don’t read student work,” he says. He tracks student performance and assesses student essays so he can “find trouble spots and start a dialogue that’s more than just comments in the margin.”
Swope recommends products such as SA Grader for science and social science fields where the type of knowledge communicated in essays is easily measured by an automated process. “SA Grader would not be a good fit for, say, an English class.” Like the A.P. test, it doesn’t grade writing style, he continues; “it is content-driven.”
But Perelman believes there is no discipline in which an automated essay grader is useful. “Technology can be used well in a lot of really useful ways,” he says, but it’s a mistake to think it can be used to grade an essay. Instead of relying on automated grading tools, he says Microsoft Word can be used to assess spelling and grammar much more accurately than any essay scorer on the market. Essays should be graded by a human — that’s the only way to assess the nuance, structure and content of student writing, he says.
Hewlett’s Foundation competition, which is being supported by the two main testing consortia, the Partnership for Assessment of Readiness for College and Careers (PAARC) and Smarter Balanced Assessment Consortium (SBAC), which received $365 million from the U.S. Department of Education to develop new assessments, will be conducted in two phases. The first part will analyze existing vendors’ work, and the second will open up the competition to the public.