Can Robots Grade Essays As Well as Humans?

| February 13, 2012 | 4 Comments
  • Email Post

Flickr: Slaving

By Jennifer Roland

Essays have long been considered the gold standard for measuring students’ understanding of a subject. But because multiple-choice tests have been graded by machines, making them easy and relatively inexpensive to administer, these sub-standard assessments are primarily what schools use for standardized test.

But is it possible to combine the more nuanced understanding that essays give us with the ease of automated grading? The Hewlett Foundation would like to find out. Last month, the organization announced a $100,000 prize that will go to the software designer who can best create an automated essay grader. The goal, they say, is to find an efficient tool that will encourage schools to use essays more regularly for assessments.

“[High cost and slow turnaround] typically mean that many school systems exclude essays in favor of multiple-choice questions, which are less able to assess students’ critical reasoning and writing skills,” Hewlett Foundation wrote in its announcement. “Rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And the more we can use essays to assess what students have learned, the greater the likelihood they’ll master important academic content, critical thinking, and effective communication,” said Barbara Chow in the release.

But there’s valid skepticism around robo-graders. Les Perelman, director of the Writing Across the Curriculum Program at MIT, who’s a noted critic of these types of tools, believes automated essay graders are incapable of measuring anything but superficial elements of an essay — and they do a bad job of that, too.

Automated essay graders prefer a standard five-paragraph essay format. They assess the maturity of vocabulary by counting the number of letters in the words and the number of words in each sentence. They look for the use of quotations, especially in the concluding paragraph, as a marker of an effective argument. And they check grammar and spelling. These elements are not the hallmark of good writing, Perelman says. In fact, most college writing instructors use their first-year writing courses to “deprogram students from writing five-paragraph essays,” he says.

But there are teachers who use these kinds of tools and appreciate their ability to grade papers quickly and help students become more comfortable with revising their work before submitting for a final grade.

Joe Swope, an A.P. and community college psychology teacher, uses the program SA Grader to grade his students’ written work. He says in addition to noticing that his students’ A.P. psychology test scores have improved, the automated grader has made him a “better teacher because it has freed me up to talk to students.”

“It teaches kids that writing is a process,” Swope says, which creates a love-hate relationship with writing for many of the kids he teaches. At first, they don’t like writing and revising – they’d rather just submit the paper and be done with it. But they’ve come to like the instant feedback as the course progresses. And, he says, it “helps prepare kids for the rigorous process of the A.P. test.”

His students complete one essay per week, and they can submit and resubmit as many times as necessary to get the grade they’re happy with. The program grades the essays immediately, which allows them to see their weaknesses while the subject matter is fresh in their minds. “By the time I read the papers and wrote comments and handed them back,” he says, it was often too late for students to benefit from the feedback.

 

One of the criticisms Swope has encountered is that automated graders are for lazy teachers. But, he says, “just because the software is grading doesn’t mean I don’t read student work,” he says. He tracks student performance and assesses student essays so he can “find trouble spots and start a dialogue that’s more than just comments in the margin.”

Swope recommends products such as SA Grader for science and social science fields where the type of knowledge communicated in essays is easily measured by an automated process. “SA Grader would not be a good fit for, say, an English class.” Like the A.P. test, it doesn’t grade writing style, he continues; “it is content-driven.”

But Perelman believes there is no discipline in which an automated essay grader is useful. “Technology can be used well in a lot of really useful ways,” he says, but it’s a mistake to think it can be used to grade an essay. Instead of relying on automated grading tools, he says Microsoft Word can be used to assess spelling and grammar much more accurately than any essay scorer on the market. Essays should be graded by a human — that’s the only way to assess the nuance, structure and content of student writing, he says.

Hewlett’s Foundation competition, which is being supported by the two main testing consortia, the Partnership for Assessment of Readiness for College and Careers (PAARC) and Smarter Balanced Assessment Consortium (SBAC), which received $365 million from the U.S. Department of Education to develop new assessments, will be conducted in two phases. The first part will analyze existing vendors’ work, and the second will open up the competition to the public.

Related

Explore: , ,

  • Email Post
  • Science Teacher

    One note I would leave for the eyes of Les Perelman, director of the Writing Across the Curriculum Program at MIT , is the sad reality of human grading being necessarily no better than the software versions.  The ironic thing is that we’ve come to believe that the English teacher especially, and no offense intended, is a perfect grader of written work.  The fact is that there are many English teachers out there teaching rules of grammar that don’t exist, requiring formats that are out of fashion, and utilizing techniques for evaluation that fail to fairly evaluate the writing at all.  There is blatant bias run amok.  I once oversaw a blind experiment where teachers were given a pile of papers to grade by students in a class they had been teaching for six months.  The cover pages with names were removed and the papers were coded.  Interestingly enough, students who had been receiving straight As on papers were now suddenly not so perfect, and likewise B & C students earned a few As.  How interesting.  This proved to me that grading of essays should always be blind.  Because human beings are inherently flawed, we succumb to errors both in judgment and in principle.  A machine does not.  Therefore, in fact, the machines may do no better nor no worse than a human in any given instance.  The belief that a human is necessarily going to do a better job than a computer is just plain erroneous. 

  • http://twitter.com/jeroenjeremy Jeroen Fransen

    Grading essays automatically sounds like a silver bullet, but does it solve the problem it set out to? 
    In this article the initial angle is that essays are the best way to see whether a student understands a subject. I would call this content evaluation. What most automatic essay scoring services do, however, is help teachers to evaluate students’ writing skills, an important skill by itself. Let’s not confuse the two.

    • Colin Monaghan

      You bring up an important distinction, Jeroen. Most automated essay scoring engines are based on statistical models that evaluate students’ writing skills. This seems to be what Hewlett is looking for in their contest, but I could be wrong.

      The program covered in the story, SAGrader, takes a fundamentally different approach and compares student work to a semantic network of content knowledge. This particular program is designed to help students understand content through writing and revision. So it’s really evaluating students’ ability to correctly explain and link concepts in their writing.

      Content knowledge and expressive skills can be linked, but it’s important to understand the difference when it comes to evaluation.

  • anne mareck

    It is useful here to consider Walter Ong’s essay, “Writing is a Technology that Restructures Thought.” An essay is a creative act, the formulation of a unique way of thinking about something. The actual clear meaning in a well-written essay is in itself the product of engaging in the often arduous and recursive act of writing itself. Each writer brings to the task their own unique world experience, their own unique value systems, their own unique voice, their own unique anecdotes. These social “ways of knowing” are woven into the essay’s fabric, integrated with other evidence in myriad ways. The *reason* writing is so difficult, is that it can’t really be taught. We can offer insight to structure, audience awareness, the use of logic and emotion, the notion of the writer’s credibility. We can provide insight into idea development and integration of support material. But the actual essay that is written is, in its own way, a unique work of art. A teacher who is accustomed to reading student essays, is also accustomed to deciphering the intended meaning while at the same time mentally comparing the structure, content, and critical reasoning against the given standards for a given course. We also take into account such things as second-language variations.  To program a machine to “grade” essays seems like another step into reductive thinking.