Key Takeaways
- With AI use common among students, some teachers are now using AI tools to grade assignments.
- A University of Georgia study found AI grading systems are highly inaccurate compared to human graders.
- AI achieved only 33.5% accuracy when creating its own grading system and just over 50% accuracy when using a human-provided rubric.
- Researchers found AI takes shortcuts and lacks the deeper logical reasoning needed for proper grading.
- The study highlights significant concerns about relying on current AI technology for evaluating student work.
Many teachers are noticing the growing impact of artificial intelligence on students, from attention spans to the potential for cheating.
As AI tools like ChatGPT become widespread, student usage is soaring. One study indicated that a vast majority of university students incorporate AI into their academic work.
This trend has led some educators to counter AI use by employing AI chatbots to grade student submissions. As detailed in a report by Yahoo News, some teachers inform students that if they use AI to write assignments, AI will be used to grade them.
Conversely, other educators are embracing AI more positively, using it to personalize learning experiences or even requiring students to interact with AI as part of assignments.
However, AI may not be suited for grading complex student work, despite potentially saving teachers time. Research from the University of Georgia’s School of Computing investigated just how effective AI is at this task.
The study involved asking an AI model, Mixtral, to grade middle school homework responses. When the AI had to create its own grading criteria, its accuracy compared to a human grader was a mere 33.5 percent.
Even when researchers gave the AI a detailed rubric created by humans, its accuracy only slightly improved to just over 50 percent.
The researchers noted that while AI can score quickly, it often relies on shortcuts and doesn’t engage in the deeper reasoning expected in human evaluation.
Xiaoming Zhai, one of the researchers, explained that AI might incorrectly assume understanding based on keywords, missing nuances that a human grader would catch by analyzing the student’s actual writing.
While better rubrics might help bridge the gap somewhat, the study suggests a significant difference remains between AI and human grading capabilities.
This inaccuracy raises questions about the fairness and reliability of using AI for high-stakes tasks like grading, where errors can directly impact students.
The findings underscore that current AI technology struggles to replicate the nuanced judgment and understanding essential for effective teaching and evaluation.