Machine Learning for Marking Support

One promising application of Machine Learning in education is marking support. Colleagues in Jisc’s National Centre for AI have identified several products that implement a similar process, where a program “watches” a human marking assessments, learns in real time, and suggests how the quality and consistency of marking can be maintained or even improved. This seems an attractive human/machine collaboration, with each partner doing what it does best.

The approach actually involves two stages of Machine Learning (ML):

the first is trained by the vendor to extract sufficient information from students’ input to be able to recognise similar sections in different submissions. This may involve domain-specific knowledge, for example to interpret hand-written mathematical or scientific equations, but is likely to be the same for every purchaser of the system;
the second stage of training is performed by a human marker as they work on each assessment: typically by highlighting parts of the submission, marking them as positive (a relevant step in a calculation, for example) or negative, and attaching feedback. The ML can then apply its pre-learned idea of “similarity” to point out when another submission contains a similar point and suggest attaching the same mark and comment. The human can agree or disagree with the suggestion, in either case providing more information for the ML’s learning about that particular assessment.

This combination of human and machine offers advantages for both markers and students. Once the machine is making appropriate suggestions for points that appear in most submissions, the marker can quickly approve those. This lets the human focus on less common insights or misunderstandings, with more time to provide relevant feedback on those. Students should get more consistent marks and better feedback. Furthermore, most systems record the structure of feedback as well the content, so markers can review how often each piece of feedback was referenced and, for example, expand those relating to common misunderstandings. All students benefit from this enhanced feedback, not just those marked after the need for it was noticed.

In terms of AI regulation, this two-stage collaborative process has several attractions. The marker remains very much a human-in-the-loop, with both marks and feedback individually approved. The link between the human’s actions and the machine’s interpretation of them is quick and direct: well suited for what is referred to as “human oversight” and “correction”. Those are provided by humans who are experts in the domain where the AI is operating, not in AI, which insights from safety-critical systems suggest is a desirable feature.

The process should also provide clear signals (through rejected suggestions) when either stage of the ML isn’t working: this might indicate either that the marker isn’t highlighting enough of the submission for the ML to be able to recognise common features, or that the ML’s original training isn’t extracting sufficient meaning from the submissions. The draft EU AI Act concentrates on information flow from providers of AI systems to their users, but here there seems to be value in the provider inviting reports in the other direction: that “your system isn’t performing well in these circumstances” and either supporting the users with better instructions or improving the performance of the first stage ML.

By Andrew Cormack

Leave a Reply Cancel reply