Challenge
A leading provider of language testing and educational services recognised the need to automate the creation of test content and the correction and marking of these tests.
Currently, correcting and grading texts takes up a lot of teachers’ time – not only because of the manual effort involved, but also because teachers formulate individual suggestions for improvement for each text. Another problem is that every assessment is subjective and can vary depending on the examiner. However, students who have to prove a certain language level to an authority or educational institution have an interest in a high degree of objectivity and a quick evaluation of their tests.
There is also a second challenge: the manual creation of test content is repetitive and must be carried out at short intervals. At the same time, there is a growing need for customised text tasks that address different language levels and text forms. Taking specific EU requirements into account increases the complexity and time required for this process.
The challenge for statworx was to prove with a proof-of-concept (PoC) that these processes can be automated and standardised. This should show that both the efficiency and consistency of the assessments can be increased.
Approach
In order to overcome these challenges, statworx initiated two use cases as part of the PoC: “Author AI” and “Rater AI”. The team developed a fully functional AI backend and an intuitive user frontend to interact with it.
Use case 1: Automated text evaluation (evaluator AI)
The first use case focussed on the implementation of an evaluator AI that evaluates texts at German B1 level based on defined criteria and provides feedback for improvement. The criteria include:
- Content appropriateness: are all key questions answered?
- Linguistic appropriateness: What is the quality of the sentences and linguistic expression?
- Formal correctness: Are the spelling and grammar correct?
Use case 2: Automation of text tasks (authoring AI)
The second use case is aimed at automating the creation of tasks using an authoring AI. This AI can generate texts at a specific language level and for a specific text form, depending on requirements and prompt. For example:
- Creating a text task
- Creating a cloze text
- Create solution options for the text
Both concepts and implementations now serve as the foundation for further use cases.
Result
These two use cases impressively demonstrate how AI technologies can improve efficiency and quality in the education sector. The automation and standardisation of assessment and text creation processes not only reduces the workload for teachers, but also sustainably optimises the learning experience for students.
Assessor AI: increased efficiency and consistency
The implementation of the teacher AI led to a significant increase in efficiency and consistency in text assessment. Through extensive testing, the team was able to prove that the AI is more robust and consistent in its assessment than human examiners. This was confirmed by measuring the evaluation agreement using the Cohen’s Kappa test statistic.
Authoring AI: potential for automation
Authoring AI shows great potential in the automation of text tasks. The previous steps for creating tasks, cloze texts and solution options have been successfully implemented.