In a previous post, I discussed Sadler’s (2005) admonitory observation of aggregating grades as a proxy for outcome attainment. Careful design, however, including signalling which components of each outcome have influenced the grades in each assessment, mitigates the concern, but this practice is not yet assimilated into mainstream assessment design. In coming posts, we will outline a pragmatic approach that upskills course designers and developers to achieve this very alignment, but meanwhile, the issue remains.
Secure vs unsecure
Many courses are currently being developed with a mix of secure (invigilated) and insecure (non-invigilated) assessments. But with the ever-increasing certainty that gen-AI is becoming a natural part of the educational landscape, how we aggregate the unsecure and secure assessment grades to arrive at a final evaluation of a student’s performance is of great importance. Our problem with this is not as it was before, where we could mitigate a lot of the issue through a strong academic integrity policy. Now we must assume that students will use generative AI to augment their learning, which means that the unsecure assessments are going to be influenced by it.
Grade inflation
If these assessments are influenced by AI the responses are likely to improve. Unless we alter the marking guidelines and/or rubrics to even out the improvement, we will see grade inflation. So, consider a ratio of secure to non-secure at 60/40. Up until now, we have had a relatively even playing field between the secure and non-secure levels of difficulty, which has allowed for the two types to aggregate their scores to represent a range of outcomes from fail to high distinction (note, the aggregation has numerous flaws that we discuss elsewhere). However, if students enter the secure assessments with an inflated grade, the secure assessment will need to be made significantly harder to maintain the typical range distribution. What this means is that the majority of questions will need to be pitched at the D and HD levels, making them inaccessible for most of the learners.
It is important to remember that the Intended Learning Outcome should be pitched at the pass level. After all, it is what we are saying the learner can do if they ‘pass’ the outcome. By restricting the secure assessments to questions pitched to the D or HD levels, the assessments are now not validly or fairly testing the outcomes. They are testing a higher level of the outcome, which technically makes it a different outcome, which therefore means that the teaching and learning activities need to be pitched at that level too.
Robbing Peter to pay Paul
I’m not sure that is ethical, and it’s certainly not pedagogically sound – where the majority of learners will fail the secure assessments to guarantee the final distribution. To compensate, some course developers may panic and simply keep the same difficulty level but apply a hurdle to all secure assessments, but I’m not sure that is the answer either. It’s certainly complex, and certainly pressing.
In the next post, I will discuss how the marking guidelines and/or rubrics can be adapted to compensate for the use of gen-AI in the assessments.
References
Sadler, D. R. (2005). Interpretations of criteria‐based assessment and grading in higher education. Assessment & Evaluation in Higher Education, 30(2), 175–194. https://doi.org/10.1080/0260293042000264262
I’m Paul Moss. I manage educational design at the University of Adelaide. Follow me on Twitter @edmerger