An Investigation into the Self-Consistency of Raters
The Case of a High-Stakes National Level Examination in Pakistan
Scoring of essays is a notoriously difficult task since raters bring in their own subjective and idiosyncratic criteria which may cause discrepancy in ratings. This variation may be across raters (inter rater reliability) or within a single rater (intra rater reliability).Both these variations are problematic and constitute a measurement error. They are also complementary and two sides of a coin since if the raters are not self-consistent, they can not be expected to be consistent with each others (Cho,1999; Douglas,2011).Hence testing organisitions and language testers adopt various procedures including provision of rating scales, training, monitoring of examiners, post rating adjustment of scroes and calculation of inter-rater and intra rater reliability of raters to control this measurement error. However,comapred to a reasonably large number of inter rater realibility studies,there is a scarcity of intra rater realibility studies. (Barkaoui, 2010; Cho, 1999; Jonsson & Svingby, 2007). This research investigates into intra rater reliability of a group of raters (n=94) who evaluated an essay set (n=25) twice (referred to as T1 &T 2 respectively) after a gap of a few weeks on a national level high-stakes examination in Pakistan. The raters were not provided with any rubric to replicate the actual practice. Comparison of each individual rater’s scores on T1& T 2 calculated by Cronbach alpha shows that most of the raters are highly self –consistent.
Copyright (c) 2022 Kashmir Journal of Language Research
This work is licensed under a Creative Commons Attribution 4.0 International License.