An Investigation into the Self-Consistency of Raters

The Case of a High-Stakes National Level Examination in Pakistan


  • Athar Munir Department of English, University of Azad Jammu & Kashmir, Muzaffarabad
  • Nadeem Haider Bukhari Department of English, University of Azad Jammu & Kashmir, Muzaffarabad


Scoring of essays is a notoriously difficult task since raters bring in their own subjective and idiosyncratic criteria which may cause discrepancy in ratings. This variation may be across raters (inter rater reliability) or within a single rater (intra rater reliability).Both these variations are problematic and constitute a measurement error. They are also complementary and two sides of a coin since if the raters are not self-consistent, they can not be expected to be consistent with each others (Cho,1999; Douglas,2011).Hence testing organisitions and language testers adopt various procedures including provision of rating scales, training, monitoring of examiners, post rating adjustment of scroes and  calculation of inter-rater and intra rater reliability of raters to control this measurement error. However,comapred to a reasonably large number of inter rater realibility studies,there is a scarcity of intra rater realibility studies. (Barkaoui, 2010; Cho, 1999; Jonsson & Svingby,  2007). This research investigates into intra rater reliability of a group of raters (n=94) who evaluated an essay set (n=25) twice (referred to as T1 &T 2 respectively) after a gap of a few weeks on a national level high-stakes examination in Pakistan. The raters were not provided with any rubric to replicate the actual practice. Comparison of each individual rater’s scores on T1& T 2 calculated by Cronbach alpha shows that most of the raters are highly self –consistent.