An Investigation into the Self-Consistency of Raters
The Case of a High-Stakes National Level Examination in Pakistan
Abstract
Scoring of essays is a notoriously difficult task since raters bring in their own subjective and idiosyncratic criteria which may cause discrepancy in ratings. This variation may be across raters (inter rater reliability) or within a single rater (intra rater reliability).Both these variations are problematic and constitute a measurement error. They are also complementary and two sides of a coin since if the raters are not self-consistent, they can not be expected to be consistent with each others (Cho,1999; Douglas,2011).Hence testing organisitions and language testers adopt various procedures including provision of rating scales, training, monitoring of examiners, post rating adjustment of scroes and calculation of inter-rater and intra rater reliability of raters to control this measurement error. However,comapred to a reasonably large number of inter rater realibility studies,there is a scarcity of intra rater realibility studies. (Barkaoui, 2010; Cho, 1999; Jonsson & Svingby, 2007). This research investigates into intra rater reliability of a group of raters (n=94) who evaluated an essay set (n=25) twice (referred to as T1 &T 2 respectively) after a gap of a few weeks on a national level high-stakes examination in Pakistan. The raters were not provided with any rubric to replicate the actual practice. Comparison of each individual rater’s scores on T1& T 2 calculated by Cronbach alpha shows that most of the raters are highly self –consistent.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Kashmir Journal of Language Research
This work is licensed under a Creative Commons Attribution 4.0 International License.