Measuring Rater Reliability on a Special Education Observation Tool

Document Type


Publication Date




This study used generalizability theory to measure reliability on the Recognizing Effective Special Education Teachers (RESET) observation tool designed to evaluate special education teacher effectiveness. At the time of this study, the RESET tool included three evidence-based instructional practices (direct, explicit instruction; whole-group instruction; and discrete trial teaching) as the basis for special education teacher evaluation. Five raters participated in two sessions to evaluate special education classroom instruction collected from two school years, via the Teachscape 360-degree video system. Data collected from raters were analyzed in a two-facet “partially” nested design where occasions (o) were nested within teachers (t), o:t, and crossed with raters (r), {o:t} x r. Results from this study are in alignment with similar studies that found multiple observations and multiple raters are critical for ensuring acceptable levels of measurement score reliability. Considerations for the feasibility of practice should be observed in future reliability and validity studies on the RESET tool, and further work is needed to address the lack of research on rater reliability issues within special education teacher evaluation.