Evaluating Reliability in Nested Group Settings.
The objective of this study was to present a novel approach to evaluate inter-rater reliability, capable of addressing any scenario, including nested groups, through the application of a mixed-effect model.
Authors
Zach Solan
Abstract
The foundation of decision-making relies on assessing the reliability of different sources of information. Therefore, the methods used to evaluate Interrater Reliability (IRR) are of significant importance. There are many approaches aimed at calculating the IRR based on the agreement among raters, including Cohen’s Kappa, Fleiss’ Kappa, Intraclass Correlation Coefficient (ICC), Krippendorff’s Alpha, and Bland-Altman analysis. These methods primarily assume that the proportion of observed agreement versus chance probability can be directly derived in a statistical manner. However, in certain cases, such as where informants are organized into nested groups, deriving the observed agreement versus chance probability is not straightforward. The main challenge arises in scenarios such as selfreported assessments by parents, where each nested group of paired informants reports on a different subject, each exhibiting different symptoms. This may result in varying levels of chance probability for each item and for each group. The objective of this study was to present a novel approach capable of addressing any scenario, including nested groups, through the application of a mixed-effect model. This model assesses IRR by utilizing the area generated by the random effects of individual curves, accounting for all sources of randomness. We demonstrate the power of the new approach to estimate the IRR both in a simulation and by applying it on a public available datasets. This approach can also be used to assess the IRR of aggregated indices that have induced bias, yielded from various combinations of items with different chance probabilities.