Residual-Based Person Fit Statistics over Test Sections

Rashid Almehrizi

Abstract


Most tests are composed of multiple sections (each section has group of items) such as different item formats, different content category, competencies, different difficulty levels, test dimensions, testlets, and interpretive exercise items. Students could show unexpected and unacceptable responses across these sections. Studying person fit over item level cannot detect aberrant response over test sections. The study proposes a residual-based person fit statistic over test sections with a dichotomous IRT model. The paper demonstrates the new section-level person fit statistic and investigates its distributional properties and power of detecting aberrance in person responses with comparison to Wright's between person fit statistic. The proposed section-level person fit statistic shows superior distributional properties with both true and real ability and item parameters. Moreover, the performance of the proposed person fit statistic is also examined with real data.

Keywords


Person fit, section-level, residual approach, dichotomous IRT models.

Full Text:

PDF

References


Almehrizi, R. (2013). Coefficient alpha and reliability of scale scores. Ap-plied psychological Measurement, 37(6), 438-459.

Almehrizi, R. (2016). Normalization of Mean Squared Differences to Meas-ure Agreement for Continuous Data. Statistical Methods in Medical Research, 25, 1975-1990.

Al-Mahrazi, R. (2004). Investigating a new modification of the residual-based person fit index and its relationship with other indices in dichotomous item response theory. Unpublished Ph.D Dissertation, University of Iowa.

Al-Mehrzi, R. (2010). Comparison among new residual-based person fit indices and Wright's indices for dichotomous three-parameter IRT model with standardized tests. Journal of Educational and Psychological Studies, Sultan Qaboos University, 4(2), 14-26.

Drasgow, F, Levine, M., V. & Mclaugh-lin, M., E. (1991). Appropriateness measurement for multidimensional test batteries. Applied Psychological Measurement, 15, 171-191.

Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look close-ly. Journal of Educational Measure-ment, 23, 283-298.

Felt,J. M., Castaneda, R., Tiemensma, J.,& Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Front Psychol., 8, 1-9.

Fox. J. P. & Marianti, S. (2017). Person fit statistics for joint models for accuracy and speed. Journal of Educational Measurement, 54, 243-262.

George, A. A. (1979). Theoretical and practical consequences of the use of standardized residuals as Rasch model fit statistics. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Hambleton, R. K., Swaminathan, H., Cook, L. L., Eignor, D. R., & Gifford, J. A. (1978). Developments in latent trait theory: Models, technical issues, and applications. Review of Educational Research, 48, 467-510.

Lord, F. M. & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score "equatings". Applied Psychological Measurement, 8, 453-461.

Meijer, R. R., Muijtjens, A. M. & Van der Vleuten, C. P. (1996). Nonparametric Person fit research: Some theoretical issues and an empirical example. Applied Measurement in Education, 9, 77-90.

Meijer, R. R. & Sijtisma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107-135.

Reckase, M. D. (1981). The validity of latent trait models through the analysis of fit and invariance. Paper presented at the annual meeting of the American Educational Research Association, Los Angeles, CA.

Smith, R. M. (1982). Detecting measure-ment disturbances with the Rasch mod-el. Unpublished doctoral disserta-tion. University of Chicago.

Smith, R. M. (1988). The distribution properties of Rasch standardized residuals. Educational and Psychological Measurement, 48, 657-667.

Smith, R. M. (1991). The distributional properties of Rasch item fit statistics. Educational and Psychological Measurement, 51, 541-565.

Smith, R. M. (1994). A comparison of the power of Rasch total and between-item fit statistics to detect measurement disturbance. Educational and Psychological Measurement, 54, 42-55.

Smith, R. M. (1996). A comparison of the Rasch separate calibration and between person fit methods of detecting item bias. Educational and Psychological Measurement, 56, 403-418.

Waller, M. I. (1981). A procedure for comparing logistic latent trait mod-els. Journal of Educational Measure-ment, 18, 119-125.

Wright, B. D. (1977). Solving measure-ment problems with the Rasch model. Journal of Educational Measurement, 14, 97-115.

Wright, B. D., & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.

Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.

Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.




DOI: http://dx.doi.org/10.24200/jeps.vol13iss4pp687-702

Refbacks

  • There are currently no refbacks.




Copyright (c) 2019 Rashid Almehrizi

JEPS 2017-CC BY-ND

This journal and its content is licensed under a Attribution-NoDerivatives 4.0 International.

Flag Counter