Effectiveness of Person Fit Indices in Item Response Models with Different Degrees of Item Local Dependence

Yaqoub Z. Al Shaqsy, Yousef A. Abu Shindi, Rashid S. Almehrizi

Abstract


This study aimed to examine the effectiveness of person fit indices (Wright’s weighted index, Drasgow index and Almehrizi’s weighted index) in item response models with different degrees of item local dependence (0.0, 0.3, 0.6, and 0.9) using simulated item parameters. Item responses for 40 samples each with 10000 subjects (a total of 400000 subjects) were simulated on a test of 60 items. Item discrimination parameters ranged between 0.19 and 1.79 and item difficulty parameters ranged between -2 and +2. 20% of test items were manipulated to show local dependence for each level of local dependence degrees. Student ability was generated to follow a standard normal distribution. Assumptions of item response theory were examined in all data sets using exploratory factor analysis and residual analysis using NOHARM platform for unidimensionality and Q3 index for local independence. Results showed that there was an increase in the percentages of non-conforming persons when increasing the degree of items local dependence for the three person fit indices (Wright’s weighted index, Drasgow index and Almehrizi’s weighted index). Results showed also that the percentages of non-conforming persons were larger with Wright’s weighted index than with Drasgow index and Almehrizi’s weighted index. The distributional properties of the three indices showed relatively consistent in distributional properties. Drasgow index and Almehrizi’s weighted index were very similar distributional properties. Also, there was a larger agreement index between Wright’s weighted index and Drasgow index.

Keywords


Item response theory, person fit index, item local dependence.

Full Text:

PDF

References


أبو شندي، يوسف (2008). تأثير تعدد الابعاد للاختبار والعلاقة بينها على تقديرات معالم فقراته: دراسة محاكاة (رسالة دكتوراه غير منشوره). جامعة اليرموك، الأردن.

جراح، بندر (2009). مقارنة مؤشرات مطابقة الشخص لنماذج استجابة الفقرة باستخدام بيانات فعلية (رسالة دكتوراه غير منشورة). جامعة اليرموك، الأردن.

علام، صلاح (2005). نماذج الاستجابة للفقرة الاختبارية أحادية البعد ومتعددة الابعاد وتطبيقاتها في القياس النفسي والتربوي. القاهرة: دار الفكر العربي.

الشريفين، نضال (2003). بناء مقياس اتجاهات معلمي العلوم نحو العمل المخبري. المجلة الأردنية في العلوم التربوية. 2(3)، 169-187.

Al-mahrazi, R. (2003). Investigating anew modification of the residual- based person fit index and its relationship with other indices in dichotomous item response theory (Unpublished Doctoral Dissertation). University of Iowa, USA.

Almehrizi, R. (2010). Comparing among new residual-fit and wright's Indices for dichotomous Three-Parameter IRT model with standardized tests. Journal of Educational and Psychological Studies. Sultan Qaboos University. 4(2), 14-26

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polytomous item response models and standardized indices. British Journal of Mathematical and statistical psychology, 38, 67-86.

Gruijter, D., & Kamp, L. (2005). Statistical test theory for education and psychology. Retrieved December 30,

from: www.leidenuniv.nl.

Hambleton, R. K. & Swaminathan, H. (1985). Item Response theory: principles and applications. Boston, MA: Kluwer- Nijhoff. Publishing.

Hambleton, R. K. and Swaminathan, H. & Rogers, H. J. (1991). Fundamental of item response theory. Newbury Park,CA:Sage.

Hatti J. (1985). Methodology review: Assessing unidimentionality of test and items. Applied Psychological Measurement, 9,139-164.

Hayes, H. (2012). A generalized partial credit FACETS model for investigating order effects in self-report personality data. (Unpublished Doctoral Dissertation) –School of Psychology in the Academic Faculty.

Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six Person-Fit Statistics. Applied Measurement in Education, 16(4), 277-298.

Lee, Y. (2004). Examining passage- related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test Language testing. Applied Psychological Measurement, 21(1), 74-100.

Levine, M. V. & Rubin D. B. (1979). Measuring the appropriateness of multiple - choice test scores. Journal of Educational Statistics, 4, 269-290.

Li, M. F., & Olijnike, S. (1997). The power of rasch person-fit statistics in detecting unusual person patterns. Applied Psychological Measurement, 21, 215-231.

Linn, R & TatsukaK. (1983). Indications for detecting unusual patterns: Links between two general approaches and potential application. Applied psychological measurement, 7, 81-96.

Lopez, A., & Montesinos, H. (2005). Fitting rash model using appropriateness measure statistics. The Journal of Psychology, 8, 11-105.

Lord, F. M. & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley

Meijer, R. (1996). Person-fit Research: An Introduction. Applied Measurement in Education, 9, 3-8.

Meijer, R. R., & Van,K,S. (1999). The null distribution of the person-fit statistics for conventional and adaptive tests. Applied psychological Measurement, 23, 327-345.

Meijer, R., & Sijtsma, K. (2001). Methodology review: evaluating person fit. Applied Psychological Measurement, 25,107-135.

Miller, T. R. (1991). Empirical estimation of standard errors of compensatory MI model parameters obtained from the NOHARM estimation program. (ACT Research Report No. onr91-2). Iowa City IA: ACT Inc.

Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412

Smith, R. M (1991). The distribution properties of Rasch Item fit statistics. Education and Psychological Measurement, 51,541-565.

Snijders, T. (2001). Asymptotic null distribution of person fit statistics with estimated person parameters. Psychometrika, 66, 331-345.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Education Measurement, 114, 96-115.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.

Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago. MESA Press.

Yen, W. (1992). Scaling Performance Assessments: Strategies for Managing Local Item Dependence. Journal of Educational Measurements, 30(3), 187-213.




DOI: http://dx.doi.org/10.24200/jeps.vol14iss1pp41-53

Refbacks

  • There are currently no refbacks.




Copyright (c) 2020 Yaqoub Z. Al Shaqsy, Yousef A. Abu Shindi, Rashid S. Almehrizi

JEPS 2017-CC BY-ND

This journal and its content is licensed under a Attribution-NoDerivatives 4.0 International.

Flag Counter