Comparing Science and Engineering Students Using the Force Concept Inventory in Introductory Physics Courses

: This work is intended to analyze and compare the performances of two groups of students on the understanding of force and motion concepts using the Force Concept Inventory (FCI). The FCI test serves questions on basic Newtonian concepts where the answers include the correct response and commonly misconceived alternatives. The FCI test was implemented twice as pre and post-tests for two introductory calculus-based physics courses offered at the Sultan Qaboos University (SQU) in Oman for students mainly from the Colleges of Science, Education and Agriculture and the students from the College of Engineering in the Spring 2017 and Spring 2018 semesters. These courses cover the traditional first-year level kinematics and dynamics in translational and rotational motions based on the same syllabus and the same textbook. Hake’s normalized gain, defined as the change in class averages divided by the maximum possible increase, was used to compare the students’ performances. The normalized gains for both groups of students were in the low gain category. Female students in both courses performed better on the FCI in general, but the difference was only statistically significant in the course offered to science students.


Introduction
oncept inventories are research-based assessment techniques that study students' understanding of certain physics concepts. They are designed mainly to measure the effectiveness of teaching by determining students' performances in the given course. The Force Concept Inventory (FCI) [1] is a multiple-choice test to measure the students' understanding of Newton's concepts and related kinematics, and in addition to identifying the misconceptions of the students in these topics. The FCI is one of the most reliable tests that has been implemented in physics education and is used in educational institutions all over the world. The questions in the FCI involve concepts only and avoid problem solving.
In this work we used the original version of FCI [1] consisting of 29 multiple-choice questions that are classified into six conceptual dimensions, kinematics, Newton's first, second and third laws, the superposition principle and kinds of forces. The questions are intentionally designed in such a way that, while the correct answer shows a certain Newtonian concept, the remaining four answers give some information on the common-sense alternatives or misconceptions, and the FCI was designed to examine each one of them [2]. Martin-Blas et al. [3] and Bani-Salameh [4][5] analyzed their data to identify dominant misconceptions of their students. Caballero et al. [6] compared two different curricula at Georgia Institute of Technology in the USA using the FCI and found that post instruction FCI averages were significantly higher for their traditional curriculum than for the Matter & Interactions (M&I) curriculum where both were taught with similar interactive pedagogy.
We administered the FCI for students in Sultan Qaboos University who took two calculus-based physics courses, namely, Physics 1 (PHYS2101) and Physics for Engineers 1 (PHYS2107). PHYS2101 is offered to students from the Colleges of Science, Education and Agriculture. Students from other colleges are also enrolled in PHYS2101 in order to be able to major in STEM (Science, Technology, Engineering, and Mathematics) subjects. PHYS2107 is offered to students from the College of Engineering. Each course is coordinated by one of the instructors who are involved in teaching the courses. These courses cover the traditional first year level kinematics and dynamics in translational and rotational motion. In both courses, the same textbook by Walker, Halliday and Resnick, "Principles of Physics", 10 th Edition is used. Each course has typically 250-300 students, divided into 3-4 sections.
These sections are taught by several instructors from the Physics Department, in two time slots per week, each one running for one hour and twenty minute lectures, conducted in large lecture halls. In addition, each student attends a 3h laboratory and 3h tutorial session in alternating weeks within groups of maximum 60 students. Attendance of lectures, labs and tutorials is mandatory. Tutorials are conducted on problem solving by two instructors, with one-toone discussions with the students. The total period of instruction is 15 weeks per semester.
We present the FCI results for Spring 2017 and Spring 2018 semesters. We calculated the normalized gain [7], defined as the change in class averages divided by the maximum possible increase, for two groups of students and compared it with the results obtained in the other institutions in the Gulf region. For example, in UAE G.W. Hitt et al. [8] obtained the average normalized gain on FCI as 0.16 ± 0.01 over several semesters of traditional instruction, and in Saudi Arabia a normalized gain of 0.10 was found (more details are given in [4]).
We compared the FCI performances between genders by using the normalized gains. Results obtained for the gender difference in PHYS2101 and PHYS2107 groups are not in tune with most of the studies performed using the FCI. Normandeau et al [9] reported the presence of gender disparity on the FCI in a sample of Canadian undergraduate students. Madsen et al. [10] reviewed the literature and reported that a gender gap exists in FCI scores and gains in favor of males across several institutions from the USA and UK. Docktor and Heller [11] from University of Minnesota, USA, found a significant gender gap in pre-test FCI scores that remained post-instruction, wherein male students outperformed females. Bates et al. [12] investigated the gender gap in conceptual understanding of Newtonian mechanics in three UK universities and found that there is a statistically significant gender gap, with males outperforming females. Bani-Salameh et al. [13] in Saudi Arabia measured the performance gap among male and female college students with the FCI, showing a gender gap in favor of male students.
Further studies compare the performances of the male and female students on in-class exams versus those on the FCI. When examining the performance of male and female students in the final exam scores in our two courses, we found that female students outperform the males. In contrast, Docktor et al. [11] found that the male final exam scores of 15 semesters were higher by 3.9 % while Bates et al. [12] found no statistically significant difference in final exam scores between men and women at three universities for a single semester. Research in a Canadian university shows that there is no apparent gender bias in the in-course assessments [9].
We organize the paper as follows: In section 2 we introduce the students' background. Section 3 deals with details of the method used. Analysis of the FCI results is covered in Section 4. Section 5 presents the discussions and conclusions.

Students' background
Students taking PHYS2101 and PHYS2107 courses come to the University from high schools where the education is mostly in Arabic with some English background. They enroll in different colleges, each having different entry requirements. The PHYS2101 group consists of students from the Colleges of Science (60%), Agriculture (30%) and Education (10%). At times, students from other colleges (about 1%) also enroll in PHYS2101. Before admittance C to the university, the students who enroll in PHYS2101 should satisfy the following requirements. The students of Colleges of Science and of Education are supposed to have studied four subjects (mathematics, chemistry, physics and biology) in high school and to have obtained a minimum of 65 % grade in any of these three subjects. The students of the College of Agriculture have to score at least a 65 % grade in mathematics and also to study and pass with at least a 65% grade in any two of the courses, chemistry, physics and biology.
On the other hand, the students from the College of Engineering register to PHYS2107. Their admission criteria to the college requires a minimum of a 65 % grade in mathematics, physics and chemistry.
In addition, at least a 65% grade in basic English language in high school is a must for all students who are admitted to the above-mentioned colleges. The number of female and male students admitted to the colleges is equal, except for the College of Engineering.
Most students go through a one-year foundation program taking Mathematics, Information Technology and English language and then they register either in PHYS2101 or PHYS2107 courses depending on their colleges. The medium of instruction in these courses is English. Despite an intensive one-year training in the foundation program, some students still struggle in understanding the language of the textbook and terminology.
In general, there are a higher proportion of male than female students in PHYS2107 as compared to PHYS2101. The male/female ratio in the randomly sampled PHYS2101 group was 1:3 and in PHYS2107 group was 4:1.

Method
The FCI test was given twice to the same students in Spring 2017 and Spring 2018 semesters; the pre-test was administered at the beginning of the semester, during the first week, and the post-test towards the end of the semester, after Newton's force and motion concepts had been covered. A total of 249 students from randomly selected sections of PHYS2101 groups, in which 24.5% were male and 75.5% were female, took the pre-and post-tests over two semesters of traditional instruction. The PHYS2107 group consisted of 218 students, in which 81.2% were male and 18.8% were female. These sample sizes represented 44.9% of the total populations of PHYS2101 and 39.4% of PHYS2107 groups, respectively. The reported statistics are for matching individual students who took part in both preand post-tests.
The FCI test has usually been conducted for between 30 and 45 minutes in many institutions. We gave our students 45 minutes to finish the test. This was due to the fact that PHYS2101 and PHYS2107 courses are two of the first science courses that are taught in English, and for many of our students' language proficiency is still developing. Our students had neither prior information about the test dates, nor were introduced to the FCI test in their mother language. Students were informed that the FCI results would not affect their final grade in these two courses and there were no alternative incentives given for taking the FCI. Table 1 shows the FCI pre-and post-tests mean values and standard deviations (S) in percentages, number of students N and t-test results for the group of PHYS2101 and PHYS2107 students. The low mean values in the pretest for both groups of students reflect poor knowledge and misconceptions in physics prior to any instruction; this is particularly true for the students of the College of Agriculture since in their admittance physics is not a compulsory requirement. The mean percentage of correct answers in the post-test is 32.7% for the PHYS2101 group and 36.9% for PHYS2107 group.

Performance of student groups
The t-test is performed for the pre-and post-test scores with unequal variances for the two independent groups and obtained a t-value (with significance level of 0.05). The results are shown in Table 1. The difference in the averages is greater for PHYS2107; they are statistically significant as verified by the t-tests. The results can be compared with similar data from different institutions from a number of countries. Pre-and post-test FCI scores range from 27% -52% and 48% -77% respectively in USA [1] depending on the methods of instruction [7]. Reported pre-and post-test scores were obtained as 28 % and 69 % in Finland [14][15] and 22 % and 30.4 % in Saudi Arabia [4] respectively. We find that our results of the post tests for the PHYS2101 and PHYS2107 groups are within the range reported by [3] for the Forestry Engineering School (EUITF) in Spain and Bani-Salameh [4] in Saudi Arabia. Figure 1 (a) and (b) shows the overall performance in terms of the number of correct answers (a maximum of 29 correct answers) by the students in pre-and post-test in PHYS2101 and PHYS2107. We normalized the frequencies to the number of students per group. The highest grades obtained by the students for post-test are 22 in PHYS2101 and 25 in PHYS2107. It can be seen that there is a clear shift towards the right, indicating that there is an obvious increase in the number of correct answers in the post-test in both groups. Similar analysis has been done by [3] and [4].  Figure 2 shows the percentage of students who answered a particular question correctly versus the question number for pre-test (grey) and post-test (blue) for both PHYS2101 (a) and PHYS2107 (b). In the pre-test less than 20 % of students in each group answered the questions 2, 8, 9, 13, 18, 24 and 26 correctly. In the post-test, questions 6, 9, 16, 20, 22, 24 and 26 were still only answered correctly by 20 % of PHYS2101 students. Here question 26 was about the Newton's first law which was also answered correctly by less than 20 % of PHYS2107 students. Figure 2 also clearly shows that a few questions, namely, questions 1, 4, 10, 12 and 16, were answered correctly by more than 50 % of both groups of students in the post-test. In particular, more than 70% of students correctly answered question 12, which deals with Newton's third law of motion.
More analysis on each question will be done in the gain per question section below.

Normalized gain per group
To measure the improvement in students' conceptual understanding, Hake's normalized gain, 〈g〉, [7], [16] is extensively used in many studies and given by where 〈A 〉 and 〈A 〉 are pre-test and post-test class averages. The normalized gain is accepted high for 〈g〉 ≥ 0.7, medium for 0.3 < 〈g〉 < 0.7 and low for 〈g〉 ≤ 0.3.
We used the following equation for the pooled standard error of 〈g〉 in terms of the standard deviations of both the pre-and post-test data [17]: where and are the same since we have matched data sets. Table 2 shows the average normalized gain for each group of students whose 〈g〉 values remain in the low gain category. Although 〈g〉 of the PHYS2101 group is slightly lower than that of the PHYS2107, there is no statistical difference between them (p = 0.28) at 95% confidence level.

Normalized gain per question
In order to have a better understanding of students' performances on the FCI questions for both groups, we calculated the gain per question [6], which is basically Hake's gain for a single question. The histograms of the gain per question are shown in Figure 3. The highest item gain obtained is 0.55 on question 12 for PHYS2101 and 0.46 on question 4 for PHYS2107. Gains above 0.3 are achieved on questions 2, 11, 12 for PHYS2101 and questions 4 and 12 for PHYS2107. Question 2 related to Newton's third law and question 12 dealt with kinds of forces. As can also be noticed, the gains are negative for questions 7, 15, 22 and 26 for both groups of students. These questions are on "kinds of forces" and "Newton's first law".

Normalized gain per category
We calculated the students' performances on the groups of questions which cover certain FCI force and motion concepts per category, these being kinematics, Newton's first, second and third laws, superposition and kinds of forces as listed in [1]. "The Kinematics" category includes the questions on position, velocity, and acceleration and their relationships excluding the dynamical effects. Questions in the "Newton's first law" category concern the relationship between motion and applied forces. The "Newton's second law" category deals with the questions on contact forces and resolving unknown forces, and the "Newton's third" law category covers the questions related to action-reaction principles and contact forces. The "Superposition" category includes the questions in which the direction and relative strength of forces acting on a body or set of bodies are represented by diagrams (i.e., force-body diagrams). The "Kinds of Force" category deals with the questions including solid and fluid contacts and gravitation. Figure 4 shows the average normalized gain in each FCI concept category for PHYS2101 (orange) and PHYS2107 (green) groups. Each category consists of more than one question from the FCI inventory items. The highest normalized gain is obtained for the "Newton's third law" category for both groups, it is in the medium gain region for PHYS2101 group. In the "Kinematics" and "Newton's second law" categories the average normalized gains are less than the overall normalized gain as shown in Table 2. The lowest normalized average gain is observed in the "Newton's second law" category for both groups of students, where the gain is negative or very low. This shows a persistent misconception despite the instruction.

Normalized gain per gender
We also compared the FCI performances between genders. It should be noted that more than 75% of registered students in PHYS2101 are females, while the majority of PHYS2107 students are male. Table 3 (see also Figure 5) shows the FCI pre-and post-test scores for male and female students in both groups. Our results point to a small gender gap, where the average pre-test gender gap is 2.3% (favoring males) for PHYS2101 and 0.14 % (favoring females) in PHYS2107. On the other hand, the gender gap in the post-test for both groups is in favor of females and the gap is 3.7% for the PHYS2101 group and 2.7% for the PHY2107 group. The gender gap increased from pre-to posttests, with an absolute increase favoring females of about 1.4% for PHYS2101 and 2.8% for PHYS2107. This is in contrast to the usual gender gap on the post test for the FCI that ranges from 1.5% to 24.6% [10], favoring the male students. Table 3. Average pre-and post-instruction FCI scores for female and male students in PHYS2101 and PHYS2107 groups. N i is the number of students and the standard deviation (i = pre, post).  The normalized gain 〈g〉 per gender for each group is also calculated. As shown in Table 4, it is larger for females in both groups. In PHYS2101 it is 0.15 ± 0.01 for females compared to 0.07 ± 0.02 for males, while in PHYS2107 〈g〉 is 0.18 ± 0.03 and 0.14 ± 0.01 for females and males, respectively. The difference in the normalized gain between females and males is 0.08 ± 0.02 and is statistically significant in PHYS2101 (p < 10 -3 , at 95% confidence level) indicating that the females are performing better than the males. On the other hand, the difference in the normalized gain between males and females is 0.04 ± 0.01 in the PHYS2107 group with (p = 0.26).   We also examined the performances of male and female students in the final exams of the two courses. We noted that, when the average scores are compared, female students outperformed the males by 11 % in PHYS2101 and 15% in PHYS2107. In addition, as can be seen from Figure 6, 27 % of female and 4 % of male students scored 70 and above out of 100 in PHYS2101 while in PHYS2107 these numbers correspond to 53 % of females and 16 % of males. Furthermore, the General Secondary final exams certificates in Oman (Higher Education Admission Statistics -for the academic year 2015/2016) indicate that, on the average, females score better than males.

Discussion and conclusion
The FCI test was implemented on two groups of students who enrolled in introductory physics courses in SQU, Oman. The Hake's normalized gain remained in the low range for both groups of students. This indicates that there is not much improvement in the conceptual understanding of the students of the Newton's force concept and the related topics of mechanics. From the analysis of each question in the FCI and the categories, we noted that the students have difficulties in understanding and interpreting the questions when the questions involved diagrams and graphs. We observed a significant gender gap on FCI normalized gains, with females achieving better gains than males in PHYS2101 but no significant difference in PHYS2107. Moreover, according to the final exam scores in these courses, female students outperformed male students, supporting our findings in the FCI test results. This could be due to the fact that females tend to have better study habits and that they are accommodated in better living conditions as they are hosted at the university hostels.

Conflict of interest
The authors declare no conflict of interest.