Analyzing the Reliability and Content Validity of Basic Education Certificate Examination Mathematics Objective Test Items From 2018-2020

Analyzing the Reliability and Content Validity of Basic Education Certificate Examination Mathematics Objective Test Items From 2018-2020

Analyzing the Reliability and Content Validity of Basic Education Certificate Examination Mathematics Objective Test Items From 2018-2020

 

Batubo, Love Obarasua, and Dr. Ambrose Amadioha

Department of Educational Psychology, Guidance and Counselling, Faculty of Education, University of Port Harcourt, Nigeria.

 

 

Abstract

 The study assessed the reliability coefficient and content validity of Mathematics objective test items in basic education certificate examination from 2018 to 2020 academic sessions in Rivers State. Two research questions guided the study; evaluation and descriptive survey designs were employed for the study. in determining the reliability coefficient and content validity of the test items, a sample of 1500 JSS 3 marked students’ scripts were randomly drawn from six Local Governments Areas using the multistage sample approach of purposive and proportionate techniques, out of the population of 19056 JSS3 marked students’ scripts in the 105 public secondary schools in the six Local Government Areas in Rivers State. Past mathematics objective test items question papers and JSS1 to JSS3 curriculum and syllabus were used as the instrument for data collection, R. Software was used to calculate the reliability coefficient while percentage, frequency and scheme of work were used to ascertain the content validity. The findings indicated that the objective test items content area were not in adequate proportion, the co-efficient of internal consistency was high, Basic Education Certificate Examination 2018 had the highest reliability coefficient. It was recommended among others, that experts in Measurement and Evaluation should be involved in setting the examination objective test items to ensure adequate coverage of the scheme of work according to their weights. This would improve students’ academic performance in Mathematics in Basic Education Certificate Examination in Rivers State.

Key word: Test characteristics, reliability coefficient, content  validity.

 

Introduction

Test means different things to different people depending on the use of test. Ukwuije (2009) defined test as a series of questions given to the testees or examinees to be answered. Test is an instrument in gathering information that will be needed for effective evaluation. Onunkwo (2002) defined a test as an instrument which can be utilized in detecting some qualities, traits, characteristics, attributes etc. possessed by a person, an object or a thing. To Orluwene (2012) a test is an instrument used to determine the relative presence or absence of the trait measure for. This implies that when a person is tested and the performance is high according to the given criterion, then the person possess the trait measured, on the other hand of the performance is low, it means no possession of that trait or the trait is absent. In Gregory (2006) a test is a standardized procedure for sampling behavior and describing it with categories or scores. To Linn, Miller and Gronlund (2005) a test is a particular type of assessment that typically consists of a set of questions administered during a fixed period of time under reasonably comparable conditions for all students. Iweka (2014) defined test as an instrument that is used to measure as accurately as possible, the trait character, personality or behavior for which it is designed. This implies that different tests are assigned or constructed for different purposes. Opara (2014) sees test as an instrument or procedure designed to measure the knowledge, intelligence, ability, trait, skill, aptitude, interest attitude etc. an individual or thing exhibits or possesses. This implies that the test does not mean measuring only the cognitive domain but also affective psychomotor domain. A test is a measurement device or technique used to quantify behavior or aid in the understanding and prediction of behavior. In other words any instrument used in detecting certain traits or skills in the cognitive, affective psychomotor domain is known as a test. Kpolovie (2014) described a test as presentation of a standard set of questions, which qualify as a consistent and valid instrument for information gathering for the effective assessment of the examinee’s cognitive, affective and psychomotor or psycho-productive traits. Hence, the researcher defines test as an instrument used to measure the extent to which the testee has acquired and are able to utilize the learned instruction.

Test maybe administered using paper, orally, on a computer or in a restricted area that entails the testee taking the test to perform skills physically. Test administration can be done formally or informal. Informal test administration is done at home by parents or guardian to assess student’s level of comprehension or ability. Formal testing is the test done in school or any learning environment where marks or scores are often given; which may be inferred to a criterion or norm. The norm referenced test allows comparisms of students’ performance. Onunkwo (2002) norm-referenced tests determine how each students perform in comparison with the performance of other members of that class. The criterion-referenced test is also known as mastery, competency test, which determines if the students can reach a certain performance level but does not compare other students’ achievement. Test could be teacher made test which is constructed by a classroom teacher, prepared by non-experts meant to measure the extent of achievement of a certain class based on some specific objectives. Test may also be a non-standardized test, mainly used to inspire students to study and to give students feedback: which could be done weekly or twice a term or semester. A standardized test is constructed by experts and subject specialists administered under a uniform set of condition based on a normative sample. Thus BECE is meant to be a standardized test.

There are different types of test such as achievement test, aptitude test, intelligence test, attitude test, personality test, learning ability tests or competency test. An achievement test is a test designed to measure the degree of attainment of educational objective in a content, subject or series of subjects. Examples are classroom tests, Basic Education Certificate Examination (BECE), Senior Secondary Certificate Examination (SSCE), Etc.

 The Basic Education Certificate Examination (BECE) is a mandatory examination for students in the ninth year of their basic education class and third year of the junior secondary school. The BECE is an external examination conducted by the ministry of education of each state, normally conducted in July/June yearly, the exact date may differ from state to state. It is an examination conducted for students in their final year of Junior Secondary education and who wish to proceed to the senior secondary or any technical college of choice. Mathematics is one of the school subjects offered and tested at the basic level and from research work by other scholars, the result has not really been good, there had been issues of poor performance, failure which has been attributed to many vices ranging from lack of facility, lack of teachers, students phobia and othrs.it is worthy of note that students’ performance is not just on students ability level but also on the tests attribute inherit upon the test items.(Moyinoluwa 2015).The test attributes include reliability , validity and item characteristics among others. Mathematics is one of the school subjects that requires attention as it is important in everyday life; a fundamental discipline for science and technological development, an essential requirement by intellectual endeavor and advancement of man to manage with the challenges of life. So mathematics is the language of the science and foundation of numeracy, which part of literacy depend. One who is inactive in mathematical concept is partly of no use to self and the large society. This is because numeric abilities are the pivot of language in different aspect of life. These are seen in the areas of commerce, education, transport, housing, health, communication and even politics.   

According to Faje, Sule and Collins as cited in Aminu (2005), there is a general consensus that economic development, viability and stability are solely in the 21st century scientific and technological based; which indicates the economic prosperity of a nation depends largely on the scientific and technological development which cannot be possibly attained without sound, effective and strong mathematics education. Hence mathematics can be seem to have multidimensional, global and undisputable relevance.

According to Soyemi (1999) mathematics opens the mind to analytical thinking, logical reasoning and the aptitude for innovative ideas, deep focusing and clarity of thought and precision. It is where all scientific and technological research find their bearings. Studying mathematics will in no small measure, arm students to live well in our modern age of science and technology (FRN, 2004). Therefore, mathematics is a very important subject which occupies the central position in academics, it is expected that every student performs very well and pass it in both internal and external examinations. Hence the researcher seeks to assess how effective is the reliability and content validity of Basic Education Certificate Examination objective test items from 2018 to 2020.

Tests consists of different items which may not be of the same quality, some may be flawed while some may not and some may be easier while others may be difficult. According to Wiersma  and Jurs (1990) easy items that almost every students scored correctly may end up boasting their morale without distinguishing the students based on their ability levels. On the same trend, a very difficult item that no student scored correctly will also end up contributing nothing in distinguishing the students based on their ability levels. If a norm reference test fails to distinguish the performance of students based on their ability levels, the aim is defeated and the test does not possess a good quality. It is like a house built with poor quality blocks that will not be strong compared to the one built with good quality block. This means that combination of good items in a test leads to the production of high quality test and vice versa.

Oshkosh (2005) considers test analysis probably as the most important tool to increase test effectiveness.  Test item characteristics provide information on how each of the test contributes to the quality of the total test. They are also referred to as item statistics because item statistics is the summary description of test takes performance on a particular test item (Wiersma and Jurs in Orluwene, 2017). A good test should possess the following characteristics:

1. Validity: The first important characteristics of a good test is validity. The test must really measure what it has been designed to measure. Validity is often assessed by exploring how the test scores correspond to some criteria that is same behavior, personal accomplishment or characteristic that reflects the attribute that the test designed to gauge.Assessing the validity of any test requires careful selection appropriate criterion measure and that reliable people may disagree as to which criterion measure is best. This is equally true of intelligence test. Reasonable people may disagree as to whether the best criterion measure of intelligence in school grades, teacher ratings or some other measures.

If we are to check on the validity of a test, we must settle on one or more criterion measures of the attribute that the test is designed to test. Once the criterion measures have been identified people scores on the measures can be compared to their scores on the test and the degree of correspondence can be examined for what it tells us about the validity of the test.

2. Reliability: A good test should be highly reliable. This means that the test should give similar results even though different forms of testers administrate it, different people scores in different forms of the test are given and the same person takes that test at two or more different times. Reliability is usually checked comparing different sets of scores.

In actual practice, psychological tests are never perfectly reliable. One reason is that changes do occur in individuals over time, for example, a person who scores low in her group at an initial testing may develop new skills that rise her to a higher position in the group at the time of the second testing. Despite such real changes, the best intelligence test usually yields reliability correlation coefficient of 90 or higher (where 1.00), indicates perfect correspondence and 0.00 indicates number correspondence whatever.

3. Objectivity: By objectivity of a measuring instrument is meant for the degree to which equally competent users get the same results. This presupposes subjective factor. A test is objective when it makes for the elimination of the scorer’s personal opinion bias judgment. The recognition of the quality objectivity in a test has been largely responsible for the development of an aroused and objective type tests. Objective-based tests measure or evaluate the entire human development in three domains that is cognitive, affective and psychomotor. As the name itself indicates they are based on particular objective of teaching and evaluating. They provide proper direction and thus streamline the whole process of evaluation. These tests are all comprehensives.

4. Norms: In addition to reliability and validity good test needs norms. Norms are sets of score obtained by whom the test is intended. The scores obtained by these groups provide a basic for interpreting any individual score.      

But for purpose of this study, only two characteristics is discussed, which is reliability and validity with respect to content of BECE mathematics objective test items.

Reliability Coefficient

The reliability coefficient is the correlation between two sets of observed scores. For example a correlation between scores on form A with scores on form B of mathematics ability test is termed as a reliability coefficient. Ebel and Frisbie (1986) defined the reliability coefficient for a set of scores from a group of examinees as the coefficient of correlation between that set of scores and another set of scores on an equivalent test obtained independently from the members of the same group. Ezeh (1992) defined reliability coefficient as the proportion of true variance. Hence, if forms A and B of mathematical ability test are parallel, a testees scores are the same on both forms, the reliability coefficient will be +1.00. If on the other hand, the scores on one form or both contain measurement error, the resulting correlation coefficient will be less than +1.00. A reliability coefficient of 0.86 implies that estimation resulting from true variance is 86%, while the remaining 14% is due to error variance. In other words, 86% of the variance in the test scores is due to true variance while the remaining 14% is accountable to chance factors or error variance. The reliability coefficient must be significant beyond 0.05 alpha levels before the instrument is declared reliable (Onunkwo 2002).

Content Validity

This is the ability of a test to adequately measure the content areas and instructional objectives of course of instruction. This course of instruction should be based on a scheme of works, syllabus or curriculum. When test items adequately cover the traits, abilities or skills to be measured and which inferences are to be made, then the test has content validity. Ysseldyke (2004) defined content validity as the extent to which a test’s items actually represent the domain or universe to be measured. This means there must be an agreement between the test items and the specific instruction, curricular, domain or universe areas that the test is meant to cover. In this case, the most important thing is to determine the extent the test items constitute a representative sample of the behavior in question. For instance, when a test made for SS1 is given to JS1 that test lacks content validity. Mehrens and Lehmann (1991), state that content validity is typically determined by a thorough inspection of the items. That is to evaluate whether the items represent the total domain or specified sub-domain. 

Research Questions

The following research questions were used to guide the study.

  1. What is the reliability coefficient of the items in the mathematics BECE objective test from 2018 to 2020 in Rivers State, Nigeria?
  2. To what extent is the content of BECE mathematics objective test items valid from 2018 to 2020 in Rivers State, Nigeria?

 

  •  

The study employed the evaluation research design and descriptive survey research. Evaluation research design was used for this study because the data obtained were used to provide feedback on the quality of mathematics BECE. According to Trochim (2006), an evaluation design is the systematic acquisition and assessment of information to provide useful feedback about some objects or issues. The population for this study compromises of all the Junior Secondary School 3 (JSS3) students’ marked scripts in 105 public secondary schools which totaled 19056 in the 6 Local Government Areas of Rivers State. A sample of 1500 junior students’ scripts was randomly drawn from 6 selected local government areas of Rivers State, with 2 local government representing each senatorial district; using a multi-stage sampling method. The instrument for data collection for this study was the past mathematics objectives test items of the basic education certificate examination (BECE) from the year 2018-2020 conducted by Rivers State Ministry of Education. The student’s raw scores from the marked and scored scripts in each of the years were collected from the Rivers State ministry of education, examination and record department. Also the researcher used the BECE mathematics curriculum, syllabus and the scheme of work to generate data for the study.

Result

After data analysis, the result obtain for research question 1-2 were summarized and presented in the tables below:

 

 

Table 1 Reliability Coefficient of Test Items of BECE Mathematics Objective test for 2018-2020

 YEAR

CORRELATION COEFFICIENT(r)

Mathematics 2018 BECE

0.771

Mathematics 2019 BECE

0.751

Mathematics 2020 BECE

0.765

Table 1 shows that coefficient of internal consistency which is a measure of the degree of reliability of the objective test items varies from year to year. BECE 2018 had the highest indices of reliability which is .771.

 Table 2 Distribution Analysis of (BECE) Mathematics Objective Test Items from 2018-2020

A

Numbers and Numeration                          

2018

2019

2020

a

Whole numbers

5

2

2

b

LCM/HCF

3

2

2

c

Fractions/Percentages/decimals

7

7

10

d

Estimation/Approximation

1

2

4

e

Binary Number System

1

2

_

f

Rational/Irrational numbers

2

2

2

g

Indices/Standard form

3

_

1

h

Fractions, Proportion, ratio, rate

3

1

_

i

Removing and Inserting bracket

_

_

_

j

Proportion, direct, inverse, joint, reciprocal

_

2

1

k

Simple Interest

1

3

1

L

Compound Interest

_

_

_

m

ICT and Computers

_

_

_

 

Total Percentages

25%

23%

23%

B

Basic Operations

 

 

 

a

Addition and Subtraction of fractions

_

_

_

b

Addition and Subtraction of decimal fractions

_

_

_

c

Directed Numbers/Number line

2

2

1

d

Transactions in homes and offices

2

_

1

e

Place value

_

1

_

 

Total Percentages

7%

5%

3%

C

Algebraic Processes

 

 

 

a

Use of symbols/letters for numbers

_

_

_

b

Simplification of algebraic expressions

2

_

1

c

Factorization

2

2

_

d

Graphs (x and y axes)

_

_

_

e

Simple equations, word problems

4

12

8

f

Quadratic expressions

3

1

3

g

Substitution/change of subject Formulae

_

2

2

h

Patterns and sequences

1

2

1

i

Simultaneous linear equations

1

1

_

 

Total Percentages

22%

33%

23%

D

Mensuration/Geometry/Trigonometry

 

 

 

A

Lengths and Perimeters

2

_

2

b

Areas

_

_

2

C

Volumes

1

1

1

D

Angles, Properties/types

8

5

5

E

Triangles/Polygons/Quadrilateral

2

1

4

F

Circles

2

2

_

g

Construction

_

-

_

H

Sine, cosine and tangent of angles

_

_

1

i

Angles of elevation and depression

_

_

_

J

Bearings

_

1

_

 

Total Percentages

25%

17%

25%

E

Statistics and Probability

 

 

 

a

Statistics

2

2

4

b

Probability

_

2

1

 

Total Percentages

3%

7%

9%

 

Table 2 showed analysis of the objective test items as they were spread on the question papers according to the years under study. From the analysis it can be seen that the questions are not evenly spread. In 2018, there were no questions on such parts as Section A (j, l, k, m and n), B (a, b, e), C (a, d, g), D (b, h, I, j, k) and E (b) as shown on the table above. In 2019, there was no question on Sections: A (g, I, j, m, n), B (a, b, d), C (a, b, d), D (a, b, h, I, j). Then in 2020, there were no questions on areas like A (e, h, j, m, n) B (a, b, e), C (a, c, d, j), D (g, h, j, k).

 

 Discussion of the findings

 The internal consistency reliability analysis of the mathematics objective test items within the three years under review, indicated that: BECE 2018 had a reliability coefficient of 0.771, BECE 2019 had a reliability of 0.751 and BECE 2020 had a reliability coefficient of 0.765, this indicated that there was consistency among the items that constituted the test items. Also, the reliability coefficient for each of the years under review was high indicating a good reliability of the test items.

Also, the results showed that BECE mathematics objective test items from 2018 to 2002 covered 5 content areas with subtopics. The findings from this study revealed that the test developed covered up to 80% of the mathematics syllabus content.  Though the topics were spread around the questions, they were not properly covered as expected. There was no consistency proper balance between the weights of the scheme of work and the objective test items. More items were set in some content areas while less items were set in other content areas which indicates that the table of specification was not used in constructing the test items. This can be seen from the fact that the items were not in proportionate measure, hence the objective test item had low moderate content validity.

Conclusion

Based on the findings of the study, it was concluded that BECE for mathematics from 2018 to 2020 had low content validity. Also some of the topics like ICT and use of computer, there was no question on them. This is because the test items did not have good representative percentage of the various topics as contained in the content area. The co-efficient of internal consistency of each of the test items in the years under study was high among the items that constitute the objective test items.  Conclusively, BECE for mathematics from 2018 to 2020 vary in their characteristics of their constituent items.

  1. Recommendations

Recommendations for this study is based on the results of the findings which are as follows:

 

  1. The test blueprint should be strictly followed to ensure that the proportion of topics covered in the curriculum are the same with the BECE, this can be achieved by employing experts in measurement and evaluation to assist.
  2. Psychometricians should be involved in setting objectives test items to ensure a better internal consistency as to enhance students’ performance.
  3.  The teachers should ensure adequate coverage of course contents relating to the examination with more emphasis on areas that are always being examined.

 

  1.  

Aminu, S. (2005). A Survey of Problems on Mathematics Teaching in Primary and Secondary   Schools in Bauchi state. Unpublished Thesis. Department of Education, University of Abuja.

Ebel, R.L and Frisbie, D.A. (1986). Essentials of Educational measurement (4th edition)           Englewood cliffs NJ: Prentice Hall

Ezeh, D.N. (1992). Reliability and validity of tests in B.G. Nworgu (Ed) educational    measurement and evaluation. Theory and Practice Nsukka: Hallman Publishes

Federal Republic of Nigeria (2004). National policy on education. Lagos: NERC Press.

Gregory, R.J (2006).  Psychological testing, History, Principles and Applications, 4th edition. New Delhi Pearson Education. Inc.

Iweka, F (2014). Comprehensive guide to test construction and administration. Omoku, chifas.ng

Kpolovie, P.J (2014). Test measurement and evaluation in education. (2nd ed.) Owerri, Imo state spring field publishers.

Linn, R. L; Miller, M. D. & Gronlund, N. E. (2005) Measurement and Assessment in teaching 9th edition USA Pearson Prentice Hall.

Mechrens, W.A. & Lehmann. L.J. (1991). Measurement and Evaluation in Education and Psychology 4th edition. U.S.A.: Holt, Rinehart and Winston, Inc.

Moyinoluwa. T.D (2015). Analyzing the psychometric properties of mathematics in public.                                       Examinations in Nigeria.

Onunkwo, G.I.N (2002). Fundamentals of educational measurement and evaluation. Owerri, Cape Publishers International.

Opara, I.M (2016). Test Construction and Measurement: Concepts and Application. Owerri, Career Publisher

Orluwene, G.W & Igwe, B.N (2016). Psychometric properties of WASSCE questions in civic Education (2015) using rivers state students. International Journal of Psychology and Counselling, 8-15(1), 92-115.

Oshkosh, U.W (2002). Item Analysis Testing services support: http://www.uwosh.edu/testing/facultyinfo/itemanalysis.php.

Soyemi, (1999). Factors responsible for poor performance on mathematics senior secondary School: retrieved May 10, 2021. Available online at https://eniplus.com/product.php?id=727. 10/05/21.

Trochim, W. M. K (2006). The Qualitative Debate. Research methods Knowledge Base. http://www.socialresearchmethodnet/kb,qualm. 

Ukwuije, R. P. I. (2009). Test and Measurement For teachers, Port Harcourt: Aba Publishers’

Wiersma, W &Jurs, S (1990). Educational measurement and testing. Needham Heights, MA Allyn and Bacon.

Ysseldyke, S (2004). Assessment in special and inclusive education 9th edition. U.S.A: Houghton      Mifflin.

 

You are here: Home Publications publication-col1 Uniport Journals Faculty Of Education cntd. Analyzing the Reliability and Content Validity of Basic Education Certificate Examination Mathematics Objective Test Items From 2018-2020