The Effects of Test Reliability, Validity and Usability on Student Outcomes


The measurement techniques are usually the basis for making inferences about the behaviors of individuals. However, certain amounts of errors are usually inherent in instruments used for educational measurement. The importance attached to educational measurements makes it imperative for us to ascertain that whatever instruments we use for our measurements must be of good quality. The quality of any instrument worth using in educational measurement is the procession of the three basic characteristics.



Reliability refers to the degree of consistency with which a particular test or instrument measures whatever it purports to measure. One method of determining the readability of a test lies in the amount of confidence that can be placed in the ranking of students based on their performance in such a test. If upon the re-administration of the same test, the same set of students retained their ranks (the scores need not be the same but the relative positions of each individual to others in the class must be seemingly maintained) the test would be considered reliable. If not the other hand, these same students failed to retain their ranks in the group upon the re-administration of the test, the test would be considered unreliable. The term consistency is synonymous with reliability. The consistency of results with the rejected use of an instrument is a measure of its reliability. Thus, a reliable instrument will provide consistent results for the traits being measured. For example, if an achievement test is administered to the same group of students on two or more occasions and the students maintain their relative positions with little changes across these administrations, the test is said to be providing a consistent and reliable result. It should be noted that the raw scores of the students can increase or decrease but as long as the relative positional ranking within the group remains the same, the test result still shows consistency.

Reliability concerns itself with the consistency of measures and not the accuracy of measures. A test may for instance provide consistent results across two or more administrations yet such a test may not provide an accurate measure of what it was designed to measure. The reliability of a test is usually expressed as a numerical value of the scores on a single test administration. The following are factors affecting the test reliability.

Factors affecting the reliability of a test

Various factors can affect reliability. If your test is reliable your perforce will definitely be feasible. Let us discuss some of the factors below.

The length of a test: The number of items in a test has a great influence on its reliability coefficient. The greater the number of items; the higher the reliability coefficient. These should, however, be cautious in making use of a large number of test items because if a test contains too many items the testees may suffer from fatigue and get disinterested, consequent upon which the gain in the reliability coefficient may become unnoticeable.

The subjectivity of scoring: The intra and inter-scoring subjectivity may also influence the reliability coefficient of a test. If the scoring is subjective for instance, the reliability coefficient will be low. The scoring subjectivity particularly affects the essay-type questions.

Choice of questions: the practice of including more items than students are expected to answer is not encouraged. This practice affects the comparability of test scores as candidates might have answered different questions.

Guessing: The susceptibility to guessing a test affects its reliability coefficient. If the trustee guesses the answer to the test items, the reliability coefficient of the item will be low.

Clarity of instructions and test items: The clarity of instructions and test items directly influences the reliability coefficient. If the questions and test items are not clear enough, the testees may interpret them differently and this may in turn affect their scores.

Cheating: Scores derived from responses of the public who cheat are of low reliability. The reliability of an objective test may be enhanced by administering special forms of equivalent tests to a large group of respondents. This involves making use of the same set of say fifty test items randomly arranged to form two or more sets of question papers. The different arrangements of questions forming a set of questions are labeled say, P, Q, R, and S, and each respondent is expected to indicate the set of question papers he is answering. The stencils for each set of questions are prepared and each time a paper is to be marked an appropriate stencil is made use of. This form places some checks on cheating.

The validity of a measuring instrument or test refers to the degree of accuracy with which a measuring instrument or test measures that which it was designed to measure. If for example, a test produces an accurate assessment of what it was designed to measure such a test will be considered valid. Validity is the most important requirement of any test since it deals with the appropriateness of the interpretations made from test scores about a particular use. No matter how satisfactory a test may be, if it does not provide the accurate information that is required by the teacher such a test is said to be valid. There are factors affecting the validity of a test below.

Factors affecting the validity of a test

Ambiguity: Many factors tend to influence the validity of test interpretation. Some of these factors may be found in the test instrument itself. If the directives (instructions) on a test paper are ambiguous the validity of the instrument may be affected as scores obtained from such a test are unreliable. The quality as well as the number of items may also affect the validity of the test instrument.

Examination malpractice: Another factor may arise from the administration and scoring of the test. If there is examination malpractice resulting from such things as leakage, cheating, undue assistance to students, inconsistency in scoring, or poor allocation of time (either too much or too small), the validity may be adversely affected.

The physical and psychological conditions: The physical conditions of the examination hall and the psychological conditions of the testes may also affect the validity of a test. An untidy, poorly lit hall with inconvenient writing facilities and situated in a noisy environment may prove uncomfortable to the testes. Harassment by invigilators, and receipt of bad news just before or on entering the examination hall may disallow the students from putting in their best.

Usability of a test

The usability of a measuring instrument (e.g. g. test) refers to the ease of developing, administering, and scoring the test. It includes all the practical values that a teacher puts into consideration before deciding to use a particular test.

Regarding how valid and reliable a test may be, it will be useless if the teacher has to spend all his time, energy, and resources on it with little or no time for teaching. A long test is for instance likely to be more reliable and valid than a short test but if a teacher has only a limited time for testing, he may have to compromise and make use of another.

The cost of purchasing a test may prove too high for a teacher to buy, hence, no matter how good the test may be, the teacher may not buy it and hence may prove of low usability.

While it is essential to make use of reliable and valid tests the following factors must be considered.

  • The cost of producing and administering
  • The length of the test-taking into consideration the time allowed for the test
  • The ease of scoring and considering the limited time available for the marking and processing of results.


The three most important qualities of a test instrument are reliability, validity, and usability. Reliability is the degree of consistency with which a test measures what it claims to be measuring. There are three types of reliability, test interest, Equivalent forms, and Split-Half Methods. Validity refers to the degree to which a test measures accurately what it is supposed to measure. Usability is the relative acceptance by the teachers to use the test. Usability may be affected by the selling price, the directives for the administration of the test, the time for the conduct of the test, the length of the test, and the mode of scoring the test.

