• Validity
Apart from the consistency of measurement which is the reliability, the test
should test on what it is intended to test and this is dealing with the issue of validity.
Validity plays a crucial role in the quality of the test. The test is valid if it measures what it is to be measured in accordance with its purpose. Validity can be considered in various types; the three main categories are focused in this test review which are face validity, construct validity and content validity.
Face validity is the perception by the ordinary people of whether the test is valid and this depends on the degree of acceptance to the test context, contents and tasks regarding the presentation of the target language use. In TOEIC test, the language is in job-related environment. Face validity is subjective judgements and can be influenced through various strategies including effective advertisement and peers’ evaluation. Due to the international popularity with the broad range of the applications including, business, educational institution and government, TOEIC is claimed to pose high face validity (Chauncey Group International, 2002).
Construct validity refers to whether or not the test is actually testing the criteria it claims to test (Cunningham, 2002). Focusing on listening comprehension, the constructs to be measured are the ability to understand the vocabulary and idiom, discriminate the minimal pairs, make inference and extract the main idea from the conversation. The format use in TOEIC is the multiple-choice type that can be measured accurately and objectively. However, this format type has been claimed by many researchers of its negative effect on the issue of validity. Cunningham, 2002 states that “…real-life interaction does not consist of multiple-choice options. The format does not require examinee to demonstrate an ability to use the language; neither are they required to manipulate it.”
Though poses high reliability in statistical evidence, TOEIC has been widely argued of its validity. With the emphasis on the listening skill, in real world setting people can show their listening ability in different ways. This can be seen in particular constructs which are the ability to make the inference and extract the main idea of the short talk in part four. In order to test this ability, the test takers should understand the talk thoroughly and reflect their understanding without being distracted from the multiple-choice available in the test booklet. The way to reflect the understanding on the focus of the talk or the specific details, people can take note. Note-taking cannot only reflect the ability of the people on the mentioned construct but is also considered as an authentic task used in real world setting. This is considered the limitation of the multiple-choice type that the test takers cannot reflect their actual ability through this format.
Due to the large number of test-takers, multiple-choice type is considered convenient and can be objectively scored. However, its limitation outweighs the convenience and reliability. The test will not be considered valid if it cannot portray the actual ability of the test-takers. Since TOEIC is considered a high-stake standardized test, its effect is tremendous. The issue of validity must therefore be taken into a great consideration in conjunction with washback effect of the test.
Content validity refers to the experts’ judgement on the extent to which a test’s content is proportionally representative of all the construct’s features. In the case of TOEIC, the content is based on the need analysis conducted by the ETS from the corporations around the world. This approach is argued by Moriyoshi that it is not based on validation theory on the presentation of the test language’s features. However, the TOEIC test is viewed as high content validity on the practical standpoint (Moritoshi, 2001 cited in Nall, 2003). Therefore, it is still uncertain whether the content including the topic and the tasks is the representative of the target language use in business domain.