The Quality of the Test
• Reliability
Reliability is the crucial concern of the test quality and is often defined
as consistency of the measurement. The scores from one test should be consistent with another set of scores regardless of the difference of the test-takers, test environment and test format (Bachman and Palmer, 1996). Therefore, the test is reliable when the identical or near identical test is administered twice and the result is consistent and highly similar within the same group of the test-takers (Bachman and Palmer, 1996 cited in Cunningham, 2002). The TOEIC's reliability, estimated using the reliability coefficient (Chauncey Group International, 1999), and is between 0.91 for listening comprehension section (a value of 1 being perfect reliability). This statistical evidence reflects the high reliability of the test.
• Validity
Apart from the consistency of measurement which is the reliability, the test
should test on what it is intended to test and this is dealing with the issue of validity.
Validity plays a crucial role in the quality of the test. The test is valid if it measures what it is to be measured in accordance with its purpose. Validity can be considered in various types; the three main categories are focused in this test review which are face validity, construct validity and content validity.
Face validity is the perception by the ordinary people of whether the test is valid and this depends on the degree of acceptance to the test context, contents and tasks regarding the presentation of the target language use. In TOEIC test, the language is in job-related environment. Face validity is subjective judgements and can be influenced through various strategies including effective advertisement and peers’ evaluation. Due to the international popularity with the broad range of the applications including, business, educational institution and government, TOEIC is claimed to pose high face validity (Chauncey Group International, 2002).
Construct validity refers to whether or not the test is actually testing the criteria it claims to test (Cunningham, 2002). Focusing on listening comprehension, the constructs to be measured are the ability to understand the vocabulary and idiom, discriminate the minimal pairs, make inference and extract the main idea from the conversation. The format use in TOEIC is the multiple-choice type that can be measured accurately and objectively. However, this format type has been claimed by many researchers of its negative effect on the issue of validity. Cunningham, 2002 states that “…real-life interaction does not consist of multiple-choice options. The format does not require examinee to demonstrate an ability to use the language; neither are they required to manipulate it.”
Though poses high reliability in statistical evidence, TOEIC has been widely argued of its validity. With the emphasis on the listening skill, in real world setting people can show their listening ability in different ways. This can be seen in particular constructs which are the ability to make the inference and extract the main idea of the short talk in part four. In order to test this ability, the test takers should understand the talk thoroughly and reflect their understanding without being distracted from the multiple-choice available in the test booklet. The way to reflect the understanding on the focus of the talk or the specific details, people can take note. Note-taking cannot only reflect the ability of the people on the mentioned construct but is also considered as an authentic task used in real world setting. This is considered the limitation of the multiple-choice type that the test takers cannot reflect their actual ability through this format.
Due to the large number of test-takers, multiple-choice type is considered convenient and can be objectively scored. However, its limitation outweighs the convenience and reliability. The test will not be considered valid if it cannot portray the actual ability of the test-takers. Since TOEIC is considered a high-stake standardized test, its effect is tremendous. The issue of validity must therefore be taken into a great consideration in conjunction with washback effect of the test.
Content validity refers to the experts’ judgement on the extent to which a test’s content is proportionally representative of all the construct’s features. In the case of TOEIC, the content is based on the need analysis conducted by the ETS from the corporations around the world. This approach is argued by Moriyoshi that it is not based on validation theory on the presentation of the test language’s features. However, the TOEIC test is viewed as high content validity on the practical standpoint (Moritoshi, 2001 cited in Nall, 2003). Therefore, it is still uncertain whether the content including the topic and the tasks is the representative of the target language use in business domain.
• Washback
Washback from TOEIC test has been focused in term of its effect on both
teaching and learning in the preparation to achieve a high score. The washback can be both positive and negative.
The TOEIC test has been administered worldwide and gained tremendous effect on the test-takers, teaching and learning approach and educational system. Due to the need to gain a high score on the test, the negative washback on “teaching to test” derives. TOEIC preparation course is considered a profitable educational industry and the aim of the course is to raise the test-takers’ score. Test-takers are taught through the small set of various test-taking strategies and focusing on a particular content that is expected to appear on the test. They gain a high test score without the improvement of their constructs or the actual ability in language use. The test-takers are also affected on their investment in terms of time and money for learning test-taking strategy instead of learning a language in a holistic approach.
Additionally, this negative washback has shaped the design of the curriculum
by narrowing or distortion. The reflection is on the reduction of the emphasis on the skills that require complex thinking or problem-solving skills (Alderson and Wall, 1993 cited in Rob and Ercanback, 1999). This can then affect the instructional approach directly.
The scenario from negative washback can be clearly seen in Thailand where TOEIC has been used not only for work-related purposes but also as a language proficiency reference by many academic institutions. A vast number of language institutions offer TOEIC preparation courses with similar emphasis on test-taking strategy. The students have not learned a language skill; instead they have been taught through the test-taking strategies. Thus, their language ability is not developed.
Due to the international reputation, TOEIC score has been used by many business organizations in a decision making process involving, staff hiring and promotion. Similar problem regarding negative washback exists in the business context where the staff who pose high score in TOEIC test cannot communicate appropriately. This can cause the vital problem for their future career where English is a means of communication in their organization.
In contrast, there is a positive washback on the test-takers extrinsic motivation to gain the high score. This group of people therefore pays more attention on the practice of the language and invests more time and money on the test preparation. They are exposed and interact with the target language more (Nall, 2003).
• Practicality
Practicality is considered the strength and the key selling point of TOEIC. It is
the commercial test that can be implemented worldwide with the available resources including human, material and time. The administration of the test is convenient and available via the contact of ETS representative. The test is proved to be an effective tool to differentiate the level of test-takers proficiency. In addition, the scoring system of the test is fair due to the multiple-choice format which is objective in nature and considered to pose high reliability that can yield the consistent result regarding the difference of the examinees, test administration’s time and place. The result of the test can be obtained rapidly and conveniently within less than 2 hours (Chauncey Group International, 2000). The test can be administered with a large number of test-takers due to the format of the test that can be easily marked through the computerized system.
In conclusion, TOEIC has been proved by a large number of researches of its
reliability and practicality. However, there are various strong arguments on the multiple-choice option format for testing receptive skills which then leads to the issue of validity of the test. Since the test is internationally administered, the wasback is inevitably tremendous. Therefore, more researches on this standardized test on the issue of validity are required for the effective use of the test.