A. TYPES OF LANGUAGE TESTS
Introduction
A test, often administered on paper or on the computer,
intended to measure the test-takers' or respondents' (often a student) knowledge, skills, aptitudes, or
classification in many other topics (e.g., beliefs). Tests are often used in education, professional certification, counseling, psychology,
the military, and many other
fields. One of the test kinds is language test. All language tests are not of the same kind.
They differ mainly in terms of design (method)
and purpose. In terms of method, a broad distinction can
be made between pen-and-paper language tests and performance
tests.
The needs of assessing the outcome of learning
have led to the development and elaboration of different test formats. Testing
language has traditionally taken the form of testing knowledge about language,
usually the testing of knowledge of vocabulary and grammar. Stern (1983, p.
340) notes that „if the ultimate objective of language teaching is
effective language learning, then our main concern must be the learning
outcome‟. In the same line of thought, Wigglesworth (2008, p. 111)
further adds that “In the assessment of languages, tasks
are designed to measure learners‟ productive language skills through performances
which allow candidates to demonstrate the kinds of language skills that may
be required in a real world context.” This is because a “specific
purpose language test is one in which test content and methods
are derived from an analysis of a specific purposes target language
use situation, so that test tasks and content are authentically representative
of tasks in the target situation” (Douglas, 2000, p.
19).
Thus, the issue of authenticity is central to
the assessment of language for specific functions. This is another way of saying that testing is
a socially situated activity although the social aspects have been relatively
under-explored (Wigglesworth, 2008). Yet, language tests differ with respect to
how they are designed, and what they are for, in other words, in respect to
test method and test purpose. In terms of method, we can broadly distinguish
traditional paper-and-pencil language tests from performance tests.
Paper-and-pen
tests are typically used for the assessment of
·
separate
components of language (grammar, vocabulary …)
·
receptive
understanding (listening & reading comprehension)
In
performance tests language skills are assessed in an act of
communication. e.g. tests of speaking and writing where:
- extended samples of speech/writing is elicited
- judged by trained markers
- common rating procedure used
There
are several types of common tests, namely Objective test; Subjective test; Direct test; Indirect Tests; Discrete-Point test; Integrative Tests;
Aptitude test; Achievement test; Proficiency Tests; Norm-referenced test; Criterion-referenced
test; Speed Test; and Power test.
A. Types of Language Tests
1.
Objectve vs. Subjective
Tests
Objective test is a psychological test that measure an individual's
characteristics in a way that is independent of rater bias or the examiner's
own beliefs, usually by the administration of a bank of questions that are
marked and compared against exacting scoring mechanisms that are completely
standardized, much in the same examinations are administered. Objective tests
are often contrasted with projective tests, which are sensitive to rater or
examiner beliefs. Objective tests tend to be more reliable and valid than
projective tests, however they are still subject to the willingness of the
subject to be open about his/her personality and as such can sometimes be badly
representative of the true personality of the subject. Projective tests
purportedly expose certain aspects of the personality of individuals that are
impossible to measure by means of an objective test, and are much more reliable
at uncovering "protected" or unconscious personality traits or
features.
An objective test is built by following a
rigorous protocol which includes the following steps:
- Making decisions on nature, goal, target population, power.
- Creating a bank of questions.
- Estimating the validity of the questions, by means of statistical procedures and/or judgement of experts in the field.
- Designing a format of application (a clear, easy-to-answer questionnaire, or an interview, etc.).
- Detecting which questions are better in terms of discrimination, clarity, ease of response, upon application on a pilot sample.
- Applying a revised questionnaire or interview to a sample.
- Use appropriate statistical procedures to establish norms for the test.
Usually this type of tests are distinguished on the basis of the
manner in which they are scored. An objective test is said to be one of that
may be scored by comparing examinee responses with an established set off
acceptable responses or scoring key. A common example would be a multiple-choice
recognition test.
Conversely a subjective test is said to require scoring by
opinionated judgement, hopefully based insight and expertise, on the part of
the scorer. An example might be the scoring of free, written, composition for
the presence of creativity in a situation whereon operational definitions of
creativity are provided and where there is only one rater.
2.
Direct vs. Indirect
Tests
A test is said to be direct when the test actually
requires the candidate to demonstrate ability in the skill being sampled. It is
a performance test. For example, if we wanted to find out if someone could
drive a vehicle, we would test this most effectively by actually asking him to
drive the vehicle. In language terms, if we wanted to test whether someone
could write an academic essay, we would ask him to do just that. In terms of
spoken interaction, we would require candidates to participate in oral
activities that replicated as closely as possible [and this is the problem] all
aspects of real-life language use, including time constraints, dealing with
multiple interlocutors, and ambient noise. Attempts to reproduce aspects of
real life within tests have led to some interesting scenarios
Direct tests, try to introduce authentic tasks,
which model the student’s real life future use of language. Such tests include:
·
Role-playing.
·
Information gap tasks.
·
Reading authentic texts, listening
to authentic texts.
·
Writing letters, reports, form
filling and note taking.
·
Summarising.
Direct tests are task oriented rather than test
oriented, they require the ability to use language in real situations, and they
therefore should have a good formative effect on your future teaching methods
and help you with curricula writing. However, they do call for skill and
judgment on the part of the teacher.
An indirect test measures the ability or knowledge
that underlies the skill we are trying to sample in our test. So, for example,
you might test someone on the Highway Code in order to determine whether he is
a safe and law-abiding driver [as is now done as part of the UK driving test].
An example from language learning might be to test the learners’ pronunciation
ability by asking them to match words that rhymed with each other. e.g :
One of these words sound different from the
others. Underline it.
door
law
though
pore
Indirect testing makes no attempt to measure
the way language is used in real life, but proceeds by means of analogy. Some
examples that you may have used are:
·
Most, if not all, of the discrete
point tests mentioned above.
·
Cloze tests
·
Dictation (unless on a specific
office skills course)
Indirect tests have the big advantage of being
very ‘test-like’. They are popular with some teachers and most
administrators because can be easily administered and scored, they also produce
measurable results and have a high degree of reliability.
3.
Discrete-Point vs.
Integrative Tests
Discrete-Point tests are based on an analytical view of
language. This is where language is divided up so that components of it may be
tested. Discrete point tests aim to achieve a high reliability factor by
testing a large number of discrete items. From these separated parts, you can
form an opinion is which is then applied to language as an entity. You
may recognise some of the following Discrete Point tests:
1.
Phoneme recognition.
2.
Yes/No, True/ False answers.
3.
Spelling.
4.
Word completion.
5.
Grammar items.
6.
Most multiple choice tests.
Discrete-point testing
assumes that language knowledge can be divided into a number of independent
facts: elements of grammar, vocabulary, spelling and punctuation,
pronunciation, intonation and stress. These can be tested by pure items
(usually multiple-choice recognition tasks). Discrete-point are designed to measure knowledge of performance in a very restricted target
language. Thus test of ability to use correctly the perfect tenses of English
verbs or to supply correct prepositions in a cloze passage may be termed a discrete-point.
Integrative
tests, on the other hand, are
said to tap a greater variety of language abilities concurrently and therefore
may have less diagnostic and remedial-guidance value and greater value in
measuring overall language proficiency.
Such
tests usually require the testees to demonstrate simultaneous control over
several aspects of language, just as they would in real language use
situations. Examples of Integrative tests that you may be familiar with
include:
1.
Cloze tests
2.
Dictation
3.
Translation
4.
Essays and other coherent writing
tasks
5.
Oral interviews and conversation
6.
Reading, or other extended samples
of real text.
Integrative testing argues that any realistic language use requires
the coordination of many kinds of knowledge in one linguistic event, and so
uses items which combine those kinds of knowledge, like comprehension tasks,
dictation, speaking and listening.
Discrete-point testing risks ignoring the systematic relationship
between language elements; integrative testing risks ignoring accuracy of
linguistic detail.
Frequently an attempt is made to achieve the best of all possible
worlds through the construction and the use of test batteries comprised of
discrete-point subtest for diagnostic purposes, but which provide a total score
that is considered to reflect overall language proficiency. The comparative
success or failure of such attempt can be determined empirically by reference
to the data from tests administrations. Farhady (1979) presents evidence that “There
are no statistically revealing differences” between discrete-point and
integrative tests.
4.
Aptitude, Achievement,
and Proficiency Tests
An aptitude is an innate,
acquired or learned or developed component of a competency (being the others: knowledge, understanding and attitude)
to do a certain kind of work at a certain level. Aptitudes may be physical or mental. The innate nature
of aptitude is in contrast to achievement, which represents knowledge or
ability that is gained.
Aptitude tests are most often used to measure the suitability of a candidate for a
specific program of instruction or particular kind of employment. For this
reason these tests are often used synonymously with intelligence tests. A
language aptitude test is designed to measure the students’ probable
performance in a foreign language which they have not started to learn.
Aptitude tests generally
seek to predict the students’ probable strengths and weaknesses in learning a
foreign language by measuring performance in an artificial language.
An achievement test is a test of developed skill or knowledge. The
most common type of achievement test is a standardized
test developed to measure skills and knowledge learned in a given
grade level, usually through planned instruction, such as training or classroom
instruction. Achievement test scores are often used in an educational
system to determine what level of instruction for which a student is prepared.
High achievement scores usually indicate a mastery of grade-level material, and
the readiness for advanced instruction. Low achievement scores can indicate the
need for remediation or repeating a course grade.
Achievement tests reflect a
student's ability and willingness to learn and show, on a percentage basis, how
much of the training and materials presented in a particular class were
absorbed. A score of 90% on an achievement test would indicate that the student
had understood and carefully covered about 90% of what was presented in a
particular class. A score of 40% would indicate that the student had only
accomplished 40% of the class goals.
Another type of test
measures overall language proficiency is called a proficiency test. This
is a test that globally measures how much of a language the student has
acquired over a period of time from all sources. It may represent a few months
of study or years of study and use of the language. Think of a proficiency test
as showing the "tip of an iceberg." A scientist can measure the tip
of an iceberg and calculate, with a great deal of accuracy, how much ice is
under the water. A language proficiency test looks at a carefully selected
group of language items and the results determine how much of the whole
language is probably understood. The score is not a "percentage" of
anything. It is a number that provides useful information based on consistent
results.
5.
Speed test vs. Power
test
A speed test is one in
which the items are so easy that every person taking the test might be expected
to get every item correct. In a speed test the scope of the questions is
limited and the methods you need to use to answer them is clear. Taken
individually, the questions appear relatively straightforward. Speed test are
concerned with how many questions you can answer correctly in the allotted
time.
For example:
139 + 235
|
|||
A) 372
|
B) 374
|
C) 376
|
D) 437
|
Power test by definition is the test that allow sufficient time for
every person to finish, but that contain such difficult items that few if any
examines are expected to get every item correct. A power test will present a
smaller number of more complex questions. The methods you need to use to answer
these questions are not obvious, and working out how to answer the question is
the difficult part. Once you have determined this, arriving at the correct answer
is usually relatively straightforward.
For example:
Below are the
sales figures for 3 different types of network server over 3 months.
Server
|
January
|
February
|
March
|
|||
Units
|
Value
|
Units
|
Value
|
Units
|
Value
|
|
ZXC43
|
32
|
480
|
40
|
600
|
48
|
720
|
ZXC53
|
45
|
585
|
45
|
585
|
45
|
585
|
ZXC63
|
12
|
240
|
14
|
280
|
18
|
340
|
In which month was the sales value
highest?
|
||
A) January
|
B) February
|
C) March
|
What is the unit cost of server
type ZXC53?
|
||
A) 12
|
B) 13
|
C) 14
|
In summary, speed tests contain more items than power tests although
they have the same approximate time limit. Speed tests tend to be used in
selection at the administrative and clerical level. Power tests tend to be used
more at the graduate, professional or managerial level. Although, this is not
always the case, as speed tests do give an accurate indication of performance
in power tests. In other words, if you do well in speed tests then you will do
well in power tests as well.
B.
Norm-referenced test vs. Criterion-referenced test
1. Norm-Referenced Tests
According to Paul (1995) norm-referenced tests are formal
assessments and have specific properties that allow a meaningful comparison of
performance among children. These properties include clear administration and
scoring criteria; validity, reliability, standardization, central tendency,
standard error of measurement, and variability measures; and norm-referenced
scores. Mills and Hambleton (1980) stated norm-referenced assessments are
constructed to facilitate comparisons among individuals in relation to the
performance of the normative group. A standardized test is also used when
comparing a child to the norm. Standardization is defined as the process of
administering a test under uniform conditions to each child who is tested
(Montgomery & Connolly, 1987).
There are advantages and disadvantages for using norm-referenced
tests. Norm-referenced tests will provide evidence regarding the existence of a
problem, suggest a need for further assessment, and/or help document a need for
the initiation or continuation of therapy (McCauley & Swisher, 1984).
Montgomery and Connolly (1987) reported that norm-referenced tests were
designed to delineate differences among individuals and used for diagnostic and
placement purposes. Johnson and Martin (1980) concluded that norm-referenced
tests spread out individuals along a continuum of performance in order to
detect deviations from the average.
McCauley and Swisher (1984) noted disadvantages to norm-referenced
tests if misused. A misused norm-referenced test can lead to (a) a mistaken
understanding of an individual’s problem, (b) an inappropriate and fruitless
therapy program, and (c) an inaccurate conclusion regarding efficacy of
therapy. Another disadvantage for norm-referenced tests is that the comparison
of a test taker’s score to the relative norms involves a comparison of
estimated, rather than absolute, or true values.
Besides the disadvantages mentioned above, McCauley and Swisher
(1984) reported four specific problems in the use of norm-referenced tests. The
first problem is using age-equivalent scores as test summaries. “This problem
concerns the relation of age-equivalent scores and the raw scores on which they
are based” (Saliva & Ysseldyke, 1981, p. 67). With most norm-referenced
tests, similar differences in age-equivalent scores are the result of smaller
and smaller differences in raw scores (McCauley & Swisher, 1984). This
problem is not necessarily based directly on evidence collected for children at
that chronological age and can serve as a basis of misinterpretation. A second problem
is the profile analysis. McCauley and Swisher (1984) stated the scores to be
compared in a profile, on norm-referenced tests, are only estimates of the
ideal or true scores one would obtain if the scores were free from measurement
error. Performance on individual test items as indications of deficit is the
third problem. That is, the small number of items on a norm-referenced test
cannot adequately sample all of the specific forms and developmental levels
that might be appropriate. The fourth problem with using norm-referenced tests
is the repeated testing as a means of assessing progress. The result is
underestimation or overestimation of change, since the individuals are able to
learn the items on the test. These problems demonstrate that norm-referenced
tests provide incomplete and possibly misleading information for the
formulation of language objectives and language analyses.
2. Criterion-Referenced Tests
Paul (1995) proposed that criterion-referenced tests are procedures
devised to examine a particular form of communicative behavior.
Criterion-referenced tests do not reference to other children’s achievement but
only determine if the child can attain a certain level of performance.
Montgomery and Connolly (1987) stated that criterion-referenced tests document
individual performance in relation to a domain of information or specific set
of skills. Therefore, criterion-reference tests are designed to measure changes
in successive performance in an individual. Criterion-referenced tests are used
specifically for program planning and evaluating; however, they can also be
standardized.
Much like the norm-referenced tests, criterion-referenced tests have
their own advantages and disadvantages. One advantage for the
criterion-referenced tests is their scoring procedures. This type of test is
based on absolute rather than relative standards. Its primary use is to measure
mastery of specific skills and test 15 items, based on known performance
objectives associated with the tasks of interest. Criterion-referenced tests
are sensitive to and can be used to measure the effects of instruction, based
on task analysis, related directly to instructional objectives. Sensitivity is
defined as the accuracy with which the test identifies children with language
impairment as language impaired (Merrell & Plante, 1997). The ability to tie the test directly to the
program objectives is another benefit of criterion-referenced tests. Freeman
and Miller (2001) reported that criterion-referenced tests were consistently
rated as the most useful assessment tool, both for understanding the child’s
abilities and needs, and for planning teaching responses to them. This
assessment tool refers directly to the curriculum, and is likely to be
considered comprehensible and relevant.
Although there are a number of advantages for criterion-referenced
tests, there are a few disadvantages that need to be mentioned. One
disadvantage includes the inability to assign age levels if not normed or
administered in a standardized manner. MacTurk and Neisworth (1978) stated
another disadvantage for the criterion-referenced tests is the lack of
comparative interpretability.
3. Similarities between
Norm-Referenced and Criterion-Referenced Tests
Even though norm-referenced and criterion-referenced tests have many
differences, there are a few similarities. For example, criterion-referenced
and norm-referenced tests should demonstrate the same interpreter and
test-retest reliability (Montgomery & Connolly, 1987). Issues of validity,
such as content, concurrence, and predictive value, should also be similar
between the two tests when administered.
4. Differences between Norm-Referenced and
Criterion-Referenced Tests
McCauley (1996) summarized the differences between norm-referenced
and criterion-referenced tests in a simplistic way. The first difference is the
fundamental purpose of both tests. The fundamental purpose of norm-referenced
tests is to rank individuals, whereas the fundamental purpose of criterion
referenced tests is to distinguish specific levels of performance. A second
difference is the test planning. Norm-referenced tests address a broad content
and criterion-referenced tests address a clearly specific domain. Lastly, a
third difference is how the individual’s performance is summarized. With norm referenced
tests, performance is summarized meaningfully by using percentile ranks and
standard scores; and criterion-referenced test performance is summarized
meaningfully by using raw scores.
Many educators and members of the public fail to grasp the distinctions
between criterion-referenced and norm-referenced testing. It is common to hear
the two types of testing referred to as if they serve the same purposes, or
shared the same characteristics. Much confusion can be eliminated if the basic
differences are understood.Many educators and members of the public fail to
grasp the distinctions between criterion-referenced and norm-referenced
testing. It is common to hear the two types of testing referred to as if they
serve the same purposes, or shared the same characteristics. Much confusion can
be eliminated if the basic differences are understood. The following is adapted from: Popham, J. W.
(1975). Educational evaluation. Englewood Cliffs, New Jersey:
Prentice-Hall, Inc.
Criterion-Referenced
Tests |
Norm-Referenced
Tests |
|
To determine whether each student has achieved specific
skills or concepts.
To find out how much
students know before instruction begins and after it has finished.
|
To rank each student with respect to theachievement of
others in broad areas of knowledge.
To discriminate
between high and low achievers.
|
|
Measures specific skills which make up a designated
curriculum. These skills are identified by teachers and curriculum experts.
Each skill is
expressed as an instructional objective.
|
Measures broad skill areas sampled from a variety of
textbooks, syllabi, and the judgments of curriculum experts.
|
|
Each skill is tested by at least four items in order to
obtain an adequate sample of student performance and to minimize the effect
of guessing.
The items which test
any given skill are parallel in difficulty.
|
Each skill is usually tested by less than four items.
Items vary in
difficulty.
Items are selected
that discriminate between high
and low achievers. |
|
Each individual is compared with a preset standard for
acceptable achievement. The performance of other examinees is irrelevant.
A student's score is
usually expressed as a percentage.
Student achievement is
reported for individual skills.
|
Each individual is compared with other examinees and
assigned a score--usually expressed as a percentile, a grade equivalent
score, or a stanine.
Student achievement is
reported for broad skill areas, although some norm-referenced tests do
report student achievement for individual skills.
|
References
Anastasi, A. (1988). Psychological Testing. New York,
New York: MacMillan Publishing Company.
Buck, Gary (1989) Written tests of pronunciation: do they work? ELT
Journal, no. 43, pp.
50-56.Corbett, H.D. & Wilson, B.L. (1991). Testing,
Reform and Rebellion. Norwood, New Jersey: Ablex Publishing Company.
Popham, J. W. (1975). Educational evaluation.
Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
Romberg, T.A., Wilson, L. & Mamphono Khaketla
(1991). "The Alignment of Six Standardized Tests with NCTM
Standards", an unpublished paper, University of Wisconsin-Madison. In Jean
Kerr Stenmark (ed; 1991). Mathematics Assessment: Myths, Models, Good Questions,
and Practical Suggestions. The National Council of Teachers of Mathematics
(NCTM)
Stenmark, J.K (ed; 1991). Mathematics Assessment: Myths,
Models, Good Questions, and Practical Suggestions. Edited by. Reston, Virginia:
The National Council of Teachers of Mathematics (NCTM)
Stiggins, R.J. (1994). Student-Centered Classroom
Assessment. New York: Merrill
U.S. Congress, Office of Technology Assessment (1992).
Testing in America's Schools: Asking the Right Questions. OTA-SET-519
(Washington, D.C.: U.S. Government Printing Office)
No comments:
Post a Comment
Give your positive comments.
Avoid offensive comments.
Thank you.