Adaptive testing has been around since at least the early part of the 20th century. The goal has always been to measure something — such as IQ, academic progress or personality traits — with the same precision as more traditional assessments but with fewer questions and less time, or with greater precision.
Early efforts had their drawbacks, of course, but we’ve come a long way in the intervening century or so. There is still much progress to be made, but some promising research today will likely change the way we assess students in the coming decade and beyond.
Early Adaptive Testing Efforts
An early example of adaptive testing was the Stanford-Binet intelligence test. Around since the early 20th century, it was administered differently from anything else at the time. Designed to assess examinees ranging in age from early childhood to adult, the Stanford-Binet consists of a battery of several distinct fixed-form subtests at each mental age level. The examiner uses available information about the examinee to start testing at an age level lower than the examinee’s expected mental age, then proceeds to administer subtests at different age levels until a basal age level and a ceiling age level are established. Subtests below the basal or above the ceiling level need not be administered, and the test score was based on performance within the levels that were used.
Other researchers were interested in the area of adaptive testing, but Dr. David Weiss was one of the first to have some significant funding behind his research beginning in the early 1970s, when I joined him. Based at the University of Minnesota, Weiss was a counseling psychologist who explored the use of computers to administer IQ tests and assessments of personality traits and vocational interests.
When Weiss was in final negotiations with the Office of Naval Research on a contract to research computer-adaptive testing (CAT), I had just begun grad school at the university and was lucky enough to be invited to join the team. Weiss acquired what was called a “mini-computer” — it still took up most of a 9’ x 12’ room — that could drive four or five terminals for test-takers at a time. With 50,000 students at the university, we had a rich source of experimental subjects, and so we were off, trying to validate the concept of CAT and the many varied approaches to it.