We learn, therefore we score?
“All tables apart, we start with a test today!” In the Netherlands, primary schools are currently obliged to participate in a ‘pupils monitoring system’ to regularly evaluate students’ academic performance (math and language) with standardized tests. The claimed advantage of standardized tests is that they provide an ‘unbiased’ record of a student’s progress over time. While it might be a good idea to track children’s academic development, there are also some serious disadvantages that should make us careful.
Quantum Mechanics in the Classroom: The Heisenberg Effect
In the last years, testing has become an integral part of schooling. Researchers have pointed out that, because of this trend, schooling might start to resemble test-training. Amrein and Berliner (2002) call this the Heisenberg effect, a term adapted from quantum mechanics:
“The more important any quantitative social indicator becomes in social decision-making, the more likely it will be to distort and corrupt the social process it is intended to monitor”
For example, if schools’ average test scores are made public, they can eventually become stigmatized as―let’s say―low-performing schools. This may lead them to find ways to improve students’ scores, for example by intensive test-training in the classroom, or preventing students with learning disabilities from admission. And there you have your Heisenberg effect: the instrument that was initially intended to monitor student’s learning is now partly determining what students practice in school.
The tests themselves, on the other hand, may have pitfalls as well. Messick (1989) first raised the problem of ‘construct-irrelevant noise’ by showing that students with reading difficulties often score lower on math tests that require reading. This was one of the first indications that standardized tests do not only measure the construct they claim, but that their outcome is highly influenced by other interfering factors. These factors include students’ reading difficulties (that is, if the test is not a reading test), attention problems, and communication problems, such as a limited vocabulary, difficulties to interpret questions or to verbalize answers. This is a considerable problem for students with learning difficulties, who score significantly lower in all academic domains (Reid et al., 2004). In our study, for example, we found that special needs students scored significantly lower than their peers in regular education on two standardized tests of academic performance. When working on hands-on scientific tasks together with a teacher, however, no substantial differences were found in their level of reasoning (Van Der Steen et al., 2012).
“Test scores are not always the objective context-independent measures of students’ understanding they are claimed to be”
If we want to eliminate the disadvantages of standardized tests as much as possible, we might be better served with universally designed testing methods. This would not only help to diminish the issue of construct-irrelevant variance, but it might also change society’s ideas about the accuracy of single standardized test scores and the consequences attached to these (although it would not offer a complete solution for the Heisenberg effect, I agree).
“Applying the universal design principles to standardized tests reduces barriers in educational material and instruction by providing accommodations and supports for all students, including students with disabilities or developmental delays” (Rose & Meyer, 2002)
For example, computerized universally designed tests would contain text-to-speech software, a build-in dictionary to help students understand the wording of the questions, and other suitable facilities. Students would be able to either type in their answers, or record their verbal answers. In the case of multiple choice questions, it is even possible to let the computer program assess students’ performance automatically, and adaptively select the following item.
In sum, scoring on tests does not equal learning, but if we do use tests, all children should get an equal opportunity to score well. A ‘universal design’ would highly increase the accessibility of tests for all student populations―even for those that are now lagging behind.
Relevant Publications and Links
Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing & student learning. Education policy analysis archives, 10, 1-74.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5-11. doi: 10.3102/0013189X018002005
Reid, R., Gonzalez, J. E., Nordness, P. D., Trout, A., & Epstein, M. H. (2004). A meta-analysis of the academic status of students with emotional/behavioral disturbance. The Journal of Special Education, 38, 130-143. doi: 10.1177/00224669040380030101
Rose, D. H., & Meyer, A. (2002). Teaching every student in the digital age: Universal design for learning. Alexandria, VA: Association for Supervision and Curriculum Development.
Van Der Steen, S., Steenbeek, H., Wielinski, J., & Van Geert, P. (2012). A Comparison between Young Students with and without Special Needs on Their Understanding of Scientific Concepts. Education Research International, 2012. doi: 10.1155/2012/260403