Leigh Ann Emrick
University of Georgia
Does not compute; how validity concerns often undermine even the most basic conclusions drawn from students’scores on today’s ultra high-stakes tests. Of late, our society has suffered a great deal at the hands, or rather, the heads of self-proclaimed visionaries and social theorists. Theoretical economists espoused lending policies based on “new mathematics” that triggered the current economic crisis. Similarly, businessmen and politicians have attempted to reform the educational system into a new model, by applying ideas and strategies that began in the corporate world, or as speculation in the minds of those policy makers with little or no experience as professional educators. As an experienced teacher, I believe that the high-stakes tests on which such theories depend often provide a warped understanding of student’s capabilities as well as their ongoing skill progression. In their book Collateral Damage; how high stakes testing corrupts America’s schools, Sharon Nichols and David Berliner (2007) argue that “If, to avoid sanctions or receive rewards, you do things to make an indicator move in the desired direction, then you probably won’t be measuring the entity that you set out to measure” (p. 109). Thus, the data from such high-stakes tests have an inherent problem with validity, especially given the potentially disastrous results for schools and districts that fail to make progress under the draconian measures of the infamous No Child Left Behind (NCLB) legislation. Nichols and Berliner (2007) further subdivide the validity problem into six distinct categories, each with its own causes and ramifications, all of which greatly impact the way that testing results from such high-stakes programs should be viewed. Ultimately, part of the problem may be that proponents of choice and competition “assumed that higher test scores on standardized tests of basic skills are synonymous with good education” (Ravitch, 2010, p.111). I believe that they failed to anticipate the validity concerns that fatally undermined the results of such testing, as well as the effects of such testing on issues like the quality of instruction and curriculum that students would inevitably receive. The first kind of validity concern is the simplest, referred to as content validity. Basically, content validity is just a measure of the extent that the questions on any test coincide with the knowledge that it purports to test. When a history test from a single chapter contains questions on the vocabulary and figures from that chapter alone, it has good content validity. In my own classroom, content validity plays quite an important role. When our class reviews a story that has been read, the questions that are asked must pertain to the content of that book, an important part of my daily lesson plans. However, although it would seem to be a simple matter to ensure that the questions on such significant tests as state achievement tests for example, had great if not perfect content validity, there have still been significant problems with the content on such tests. Since state achievement tests are supposed to assess the standards set by the state school boards, it is surprising that a recent study found only 11 states had made tangible efforts to ensure such a perfect alignment of the standards taught and those assessed. Another important aspect of validity is construct validity. In the example I used above in my classroom, lesson planning played a role in content validity by ensuring that questions asked after a story came from material covered within the story itself. If I adapted questions to discover whether my reading activities were in fact improving the reading activities of my students, then I would be investigating