Local Assessment vs. Standardized Assessment: Value Added & Student Growth

Share Online Facebooktwitterredditpinterestlinkedinmail
Are Standardized Tests a Necessary Evil?

Are Standardized Tests a Necessary Evil?


I am a curator of educational news–which means that I read and engage myself in all things related to educational policy and instructional methods, especially those related to standards and assessment. I tweet about Common Core, PARCC, and SBAC. I post a weekly listserv policy digest for the Literacy Research Association. I curate three ScoopIt topics: one on the Common Core State ELA/Literacy Standards, a second entitled All Things PARCC, and my newest topic related to the weekly policy digest, Literacy & Research in Higher Ed. Using ScoopIt, I also post to LinkedIn and thereby support connected educators in the challenge of keeping up on the ongoing evolution of national education policies and effective instructional pedagogies. And then…I sometimes blog. Not often enough. However, over the last few days, I have found that a post curated through ScoopIt and shared on LinkedIn has been getting heavy discussion, discussion worthy of a blog post.

The curated post was an editorial in a New Jersey paper titled “Standardized Tests Necessary Evil.”  Let me summarize in case you don’t have time to read the post. The author begins by pointing out that many parents have established anti-testing groups and are opting their children out of PARCC testing. The author goes on to explain that the test will be hard and parents fear the implications of the testing outcomes as well as oppose the way the test is being delivered: via a computer rather than conventional pencil and paper. However, the latter portion of the editorial makes the author’s point clear: “…standardized tests are a necessary evil. But as long as school districts and the state keep closely monitoring their  effect on students–and on student performance–parents should support them” (paragraph 10). Interestingly, the post had received one comment from a reader who evidently didn’t read the editorial because she appears to thank the author on behalf of an anti-PARCC group. I’m not sure what that says about the literacy of the general public.

I posted the editorial to a discussion group on LinkedIn: ETS Educational Measurement, Psychometrics and Research. The ensuing discussion among researchers about perceptions and purposes of standardized assessment has been rich–garnering 30+ multi-paragraph comments in fewer than five days, many with supporting research links. What I share below is sampling of discussion entries:

Howard Wainer, one of the most respected research scientists in the field, opened the discussion with these comments:

1. Tests are no more a ‘necessary evil’ than is school, or work or water. In each case, they fill an important role, but too much of it can be dangerous. One recent study divided students into two groups, one group had an hour to study for an exam while the other group used that hour to take another exam, on the same subject of the one they were preparing for. The second group did better on the exam showing that the focus achieved by taking an exam was more effective at teaching the material than was self-study. This result has been replicated dozens of times.

2. Suppose we measured children’s heights at the beginning of the school year and again at the end, and then characterized the quality of the teacher by the average gain in height of their students. Obviously dopey, but why do we attribute causality for test performance? Students’ gains depend on the 9% of the time they spend in school but also on the 91% they don’t. Ever since the Coleman Report rigorously showed it we know that about 70% of student performance is determined by home factors. The use of “Value-added Models” (VAM) by states to assess teachers is an idea that only makes sense if we say it fast. It is the responsibility of those of us who know better to speak up. And those of us who do not, to remain silent. (Linked In Group Discussion: ETS, Educational Measurement, Psychometrics, and Research, December 24, 2014)

  • For a more detailed discussion of VAM at a non-technical level see chapter 9 in Dr. Wainer’s book:  Uneducated Guesses Using Evidence to Uncover Misguided Education Policies. Princeton, NJ: Princeton University Press, 2011.

  • For a more technical discussion see my chapter with Henry Braun:  Value-Added Assessment. In Handbook of Statistics (Volume 27) Psychometrics (Eds C. R. Rao and S. Sinharay). Elsevier Science, Amsterdam. 2007, pages 867-892.

Richard Phelps, testing scholar and economist, weighed in on the use of  no-stakes testing and the relationship of no-stakes and high-stakes assessments:

A “mainstream” idea in US psychometric circles has supported the use of VAM–that is, that student effort on no-stakes tests is consistent and directly comparable across student populations, despite the evidence that student effort on “tests that do not count” varies by at least several background factors. A corollary–that no-stakes student test performance can be used to “audit” the allegedly more unreliable results of high-stakes tests was promoted heavily in the draft version of the recently completed Standards for Educational and Psychological Testing. It is largely muted in the final version, but still there. These beliefs have advanced the careers of several highly visible scholars in the field, and my personal experience has convinced me that they are dogma that cannot be challenged short of ruining one’s own career. (Linked In Group Discussion: ETS, Educational Measurement, Psychometrics, and Research, December 29, 2014)

Jay Powell, founder of Better Schooling, critiqued the form  and function of standardized tests:

…the problem with standardized tests. They are scored for the frequency of match with an arbitrary subset of responses set by the test designers, at least some of whom have never taught school. They are cognizant of the language of the textbooks, but they may not be cognizant of the language used by teacher to explain these textbooks and they also may not be cognizant of the language used by the students themselves, particularly with students whose cultures differ from that assume by the test developers.
It is necessary but not sufficient that decisions made from test score come from the frequency of “right” answers.

The performance assessment must include every answer, but it does not. Hence the arbitrary scores being used actually destroy the integrity of the information contained in the test, invalidating the entire process. (Linked In Group Discussion: ETS, Educational Measurement, Psychometrics, and Research, December 28, 2014)

David Mott,  on NWEA’s MAP:

The MAP (by NWEA) is not a standards-referenced (objectives-referenced) test. It was not designed to be that. It is basically a norm-referenced test in new clothes: CAT, given three times a year (or four, with summer-school administration). It has other features which somewhat set it apart form more traditional NRTs. It does show growth, as do all NRTs, but it is a survey test. Any standards-referenced material presented is simply tacked on to it. It was never designed to give any data below the level of “Number and Operations”. No test can validly and reliably give standards-level feed back without at least 12 to 15 items per standard. My company, Tests for Higher Standards, publishes a “Grade Level Test” series that tries to survey the standards in one subject/subject (e.g. 4th grade Math) for a given state or CCSS. With between 45 to 80 items, we can usually “touch on” most of the standards (but not on most of the substandards). Teachers can perhaps get a “hint” or a “wif”of the mastery level of their class with from 3 to 5 items. If they want more or they want student-level actionable data, they will have to administer a test with at least 12 to 15 items per standard. (Linked In Group Discussion: ETS, Educational Measurement, Psychometrics, and Research, December 29, 2014)

For the most part, respondents who  know and value standardized assessment as an educational tool do not see it as a valid means of measuring student growth or teacher efficacy. Nor do I. Standardized tests are not evil; however, the misuse of standardized tests as measures for decisions over-reaching the assessment purpose and further fueled by those who rationalize the validity of such misuse are evil. Standardized testing has become increasingly common in education since the 1950s; however, the pushback to their growing applications is relatively new. The negative response to standardized assessment may be the result of the world’s growing awareness and dependence on research and therefore scrutiny regarding its use and misuse–not only in education but in all realms of our world (i.e., medical, agricultural, environmental, and manufacturing).

In my work, I am supporting the growth of teachers as they learn to construct their own assessments to measure students proficiencies with the curriculum taught (hopefully a written curriculum); moreover, I am working to help them understand and embed the Common Core Standards and next generation assessment practices in their own local assessments. The issue with this process is one of reliability since local assessments will have limited field testing in terms of time and numbers. At this time, local assessments can become predictors for standardized assessment outcomes–too many moving parts right now. However, with changes facing pre-service programs in the preparation of tomorrow’s teachers, that may come to pass. A number of states are developing policy requiring assessment writing and data analysis as part of pre-service teacher programs. If my professional rating were to be inflected by the analysis of student data acquired through assessment measures of student growth –I’d rather students be assessed using items measuring content, skills, and cognitive processes I had taught than items having evolved over the course of a normalized smoothing process of the last fifty years. That decision, though prudent, raises the bar for teacher performance, adding one more layer of skill, assessment writing, to the job description of educator–but puts teaching and learning squarely as the center of assessment.


Mott, D. (2014, Dec 29). Editorial: Standardized Tests Necessary Evil [Comment 1]. Comment posted to

Phelps, R. (2014, Dec 29). Editorial: Standardized Tests Necessary Evil [Comment 1]. Comment posted to

Powell, J. (2014, Dec 28). Editorial: Standardized Tests Necessary Evil [Comment 1]. Comment posted to

Wainer, H. (2014, Dec 24). Editorial: Standardized Tests Necessary Evil [Comment 1]. Comment posted to

Share Online Facebooktwitterredditpinterestlinkedinmail

Tags: , , , , , , , ,