FULL TEXT:
I grew up in a big family—6 kids—and when we were little, we had a ritual that you may recognize. First we’d take off our shoes and stand—as tall as we could—up against the wall by the kitchen door. Then Mom or Pop would mark our height and write our name and the date next to the mark. We’d do this every six months or so, and that let us see if we were getting taller. I’m sure lots of families still do that.
Public education has embraced that concept. Naturally, educators have given it a fancy name, ‘the growth model.’ In education it means testing a student at the beginning of the year and then again at the end, to see how much the student has learned.
This education ‘growth model’ is now the latest ‘best idea ever.’ Many liberals, who think the federal law known as No Child Left Behind placed too much emphasis on a single end-of-the-year test, approve of the ‘before and after’ approach. Some conservatives see the growth model as a way of finally being able to measure teacher effectiveness.
The family growth model works especially well if families don’t move to a new house. But, even if a family moves from New York to Oregon, the measure—that yardstick—remains the same. Parents can copy the numbers, put them on the wall of the new kitchen, and keep on taking measurements.
However, adapting the growth model for schools is problematic in two important ways. In many urban schools the student turnover rate often exceeds 50 percent. By late spring more than half of the kids who started at one school in September are now going to other schools. A niece of mine who taught elementary school in Orlando told me that her class changed 40 percent by mid-January. If she taught half of her kids only half of the year, who’s accountable for learning? Is she, or the other teachers? Or, to put it in terms the conservatives can easily understand, which teacher are you going to punish?
So it doesn’t work as a reliable way of holding teachers accountable. What about for kids? There’s a problem here as well. The yardstick that parents use is the same wherever they are: 36 inches to a yard. Education is different. While tests can be given at the beginning and the end of the year no matter how many schools a kid attends, if the test results are to have any genuine meaning, the curriculums must fit, and the tests must be assessing what’s been taught. That’s rarely true.
In other words, given the high rate of mobility, to have a valid growth model in education we need a common yardstick and a generally agreed upon curriculum. That means national standards, a direction we are now moving in. Nearly all the states have endorsed national standards, but let’s not rush to set standards without a vigorous debate about what belongs in the curriculum. Because an inevitable corollary of national standards is common measures—and probably common tests as well, let’s figure out what sort of performance measures make sense, before we—educationally speaking–put our children up against the wall.
There are at least 3 more serious problems with the so-called “growth” model than are identified in this blog.
1. It does not solve the problem you name – “too much emphasis on a single test.” Giving such a test twice a year just doubles the problem. The single most commonly raised criticism of NCLB in surveys such as Gallup-Phi Delta Kappa is the narrowing of curriculum coupled with teaching to the test. This may increase that joint problem without solving the problem of the narrow, limited tests that fail to adequately assess whether a student has done more than memorize. Having one national test won’t solve the core testing problem either.
2) With teaching to the test comes score inflation. In her blog of June 9 on Education Week (www.edweek.org), Diane Ravitch points to recent implausible score gains in New York state and thence to a linked article from June 7 in the NY Post in which researchers point to teaching to the test as the source of the score inflation that is rampant in NY (and in many if not most states). That is, it is as though every year the inch mark on your ruler shrank,so every year you would appear taller than you really are, and that you had grown more than you really have. Growth models don’t solve this, but simply perpetuate this problem of public disinformation. Nor will a national test solve this problem. Though testmakers might make each year’s test somewhat more different from last year’s than is now the case in NY, to have tests with no meaningful connection to what is tested the year before will only produce other forms of meaningless data with which to beat up schools, students, parents. It is high stakes attached to the tests that drives this problem. NCLB, in this regard, could be made worse by proposals for “pay for test scores” coming from Secretary Duncan and others.
3. Technical details. In many ways the technology of “growth” is not ready for prime time. The consequence of the technical limitations is we won’t know what causes growth or how much of the apparent growth to believe is real (in addition to the teach to the test factor). Further, outside of a few technical experts, no one really even understands the complex statistical work that produces the apparently exact numbers that can weave their magical spell over the public and policymakers alike.
John, since you end with support for a national test, I think you need to able to explain how a national test will not merely displace to one test the greater problems, outlined above, that the nation faces with the testing and accountability laws and policies now in force across the US.
LikeLike
Also, John, our height is “out of our hands”–more or less. But what and how we learn isn’t–and differences of opinion on both these issues can’t be smoothed over by getting a common ruler (in either sense of that word). We need both a deeper discussion of what it means to be “well-educated” and respect for the very strong possibility that our disagreements on this matter are not solvable by majority vote.
Let’s keep the metaphors going, though, because they help us think this through. I think about how, if I had to, I’d “rank” my three children. I rank them all #1, but if required (as standardized tests do) to differentiate I couldn’t find a common ruler that would work. Any “test” I administered would be pre-disposed ahead of time to reward one and penalize the other. Three’s just no gtting around the act that I consider all three well-educated, with differences…..
Best, Deb
LikeLike
A big plus is looking at change rather that absolute levels.
What would work best would be student-developed portfolios and occasional oral examinations.
If we ever decide learning is really important, maybe we’ll actually seek solutions such as this – rather than always shooting everything down except for poorly considered fads and buzz words.
LikeLike
Three strong responses from people I respect and probably agree with more than I disagree. I am not pushing for a national test, and I am worried about a rush to national standards. We need to debate what it means to be educated, and then worry about measuring.
But students are going to be tested and measured, and they should be. That’s unavoidable. I happen to believe, with Don Hirsch, in a common core of knowledge and skills and values. Not trivial pursuit stuff, of course, but the core of a culture.
LikeLike
You say that the growth model works especially well if the family doesn’t move to a new house. This is because every time you have to relocate the yardstick or other measuring device there’s a greater chance of measurement error.
This is also true every time a different edition of a standardized test is given than the one given the year before. As much as the effort is made to make the measurements equivalent, there is measurement error.
Of course, the exact same test could be given at the start of the year and at the end, but as Mr. Neil says, this will only lead to more teaching to the test.
You say we need to debate what it means to be educated and I agree, but much of the country, including its most recent and present leaders, believe that debate has been settled by defining being educated as getting a good score on the test. We can’t have the debate about what being educated really means until we can convince our representatives in the federal and state governments that the question is still open.
LikeLike
I’m glad you point out the effect of mobility on measuring student performance using existing measures. Ironically, student mobility is especially problematic for the high-needs schools, which in turn contributes to their chronic poor performance.
I agree we need much more thoughtful discussion of national standards, particularly given the responses to the idea of national standards when educators tried to broach the issue within our content areas. It was often contentious; sometimes productive; but in the larger policy arena, often disregarded.
I would hope concurrent to those discussions, we would address the very real need to develop more comprehensive and rigorous methods of evaluating student learning, and how to use the methods efficiently at state or national levels. This is a major research and development task, that would require major funding support.
LikeLike
I loved the opening of this piece. My Mom’s parents have this big wonderful old house upstate and they devoted a wall right in the middle of the living room for this kind of measurement. I love going there and pouring over all the pencil marks and growth spurts of the children and grandchildren (and even my little self). Needless to say, you’ve made me quite nostalgic, and I’m calling my grandfather as soon as I get home!
LikeLike
John,
You write in your blog post: “…an inevitable corollary of national standards is common measures—and probably common tests.” Then in your response to the points made by Meier and Neill you say you are “not pushing for a national test.” You could have fooled me. The whole emphasis of your critique of No Child Left Behind was that there was not a common set of standards and performance measurements in place across the country, and that seems to be the thrust of this blog post as well. The implied solution is indeed a national set of tests.
In California, we have had statewide standards of the “tough” sort for at least a decade. Has this common yardstick yielded the kind of results you seem to think will come when curriculum and standards are aligned across the nation? From where I sit in Oakland, I see continued inequity — often intensified by the drive to produce high test scores in reading and math. I think the push to create national standards, and tests aligned to them, is an exercise in distraction. It takes our mind off the fact that the test-driven approach embedded in NCLB as the core of reform has failed to yield the promised results. We do not need more standards, or common standards. We need attention to the underlying conditions that affect student performance. That means paying attention to the physical and mental health of our children. It means giving our teachers adequate pay to sustain them in this career (rather than the revolving doors we have installed in many urban districts), and giving teachers time and creative space to collaborate and learn together as professionals.
LikeLike
I don’t disagree with Anthony’s points about what we need, but I fail to see his list and my point about higher standards as being mutually exclusive. I feel we need lots of dialogue–at every level–about expectations, standards and resources. Where are we going? If we don’t address that (and we don’t have to have one destination, of course), then we will end up micromanaging the details of the process. “Giving teachers time and space to collaborate” is all well and good, but to what ends? What are our expectations? Our goals for public education beyond teacher collaboration?
LikeLike
Dear Mr. Merrow and Fellow Commentors,
I think, for the most part, you’re all on the right track. But I’d like to add some ideas here that come from my practical experience with testing in schools – and the data we already have on the efficacy of testing as a result of NCLB.
In short, the arguments against growth model testing are much worse than I think we are aware. Here are just a few of the reasons why we should oppose growth model testing:
1. TWICE AS MUCH TESTING. First of all, growth model testing would double the number of tests kids take. Kids are already over-tested and over-prepped for tests. Is this really the direction in which we want to continue to move? I don’t think so. Already, in many states, as much as 10-20 days can be lost to testing. Why would doubling this amount be better when current research suggests that it is not?
2. MORE TEACHING TO THE TEST. We already have a national epidemic of teaching to the test. Growth model testing will only increase this. What teacher, being measured under the growth model, wouldn’t, having seen the pre-test, teach directly to it in preparation for the post-test?
3. DIFFERENT POST-TEST OR SAME POST-TEST. Do we use the same post-test or a different one? If we use the same post-test, we measure growth more accurately except for the “test familiarity” factor which, in these tests, can be significant. If we use a different post-test, how do we know the correlation between the two tests is accurate? We certainly can’t trust states to handle this as they are well known to have goosed their scores in the past with a variety of test-development and statistical scoring strategies to make themselves look better than they really are.
4. LACK OF TIMELY RESULTS. Turnaround times from test to results are notoriously slow right now. If even a few states went to growth model testing, the testing services would be overwhelmed and therefore unable to produce timely results. Right now, the fastest some schools get their scores back is six weeks. What if that goes to 12 weeks? That’s a third of the year. Not much “formative” value in that, is there?
5. “SHALLOWER” TESTS. When we give more tests, we tend to give worse tests. We will test fewer subjects in the growth model approach. We will also use shorter tests with fewer questions. And we will rely more and more on tests that are entirely multiple choice. All of these things mean that the information we get from these tests will be of less value.
6. HIGHER TEST COSTS. Many states curtail tests in certain subjects now based on budget problems. Doubling the number of tests given each year will only encourage this practice. We will test fewer subjects and, as we have seen already, the curriculum will narrow even further.
7. LACK OF NECESSARY SKILLS DEVELOPMENT TIME. Let’s say our pre-test testing window stretches from week two to week four and that our post-test testing window stretches from week 32 to week 34. This means, on average, that kids will have just about 30 weeks to make progress. Many of the most important developmental indicators, particularly in literacy and math, do not show up well on that kind of time scale. This turns the school year into an artificially shortened sprint which shortchanges those kids who might indeed develop the requisite but just at a slightly slower pace.
8. IT’S EASY TO TANK THE PRE-TEST. Any teacher who has ever been in a pre- post-testing situation knows how to look good: tank the pre-test. There are two easy ways to do this: (1) Give the test very early in the year, preferably during the first few days of class. And (2) Give poor instructions. Fascinating research has been done on how test instructions alone affect results. Giving the test as soon as kids come back from vacation combined with an unenthusiastic recounting of test directions (and a concomitant lack of encouragement) makes for artificially depressed performance. Turn that around at the end of the year and you can make a good spread whether your kids have learned anything or not.
There are only three reasons why growth model testing is being suggested:
1. TESTING COMPANIES BENEFIT. It’s quite obvious where the money goes when it comes to testing. I don’t think you have to be an educational researcher to figure out why some corporations would support growth model testing.
2. POLITICIANS BENEFIT. To politicians, growth model testing has great appeal because it makes them appear to be more concerned about rigor and measurement. This is, of course, false. But politicians also know that the average voter won’t ever figure that out.
3. POLICY WONKS BENEFIT. Twice as many tests gives us twice as much data. And that means more grants for more studies on data-driven education. Even if the studies show growth model testing to be the boondoggle that it is, policy wonks and their research teams still benefit.
So who are the big losers in growth model testing?
1. TAXPAYERS. They pay more for testing while less is actually spent to educate kids.
2. TEACHERS. They get even more incentive to teach poorly and to view teaching as test prep rather than as the facilitation of learning.
3. KIDS AND FAMILIES. The biggest losers will be children and their families. Since the tests will be poor, and poorly administered, and the data slow to return, the tests will be useless in helping teachers guide appropriate instruction. Kids will view school not as a learning opportunity but simply as the part of their life when they are tested, judged, and found wanting. Parents will be further confused about why their children are being tracked into certain programs simply by scores on tests that were given out under suspect conditions.
What’s the solution?
Growth model testing is an inherently bad idea. It is being suggested for political reasons, not educational reasons. A better, and simpler, testing solution would be the following:
1. FEDERALIZE TESTING. Leaving states in charge of their own testing is like leaving the fox in charge of the hen house. This is an indisputable and well-documented fact. Testing must be federalized to keep state legislators and state departments of education from continue to use testing as an educational football they can punt around whenever they need votes.
2. USE THE NAEP AT GRADES 4 and 8. The NAEP is currently the gold standard is U.S. testing. Why not use it more broadly? Granted, it would take some changes to do this. But we’ve already decided that testing has to change anyway.
3. USE THE SAT AS A HIGH SCHOOL GRADUATION TEST. The SAT may not be everyone’s favorite test but it has stood the test of time. Statistically, it seems valid. It is difficult to teach to – as evidenced by the meager improvement results kids can get from so-called “prep” courses. And we all understand the scoring. Graduating, or passing, scores could be indexed to the minimum scores needed to apply for four-year colleges in a given state. This would ensure that kids who passed would at least meet one of the requirements of going on to college. Kids would be offered the test initially, just as they are now, at the end of their junior year in high school, with optional re-takes (just as we do now), during senior year.
Figuring out the right thing to do about testing isn’t hard – if you keep the needs of children first and foremost in your mind. It’s clear that we need some form of testing. What we don’t need is any more bad testing. Testing three times during a child’s school career is plenty – especially when school districts are free to insert their own formative assessments at any point along the way.
LikeLike
John Merrow says:
I don’t disagree with Anthony’s points about what we need, but I fail to see his list and my point about higher standards as being mutually exclusive. I feel we need lots of dialogue–at every level–about expectations, standards and resources. Where are we going? If we don’t address that (and we don’t have to have one destination, of course), then we will end up micromanaging the details of the process. “Giving teachers time and space to collaborate” is all well and good, but to what ends? What are our expectations? Our goals for public education beyond teacher collaboration?
John,
I agree that a dialogue about what we actually value would be valuable. I think I am troubled by the idea that this dialogue should be a national one, and the conclusions should become the prescription for every school in the nation. As I wrote in my blog recently, I think our communities and the needs within them are quite varied.
I also worry about the functioning of our democracy at this time in our history, when millionaires have ways of exerting far more influence than the people most affected by these decisions. A national dialogue seems especially vulnerable to this kind of domination.
LikeLike
John,
Ok, now the National Governor’s Association has revealed the sixty people who will be charged with writing the National Standards over the next six months. The process will be “confidential.” The participants include exactly ONE teacher.
This does not seem to represent much of a dialogue. What do you think?
LikeLike