Do Flat STAAR Tests Mean DISD Was Better Under Hinojosa? Of Course Not.

I got sidetracked on school finance, but I want to come back to something I promised last week: Looking at how the newspaper’s editorial writers fail to put school testing results in context. Since I just asked you to slog through 2K words on bridge plans and general operating funds, I will try to keep this as short and sweet as possible.

Recall I had problems with the way the DMN characterized the incomplete STAAR results. If you haven’t read Jeffrey Weiss’s excellent responses in the comments, please do so. He says that even if the paper should have been clearer about not including Spanish language STAAR results in its headline/analysis, the results when they’re included still are mixed-to-poor. I don’t disagree with him – I say in fact that I’m not defending DISD’s STAAR performance, I’m criticizing how incomplete data sets are presented to the public.

In any case, that DMN story begat three editorial board posts:

One by Tod Robberson, which we can ignore for obvious reasons.

One by James Mitchell, which was on the whole very good. He basically says that even if you give the district the benefit of the doubt on the Spanish language tests, the other testing data is not good enough. He’s right.

The one problem I have is a throwaway graph at the end of his post, one that presages the next post we’re going to discuss:

These numbers are a huge disappointment since former superintendent Mike Hinojosa had academic performance trending upward, and Miles promised improvement on steroids.

It suggests that DISD was making state testing gains before Miles arrived, with a student population that was almost equally poor as this one. Which is accurate but not really true.

The third post tries to make this point more concretely. In it, Rodger Jones makes the case that, looking back, state testing gains under Hinojosa look fantastic compared to test results today. Here are the three money graphs:

It’s ironic that Hinojosa ran for the door in 2011 having made strong performance gains in Dallas that seem unattainable today.
In the fifth round of statewide test scores under Hinojosa, there was a 12-point gain in the districtwide passing percentage on reading tests, a 9-point gain in writing tests and a 14-point gain in math tests. There was a 12-point gain among black students and 16-point gain among Latinos.
These were dramatic gains and took place under the old TAKS system. Statewide scores rose sharply over those years, but DISD’s gains eclipsed them.

What does this tell us?

That there were statewide gains during this time by students taking the TAKS tests — a test that was shortly thereafter phased out by the state in favor of STAAR. This tells us that all of Texas was getting smarter, and DISD’s kids were getting even smarterer. Before. But not now.

So, what the hell happened?

One thing that happened was that educators realized something was wrong with a test that continually showed year-over-year gains when other objective tests showed Texas (and U.S.) kids as stagnating.

First and foremost, look at NAEP scores for Texas from 2007 to 2011 and compare them to TAKS in Texas from 2007 to 2011. (You can get more background on the National Assessment for Education Progress here, but basically it’s the federal government’s way to test what our kids know throughout the country, since each state has its own test.) Here are links to just 8th grade reading, here is 8th grade math.

Looking at all the NAEP data, you’ll see that Texas has been basically stagnant throughout the history of NAEP, and certainly from 2007-2011. But Texas’s overall passing rate on the TAKS jumped from 67 percent to 76 percent in that same time period, a very noticeable gain that mirrored the DISD gains Jones has highlighted.

And we can look beyond NAEP. If we look at changes in the SAT/ACT for the state over the same period of all these TAKS gains, we similarly see nothing. Nada. Flat. TAKS is the only thing (noticeably) going up. Nothing else.

But how can that be, if TAKS scores were climbing? Because TAKS, the state realized, was measuring not what kids knew nor whether they could think, but how well they had been trained to take the TAKS test. In edu speak, TAKS wasn’t “rigorous.” That’s why the state switched to STAAR, a more-rigorous test in many ways. (Too rigorous, many critics contend.)

The funny thing is, the state got exactly what it wanted – a tougher test that would better give educators a sense of where kids really stood in relation to the rest of the nation and the world. But that has caused everybody to freak out. Because it turns out that our state isn’t making any noticeable educational progress. It hasn’t for years (see the NAEP or SAT data, which goes back far longer than TAKS). But our state has, up until about three years ago with the advent of STAAR, hidden that fact with easy tests that have shown consistent gains (statewide, and in DISD). This is partly because of the Faustian bargain everyone made with TAKS (and its predecessors): educators, media, parents, all the stakeholders in the game said, “Look, we really just want to report year-over-year improvement; it makes everyone feel better.” In fact, a TEA official was complaining just last week about this, wondering just WHY these damn STAAR test results are flat.

Well, maybe – just maybe – STAAR, despite its shortcomings, does a better job telling us what kids actually know. And maybe significantly improving that base of knowledge, especially in reading and writing, and especially with poor kids, is way harder than we like to believe. There’s nothing inherently wrong with test scores that don’t change much. No one freaks out when ACT or SAT scores show the same basic scores every year. But if STAAR results don’t reflect the same (at least somewhat fake) upward progression that we saw under Hinojosa … well now we’ve got ourselves a scandal. And unlike all the scandals of scandalgate, this one is real. We’re not getting any better at educating our kids. Not in DISD. Not in Texas. Not really in the U.S. At least, not in any way that is noticeable across the entire population of kids (whether that’s the 160K in Dallas, the 5M in Texas, or the 70M-plus in the U.S.).

It doesn’t appear that Miles’ reform efforts have yet netted any noticeable gain on STAAR passing rates. But no other districts in Texas have shown gains, either, no matter what approach their superintendents have taken. If we were still under the easier testing regime of TAKS, that might not be the case. But it does mean that we’re going to have get pretty sophisticated to determine educational improvements or declines – a sophistication that these facile arguments do not demonstrate.

Just because I’m discounting the rigor of TAKS doesn’t mean that Hinojosa didn’t do a good job. There is data we can look at that paints an effectively nuanced, peer-comparison performance picture for schools and districts in Texas, regardless of the testing regime. The ERG data, which gives us a much better picture of true district-wide performance (because it factors in poverty, and uses several data points, and compares this to every other district in the state), shows that DISD’s improvements began in about 2007 and have continued up through last year. (It just shows a significant acceleration in improvement since about 2012.)

No, what I’m saying is that the REAL story is the state has no idea how to drive real performance improvements in our schools. The testing regime we’ve been under for a couple of decades doesn’t appear to be doing it. I know that we (media, officials, parents, you name it) put a lot of emphasis on these tests, but judging from the Jones editorial and many other comments, it doesn’t appear anybody is really considering the important context. I mean, no one cares that NAEP scores don’t show improvement, because those results aren’t tied to statewide accountability ratings. But STAAR now shows the same thing that NAEP has been showing for almost two decades, and everyone freaks out. And many freak out in a way that seems to demand that the test go away, which certainly seems to fall under the “shoot the messenger” category of responses.

These STAAR results really do raise interesting questions. For example, what do these results say about teaching’s effect on these scores? What if STAAR outcomes don’t reflect teaching effects (in other words, average and great teachers each would get the same STAAR results from a random class)? What if it only shows inherent student aptitude? Or perhaps teaching excellence could better be reflected in results, but the teaching changes required to improve STAAR outcomes are more significant than any previous teaching regime had to endure? I think these are interesting questions that we can find out as we look closely at the STAAR data.

But the one thing I know the data don’t show is how much better the district was under Hinojosa. Apples and oranges.

Now, is Jones’s point valid that Miles (and Hinojosa) baited their own mousetraps because they made huge promises to the district that they couldn’t keep? Of course. Miles made promises of gains in a test that had no precedent, and he undoubtedly didn’t realize how much more difficult STAAR would be to move than the Colorado tests. Hinojosa made other promises Jones outlines that were foolish. But that doesn’t mean you need to fall into the trap of overly praising or condemning either of these men based on scores without appropriate context.

Now, I have some more Shipp stuff to work on, but I’ll get back to my promised part 3 soon: Should DISD be more like Aldine ISD?