This happened more than a week ago, and I’ve just been too caught up in stuff going on here [insert sound of 40s being poured] to post it. So, a quick recap is necessary. A think tank, the Brookings Institute, produced a study of big-city school districts that showed DISD was 2nd overall nationally in terms of academic gains in the past seven years. First, read this Associated Press account, which lays it out in a straightforward fashion:
DALLAS – Achievement test scores at big-city school districts in Texas still lag far behind their suburban and rural counterparts but they’re making great strides and narrowing the gap, according to a report by an education think tank released Wednesday. A study of 37 of the nation’s largest urban school systems by The Brown Center on Education Policy at the Brookings Institution in Washington, D.C., found that city schools are improving more than other school districts in their respective states. … It was designed to determine how big-city school districts fared when compared to their suburban and rural peers. The study was able to standardize scores between states, even those using different tests. … Dallas showed the biggest improvement among the large Texas cities, and was 2nd overall nationally. … In 2000, Dallas was outscored by 100 percent of the state’s school districts. By 2007, just 90 percent of suburban and rural districts did better than Dallas – a significant improvement given its demographics, the study’s author said.
Now, this is good news, right? Not great, because the district, as noted, still has a long way to go. But it suggests things are getting better. Is that your takeaway? Well, I don’t think it would be if you just read how the DMN covered it.
After the study came out, Kent Fischer at the Morning News posted this item, dissecting it. His three main points are below (I’ve numbered them, but the bold emphasis is his):
1. First, and most importantly, the report is an indicator of student achievement, but stops short of confirming it. “To make scores comparable, we computed a z-score for each city, an indicator of the distance–expressed in standard deviation units–between the city district’s test score and its state’s average score. State averages are fixed at 0.00. Positive scores are above average and negative scores are below average. This kind of relative measure means all test scores could be falling in a particular state and city schools would look good by going down less.”
Translation: A district’s ranking is not determined by whether or not achievement actually improved. It’s possible scores went down, but if the state average went down more, the district will look much better.
2. The report excludes 86 percent of district kids. “We combined data on fourth-grade reading and eighth-grade math into a single composite score for each school district. These are two crucial grades, respectively, for reading and math achievement. Although unlikely, perhaps academic achievement in other grades or subjects has behaved differently.”
Translation: DISD’s ranking is based on the scores of about 14 percent of its total student body — meaning 144,000 kids are not included in the study. Notably left out are high schools, which fail to graduate about half their students. Also left out: scores in Science, Social Studies and writing.
3. No clues given as to why. “Trends in performance are presented for 2000-2007. Big city schools have made significant improvement. Smaller urban districts have made similar gains. Unfortunately, the data cannot pinpoint reasons for the improvement, and any discussion of causes is automatically speculative.”
Translation: Although they rank districts, the researchers don’t know why, academically, one ranks ahead of another.
Now, I’m not very bright, but I know a thing or two about a thing or two. One of the things I know just enough about to be dangerous is statistical analysis. And the idea that the total number of students surveyed has anything to do with the quality of the report’s findings is absurd. The only question is whether the sample size is statistically valid. But why have me explain this in my inelegant fashion? I asked the study’s author, Tom Loveless, to respond to Fischer’s post. He said:
Yes, he’s making some poor arguments. I will address three.
1. Scores didn’t go down in Texas from 2000-2007. They went up. So the argument is moot, and the standard imposed by our study is more rigorous, not less, than simply reporting a gain. It looks at performance relative to a rising state average.
2. A composite score was constructed from 4th grade reading and 8th grade math scores. Â Such a composite is representative of a district’s academic performance. It doesn’t leave anyone out. By the writer’s logic, we shouldn’t believe the Dow Jones Industrial Average as a stock market indicator because it is only composed of 30 stocks and “leaves out” several thousand stocks.
3. We can’t tell why the scores are going up or down. That’s simply a limitation of the data because we don’t have good measures of the policies that states, districts, and schools were implementing from 2000-2007. It is an honest statement of the limitations of the research. Â The writer has taken exactly the wrong lesson from the study, and he gives exactly the wrong advice. Readers of research should distrust studies that make claims they cannot document with evidence, not studies that admit to their limitations.
To be fair, the story the DMN ran the next day was more balanced, I think, than Fischer’s post, in which he said he was simply putting the study in perspective (as though people who read his paper’s coverage are in danger of thinking DISD is TOO kickass these days). But one thing in the story gnawed at me. It was this quote from UCLA’s David Silver, a research expert who could comment on the study’s methodology. Silver’s points were presented in the News this way:
David Silver, a senior researcher at UCLA, said the study is “not air-tight.” But he said it’s likely the best Loveless could do without being able to follow individual students over time. But he said the study could do more to take demographic changes into account.
So I sent him a message asking him two question: one, could he take an objective look at Loveless’ responses to Fisher’s point and tell me who is right; and two, could he expand on the “not air-tight” quote? I told him I was really trying to understand whether the study had validity. Here are his responses to the first two points:
1. I agree with Loveless here (assuming that his statement about scores rising in Texas is correct). His decision to compare District scores to the statewide average is reasonable and transparent in his discussion of the methodology.
2. Again, the choice of a composite score is a reasonable decision. It does leave out students, but there is no reason to assume that the sample is misleading when averaged over a large district or state. Would it have been preferable to incorporate all grade levels into the score? Probably, but he was undoubtedly limited by the availability of data.
To the overall “air-tightness” of the report, which sort of gets to No. 3, he says two things:
Essentially, Loveless found that the average student in large urban districts in 2007 was closer to the statewide average than he/she was in 2000. I’m fairly comfortable with his methodology to make that specific statement. The two main caveats are: 1) that the rankings of districts/states should be used only as a rough guide, since the characteristics of the tests may be very different across states; and 2) it is very likely that the demographics of many districts (e.g., Long Beach, San Francisco, Los Angeles, Las Vegas, New York) changed considerably relative to surrounding areas over these “real estate bubble” years, as large numbers of low- and moderate-income families moved to more affordable districts outside of the big cities.
He then sent along this follow-up:
One other point. The “air-tight” quote came from this statement that I made:
“I think Loveless makes an interesting case. As I said, the case is not air-tight, but given the limitations of the data (namely, that he couldn’t follow individual students over time), he probably did the best he could. Hopefully, we’ll have definitive confirmation of these findings in the near future, now that California (and Texas?), among others, have finally begun tracking students with longitudinal Statewide Student Identifiers.”
So a better phrase might have been that “his methodology for assessing change over time and ranking districts is not perfect, but given the limitations of the data . . .”
In the end, no harm, right? Maybe Fischer was just viewing this news in the harshest, most skeptical light possible, and that’s his job, right? Well, sure, maybe. I just wish a fuller picture of the district could be told, because I know there are also fascinating success stories out there, and when a small measure of success is noted, I wish he wouldn’t see it as his job to post a poorly reasoned takedown of said success. Then I wouldn’t get so tired and bent out of shape when I read a story that, while technically accurate, misses the point. It makes me believe even more strongly that the paper and the TV stations in town view DISD as a wounded antelope whose entrails are to be feasted on because, hey, lions gotta hunt.