On the weekend, I posted a thread on Twitter/X about Project Follow Through.
If you don’t have Twitter/X then this link should work for you.
For those of you who don’t know, Project Follow Through is the largest—and possibly messiest—education study ever conducted. It was originally intended to be a fully funded intervention for disadvantaged early years students in America, but when Congress scaled back funding, it was reimagined as a research project. Different programs, including Direct Instruction, were pitted against each other in a ‘horse-race’ design.
Christian Bokhove is a professor of mathematics education in the UK who likes to look down his nose at the way other people interpret the results of educational studies such as Project Follow Through. He has produced a diagram to explain what goes on in their silly heads:
I don’t think this diagram is quite right. I rarely see people say, ‘This robust trial shows this is the best thing since sliced bread.’ Anyone with even a passing interest in educational studies knows the many flaws and confounds that surround them. Worse still, the magic trick that is supposed to fix these flaws and confounds—meta-analysis—does not actually work.
This is why I am drawn to small scale educational psychology studies. They still have much scope to go wrong, but the aspect of them that critics see as a flaw—they are short in duration and slightly artificial—is what makes them less susceptible to the issues facing larger studies. Therefore, when we come to the largest education study of all time, Project Follow Through, we should expect plenty to criticise and debate.
Bokhove has had a go at this himself. He managed to gain a volume of the Abt Associates original analysis of the data from the project and tweet about it at some length. I would never point to Bokhove’s own diagram to claim that he does not like an intervention—Direct Instruction—and therefore, because the outcome was positive for this intervention, he has decided to highlight methodological flaws, even though he does go through the paper and highlight a large number of methodological issues. Such a claim on my part would be discourteous. I am not into mind reading and unlike Bokhove, I take people’s contributions on these matters at face value.
I haven’t read the full Abt Associates report, but I have read a summary by Richard Anderson of Abt Associates. It is an interesting read from which, if we agree with the analysis, we can conclude:
Most intervention models struggled to improve outcomes for students in the Follow Through cohort. Even when compared against similar students, many Follow Through students performed worse.
There was high variability among the models. Even those that showed positive outcomes on average had negative outcomes at some sites. For example, Direct Instruction had a positive overall effect on Basic Skills but at some sites, the effect was negative.
Overall, Direct Instruction was the best performing intervention across a range of measures.
Although Direct Instruction was labelled a ‘basic skills’ intervention, it had an overall positive impact in areas other than just basic skills. Strikingly, Direct Instruction had an overall positive effect on ‘cognitive conceptual skills’ whereas the rival Cognitive Curriculum had an overall negative effect on these skills.
It is worth noting that Abt Associates are not the last word on this issue. Ernest House and colleagues wrote an influential critique casting doubt on the results. Then, Carl Bereiter, who was associated with the early development of Direct Instruction, wrote a critique of the House analysis that is a great read if you are a research methods nerd.
Does any of this prove, once and for all, the superiority of Direct Instruction? Does it demonstrate the intervention’s universal effectiveness even outside of the early years and the domains of English and Mathematics? Absolutely not. However, as the largest experiment of its kind ever run, I find it extraordinary that it has been so routinely ignored and that most teachers and even many education academics have never heard of it.
This ignorance requires an explanation. Without pointing the finger at individuals, there seems to have been a collective tacit agreement to bury this result. I can only put that down to the fact that Direct Instruction is at odds with the dominant ideology of educational progressivism.
I often say, largest educational study ever and the team that won wasn't supposed to, at least that's what the other 21 models and their supporters/funders had hoped. Hence, the results were an inconvenience to many largely, heavily funded interventions. It was a nuisance and so hiding the entire project under the rug must have felt like the only solution to those folks. Had we listened to the results back in 1960s, what would our world be like, especially for those numerous disadvantaged students who would have otherwise thrived under DI because it offered a way to build knowledge. I often get emotional presenting PFT to our teacher candidates. Doing our part by not hiding it anymore.
Greg, thanks for bringing this topic forward. I had the good fortune to learn from the people who led the DI model (S. Engelmann, W. Becker, D. Carnine, and others) and the bad fortune of seeing the Follow Through demonstration diluted and discounted. As a researcher, I understand that there are legitimate questions one may raise about aspects of the evaluation (e.g., it was an evaluation, not a formal experiment). There are also important strengths in the evaluation that are rarely mentioned (e.g., the outcome data such as achievement test scores, were collected by a third party who had no reason to favor the 1000s of children in one of the models over the 1000s in other models; likewise, Abt Associates, which analyzed the data, was an independent party with no connections to favor one group over another; yet another third party assessed implementation).
These (and other) strengths add support for your summary of the findings. And those outcomes were pretty dang clear. Even though the evaluation was set up so that there were measures that would tap outcomes aligned with subgroups of the models, the kids in the 40 DI model schools in 13 different geographical locations had better scores on the outcomes across the domains. They would be expected to be stronger on the "lower order" aspects of academic learning (decoding & computation), of course, but they also did better on the higher-order areas (comprehension and math problem solving) and social emotional outcomes (self-concept and attributions for success).
Now, I might quibble with one or two characterizations in your analysis, but they're mostly accurate. To be sure, for example, there was variability in the outcomes of all the different models. For example, one of the scores from one of the local education agencies that is counted as a DI site were markedly lower than the others; as it happened, soon after the study began, a new leader for that LEA stopped the implementation of DI in those schools, but the data collectors and data analysts for Follow Through included the data from those schools as if DI had been implemented in them.
And, though your post about Follow Through wouldn't have to include it, I think it is valuable to note that even if some people would dismiss the FT results, negating FT does not remove the scores of other studies examining the effects of DI. As summarized (and meta-analyzed) by Jean Stockard and her colleagues (https://doi.org/10.3102/0034654317751919), those studies provide substantial indications that DI helps students learn rapidly, thoroughly, deeply, and happily.