In 2022, Lin Zhang and a group of like-minded researchers published a paper in Educational Psychology Review arguing that the inquiry orientation of most research on science teaching in the U.S. is misplaced. This prompted a high-profile rebuttal by Ton de Jong and colleagues in Educational Research Review (let’s call this de Jong I). I blogged about this and was asked to coauthor a paper, Sweller et al. (2024), responding to this rebuttal which Educational Research Review graciously published. Now, a new rebuttal of this response, written by de Jong and colleagues, has been solicited by and published in Educational Research Review (let’s call this de Jong II).
As promised, this is an inexhaustive response to de Jong II.
"There are none so blind as those who will not see." — Proverb
Meta-analysis
One of the ongoing disagreements between the Zhang camp and the de Jong camp has been about the value of meta-analyses in providing evidence about instructional methods. I am a meta-analysis sceptic. I don’t think meta-analysis is necessarily wrong in principle, but it often fails in practice. In our response to de Jong I, we pointed out why relying on meta-analyses was flawed. Mushing together a variety of studies comparing different conditions and measuring different outcomes to produce an overall effect size seems misguided, at best. I have written about this at greater length here but de Jong II provides yet another lesson in the dangers.
Towards the bottom of page four, they write:
“Furtak et al. (2012) showed in an analysis of 37 experimental and quasi-experimental studies that there was an overall positive effect of inquiry-based methods over more direct-instruction-based methods”
However, when we examine the abstract of Furtak et al. (2012), something curious arises:
“This meta-analysis introduces a framework for inquiry-based teaching that distinguishes between cognitive features of the activity and degree of guidance given to students. This framework is used to code 37 experimental and quasi-experimental studies published between 1996 and 2006, a decade during which inquiry was the main focus of science education reform. The overall mean effect size is .50. Studies that contrasted epistemic activities or the combination of procedural, epistemic, and social activities had the highest mean effect sizes. Furthermore, studies involving teacher-led activities had mean effect sizes about .40 larger than those with student-led conditions.” [my emphasis]
At first, this appears to be mystifying. This study seems to be providing evidence that is directly opposed to the assertion it is being cited to support. How can de Jong II claim there are overall positive effects of inquiry-based methods over direct-instruction-based methods when the study shows teacher-led activities had higher effect sizes?
The devil, as so often with meta-analyses, is in the comparison conditions. Most people don’t drill down into the comparison conditions used in the studies that make-up a meta-analysis. It can be tricky and time-consuming. Instead, they tend to assume the comparison conditions, often a form of business as usually, involve direct instruction — I prefer the term ‘explicit teaching’. So what I think de Jong et al. mean when they cite this meta-analysis is that all the studies it referred to compared inquiry learning with explicit teaching and found inquiry learning to be better. Yes, the more guided the inquiry learning, the better it performed, but through their funny glasses, that just provides evidence for that celebrated legendary beast, guided inquiry learning.
So, I did what I always do and that nobody else seems to bother with when it comes to meta-analyses. I took a look at the individual studies that it draws on. I know from experience that they can range around a whole lot of different things, from students completing worksheets to video lectures.
The very first study in Furtak et al.’s list is by Alexander et al. and somehow Furtak et al. extract from it the absurdly large effect size of 1.737 and the only slightly less absurdly large effect size of 0.956. This is its abstract:
“A design experiment was undertaken to explore the effects of science lessons, framed as persuasion, on students’ knowledge, beliefs, and interest. Sixth and seventh graders participated in lessons about Galileo and his discoveries focusing on the personal costs and public controversies surrounding those discoveries. In selected classrooms, lessons were teacher led, while others were student led. Participants’ knowledge, beliefs, and interest were compared to peers in other science classes. There were significant differences between persuasion and comparison classrooms on all variables. However, teacher-led lessons were more effective at changing students’ knowledge, whereas student-led lessons had more impact on students’ beliefs.”
This may be many things, but it is not evidence for the superiority of inquiry learning over explicit teaching.
The second paper in Furtak et al.’s meta-analysis is by Ardac and Akaygun (2004 - no paywall) and compares a multimedia learning condition against ‘regular instruction’. It finds that:
“Students who received multimedia-based instruction that emphasized the molecular state of chemicals outperformed students from the regular instruction group in terms of the resulting test scores and the ease with which they could represent matter at the molecular level. However, results relating to the long-term effects suggest that the effectiveness of a multimedia-based environment can be improved if instruction includes additional prompting that requires students to attend to the correspondence between different representations of the same phenomena.”
This does not appear to have much to do with inquiry learning. Perhaps the multimedia condition used inquiry learning? If you click through to the paper, you can see an example multimedia slide which looks explicit. Reading the description, both the mode of instruction and the concepts addressed varied between the multimedia and regular conditions:
“Regular instruction was based on lecture and questioning. The instructional plan used during regular instruction was similar to the plan used for the treatment group with the same sequence, homework, and lab assignments. However, there was no emphasis on molecular representations and no deliberate attempt to establish connections between the macroscopic, symbolic, and the molecular levels.”
Given that two key things vary between the two conditions, we cannot be sure which was more important. However, as neither had anything to do with inquiry learning, this paper once again does not support de Jong et al.’s characterisation.
I could go on examining these papers, but I’ve made my point. How can anyone mount an argument based on mushing together such studies?
Is this misrepresentation deliberate? I doubt it. The danger of meta-analyses is that the original conditions become so obscured that there is a high risk of misinterpreting them. It is therefore with confidence that I would argue we need to go back to the original papers.
Hawthorne effect
One of the reasons we argued in Sweller et al. (2024) that meta-analysis is misguided is that mushing studies together does not remove the flaws of the individual studies. Many studies that test inquiry learning run a novel intervention, supported by researchers, teacher training and so on, against a form of business as usual — Ardac and Akaygun (2004) is a good example of that and of the kind of study Zhang et al. refer to as ‘program based.’ The potential flaw is that everyone, including the participants, know which condition is the novel one and this can affect their expectations, leading to a placebo-like effect that, in the social sciences, is known as the Hawthorne effect.
De Jong II complain that the reference we give in Sweller et al. (2024) is missing from the reference list — they are correct and this is a curious mistake because it is present in my Word version of the preprint. Once they (correctly) located this reference, they note it is simply an explanation of the Hawthorne effect rather than evidence it applies to the studies they cite. Well, yes. We explain why it is relevant in the body of Sweller et al. (2024). De Jong II dismiss the effect as irrelevant, citing two studies that failed to find a Hawthorne effect in education research. One, Cook (1967), pre-dates most of the evidence in question. The other, Adair et al. (1989) supports de Jong II’s contention by conducting a meta-analysis and comparing effect sizes across various attempts to control for a Hawthorne effect.
This seems to demonstrate the lack of a Hawthorne effect in education studies but it could also be explained if the various attempts to control for them were ineffective. They also note, and I agree, that the Hawthorne effect is poorly defined — is it about activity, attention or awareness of being studied? Adair et al. suggest that it may be more apparent if subjects are aware of the hypothesis being tested — something likely in many education studies.
Although there is plenty of research on placebo-like effects in medicine, there is very little in education research. Placebo and Hawthorne effects are both examples of the broader category of expectation effects, but if you search for studies on expectation effects in education, you mostly find those on how teacher expectations affect student achievement.
My own thinking on this has been shaped by watching the Education Endowment Fund in the UK evolve over time. Often, they have taken studies with large effect sizes and attempted to replicate them. However, perhaps due to the fact that they try to run high-quality, properly controlled trials, they often fail to replicate these studies or replicate them with much smaller effects.
Nonetheless, this is a reasonable issue for people to disagree about and stronger ground for de Jong II. However, it fades to insignificance compared to the huge flaws I have already highlighted when relying on meta-analysis.
An each-way bet
Stung by criticism in Sweller et al. that de Jong I did not base their arguments in any kind of theoretical framework, de Jong II insist there’s absolutely loads of theory to support their view that we should sometimes use explicit teaching and sometimes use inquiry.
Here is a sample of their argument:
“Cognitive theories explaining the success of inquiry-based learning emphasize the active integration of knowledge (Linn & Eylon, 2011). This involves emphases pertinent to CLT including supporting learners to distinguish among ideas (Linn et al., 2023), engage in ‘generative learning activities’ (Fiorella & Mayer, 2016; Mayer, 2024), and undertake schema (re)construction (Rumelhart, 1980). For example, Fiorella and Mayer (2022, p. 339) elaborate: “Generative learning involves actively making sense of the learning material by engaging in activities for organizing the material and integrating it with one's existing knowledge.”
It is odd they are ostensibly arguing for a bit of both when it comes to explicit teaching and inquiry, yet this whole section is devoted to hand-waving theoretical support for inquiry.
Obviously, I am biased, but I am struck by the glaring difference between de Jong II and Sweller et al. on this issue. De Jong II seems to be throwing theories at the wall to see which stick. Sweller et al. use theory to specify exactly when explicit- and inquiry-style activities should be used. They argue that open-ended problem-solving activities become appropriate later in a learning sequence after learners have been explicitly taught and mastered the concepts and procedures involved. They reference the expertise reversal effect found in cognitive load theory to support this position. This is a testable hypothesis. In stark contrast, despite repeatedly insisting that we need to use a mix of explicit teaching and inquiry, de Jong I and II never explain when each method may be appropriate and how we would make that determination.
Which is, in practical terms, useless.
Definitions
I will leave you with a weird thread line. It’s not that important, but it is interesting enough to me to mention it.
In Zhang et al., the authors discuss different studies including:
“…controlled studies [that] compared the aforementioned inquiry- or exploration-based investigation approach to science teaching with various forms of explicit instruction, such as simply providing students with the desired information and having students read it from texts or watch demonstrations.” [My emphasis]
De Jong I somehow turn this into a definition of explicit teaching:
“For example, Klahr, Zimmerman, and Jirout (2011) found that “direct instruction” was better than exploration for the development of CVS. In that work, however, the instruction went beyond the definition of direct instruction offered by Zhang et al. (2022) who described direct instruction as “simply providing students with the desired information and having students read it from texts or watch demonstrations”
Although a convenient way to dismiss a contrary finding, this clearly misrepresents Zhang et al. who provided no such ‘definition’.
In Sweller et al. we subtly pointed this out:
“In their response, De Jong et al. (2023) stated that Zhang et al. (2022) provided a definition of explicit instruction to which they claim to have responded. In fact, a definition of explicit instruction was not provided. Instead, Zhang et al. (2022) provided an example of what explicit instruction could look like. That example was not intended to rule out other potential forms of explicit instruction. It therefore seems appropriate to take this opportunity to clarify this example with a working definition. To us, the defining feature of explicit instruction is that, for novice learners, concepts are fully explained, and procedures are fully modelled before learners are asked to apply those concepts or procedures. Significantly, this working definition does not preclude the possibility of learners completing open-ended problem-solving tasks.”
That was us trying to be helpful. In de Jong II, this was all our fault because Zhang et al. should have provided a definition:
“Although it may seem ironic that such an explicit definition was missing in a paper on direct instruction, it left us with no option but to infer what Zhang et al. meant by direct instruction on the basis of the examples they provided.”
No option? Really?
I am not buying that.
After having heard the view points expressed in De Jong et tal (2024) and Sweller et tal (2024), it made me wonder about another important factor when it comes to choosing a particular instructional approach as a classroom teacher: time allocation. Time is an important resource, and there is so much content to teach in Science (and in other academic subjects too), so as a science classroom teacher, I would favor an instructional approach that would be most effective at transmitting a body of knowledge in the shortest amount of time possible. The quicker the knowledge is passed on from the teacher to students, the more time is then allocated to independent practice/extension activities in class which I hope everyone will agree it to be a good thing for students' learning outcomes. In contrast, I would expect that transmitting a body of knowledge via guided inquiry model would by design take more time, taking away precious time in class from students to engage in further practice and consolidation as well as covering more content. Please point out if I am wrong on this, but I could not find De Jong et tal (2023) and De Jong et tal (2024) addressing the potential issue of guided inquiry learning being time inefficient and is therefore impractical to be used in most school settings.
If you didn't believe Hawthorn le Effect (or something like it) is at work, then how would you account for the fact that diametrically opposed interventions routinely show positive effects? I can see how it's hard to find a single smoking gun metric to measure its prevalence, but i don't know how you could look at the aggregate and not see it. Where are all the studies showing null results? They should be at least half. If it's not a Hawthorne-like effect, then it's desk drawer bias, and that's much worse.