Adam Wray has a Substack all about Predictive Processing, an alternative to cognitive load theory he would like us to consider. Carl Hendrick has been very complimentary online about this approach.
I am sympathetic. I have my own refinement to cognitive load theory based on the idea of informational entropy that is currently sitting as a preprint. However, it is sitting as a preprint for a good reason. At present, I cannot think of any predictions it would make that are at odds with those of canonical cognitive load theory. That’s a problem because scientific theories wax and wane based on the accuracy of their predictions. If my theory has nothing new to add, what need is it meeting?
In a similar vein, on his Substack, Wray goes on to explain all the cognitive load theory effects using predictive processing, as well as work such as Robert Bjork’s research into ‘desirable difficulties’. These are also explained by canonical cognitive load theory (e.g. here and here) and so the same question arises.
Sleight of hand
Wray is fond of a diagram that I won’t reproduce for copyright reasons. It contrasts the complexity of the cognitive science Wray wants us to move away from with the predictive processing model of the mind. On the left-hand side is a ramshackle shed covered in statements and diagrams that represents cognitive science, with an emphasis on cognitive load theory. On the right hand side is a clean and simple diagram representing predictive processing’s model of the mind. Which would you prefer?
The shed is annotated with terms including ‘mental models’, ‘limited working memory’, ‘CLT’, ‘simple model of memory’, and more. First, note how the four I have listed are overlapping terms and not distinct from each other. So, there is something of a sleight of hand going on here, whether intentional or not. The shed is not as ramshackle and patchwork a construction as Wray suggests.
Moreover, the appropriate concept to contrast with Wray’s model of the mind is cognitive load theory’s model of the mind which, if we set aside Oliver Caviglioli’s rather eccentric rendering—one that makes an appearance on Wray’s shed—is equally simple.
If the complexity we dislike about cognitive load theory is its long list of ‘effects’, such as the split-attention effect, the worked example effect and so on, then what is predictive processing going to do about this? If it also predicts all these effects then the complexity is not reduced because they are still there. If it does not, then that would be an interesting and testable hypothesis.
As an aside, critics often point to cognitive load theory’s long list of effects as a flaw, but I don’t agree. The purpose of them is to provide guidance for teachers and other instructional designers, such as those designing multimedia learning environments. Some effects more useful than others, but having fewer of them serves what practical purpose?
Replication failure
Wray notes that during the development of cognitive load theory, its failures to predict the results of experiments have led to refinements of the theory and generated new effects. This can be seen as a strength—the theory evolved according to new evidence—but can also be seen as a weakness. As Wray asks:
“at what point should scientists, faced with ongoing conceptual replication failures, decide to rework or replace their underlying explanatory framework?”
This is a fair question, but it returns us to a previous point. Predictive processing as a learning theory has not been through the same evolutionary process that cognitive load theory has. Wray claims it offers a coherent explanation for all the effects cognitive load theory had to adapt to accommodate, but so does contemporary cognitive load theory. What does this add? If anything, it just highlights that predictive processing has not been tested in the same way. Instead, predictive processing has the benefit of all these known results to potentially fit itself to.
Again, it would be better if we could find a prediction that predictive processing makes that is at odds with those of cognitive load theory and then run that test. A win for predictive processing and a loss for cognitive load theory would be a good reason to consider the new model.
Schema building
The strength of predictive processing is that it has something to say about schema acquisition in a way that cognitive load theory perhaps does not. The basic idea is that we are constantly making predictions about the world then testing those against the input from our senses. When there is a conflict, we adapt the schemas in long-term memory to accommodate new results. In effect, this provides an explanation for cognitive load theory’s troubled germane load concept.
However, before we start thinking that we should flood students with information to confound their predictions, Wray posits that we only have a certain bandwidth we can deal with at any one time. This is the equivalent of cognitive load theory’s limited working memory. So, the two ideas map on to each other.
What does predictive processing add? Potentially, it adds a mechanism for schema building, but I am not totally sure it aligns with what we already know. For example, constructivists have long promoted the idea of inducing ‘cognitive conflict’ in students by presenting them with evidence that contradicts their predictions. There have been decidedly mixed results when this hypothesis has been tested. Perhaps there is a difference between the kind of conscious predictions that cognitive conflict induces and the presumably unconscious ones that predictive processing suggests we make all the time? Perhaps. Perhaps cognitive conflict overloads bandwidth? Perhaps.
Stellan Ohlsson’s resubsumption theory of conceptual change suggests that we don’t change flawed schemas, we create new ones that outcompete old ones, and this is one reason why cognitive conflict doesn’t work. He even suggests initially learning new ideas, such as Newton’s laws, that conflict with existing mental models in a simulated environment to aid this process. If this is true, it is not immediately apparent how this is consistent with predictive processing. I’m sure we could make it work, but that would involve complicating the model and its attraction is supposed to be its simplicity.
True, cognitive load theory doesn’t have much to say about how schemas are acquired in long-term memory, but I am looking for something that predictive processing can add.
The other problem with the brain-as-prediction-machine model is what this says about all the times we are not building schemas. Simply sitting down and thinking about things fits well into cognitive load theory—it is a simple interchange between working and long-term memory—but what is happening to the prediction machine with no sensory input to process and test itself against?
Neuroscience
A concerning, if largely incidental, aspect of Wray’s argument is its appeal to neuroscience. Knowing where something is happening in the brain, how synapses are firing and so on is precisely useless to teachers. That is why cognitive load theory does not make any claims about the brain. It simply presents an abstract model of the mind.
For example, if I told you that when working memory is being used, fMRI scanners show blood flow in a particular region of the brain, what could you, as a teacher, do with that? Well, if you could hook all your students up to fMRI scanners, you may be able to tell when you’re overloading them. Alternatively, you might note the confused expressions on their faces or the inability to give correct answers. Fascinating as it is, at present, neuroscience has little to add.
Wray thinks agnosticism about neuroscience is a flaw in cognitive load theory. Instead:
“One of the reasons I am persuaded to adopt Predictive Processing/Active Inference is its growing neurobiological grounding, supported by computational psychology models. Neural evidence shows prediction error signals, hierarchical processing, and precision weighting at work in real cortical structures.”
I am sure Wray is sincere, but appeals to neuroscience do not have a great track record in education research and are, to my mind and at this stage, a distraction.
In a similar vein, I cannot help thinking part of the appeal of predictive processing is the analogy with Artificial Intelligence and Large Language Models that essentially crunch lots of data to generate predictions. Drawing inspiration from the latest technology can be fruitful but also limiting, as information processing theories and their relationship with computer models shows.
Conclusion
Predictive processing may be right. Time will tell. At present, it looks to me like a mapping of one set of concepts from cognitive load theory onto another equivalent set of concepts. The aesthetic appeal is promoted on the basis that it is simpler than cognitive load theory, but that seems to be based on a sleight of hand.
Cognitive load theory is undoubtedly wrong. All scientific theories break down and it is their fate to be replaced, refined and updated. The question is whether cognitive load theory’s model of the mind is more like the Ptolemaic model of the universe, where the basic idea was wrong but it was great at making predictions, or the Copernican model of the universe, where the basic idea was right but it was relatively poor at making predictions. If it’s the former, perhaps predictive processing can do better.
In the meantime, it represents an interesting set of ideas.
A possible use of an LLM style model of human learning would be as a way to model teaching approaches and optimize them and to quantify human learning constraints.
Similarly to Einstein’s theory on thermodynamics a more mechanistic model would allow quantitative predictions about effectiveness.
In the same way it might only show new insights at the extremes but showing that bad approaches are inferior seems like quite the challenge today so perhaps that is useful.
Predictive processing is not right--for education. Go nuts with increasing the accuracy of your AI model from 97.3% to 97.4%, but it doesn't scale meaningfully onto human education, which is fundamentally social. (If I may be so bold.) The allure of the individual, the psychologistic, the anti-historical, is simply amazing sometimes.