I think information theory is a fascinating way to look at learning. One issue is it is a lot more complicated than Shannon’s analysis of a sender and receiver of messages and a message channel.
You have at least the idea of a sender (teacher) and two other end points: long term memory and short term memory and channels between these in each direction.
You could simplify this by just looking at two channels for messages one from short term memory to long term memory and one from long term memory to short term.
A practical application of Shannon’s theory is the study of how message alphabets and constraints on them can be optimally used to improve communication over an imperfect channel. Stephen Pinker touches on how this works in spoken language with regular and irregular verbs.
For example “dived” uses a root dive and a common ending “ed” which is constrained to always mean the past tense. The North American “dove” doesn’t have this structure and requires us to know what is meant without the clue of the close match to “dive” and the common ending “ed”. According to Pinker we optimally use irregular verbs to make communication better and this works by having common verbs irregular and less common ones regular.
That is inline with Shannon’s theories where a less common word benefits from the structure to reduce the decoding problem while a common word benefits from being shorter or more unique but being well known is more likely to match the sounds we hear.
In language we are matching sounds to meanings and efficiency is about using the least number of sounds to give us a good chance of not making a mistake.
The example in your paper of 3x = 18 is interesting. When people first learn simple algebra like this you are combining two languages into a new one - the language of arithmetic and the language of letters. This is useful once we know what is going on because we are already good a recognizing all the symbols when spoken or written. However, from an information passing problem we might see the problem of a brain that knows spelling and arithmetic struggling to match algebra to either one of these.
An interesting experiment would be to see if introducing algebra using a Greek letter works better - sometimes this type of thing is done using an open square to introduce the unknown so 3 x [ ] = 18.
(You can see another issue going on with the use of x as a variable and as the symbol for multiplication).
A Greek letter might work better as it is easier to say alpha than “what goes in the square” and it introduces the idea of labels for variables without the close proximity to other messages such as words with x in them or x as multiplication.
You could also experiment, taking a hint from programming, using a full word for the variable that represents a physical thing and only introduce the abstraction of a generic label once the process of solving for the variable is no longer new information.
Mathematics is a language which sheds redundancy and that works because it relies on a high degree of familiarity with the language. It becomes a great language for those that know it. But this works against those that don’t.
Looking at it this way suggests introducing the efficiency slowly will work better for learning. Information theory provides a way to decide if you are doing that.
However to use information theory you have to know the information content of the language used - how much is new, how much is in close proximity to different concepts and needs more redundancy to separate it.
A key point from Shannon’s theory is that the information content of the message depends on the receiver’s existing knowledge of the message language and message content. In learners this is dynamic.
(You might think of the natural reaction to a boring lesson on something we already know as our minds determination to apply Shannon’s theory well.)
I think information theory is a fascinating way to look at learning. One issue is it is a lot more complicated than Shannon’s analysis of a sender and receiver of messages and a message channel.
You have at least the idea of a sender (teacher) and two other end points: long term memory and short term memory and channels between these in each direction.
You could simplify this by just looking at two channels for messages one from short term memory to long term memory and one from long term memory to short term.
A practical application of Shannon’s theory is the study of how message alphabets and constraints on them can be optimally used to improve communication over an imperfect channel. Stephen Pinker touches on how this works in spoken language with regular and irregular verbs.
For example “dived” uses a root dive and a common ending “ed” which is constrained to always mean the past tense. The North American “dove” doesn’t have this structure and requires us to know what is meant without the clue of the close match to “dive” and the common ending “ed”. According to Pinker we optimally use irregular verbs to make communication better and this works by having common verbs irregular and less common ones regular.
That is inline with Shannon’s theories where a less common word benefits from the structure to reduce the decoding problem while a common word benefits from being shorter or more unique but being well known is more likely to match the sounds we hear.
In language we are matching sounds to meanings and efficiency is about using the least number of sounds to give us a good chance of not making a mistake.
The example in your paper of 3x = 18 is interesting. When people first learn simple algebra like this you are combining two languages into a new one - the language of arithmetic and the language of letters. This is useful once we know what is going on because we are already good a recognizing all the symbols when spoken or written. However, from an information passing problem we might see the problem of a brain that knows spelling and arithmetic struggling to match algebra to either one of these.
An interesting experiment would be to see if introducing algebra using a Greek letter works better - sometimes this type of thing is done using an open square to introduce the unknown so 3 x [ ] = 18.
(You can see another issue going on with the use of x as a variable and as the symbol for multiplication).
A Greek letter might work better as it is easier to say alpha than “what goes in the square” and it introduces the idea of labels for variables without the close proximity to other messages such as words with x in them or x as multiplication.
You could also experiment, taking a hint from programming, using a full word for the variable that represents a physical thing and only introduce the abstraction of a generic label once the process of solving for the variable is no longer new information.
Mathematics is a language which sheds redundancy and that works because it relies on a high degree of familiarity with the language. It becomes a great language for those that know it. But this works against those that don’t.
Looking at it this way suggests introducing the efficiency slowly will work better for learning. Information theory provides a way to decide if you are doing that.
However to use information theory you have to know the information content of the language used - how much is new, how much is in close proximity to different concepts and needs more redundancy to separate it.
A key point from Shannon’s theory is that the information content of the message depends on the receiver’s existing knowledge of the message language and message content. In learners this is dynamic.
(You might think of the natural reaction to a boring lesson on something we already know as our minds determination to apply Shannon’s theory well.)