Neuroscience, psychology and economics: the evidence for System 3 (long)

In my last post I outlined the concept of System 3, what it is and why it matters. In short, System 3 is the mental ability to imagine the future and evaluate how happy you will be in it – based on how pleasurable the process of imagining itself is.

A lot of different research strands have come together to result in the identification of System 3 as a distinct mental process. I summarise the key steps here:
  1. The fundamental building block of System 3 is the stimulus-response relationship. It has been known for a long time that people easily learn stimulus-response relationships when they are rewarded for the response. The classic examples come from Pavlov (who rewarded dogs with food and discovered that they would start to get excited when they saw the experimenter’s white coat – as any pet owner will recognise), and Skinner (who trained pigeons to learn that pressing a lever was associated with getting fed). Although these original experiments were done on animals, there is plenty of evidence that the same principle applies to people. A typical example would be seeing the wrapper of a chocolate bar and hungrily anticipating the taste of the chocolate inside. (Stimulus-response is also the foundation of System 1 – but System 3 grows out of the same roots).
  2. The next step is the idea of successor representations. Neuroscientists (e.g. Stachenfeld et al 2017) have shown that we store in our brains the whole sequence of steps required to get to a goal. Each of these steps can be considered in turn to be an individual stimulus-response relationship. In other words, a chain of stimulus-response relations can be linked together, where the response of one step becomes the stimulus for the next.
  3. Schultz, Dayan and Montague (1997) showed that the motivational response can migrate along this chain as it becomes more familiar. Imagine the chain A->B->C represents a stimulus A that predicts response B, and B in turn predicts response C. C is the actual ‘reward’. For instance, A might be the logo of a chocolate manufacturer; B the packaging of a chocolate bar; and C the actual chocolate. As you see more chocolate packaging and open it up to discover chocolate, the packaging itself will start to motivate you before you even get to the chocolate. Then in turn, the logo might become motivating.
  4. The way that motivation changes with reward is governed by the Rescorla-Wagner model (Rescorla and Wagner 1972): if the reward experienced from an event is more (or less) than was expected, the decision maker’s brain learns to strengthen (weaken) the causal connection and is motivated to repeat (or avoid) the action.
  5. More recent work from Dayan discusses the idea of truncation: that we mentally plan the steps in a process, but we don’t plan all the way to the end. Instead we stop at some point, and base our decision on how good things look at this point. For example, a chess player might look three or four moves ahead and make a judgement about how good the position looks at that point, rather than trying to work through all the possibilities to the end of the game, which would be impossible.
  6. This in turn relates to work by on causal representations. A causal representation can be thought of as a complex network made up of individual stimulus-response ‘edges’. Sloman and Lagnado (2015) discuss how causal representations can support mental simulation and the development of narratives about the world.
  7. A separate set of discoveries was developing in parallel within the psychology literature. The idea of prospection was described in 2009 by Gilbert and Wilson in a Science article. They had observed that people think about the future, and get pleasure from doing so. This can have implications for psychological health, and more generally appears to be a common human activity.
  8. Pezzulo and Rigoli in 2011 published in Frontiers of Neuroscience “The value of foresight: how prospection affects decision making”. They worked out a model to explain how decision makers can imagine their future motivations and use these to work out what actions they will want to take in the future – and to act accordingly in the present.
  9. This work in turn builds on two core ideas. The first is the idea of model-based decision making (as distinct from model-free decisions). Model-free learning (like those early Pavlovian experiments) starts from an external stimulus and learns the corresponding action or behaviour. See a lever – press the lever. There is no meaningful representation of what the lever might mean, or why pressing it is a good thing. Model-based learning introduces an intermediate step. You see the stimulus, and in your mind you consider what this might mean, and update your mental model of the world. Model-based learning and decisions turn out to be much more powerful, especially in more complex situations, and it is likely that people use model-based representations because it would be impossible to learn enough combinations of stimulus and behaviour to reflect all of our knowledge in a model-free way.
  10. The other line of research they draw on is the idea of utility from anticipation and dread. Anticipation is when we enjoy thinking about positive events in the future; dread is when we find it painful to think about negative future events. George Loewenstein has studied this extensively (Loewenstein 1987) and determined that people do enjoy the process of anticipation, and are sometimes willing to put off a pleasurable activity in order to extend the pleasure of anticipating it.
  11. Thomas Schelling, in The Mind As a Consuming Organ (1983) had asked why we shed a tear when watching Lassie. Do we think Lassie is real, or that the things that happen to her in the show are genuine? Of course not. But we still enjoy the program: our imagination provides us with reward for ‘pretending to believe’ in this fictional world. This is likely to be connected with the psychological capacity for empathy (Ainslie and Monterosso 2002).
  12. Neuroscience work in the mid-2000s (Padoa-Schippoa and Assad, 2006; Kable and Glimcher, 2007) discovered that the brain represents reward values when we make goal-directed decisions. Rather than being rewarded for taking certain actions, we (or, at least, monkeys) are rewarded for consuming specific goods. The representation of these goods in the mind provides evidence for the idea of model-based reasoning.
  13. Recent computational learning research (Hamrick 2018, Reichert 2018) shows that mental simulation is a powerful way to solve problems, and software algorithms which use this method show similarities to human decision making. This does not directly prove that human minds decide things in the same way, but it does offer support for the plausibility of this idea.


The key step from here is to realise that model-based learning based on an underlying network of stimulus-response relations, the successor representation, causal reasoning, dopamine migration, truncation, anticipation, prospection and empathy can all be seen as different views on a common system or process: System 3.

In the System 3 process, humans maintain a mental representation of the world, structured as a causal graph: an set of beliefs about the cause-and-effect relations between events and objects in the world. They use this causal graph to make decisions. When presented with an new option, they explore mentally the likely consequences of that option: what will happen if I do it, then what will happen next as a result, and so on. (The successor representation is a specific chain of steps within this graph).

Anticipating each of these successive outcomes provides pleasure (or pain, if the outcome is negative). As a result, the decision maker experiences pleasure if the option is a good one, and this encourages them to carry out the action (and correspondingly, not to carry it out if the mental exploration is painful). The amount of reward gained from anticipation is related to the amount of reward gained from the real experience. The experience of present reward in return for future activities is what resolves the ‘prospection paradox’: how can our brains force us to forego present reward in favour of the possibility of future reward?

Observe that the decision maker can get pleasure simply from thinking about possible actions – they do not need to actually do anything! This is the key step that motivates people to prospect – the anticipation of an event, even if it will never happen, provides reward.

The Rescorla-Wagner formula pops up again now. Let’s say I think about an outcome and I am rewarded for thinking about it. My brain is rewarding me because it “wants” me to take the action I’m thinking about, because that action in turn is likely to lead to another reward (that chocolate bar). Moreover, the act of thinking about the chocolate is, most likely, statistically linked to getting and eating the chocolate – so the brain is quite right to have learned this association. But if I keep thinking about it and never actually eat the chocolate, the anticipated reward will be less than expected, and my motivation to keep imagining it will diminish. In the long run, one might expect the motivation to think about chocolate to rapidly disappear altogether: but in practice, truncation stops this from happening. The brain goes off to do other things before the reward has been fully extinguished.

So the brain is motivated to keep imagining, and ruminating over, rewarding activities. Motivation can be seen as both the fuel, and the prize, for this process; over time the fuel is metaphorically “used up” and motivation diminishes. It is likely that the motivation and reward for imagined events will move towards a stable equilibrium state. As the brain wanders around this network of imagined outcomes, it is indirectly testing out the reward levels of each event and its successors. As it simulates a chain of events and realises that the consequent reward is less, or more, than expected, it adjusts the reward assignments to reflect its more accurate prediction of how positive those events would be.

When the brain learns new rewards, it fits them into this network; and when it encounters new situations it tries to map the existing network onto the new landscape. This also happens when we watch a TV show (containing its own fictional world, of which we develop a new mental representation), or imagine the life of another person.

In all of these cases, we are rewarded for imagining what might happen – in the future, in a fictional world, or to another person – even though we gain no direct, immediate reward for any of these events. System 3 is what links the future to the present; fiction to the real world; and other people’s lives to our own.

System 3 provides the mechanism by which we come to care about, and be rewarded or punished for, purely symbolic outcomes. Typical examples are the success or failure of a work project (which we may care intensely about even if it is unlikely to affect our job security or income), a political attachment to symbols such as a national flag or a signature policy, the experience of turning 30, 40, 50 or 60 (or even 25, galling though this idea may be to many readers), the results of sporting events and many other non-material experiences. It is the impact of these experiences on our causal networks, or the mental simulations that they trigger, that provide reward or pain.

Is System 3 definitely distinct from System 1 and 2? This is a matter for judgement rather than evidence, but I would argue that:

  • System 1 is primarily about fast, nonconscious processes – while System 3, though automatic, is slower and can be quite conscious
  • System 2 processes are about accurately recreating, then symbolically and logically manipulating the material laws of the real world. For example, System 2 can tell you that if you save $1000 today, you will have $1030 this time next year. It can’t tell you how you will feel about that, or which is better. System 3, however, lets you try out the feeling of spending $1000, the smug satisfaction of not spending it, and the pleasure you may get next year from that extra $30.
  • System 3 involves a specific and distinctive mental process that is dissimilar to the instant, model-free leaps of System 1 and the emotionless, rule-based, non-causal reasoning of System 2.

I believe System 3 offers a good description of a class of decisions that are not well-explained by existing theory, and a strong foundation for understanding the economic valuation of mental states (at the heart of the emerging field of cognitive economics).


Comments

koenfucius said…
A couple of quick thoughts.

1. I think the logo-wrapper-chocolate process does not require a system 3 explanation: the heuristic can develop simply through correlating predecessors with successors. Mere memory is sufficient for this.
2. Likewise the ability of imagining multiple futures to help decide a suitable course of action. What this looks like is a 'what if?' function in system 2.
3. The really interesting thing here is the rewards that come from merely imagining. IMV this is where the true attraction lies of a system 3 complementing the other 2.
Leigh Caldwell said…
Thanks Koen, I partly agree. On each point:

1. I think you're right. One way to think of this is that a system 3 reward can turn into a system 1 reward over time. At first, prior to learning, while it is still necessary to imagine the intervening steps (logo, wrapper, chocolate, reward), we require system 3. Once you have learned the relationship between logo and reward, system 3 is no longer needed. The model-based reasoning has become model-free. Only the relations that we experience frequently (or those that we simulate very frequently in our minds) will ever become system 1; others will remain available only with system 3.

2: the difference is that system 2 provides a reasoning function but not a motivational function. System 3 provides motivation, and also a specific cognitive process to connect future motivation with present. For example, with system 2 I could calculate the size of my retirement nest egg, and the risk factors, under various investment routes - all without imagining what any of them would feel like. System 3 wouldn't be accurate at the calculation, but it would let me motivationally 'try on' what it would feel like to deprive myself of some current consumption and have a more comfortable life at 70.

3. I agree, but I would argue that the 'purely imagining' function is used in certain kinds of decision making that we don't typically think of as pure imagining.
koenfucius said…
I have a bit more time now!

1. Maybe I misunderstood. If the point is that seeing the logo in itself provides a reward, then that would seem to be System 3 territory, but if the point is that it (ahead of the wrapper) produces motivation, then I don't think it is.

I think System 1 acts primarily as correlation machinery: (opening) wrapper is correlated with reward of chocolate, which fuels the motivation of opening the wrapper. The fact that ahead of the wrapper, the logo is also to be found to correlate with the reward, this might motivate similar behaviour: go to the pantry and locate a chocolate bar, or even go to the corner shop and buy a chocolate bar. It doesn't matter how long that apparent chain (logo --> wrapper --> chocolate) becomes, in the end it is simply the correlation between a certain stimulus and the (memory of a) reward. Imagination is not necessary to explain all this. The learning can happen simply thanks to a memory function.

(If, on the other hand, we find that seeing the logo is directly experienced as a reward, rather than as the trigger to search for the reward, there is something more than correlation at play.)

As an aside: the question whether a wrapper or the logo of an alternative chocolate bar would also produce motivation is an interesting one. Arguably that would require imagination - we have to imagine that chocolate of this other brand will *also* produce the desired reward. Unless we've also internalized that as a heuristic of course... if it looks like a chocolate logo, we will be rewarded! :-)

2. If it is true that System 2 does not provide a motivational function, then System 3 is essential in all situations, effectively collaborating with both Systems 1 and 2. For a while I thought that this is what you were arguing for - namely that in both cases, the imagination of the reward is essential to provide the motivation to take the action to realize it. I think that makes kinda sense, but then the question arises, why do we need a separate system and can we not assume this functionality being part of System 1 and System 2? In this respect the System 3 you posit is more like a - so far not specifically identified - component of Systems 1 and 2, using imagination of the future reward to provide the motivation.

One way of considering System 1 is that it simply equates a stimulus with an outcome, and if that outcome is beneficial, appropriate action is motivated. Imagination is not necessary to explain this, but you could introduce it as a mechanism that helps develop development of the heuristic ("I see the wrapper, I remember that it leads to chocolate, hmm, chocolate! I can imagine it now, let's go for it!").

System 2 entails deliberate evaluation, and here too we do not need imagination but we can integrate it. Without imagination = calculation of costs and benefits (using an 'unimaginative' what-if function) delivers a result above or below a certain threshold, or a ranking, and the decision is made on that basis. With imagination = the evaluation of costs and benefits makes use of imagined future states.

3. So IMV the role of a System 3 in Systems 1 and 2 is moot. Its real importance is where imagination is the *only* explanation for an observed phenomenon. You are right that this is not only where the imagined image (IYSWIM) in itself provides a reward (maybe positive psychology would provide good study material?). Imagination can indeed play a central role in decision making, eg in career choice, or whether to take part in a talent show. Arguably in situations like these, it's not System 1 or System 2 with a bit of help from a System 3, but System 3 in the driving seat.

My hunch is also that many decisions are not entirely pure in this way, and so a primary S1, S2 or S3 process will involve the other 2.

Popular posts from this blog

Discussion 2 of 3: No spooky action at a distance - a theory of reward

Discussion 3 of 3: Lassie died one night

Discussion 1 of 3: Where do goals come from?