Monday, 28 March 2016

Discussion 2 of 3: No spooky action at a distance - a theory of reward

Part 2 in a short series of posts. Part 1 and part 3 are also available.

One of the most powerful ideas in physics is the principle of locality. This principle insists that objects can only be influenced by other objects that touch them. Two items separated by a distance cannot directly exert any force or influence on each other, but must communicate via some medium which physically transmits the force from one to the other.

Albert Einstein described this principle as "no spooky action at a distance" and it applies to his theory of gravity as well as all the other physical forces (it gets more complicated when we consider quantum mechanics, but that would take a whole other article). The Scottish physicist James Maxwell also used it in developing his theory of electromagnetism.

Instead of the magnets directly pushing or pulling each other, each magnet creates an electromagnetic field, and sends out the field into the world around it, transmitted by light waves. When another magnet passes through the field, it is affected because the field has now reached the same point where the object is. The two objects are not directly attracting each other; the force is mediated by the electromagnetic field.

A more familiar example is a chain of dominoes. The last domino doesn't fall as a result of the first one being knocked over. It falls because the second last one is knocked over. That in turn happens because of the one before it. The first and last domino can only affect each other because of all the other dominoes in between.

A different way to think about this principle is in time instead of space: the ultimate result of an event can never be a cause of how the event happens. The first domino doesn't fall down because the last one is going to; indeed, if you ever played Domino Rally as a kid, you will know that when the first one falls, it's often entirely unpredictable whether the last one will also go. Each domino only has a local field of influence: just it and the ones that can physically touch it.

What if we apply this principle of locality to economic decision-making?

We are used to talking about decisions based on "future reward" or "future returns". I don't eat the marshmallow now, so I can have two of them later; I save money now because I will receive interest next year; I exercise today so I can enjoy being healthier in later life.

But can a future event actually cause an event in the present? Can my future health cause me to exercise today? Surely not – that would be time travel.

The principle of locality says that a decision I make today can only be based on stimuli and causes right here, right now in the moment of the decision. Future events cannot influence it. (Neither for that matter can past events, or costs and benefits that will take place outside of my direct experience). My brain and the physical atoms that make it up only know about immediate, present influences. Thus, my decision can only be affected by feelings, rewards and costs that I experience at the time of making it.

In practice, however, we do regularly make decisions to defer gratification. We appear to take into account outcomes that happen later, or outside of the decision context (just as the magnets really do attract each other, even though they are not touching each other). How is this paradox resolved?
The key here is to understand how the outcomes are mediated – how those future benefits can indirectly influence the present. The future reward does not directly cause my decision today. My 65-year-old self does not reach into his past and make a pension contribution. Instead, it's my current feelings and beliefs about the future reward that matter. I can only take that future reward into account if I get some kind of immediate payoff for doing so.

That payoff might be the feeling of security that comes from knowing that my retirement is being provided for. It could be the positive feeling of going along with socially acceptable behaviour. Conversely, the guilt associated with eating a doughnut may stop me eating one. All of these are feelings experienced now, by my present self, even though they are based on what might happen in the future. Whatever it is, I need to get something now to make me act now.

Even though those feelings and beliefs are related to the future, they still cannot directly be caused by future benefits. My feeling of security isn't actually a result of my future comfortable retirement. It's a result of me imagining now what my retirement might be like. My brain has to be able to predict the future, and somehow take an action now based on imagining something good in the future.

This leads to an important conclusion. The brain must have a mechanism for forecasting future outcomes. Having made its forecast, it must be able to produce a present value for each of them – converting it into some immediate force that can influence current decisions. It makes sense to believe that whichever outcome produces the highest immediate force will be chosen by the decision maker.

So, instead of picking options based on which one brings the highest predicted reward, the brain chooses whichever makes the highest immediate impact at the time the decision is made. The size of this impact is certainly related in some way to anticipated reward, but is not the same thing. It is calculated by some mental mechanism that predicts decision outcomes. An obvious research question which follows from this is: how does this brain function translate one quantity (anticipated reward) into another (immediate influence on decisions)?

My last post suggests a possible mechanism by which this could happen. The mind contains an associative network that makes a model of the world, tests out actions and their consequences, and estimates the amount of reward that is likely to be generated. That's just one hypothesis of how this process could work; whatever the mechanism in reality, there must be some process that can estimate which of two anticipated outcomes is better.

That insight leads to a very important question. We know the mind has the ability to experience pleasure from receiving certain sensations. But does it have two separate mechanisms: one for experiencing actual pleasure, and another for weighing up anticipated pleasure in order to choose between two options? If the pleasure I gain from actually eating a doughnut is measured in (for example) micrograms of dopamine, in what units do we measure the anticipated pleasure when I imagine eating the doughnut?

Neuroscience (e.g. this 2014 paper by Linnet) and Occam's razor both suggest an answer with far-reaching consequences. The simplest explanation, and the one that requires the least neural machinery in the brain, is to assume that there is a single quantity in the decision process that does both duties: evaluating immediate sensations and evaluating anticipated outcomes. In other words, we get exactly the same kind of reward from thinking about future pleasure, as we do from experiencing pleasure right now.

This poses an interesting scenario: the question of trading off current reward versus imagined reward in a single decision (one marshmallow now versus two marshmallows in the future). In order for me to exchange the actual pleasure of a marshmallow now for a the imagined pleasure of two, the immediate reward from imagining two marshmallows must be greater than the reward from eating one. In some situations that's the case, but in others it is not.

There's a complication to this: if I get so much pleasure from imagining future marshmallows, why wouldn't I eat the marshmallow now and imagine the future ones? I could go around imagining marshmallows all the time and get unlimited pleasure from it. There are reasons, though, why this wouldn't work: to be discussed in a future post.

As a reward for getting to the end - for those who did have Domino Rally as a kid, take a look at all the add-ons we couldn't afford: