Monday, 28 March 2016

Discussion 2 of 3: No spooky action at a distance - a theory of reward

One of the most powerful ideas in physics is the principle of locality. This principle insists that objects can only be influenced by other objects that touch them. Two items separated by a distance cannot directly exert any force or influence on each other, but must communicate via some medium which physically transmits the force from one to the other.

Albert Einstein described this principle as "no spooky action at a distance" and it applies to his theory of gravity as well as all the other physical forces (it gets more complicated when we consider quantum mechanics, but that would take a whole other article). The Scottish physicist James Maxwell also used it in developing his theory of electromagnetism.

Instead of the magnets directly pushing or pulling each other, each magnet creates an electromagnetic field, and sends out the field into the world around it, transmitted by light waves. When another magnet passes through the field, it is affected because the field has now reached the same point where the object is. The two objects are not directly attracting each other; the force is mediated by the electromagnetic field.

A more familiar example is a chain of dominoes. The last domino doesn't fall as a result of the first one being knocked over. It falls because the second last one is knocked over. That in turn happens because of the one before it. The first and last domino can only affect each other because of all the other dominoes in between.

A different way to think about this principle is in time instead of space: the ultimate result of an event can never be a cause of how the event happens. The first domino doesn't fall down because the last one is going to; indeed, if you ever played Domino Rally as a kid, you will know that when the first one falls, it's often entirely unpredictable whether the last one will also go. Each domino only has a local field of influence: just it and the ones that can physically touch it.

What if we apply this principle of locality to economic decision-making?

We are used to talking about decisions based on "future reward" or "future returns". I don't eat the marshmallow now, so I can have two of them later; I save money now because I will receive interest next year; I exercise today so I can enjoy being healthier in later life.

But can a future event actually cause an event in the present? Can my future health cause me to exercise today? Surely not – that would be time travel.

The principle of locality says that a decision I make today can only be based on stimuli and causes right here, right now in the moment of the decision. Future events cannot influence it. (Neither for that matter can past events, or costs and benefits that will take place outside of my direct experience). My brain and the physical atoms that make it up only know about immediate, present influences. Thus, my decision can only be affected by feelings, rewards and costs that I experience at the time of making it.

In practice, however, we do regularly make decisions to defer gratification. We appear to take into account outcomes that happen later, or outside of the decision context (just as the magnets really do attract each other, even though they are not touching each other). How is this paradox resolved?
The key here is to understand how the outcomes are mediated – how those future benefits can indirectly influence the present. The future reward does not directly cause my decision today. My 65-year-old self does not reach into his past and make a pension contribution. Instead, it's my current feelings and beliefs about the future reward that matter. I can only take that future reward into account if I get some kind of immediate payoff for doing so.

That payoff might be the feeling of security that comes from knowing that my retirement is being provided for. It could be the positive feeling of going along with socially acceptable behaviour. Conversely, the guilt associated with eating a doughnut may stop me eating one. All of these are feelings experienced now, by my present self, even though they are based on what might happen in the future. Whatever it is, I need to get something now to make me act now.

Even though those feelings and beliefs are related to the future, they still cannot directly be caused by future benefits. My feeling of security isn't actually a result of my future comfortable retirement. It's a result of me imagining now what my retirement might be like. My brain has to be able to predict the future, and somehow take an action now based on imagining something good in the future.

This leads to an important conclusion. The brain must have a mechanism for forecasting future outcomes. Having made its forecast, it must be able to produce a present value for each of them – converting it into some immediate force that can influence current decisions. It makes sense to believe that whichever outcome produces the highest immediate force will be chosen by the decision maker.

So, instead of picking options based on which one brings the highest predicted reward, the brain chooses whichever makes the highest immediate impact at the time the decision is made. The size of this impact is certainly related in some way to anticipated reward, but is not the same thing. It is calculated by some mental mechanism that predicts decision outcomes. An obvious research question which follows from this is: how does this brain function translate one quantity (anticipated reward) into another (immediate influence on decisions)?

My last post suggests a possible mechanism by which this could happen. The mind contains an associative network that makes a model of the world, tests out actions and their consequences, and estimates the amount of reward that is likely to be generated. That's just one hypothesis of how this process could work; whatever the mechanism in reality, there must be some process that can estimate which of two anticipated outcomes is better.

That insight leads to a very important question. We know the mind has the ability to experience pleasure from receiving certain sensations. But does it have two separate mechanisms: one for experiencing actual pleasure, and another for weighing up anticipated pleasure in order to choose between two options? If the pleasure I gain from actually eating a doughnut is measured in (for example) micrograms of dopamine, in what units do we measure the anticipated pleasure when I imagine eating the doughnut?

Neuroscience (e.g. this 2014 paper by Linnet) and Occam's razor both suggest an answer with far-reaching consequences. The simplest explanation, and the one that requires the least neural machinery in the brain, is to assume that there is a single quantity in the decision process that does both duties: evaluating immediate sensations and evaluating anticipated outcomes. In other words, we get exactly the same kind of reward from thinking about future pleasure, as we do from experiencing pleasure right now.

This poses an interesting scenario: the question of trading off current reward versus imagined reward in a single decision (one marshmallow now versus two marshmallows in the future). In order for me to exchange the actual pleasure of a marshmallow now for a the imagined pleasure of two, the immediate reward from imagining two marshmallows must be greater than the reward from eating one. In some situations that's the case, but in others it is not.

There's a complication to this: if I get so much pleasure from imagining future marshmallows, why wouldn't I eat the marshmallow now and imagine the future ones? I could go around imagining marshmallows all the time and get unlimited pleasure from it. There are reasons, though, why this wouldn't work: to be discussed in a future post.

As a reward for getting to the end - for those who did have Domino Rally as a kid, take a look at all the add-ons we couldn't afford:

4 comments: said...

OK, well, I admire the effort but it is all over the map. How's about we keep extreme physics out of it an focus on seeking and consumption behavior in any animal? You know they are separate things, right?

Let's start with there is no such thing as choice, free will or conscious control of anything, in humans, or in any other animals, duh. Humans have no more ability to comprehend and consciously chose their behavior, then bacteria, of course. The essence of all magical thinking is the promise/lie of "Mind over matter." There is no mind, only matter.

Look the DA system is really complicated and even the real brain scientists, let alone the pretend n-ecnomics, ones don't have it dialed in - in fruit flies let alone mammals, let alone... "Neuroeconomist" someone who is not a trained neurologist, pretending while paid by the econ dept. Really dishonest.

Read Paul Cisek, who debunk "cognitivism" easily. Cognitive models are just magical human exceptionalism dessed up in pop science, ho hum.. But, like all magical ideas, it sells really well. said...

Kudos, that is a nice paper. Hadn't seen it. Let me study. said...

Oops,Frontiers, is a pretty speculative publication...the "reward" model of dopamine is probably off, but economists like it because they can twist it to support utility theories and incentives...bass ackwards as econ always is...DA is the "seeking" ligand. The W. Schultz model is pretty need to go on Research Gate and get better papers.

Throw out the economists, they are selling ideology. John Salomone does the best work on DA but it is frickin' complicated. DA is really hard.

Impulsive, self-harming behaviors are a little easier...still it all happens in milliseconds so that is very hard....grrrrrrr

Leigh Caldwell said...

Thanks for the thoughtful comments. The physics is just an analogy, but its usefulness is a matter of opinion. On your other points, we probably agree more than we disagree.

Free will: yes, I'm with you (though I don't expect to change anyone's mind on that fraught topic).

Human vs animals: absolutely. This associative forecasting mechanism, I'd speculate, is present in other animals, and the difference in humans is a matter of degree not kind.

Neuroscience generally: I'm not widely read in this field and am grateful for your input. I'm making more of a philosophical argument than a biological one, and don't intend to make any strong claims about dopamine as such. It makes a lot of sense that dopamine would be a tool for motivation more than reward (indeed, perhaps those two things are merely different descriptions of the same phenomenon).

Neuroeconomics: I've been very sceptical about this field but I'm starting to see where it can have some value (i.e. not in fMRI-derived speculation about "the part of the brain responsible for...") I believe a neuroeconomics with suitable built-in humility can be useful in helping us understand what kind of brain mechanisms are plausible and which are unlikely, which in turn provides constraints on the types of decision models we might choose to believe in.

On seeking and consumption (if I parse your sentence correctly, it's these two you're saying are "separate things" - not physics and behaviour?) The argument I'll develop further in the next post or two is that these are not necessarily fully separate after all. But if there's evidence I'm wrong, I'd be keen to see it and save some time.

Thanks - look forward to your further comments. Your blog is interesting too.