My new paper, coauthored with Yohan John, Dakota McCoy and Oliver Braganza, is out in Behavioral and Brain Sciences.

"Dead rats, dopamine, performance metrics and peacock tails" is about the universal emergence of proxy failure. When you measure and incentivise performance by a single metric (a proxy), the proxy will always become a worse measure of performance than it was before you added the incentive.

This effect is also known as Goodhart's Law in economics, and by other names in other fields - but the same underlying process drives the effect across multiple domains. Our paper studies it in management, economics, biology, neuroscience and other areas.

Recent concerns about AI alignment are closely related to this phenomenon. The paperclip problem is a good example - if a sufficiently clever AI is given a single goal, to produce as many paperclips as possible, it may eventually destroy all of humanity and take over the whole universe in its efforts to maximise output.

Fortunately, our paper identifies a few constraints that can stop systems from running completely out of control - we might still be safe from our Clippy overlords.

Currently the journal is inviting comments on the article - a number of authors will be invited to write a commentary which will be published in the journal alongside the article. Please do feel free to sign up on the journal website and submit your proposal for a comment if you have something to add to the conversation.

It has been a long and very satisfying process for the four of us to put this paper together, so I hope you enjoy it - I would love to hear your informal thoughts if you don't want to go so far as submitting a formal response to the journal.

If you can't download the paper via the link above please let me know and I can send you a preprint copy.


