The prisoner’s dilemma and Newcomb’s problem

The Prisoner’s dilemma describes any situation where two players have to independently choose one of two actions—usually referred to as “cooperate” and “defect”—such that each player’s rewards for the possible outcomes are ordered like this:

(me defect, you cooperate) > (both cooperate) > (both defect) > (me cooperate, you defect)

Think of two prisoners accused of killing someone; they are kept in separate cells, and both are given the chance to betray the other by testifying against her. If just one of them betrays the other, the betrayer walks free while the betrayed is sentenced to 10 years; if neither betrays the other, they’re both sentenced to 2 years; and if both betrays the other, they each get 8 years (10 years for murder minus 2 years for cooperating with the investigation).

The interesting thing about this situation is that while it would at first glance seem like they should cooperate, it’s easy to see that since they can’t influence each other (so that the other’s decision behaves like a constant), each prisoner is in fact faced with either of these two situations—they just don’t know which one:

if the other cooperates: (me defect, you cooperate) > (both cooperate)

if the other defects: (both defect) > (me cooperate, you defect)

And crucially, in both cases the prisoner is clearly better off if she defects. So the rational choice for both of them is to defect, netting them each 8 years in jail, compared to 2 years if both had cooperated.

Now consider what would happen if the two players are very similar—for example, if they’re two instances of the same computer program—and they know this fact. Then the cases of them not making the same decision go away, leaving us with just

(both cooperate) > (both defect)

So in this case, the rational choice is to cooperate! It can work even if the other player has a less than 100% chance of mirroring your decision—it depends on what the probability is, and the relative rewards for the four cases.

Compare this to Newcomb’s problem, where an adversary has placed a small reward in one box (“A”) and has placed a large reward in another box (“B”) if and only if it has predicted that you will not open the first box; you are then given the choice of opening either or both boxes and taking what’s inside. This looks like a prisoner’s dilemma for you:

(open both, B full) > (open just B, B full) > (open both, B empty) > (open just B, B empty)

No matter what the adversary har put in box B, you get a bigger reward if you open both boxes (known as “two-boxing”) than if you open just one. However, the problem states that the adversary is very good at predicting what you will do, which eliminates the (open both, B full) and (open just B, B empty) cases, leaving us with just

(open just B, B full) > (open both, B empty)

So if the adversary can predict you perfectly, you should one-box instead of two-box! And just like in the prisoner’s dilemma, if the adversary predicts your action correctly less than 100% of the time, the right choice depends on that probability and the relative rewards in box A and B.