The Problem of IndignationBot, Part 1

I like to state the Prisoner’s Dilemma by saying that each player can destroy \$2 of the other player’s utility in exchange for \$1 for himself. Writing “C” and “D” for “cooperate” and “defect”, we have the following:

Payout Matrix
P1: \ P2:	C	D
C	(\$2, \$2)	(\$0, \$3)
D	(\$0, \$3)	(\$1, \$1)

There’s an extremely interesting version of the Prisoner’s Dilemma which I like to call the “Open-Source Prisoner’s Dilemma” wherein each player is an algorithm that can read the other player’s source code (and his own source code, via “Quining”) before returning either C or D.

Then you can consider such interesting agents as “FairBot”, which cooperates with you if it figures out that you’re going to cooperate with it:

def FairBot(Opponent):
    search for a proof that Opponent(FairBot) = C
    if found:
        return C
    else:
        return D

…or agents as trivial as “CooperateBot”, which always cooperates:

def CooperateBot(Opponent):
    return C

In an awesome paper called Robust Cooperation in the Prisoner’s Dilemma, FairBot and another more complex agent called “PrudentBot” are shown to attain mutual cooperation in some very surprising ways. But here, I’m going to focus on another surprising agent I discovered, which I’ll call “IndignationBot”, or “IB” for short:


def IB(Opp):
    search for a proof that 
        (IB(Opp) = C) → (Opp(IB) = D)
    if found:
        feel a sense of righteous indignation
        return D
    else:
        return C

The intuition here is that if IndignationBot is about to be nice to you and realizes that, even then, you’d be mean to it, then it finds this particularly distasteful and defects against you as punishment.

In the “Modal Agent” notation of the Robust Cooperation paper, if we write $A$ for “IB(Opp)=C”, $B$ for “Opp(IB)=C”, and $\blacksquare X$ for “X is provable (in whatever proof system we’re using)”, then IndignationBot is characterized by the relation \[A \leftrightarrow \neg\blacksquare (A\rightarrow \neg B)\]

Question: What is IndignationBot(CooperateBot)?

I claim that the answer is surprising, and sufficiently interesting that I won’t spoil it in this post…

Next in series: The Problem of IndignationBot, Part 2.

This entry was posted in AI Safety. Bookmark the permalink.

The Problem of IndignationBot, Part 1

Leave a Reply Cancel reply