Summary: Is strange “Löbian” self-reflective behavior a just theoretical symptom of assuming unbounded computational resources?
Previous in series: The Problem of IndignationBot, Part 2.
In Part 1, we defined IndignationBot to be an agent which defects against you iff it finds a proof that if it cooperates then you will defect. We also looked briefly at FairBot, an agent which cooperates with you if and only if it can prove that you cooperate. Both of these agents exhibit highly counterintuitive behavior:
- In the case of IndignationBot(CooperateBot), it seems like IndignationBot could not possibly prove that its defection would imply that CooperateBot defects… but in Part 2 we saw that it actually does, because of Löb’s Theorem.
- In the case of FairBot(FairBot), it seems on first pass that either C or D is a consistent outcome, and thus no proof that the outcome must be C could possibly exist. [[Stop and think about this now if you haven’t before]] Therefore, no such proof should be found, and the result should be D… but actually, the result is C, again by Löb’s Theorem:
Proof that FairBot(FairBot)=C
If we let p be the statement “FairBot(FairBot)=C”, the definition of FairBot implies that \[\quad \blacksquare p \rightarrow p.\] Moreover, this statement is provable from that definition, and so by Löb’s Theorem, p must itself be provable: i.e., FairBot(FairBot)=C.
This stuff is all very interesting, but the experienced reader will notice that these agents are not physically possible. In fact, they are not even Turing machines: no Turing machine is sufficiently general as to be capable of deciding the provability of arbitrary statements, especially statements about the outputs of arbitrary other Turing machines!
To put it another way, any real-world proof search would have to be bounded by some computational resource constraints, so we simply can’t implement any strategy that says “Cooperate iff X is provable”; instead, we’d have to say something like “Cooperate iff X has a proof of length at most K” for some K, or impose some other constraint on the proof complexity so that our agent will eventually halt instead of searching forever when no proof of X exists.
This raises the question: does all this self-reflective weirdness come from the infinite computational resources we have given our hypothetical agents? Consider how an infinite set can contain a strict subset of equal cardinality to itself (like Hilbert’s Infinite Hotel), whereas this is impossible for finite sets. Or how infinite-precision images can form fractals by containing self-similar sub-images, but no finite-resolution image can exhibit this phenomenon.
So perhaps an “infinite reasoner” can contain a self-model and reason itself “more completely” than any finite reasoner could, and maybe all this Löbian stuff won’t actually manifest for bounded reasoners.
For example, a Bounded FairBot (BFB) — say, one searching for proofs of length at most 10^100 — would not be able to prove that the mere existence of a proof that it cooperates would make it cooperate. It could show that
(Y) If a short proof that BFB(BFB) = C exists, then BFB(BFB) = C.
… but (Y) is not enough to apply Löb’s theorem, so we can’t conclude that BFB(BFB)=C.
This was my biggest complaint about the aforementioned Robust Cooperation paper… that the ‘agents’ it describes are not actually algorithms, but rather, non-computable abstract mathematical functions, defined implicitly by fixed-point conditions rather than explicitly by construction.
So, what do you think? Is infinity to blame here, or will bounded versions of FairBot and IndignationBot continue to exhibit the same behavior?
Next in series: The Problem of IndignationBot, Part 4.