Summary: I proved a parametric, bounded version of Löb’s Theorem that shows bounded self-reflective agents exhibit weird Löbian behavior, too.
Previous in series: The Problem of IndignationBot, Part 3.
As it turns out, agents that bound the lengths of the proofs they consider still exhibit weird Löbian behavior, which includes mutual cooperation of FairBots, and defection of IndignationBot with CooperateBot (and everyone else, for that matter). So, in short, infinite computational resources are not to blame for the weirdness. Moreover, I now suspect that the phenomenon is not confined to formal logic, either, but could manifest in many other reasoning systems.
A number of people I know told me they expected bounded versions of these agents to exhibit “Löbian cooperation” and the like, but I had never seen a satisfactory proof of the claim, and I was honestly a little suspicious: sometimes when people believe something but they haven’t written it up, it’s because it’s hard to write up… and sometimes, things are hard to write up because they’re false.
But anyway, I was able to write up a proof myself, and now I’m satisfied. It’s via a result I’m calling “Parametric Bounded Lob’s Theorem” because it’s a version of Löb’s theorem that works with quantified parameters and bounded proof lengths:
- If you let FairBot_k search for proofs up to length k, then for large k, FairBot_k(FairBot_k)=Cooperate, and similarly, IndignationBot_k(CooperateBot)=Defect.
- If you let FairBot_k search for proofs up to length k plus a small term adjusting for the size of its input, then actually for all large m,n past some threshold, FairBot_m(FairBot_n)=Cooperate.
I’ve made a fair effort to make the paper readable, and will probably try to publish it sometime in the coming year, so any comments on it are extremely welcome! Please send any suggestions to my work email, [my last name] at intelligence.org 🙂
Note to commenters:I know that there is some previous work by Pavel Pudlak on finitistic consistency statements, and even some blog posts on LessWrong in a similar vein, but they weren’t quite applicable to show the robust cooperation of the agents in question here because the agents often included a reference to the lengths of proofs being written about them. If not for that glitch, I probably could have just cited Pudlak for fairly quick proof, and most of the work I had to do (aside from learning the basics of the field!) was to work around that.
Anyway, I still find this all very strange. And, for what it’s worth, after staring long and hard enough at how Lob’s Theorem actually works to prove a generalization of it, I now suspect that it is not merely a bug-feature of formal logical systems, but rather a more robust fixed-point-finding phenomenon that could manifest in many other media, even in neural networks designed for abstract reasoning. This claim is much harder to cash out formally, of course, but if you happen to be on the cusp of making potentially-self-reflective-neural-net-based-AI, I am happy to work through the structure of the argument to help you figure out if it’s something you should worry about!