Category Archives: AI Safety

FAQ

I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.

Thanks for being understanding.

Continue reading

Time to spend more than 0.00001% of world GDP on human-level AI alignment

From an outside view, looking in at the Earth, if you noticed that human beings were about to replace themselves as the most intelligent agents on the planet, would you think it unreasonable if 1% of their effort were being spent explicitly reasoning about that transition? How about 0.1%?

Well, currently, world GDP is around \$75 trillion, and in total, our species is spending around \$9MM/year on alignment research in preparation for human-level AI (HLAI). That’s \$5MM on technical research distributed across 24 projects with a median annual budget of \$100k, and 4MM on related efforts, like recruitment and qualitative studies like this blog post, distributed across 20 projects with a median annual budget of \$57k. (I computed these numbers by tallying spending from a database I borrowed from Sebastian Farquhar at the Global Priorities Project, which uses a much more liberal definition of “alignment research” than I do.) I predict spending will roughly at least double in the next 1-2 years, and frankly, am underwhelmed…

Continue reading

Open-source game theory is weird

I sometimes forget that not everyone realizes how poorly understood open-source game theory is, until I end up sharing this example and remember how weird it is for folks to see for the first time. Since that’s been happening a lot this week, I wrote this post to automate the process.

Consider a game where agents can view each other’s source codes and return either “C” (cooperate) or “D” (defect). The payoffs don’t really matter for the following discussion.

First, consider a very simple agent called “CooperateBot”, or “CB” for short, which cooperates with every possible opponent:

def CB(opp):
  return C

(Here “opp” is the argument representing the opponent’s source code, which CooperateBot happens to ignore.)

Next consider a more interesting agent, “FairBot”, or “FB” for short, which takes in a single parameter $k$ to determine how long it thinks about its opponent:

Continue reading

Abstract open problems in AI alignment, v.0.1 — for mathematicians, logicians, and computer scientists with a taste for theory-building

This page is a draft and will be updated in response to feedback and requests to include specific additional problems.

Through my work on logical inductors and robust cooperation of bounded agents, I’m meeting lots of folks in math, logic, and theoretical CS who are curious to know what contributions they can make, in the form of theoretical work, toward control theory for highly advanced AI systems. If you’re one of those folks, this post is for you!

Continue reading

Professional feedback form

Continue reading

Leveraging academia

Since a lot of interest in AI alignment has started to build, I’m getting a lot more emails of the form “Hey, how can I get into this hot new field?”. This is great. In the past I was getting so few messages like this that I could respond to basically all of them with many hours of personal conversation.

But now I can’t respond to everybody anymore, so I have a new plan: leverage academia.

To grossly oversimplify things, here’s the heuristic. Continue reading

Seeking a paid part-time assistant for AI alignment research

Please share this if you think anyone you know might be interested.

Sometimes in my research I have to do some task on a computer that I could easily outsource, e.g., adding bibliographical data to a list of papers (i.e., when they were written, who the authors were, etc.). If you think you might be interested in trying some work like this, in exchange for

  • $20/hour, paid to you from my own pocket,
  • exposure to the research materials I’m working with, and
  • knowing you’re doing something helpful to AI alignment research, then
Continue reading

Interested in AI Alignment? Apply to Berkeley.

Summary: Researching how to control (“align”) highly-advanced future AI systems is now officially cool, and UC Berkeley is the place to do it.

Interested in AI alignment research? Apply to Berkeley for a PhD or postdoc (deadlines are approaching), or transfer into Berkeley from a PhD or postdoc at another top school. If you get into one of the following programs at Berkeley:

  • a PhD program in computer science, mathematics, logic, or statistics, or
  • a postdoc specializing in cognitive science, cybersecurity, economics, evolutionary biology, mechanism design, neuroscience, or moral philosophy,
… then I will personally help you find an advisor who is supportive of you researching AI alignment, and introduce you to other researchers in Berkeley with related interests.

This was not something I could confidently offer you two years ago. Continue reading

Seeking a paid personal assistant to create more x-risk research hours

My main bottleneck as a researcher right now is that I have various bureaucracies I need to follow up with on a regular basis, which reduce the number of long interrupted periods I can spend on research. I could really use some help with this. Continue reading

AI strategy and policy research positions at FHI (deadline Jan 6)

Oxford’s Future of Humanity Institute has some new positions opening up at their Strategic Artificial Intelligence Research Centre. I know these guys — they’re super awesome — and if you have the following three properties, then humanity needs you to step up and solve the future: Continue reading

The 2015 x-risk ecosystem

Summary: Because of its plans to increase collaboration and run training/recruiting programs for other groups, CFAR currently looks to me like the most valuable pathway per-dollar-donated for reducing x-risk, followed closely by MIRI, and GPP+80k. As well, MIRI looks like the most valuable place for new researchers (funding permitting; see this post), followed very closely by FHI, and CSER. Continue reading

Why I want humanity to survive — a holiday reflection

Life on Earth is almost 4 billion years old. During that time, many trillions of complex life forms have starved to death, been slowly eaten alive by predators or diseases, or simply withered away. But there has also been much joy, play, love, flourishing, and even creativity.

Continue reading

MIRI needs funding to scale with other AI safety programs

Summary: MIRI’s end-of-year fundraiser is on, and I’ve never been more convinced of what MIRI can offer the world. Continue reading

The Problem of IndignationBot, Part 4

Summary: I proved a parametric, bounded version of Löb’s Theorem that shows bounded self-reflective agents exhibit weird Löbian behavior, too. Continue reading

The Problem of IndignationBot, Part 3

Summary: Is strange “Löbian” self-reflective behavior a just theoretical symptom of assuming unbounded computational resources?

Continue reading

The Problem of IndignationBot, Part 2

Summary: Agents that can reason about their own source codes are weirder than you think.

Continue reading

The Problem of IndignationBot, Part 1

I like to state the Prisoner’s Dilemma by saying that each player can destroy \$2 of the other player’s utility in exchange for \$1 for himself. Writing “C” and “D” for “cooperate” and “defect”, we have the following: Continue reading