Category Archives: AI Safety


I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.

Thanks for being understanding.

Continue reading

Time to spend more than 0.00001% of world GDP on human-level AI alignment

From an outside view, looking in at the Earth, if you noticed that human beings were about to replace themselves as the most intelligent agents on the planet, would you think it unreasonable if 1% of their effort were being spent explicitly reasoning about that transition? How about 0.1%?

Well, currently, world GDP is around \$75 trillion, and in total, our species is spending around \$9MM/year on alignment research in preparation for human-level AI (HLAI). That’s \$5MM on technical research distributed across 24 projects with a median annual budget of \$100k, and 4MM on related efforts, like recruitment and qualitative studies like this blog post, distributed across 20 projects with a median annual budget of \$57k. (I computed these numbers by tallying spending from a database I borrowed from Sebastian Farquhar at the Global Priorities Project, which uses a much more liberal definition of “alignment research” than I do.) I predict spending will roughly at least double in the next 1-2 years, and frankly, am underwhelmed…

Continue reading

Open-source game theory is weird

I sometimes forget that not everyone realizes how poorly understood open-source game theory is, until I end up sharing this example and remember how weird it is for folks to see for the first time. Since that’s been happening a lot this week, I wrote this post to automate the process.

Consider a game where agents can view each other’s source codes and return either “C” (cooperate) or “D” (defect). The payoffs don’t really matter for the following discussion.

First, consider a very simple agent called “CooperateBot”, or “CB” for short, which cooperates with every possible opponent:

def CB(opp):
  return C

(Here “opp” is the argument representing the opponent’s source code, which CooperateBot happens to ignore.)

Next consider a more interesting agent, “FairBot”, or “FB” for short, which takes in a single parameter $k$ to determine how long it thinks about its opponent:

Continue reading

Abstract open problems in AI alignment, v.0.1 — for mathematicians, logicians, and computer scientists with a taste for theory-building

This page is a draft and will be updated in response to feedback and requests to include specific additional problems.

Through my work on logical inductors and robust cooperation of bounded agents, I’m meeting lots of folks in math, logic, and theoretical CS who are curious to know what contributions they can make, in the form of theoretical work, toward control theory for highly advanced AI systems. If you’re one of those folks, this post is for you!

Continue reading

Professional feedback form

Leveraging academia

Since a lot of interest in AI alignment has started to build, I’m getting a lot more emails of the form “Hey, how can I get into this hot new field?”. This is great. In the past I was getting so few messages like this that I could respond to basically all of them with many hours of personal conversation.

But now I can’t respond to everybody anymore, so I have a new plan: leverage academia.

To grossly oversimplify things, here’s the heuristic. If the typical prospective researcher (say, an inbound grad student at a top school) needs 100 hours of guidance/mentorship to become a productive contributor to AI alignment research, maybe only 10 of those hours need to come from someone already in the field, and the remaining 90 hours can come from other researchers in CS/ML/math/econ/neuro/cogsci. So if I have 100 hours of guidance to give this year, I can choose between mentoring 1 person, or 10 people who are getting 90% of their guidance elsewhere. The latter produces more researchers, and potentially researchers of a higher quality because of the diversity of views they’re seeing (provided the student has the filter-out-incorrect-views property, which is of course critical). So that’s what I’m doing, and this blog post is my generic response to questions about how to get into AI alignment research 🙂

I think this policy might also be a good filter for good-team-players, in the following sense: When you’re part of a team, it’s quite helpful if you can leverage resources outside your team to solve the team’s problems without having to draw heavily on the team’s internal resources. Thus, if you want to be part of a new/young field like AI alignment, it’s nice if you can draw on resources outside that field to make it stronger.

So! If I send you a link to this blog post, please don’t read me as saying “I don’t have any advice for you.” Because I do have some advice: aside from going to grad school and deliberately learning from it, and choosing Berkeley for your PhD/postdoc (or transferring there), I’m also advising that you acquire and demonstrate the quality of drawing from non-scarce resources to help produce scarce ones. Use non-scarce resources to decide what to learn (e.g., read this blog post by Jan Leike); use non-scarce resources to learn that stuff (e.g., college courses, online lectures, books), and use non-scarce resources to demonstrate what you’ve learned (standardized tests, competitions, publications), at least up to the point where you get admitted as a grad student to a top school. And if that school is Berkeley, I will help you find an advisor!

Seeking a paid part-time assistant for AI alignment research

Please share this if you think anyone you know might be interested.

Sometimes in my research I have to do some task on a computer that I could easily outsource, e.g., adding bibliographical data to a list of papers (i.e., when they were written, who the authors were, etc.). If you think you might be interested in trying some work like this, in exchange for

  • $20/hour, paid to you from my own pocket,
  • exposure to the research materials I’m working with, and
  • knowing you’re doing something helpful to AI alignment research, then
Continue reading

Interested in AI Alignment? Apply to Berkeley.

Summary: Researching how to control (“align”) highly-advanced future AI systems is now officially cool, and UC Berkeley is the place to do it.

Interested in AI alignment research? Apply to Berkeley for a PhD or postdoc (deadlines are approaching), or transfer into Berkeley from a PhD or postdoc at another top school. If you get into one of the following programs at Berkeley:

  • a PhD program in computer science, mathematics, logic, or statistics, or
  • a postdoc specializing in cognitive science, cybersecurity, economics, evolutionary biology, mechanism design, neuroscience, or moral philosophy,
… then I will personally help you find an advisor who is supportive of you researching AI alignment, and introduce you to other researchers in Berkeley with related interests.

This was not something I could confidently offer you two years ago. Continue reading

Seeking a paid personal assistant to create more x-risk research hours

My main bottleneck as a researcher right now is that I have various bureaucracies I need to follow up with on a regular basis, which reduce the number of long interrupted periods I can spend on research. I could really use some help with this. Continue reading

AI strategy and policy research positions at FHI (deadline Jan 6)

Oxford’s Future of Humanity Institute has some new positions opening up at their Strategic Artificial Intelligence Research Centre. I know these guys — they’re super awesome — and if you have the following three properties, then humanity needs you to step up and solve the future: Continue reading

The 2015 x-risk ecosystem

Summary: Because of its plans to increase collaboration and run training/recruiting programs for other groups, CFAR currently looks to me like the most valuable pathway per-dollar-donated for reducing x-risk, followed closely by MIRI, and GPP+80k. As well, MIRI looks like the most valuable place for new researchers (funding permitting; see this post), followed very closely by FHI, and CSER. Continue reading

Why I want humanity to survive — a holiday reflection

Life on Earth is almost 4 billion years old. During that time, many trillions of complex life forms have starved to death, been slowly eaten alive by predators or diseases, or simply withered away. But there has also been much joy, play, love, flourishing, and even creativity.

Continue reading

MIRI needs funding to scale with other AI safety programs

Summary: MIRI’s end-of-year fundraiser is on, and I’ve never been more convinced of what MIRI can offer the world. Continue reading

The Problem of IndignationBot, Part 4

Summary: I proved a parametric, bounded version of Löb’s Theorem that shows bounded self-reflective agents exhibit weird Löbian behavior, too. Continue reading

The Problem of IndignationBot, Part 3

Summary: Is strange “Löbian” self-reflective behavior a just theoretical symptom of assuming unbounded computational resources?

Continue reading

The Problem of IndignationBot, Part 2

Summary: Agents that can reason about their own source codes are weirder than you think.

Continue reading

The Problem of IndignationBot, Part 1

I like to state the Prisoner’s Dilemma by saying that each player can destroy \$2 of the other player’s utility in exchange for \$1 for himself. Writing “C” and “D” for “cooperate” and “defect”, we have the following: Continue reading