Make Gmail or Inbox open “mailto:” links in Chrome

Life will be better… just click the “handler” button and choose “allow”:

Associate your academic email address with a Google account

If I’ve sent you a link to this blog post, it’s probably because your .edu email address is not already associated with a Google account, and I got a notification about that when sharing a doc or calendar item with you. To fix this problem permanently, open a browser logged into a gmail account (create a new one if you don’t want to use your personal one), and go to:

From there, you can add email addresses that will actually work for receiving things like Google doc invitations and Google calendar invitations. This is somewhat new, and different from just setting up a “send mail as” setting in gmail, because it applies to all google services at once.

Give it a try, and save us both a bunch of future hassle 🙂

Deserving Trust, II: It’s not about reputation

Summary: a less mathematical account of what I mean by “deserving trust”.

When I was a child, my father made me promises. Of the promises he made, he managed to keep 100% of them. Not 90%, but 100%. He would say things like “Andrew, I’ll take you to play in the sand pit tomorrow, even if you forget to bug me about it”, and then he would. This often saved him from being continually pestered by me to keep his word, because I knew I could trust him.

Around 1999 (tagged in my memory as “age 13”), I came to be aware of this property of my father in a very salient way, and decided I wanted to be like that, too. When I’d tell someone they could count on me, if I said “I promise”, then I wanted to know for myself that they could really count on me. I wanted to know I deserved their trust before I asked for it. At the time, I couldn’t recall breaking any explicit promises, and I decided to start keeping a careful track from then on to make sure I didn’t break any promises thereafter.

About a year later, around 2000, I got really wrapped up in thinking about what I wanted from life, in full generality. I’d seen friends undergo drastic changes in their world views, like deconverting from Christianity, and becoming deeply confused about what they wanted when they realized that everything they previously wanted was expressed in terms of a “God” concept that no longer had a referent for them. I wanted to be able to express what I wanted in more stable, invariant terms, so I chose my own sensory experiences as the base language — something I believed more than anything else to be an invariant feature of my existence — and then began trying to express all my values to myself in terms of those features. Values like “I enjoy the taste of strawberry” or “it feels good to think about math” or “the split second when I’m airborne at the apex of a parabolic arc feels awesome.” Life became about maximizing the integral of the intrinsic rewardingness of my experiences over the rest of time, which I called my “intensity integral”, because it roughly corresponded to having intense/enriching intellectual and emotional experiences. (Nowadays, I’d say my life back then felt like an MDP, and I’d call the intensity function my “reward function”. I’ll keep these anachronistic comments in parentheses, though, to ensure a more accurate representation of the language and ideas I was using at the time.)

So by 2001, I had decided to make the experience-optimization thing a basic policy; that as a matter of principle, I should be maximizing the integral of some function of my sensory experiences over time, a function that did not depend too much on references to concepts in the external world like “God” or “other people”. I knew I had to get along with others, of course, but I figured that was easier to explain as something instrumental to future opportunities and experiences, and not something I valued intrinsically. It seemed conceptually simpler to try to explain interactions with others as part of a strategy than as part of my intrinsic goals.

But by 2005, I had fallen deeply in love with someone who I felt understood me pretty well, and I began to feel differently about this self-centered experience-optimization thing. It started to seem like I cared also about *her* experiences, even if I didn’t observe them myself, even indirectly. I ran through many dozens of thought experiments over a period of months, checking that I couldn’t find some conceptually simpler explanation, until I eventually felt I had to accept this about myself. Even if it had no strategic means of ever paying off in enjoyable experiences for *me*, I still wanted *her* to have enjoyable experiences, full stop.

Around the same time, something even more striking to me happened: I realized I also cared about other things she cared about, aside from her experiences. In other words, the scope of what I cared about started expanding outward to things that were not observable by either of us, even indirectly. I wanted the little stack of rocks that I built for her while walking alone on the beach one day to stay standing, even though I never expected her to see it, because I knew she would like it if she could. (In contemporary terms, my life started to feel more like a POMDP, so much so that, by the time I first encountered the definition of an MDP around 2011, it felt like a deeply misguided concept that reminded me of my teenage self-model, and I didn’t spend much time studying it.)

At this point, depending whether you want to consider “me” to be my whole brain or just the conscious, self-reporting part, I either realized I’d been over-fitting my self-model, or underwent “value drift”. When I introspected on how I felt about this other person, and what was driving this change in what I cared about (I did that a lot), it felt like I wanted to deserve her trust, the same way I wanted to keep promises, the way my father did. Even when she wasn’t looking and would never know about it, I wanted to do things that she would want me to do, to produce effects that, even if neither of us would ever observe them, would be the effects she wanted to happen in the world. This was pretty incongruous with my model of myself at the time, so, as I pretty much always do when something important seems incongruous, I entered a period of deep reflection.

Before long, I noticed a similarity between my situation and Newcomb’s problem, and I recalled other people describing similar experiences. I wanted to deserve her trust in the same way I wanted to be a one-boxer on Newcomb: even when the predictor-genie in the experiment isn’t looking anymore, you have to one-box today so that the genie will have trusted you yesterday and placed $1,000,000 in the first box to begin with. (For technical details, see this 2010 post I made to LessWrong, and another I made earlier this year on the same topic.)

Basically, I noticed that the assumptions of classical game theory that give rise to homo-economicus behavior were just false when a person can really get to know you and understand you, because that enables them to imagine scenarios about you before you actually get into those scenarios. In other words, your reputation literally precedes you. Your personality governs not only what you do in reality, but also — to a weaker, noisier, but non-zero extent — what you do in the imaginations and instinctive impulses of other people who knew you yesterday.

So, when I find myself entrusted by someone who isn’t looking anymore, or who can’t otherwise punish me for defecting, I still ask myself, “If this were happening in their imagination, before they decided to trust me, would they have given me their trust in the first place?”

This is what I call deserving trust. This framework fits both with my felt-sense of wanting to deserve trust, and with my normative understanding of decision theory. It’s not about having a reputation of being trustworthy. It’s about doing today what people and institutions who might now be unable to observe or punish you would want you to do when they made the decision to trust you.

As I’ve been finding myself expanding my ring of coworkers and collaborators wider and wider, I’ve been wanting to make these distinctions more and more explicit and understandable. The closer I come to working with a new colleague, the more I want them to help me help us be trustworthy. I feel a strong desire for them to know I’m not just trying to preserve our reputation, but that I actually want us, as a group, to be trustworthy, which usually involves making some extra effort. I feel this desire because, unless I make this distinction explicit, people often often respond with something like “Yeah, we don’t want to be perceived as [blah blah blah]”, which leaves me feeling disappointed and like I haven’t really communicated how I want us to work with each other and the outside world. Optimizing perceptions today just isn’t good enough for deserving trust yesterday.

As a researcher, I’ve seen academics do some fairly back-stabby things to each other by now, and while blatant examples of it are rare, there are still commonplace things like writing disingenuous grant applications that I find fairly unacceptable for me personally, and I want my closest colleagues to really understand that I don’t want us to operate that way.

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.


I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.

Thanks for being understanding.

Continue reading

Deserving Trust / Grokking Newcomb’s Problem

Summary: This is a tutorial on how to properly acknowledge that your decision heuristics are not local to your own brain, and that as a result, it is sometimes normatively rational for you to act in ways that are deserving of trust, for no other reason other than to have deserved that trust in the past.

Related posts: I wrote about this 6 years ago on LessWrong (“Newcomb’s problem happened to me”), and last year Paul Christiano also gave numerous consequentialist considerations in favor of integrity (“Integrity for consequentialists”) that included this one. But since I think now is an especially important time for members of society to continue honoring agreements and mutual trust, I’m giving this another go. I was somewhat obsessed with Newcomb’s problem in high school, and have been milking insights from it ever since. I really think folks would do well to actually grok it fully.

You know that icky feeling you get when you realize you almost just fell prey to the sunk cost fallacy, and are now embarrassed at yourself for trying to fix the past by sabotaging the present? Let’s call this instinct “don’t sabotage the present for the past”. It’s generally very useful.

However, sometimes the usually-helpful “don’t sabotage the present for the past” instinct can also lead people to betray one another when there will be no reputational costs for doing so. I claim that not only is this immoral, but even more fundamentally, it is sometimes a logical fallacy. Specifically, whenever someone reasons about you and decides to trust you, you wind up in a fuzzy version of Newcomb’s problem where it may be rational for you to behave somewhat as though your present actions are feeding into their past reasoning process. This seems like a weird claim to make, but that’s exactly why I’m writing this post.

Continue reading

Start following conservative media, and remember how agreements between people and states actually work

Dear liberal American friends: please pair readings of liberal media with viewings of Fox news or other conservative media on the same topics. This will take work. They will say things you disagree with, using words you are unfamiliar with. You’ll have to stop scrolling down on Facebook and actively google phrases like “Trump executive order to protect America.” That may sound hard, but the integrity of your country depends on you doing it.

You’ve probably heard about the President’s executive order restricting immigration from seven countries, which lead to the mistreatment of legal visa holders and permanent residents of the United States in Airports. You probably also understand that there is a huge difference between ruling out new visas from those countries, and dishonoring existing ones. The latter is breaking a promise. Dishonoring agreements like that makes you untrustworthy, and that is very bad for cooperation. Right?

Well, hear this. Continue reading

Time to spend more than 0.00001% of world GDP on human-level AI alignment

From an outside view, looking in at the Earth, if you noticed that human beings were about to replace themselves as the most intelligent agents on the planet, would you think it unreasonable if 1% of their effort were being spent explicitly reasoning about that transition? How about 0.1%?

Well, currently, world GDP is around \$75 trillion, and in total, our species is spending around \$9MM/year on alignment research in preparation for human-level AI (HLAI). That’s \$5MM on technical research distributed across 24 projects with a median annual budget of \$100k, and 4MM on related efforts, like recruitment and qualitative studies like this blog post, distributed across 20 projects with a median annual budget of \$57k. (I computed these numbers by tallying spending from a database I borrowed from Sebastian Farquhar at the Global Priorities Project, which uses a much more liberal definition of “alignment research” than I do.) I predict spending will roughly at least double in the next 1-2 years, and frankly, am underwhelmed…

Continue reading

Considerations against pledging donations for the rest of your life

I think donating to charity is great, especially if you make more than \$100k per year, placing you well past the threshold where your well-being depends heavily on income (somewhere around \$70k, depending on who does the analysis). I’ve been in that boat before, and donated more than 100% of my disposable income to charity. However, I was also particularly well-positioned to know where money should go at that time, which made donating particularly worth doing. I haven’t made any kind of official pledge to always donate money, because I take pledges/promises very seriously, and for me personally, taking such a pledge seems like a bad idea, even accounting for its signalling value. I’m writing this blog post mainly as a way to reduce social pressure among such folks who earn less than \$100k per year to produce donations, while at the same time encouraging folks who earn more to consider donating more seriously.

Note that I currently work for a charitable institution that I believe is extremely important. So, having been both a benefactor and beneficiary of donations, I hope I may come across as being honest when I say “donating to charity is great.”

Note also that I believe I’m in a somewhat rare situation relative to humans-in-general, but not necessarily a rare situation among folks who are likely to read my blog, who tend to have interests in rationality, effective altruism, existential risk, and other intellectual correlates thereof. Basically, depending on how much information I expect you to actively obtain about the world relative to the size of your donations or other efforts, I may or may not like the idea of you pledging to always donate 10% of your income. Here’s my very rough breakdown of why:

Consider future variance in whether you should donate.

If you either (1) make less than \$100k/year, or (2) might be willing to make less than that at some future time in order to work directly on something the world needs you to do (besides giving), I would not be surprised to find myself recommending against you pledging to always donate 10% of your income every year.

Moreover, if you currently spend more than 100 hours per year investigating what the world-at-large needs, I would not be that surprised if in some years you were able to find opportunities to spend \$10k-worth-of-effort (per year on average, rather than every year) that were more effective than giving \$10k/year. Just from eyeballing people I know, I think a person who spends that much time analyzing the world (especially one who is likely to come across this post) can be quite a valuable resource, and I expect high initial marginal returns to their own direct efforts to improve themselves and the world.

Example: during my PhD, I spent a considerable fraction of my time on creating a non-profit called the Center for Applied Rationality. I was earning very little money at that time, and donating 10% of it would have been a poor choice. It would have greatly reduced my personal flexibility to spend money on getting things done (saving time by taking taxis, not worrying about the cost of meals when I was in flow working with a group that couldn’t relocate to obtain cheaper food options without breaking productivity, etc.). I think the value of my contribution to CFAR during those years greatly exceeds \$4,000 in charitable donations, which is what 10% of my income over two years would have amounted to. In fact, I would guess that it exceeds \$40,000, so even if I thought things were only 10% likely to turn out as well as they did, not donating in those years was a good idea.

In other years when I made much more money, I’ve chosen to donate 100% of my disposable income. You might want to do that sometimes, too, and I would highly recommend considering it, especially if you’re spending a lot of your time investigating where that money should go. But I still might recommend against you pledging to keep donating, unless you expect to stop investigating the world as much as you currently do and will therefore be less likely to discover things in the future that should change your plans for years-at-a-time.

Sometimes you should trust your own future judgement.

You might think that you should just defer all your decisions about where money or effort should go to the investigations of a larger group like GiveWell, OPP, GPP, or GWWC, who spend more time on investigation than you. Such a position favors donating as a way of impacting the world, because your impact gets multiplied by the value of someone else’s investigation. This is a highly tenable position, but I believe it becomes less tenable as the ratio of [value of time you spend investigating cause prioritization] to [value of money or effort you spend on your top cause] increases. E.g., if you’ve spent 100 hours this year identifying and analyzing arguments about what the world needs most, I would not be surprised if you could find a way to spend \$10k worth of money or effort on some important and neglected cause that was more valuable than donating to something with more mainstream support.

On the other hand, it would take more convincing for me to think it was also worth you spending \$1mm worth of money or effort on that cause, since that would represent a larger inefficiency in the charity market that should have been easier for others to have identify, and someone spending $1mm has plenty of incentive to have investigated (or hired investigation) for more than 100 hours. That would be a case where I think it makes more sense to depend on (or even better: pay for your own!) more centralized analysis of what’s needed.

Expecting variance + respecting your judgement = not pledging

The combined effects of

  • expecting variance in whether you should donate, and
  • respecting your own judgement for donations valued comparably to the time you spend investigating,
leads me to recommend some folks against pledging to always donate 10% of their income. If you expect low variance and/or low time-ratio-spent-investigating relative to the examples I’ve given, I’m less likely to discourage you from taking such a pledge, because it helps you signal to the world that donating to charity is extremely important.

Having said that: you can donate lots of money without ever pledging do so for the rest of your life, and if you can afford it, I totally think you should do it 🙂

Open-source game theory is weird

I sometimes forget that not everyone realizes how poorly understood open-source game theory is, until I end up sharing this example and remember how weird it is for folks to see for the first time. Since that’s been happening a lot this week, I wrote this post to automate the process.

Consider a game where agents can view each other’s source codes and return either “C” (cooperate) or “D” (defect). The payoffs don’t really matter for the following discussion.

First, consider a very simple agent called “CooperateBot”, or “CB” for short, which cooperates with every possible opponent:

def CB(opp):
  return C

(Here “opp” is the argument representing the opponent’s source code, which CooperateBot happens to ignore.)

Next consider a more interesting agent, “FairBot”, or “FB” for short, which takes in a single parameter $k$ to determine how long it thinks about its opponent:

Continue reading

Abstract open problems in AI alignment, v.0.1 — for mathematicians, logicians, and computer scientists with a taste for theory-building

This page is a draft and will be updated in response to feedback and requests to include specific additional problems.

Through my work on logical inductors and robust cooperation of bounded agents, I’m meeting lots of folks in math, logic, and theoretical CS who are curious to know what contributions they can make, in the form of theoretical work, toward control theory for highly advanced AI systems. If you’re one of those folks, this post is for you!

Continue reading

Voting is like donating thousands of dollars to charity

(Share this post to encourage folks with rational, altruistic leanings to vote more. I originally posted this to LessWrong in 2012, but I figured it was worth re-posting.)

Summary:  It’s often argued that voting is irrational, because the probability of affecting the outcome is so small. But the outcome itself is extremely large when you consider its impact on other people. I estimate that for most people, voting is worth a charitable donation of somewhere between \$100 and \$1.5 million. For me, the value came out to around \$56,000.  So I figure something on the order of \$1000 is a reasonable evaluation (after all, I’m writing this post because the number turned out to be large according to this method, so regression to the mean suggests I err on the conservative side), and that’s big enough to make me do it.

Moreover, in swing states the value is much higher, so taking a 10% chance at convincing a friend in a swing state to vote similarly to you is probably worth thousands of expected donation dollars, too. (This is an important move to consider if you’re in a fairly robustly red-or-blue state like New York, California, or Texas where Gelman et al estimate that “the probability of a decisive vote is closer to 1 in a billion.”) I find EV calculations like this for voting or vote-trading to be much more compelling than the typical attempts to justify voting purely in terms of signal value or the resulting sense of pride in fulfilling a civic duty.

Selfish voting is a waste of your time

Note that voting for selfish reasons is still almost completely worthless, in terms of direct effect. If you’re on the way to the polls, or filling out some forms, only to vote for the party that will benefit you the most, you’re better off using that time to earn \$5 mowing someone’s lawn. The thousands of dollars of EV you generate for thecountry by voting get spread between millions of people, so you basically get nothing from it. Voting for direct impact on the election is an act that pretty much only makes sense if you’re altruistic.

Time for a Fermi estimate

Below is an example Fermi calculation for the value of voting in the USA. Of course, the estimates are all rough and fuzzy, so I’ll be conservative, and we can adjust upward based on your opinion.

I’ll be estimating the value of voting in marginal expected altruistic dollars, the expected number of dollars being spent in a way that is in line with your altruistic preferences.1 If you don’t like measuring the altruistic value of the outcome as a fraction of the federal budget, please consider making up your own measure, and keep reading. Perhaps use the number of smiles per year, or number of lives saved. Your measure doesn’t have to be total or average utilitarian, either; as long as it’s roughly commensurate with the size of the country, it will lead you to a similar conclusion in terms of orders of magnitude.

Component estimates:

At least 1/(100 million) = probability estimate that my vote would affect the outcome (for most Americans). This is the most interesting thing to estimate. There are approximately 100 million voters in the USA, and if you assume a naive fair coin-flip model of other voters, and a naive majority-rule voting system (i.e. not the electoral college), with a fair coin deciding ties, then the probability of a vote being decisive is around √(2/(pi*100 million)) = 8/10,000.

But this is too big, considering the way voters cluster: we are not independent coin flips. As well, the USA uses the electoral college system, not majority rule. So I found a paper of Gelman, King, and Boscardin (1998) where they simulate the electoral college using models fit to previous US elections, and find that the probability of a decisive vote came out between 1/(3 million) and 1/(100 million) for voters in most states in most elections, with most states lying very close to 1/(10 million). Gelman, Silver, and Edlin (2008) reaches a similar number, but with more variance between states.

Added (2016): Some folks have asked me about what happens if there are re-counts of the vote. Errors during (and before) recounts both decrease the probability that a vote that “should” flip the election actually will, and also increase the probability that votes that “shouldn’t” flip the election actually will. These effects roughly cancel out, and you’re left with roughly the same probability that your vote will flip the election outcome.

At least 55% = my subjective credence that I know which candidate is “better”, where I’m using the word “better” subjectively to mean which candidate would turn out to do the most good for others, in my view, if elected. If you don’t like this, please make up your own definition of better and keep reading 🙂 In any case, 55% is pretty conservative; it means I consider myself to have almost no information.

At least \$100 billion = the approximate marginal altruistic value of the “better” candidate (by comparison with dollars donated to a typical charity2). I think this is also very conservative. The annual federal budget is around \$3 trillion right now, making \$12 trillion over a 4-year term, and Barack Obama and Mitt Romney differ on trillions of dollars in their proposed budgets. It would be pretty strange to me if, given a perfect understanding of what they’d both do, I would only care altruistically about 100 billion of those dollars, marginally speaking.


I don’t know which candidate would turn out “better for the world” in my estimation, but I’d consider myself as having at least a 55%*1/(100 million) chance of affecting the outcome in the better-for-the-world direction, and a 45%*1/(100 million) chance of affecting it in the worse-for-the-world direction, so in expectation I’m donating at least around

(55%-45%)*1/(100 million)*(\$100 billion) = \$100

Again, this was pretty conservative:

  • Say you’re more like 70% sure,
  • Say you’re a randomly chosen american, so your probability of a decisive vote is around 1/10 million;
  • Say the outcome matters more on the order of a \$700 billion in charitable donations, given that Obama and Romney’s budgets differ on around \$7 trillion, and say 10% of that is stuff that money is being used as well as moving charitable donations about things you care about.

That makes (70%-30%)*1/(10 million)*(\$700 billion) = \$28,000. Going further, if you’re

  • 90% sure,
  • voting in Virginia — 1/(3.5 million), and
  • care about the whole \$7 trillion dollar difference in budgets,

you get (90%-30%)*1/(3.5 million)*(\$7 trillion) = \$1.2 million. This is so large, it becomes a valuable use of time to take 1% chances at convincing other people to vote… which you can hopefully do by sharing this post with them.


Now, I’m sure all these values are quite wrong in the sense that taking account everything we know about the current election would give very different answers. If anyone has a more nuanced model of the electoral college than Gelman et al, or a way of helping me better estimate how much the outcome matters to me, please post it! My \$700 billion outcome value still feels a bit out-of-a-hat-ish.

But the intuition to take away here is that a country is a very large operation, much larger than the number of people in it, and that’s what makes voting worth it… if you care about other people. If you don’t care about others, voting is probably not worth it to you. That expected \$100 – \$1,500,000 is going to get spread around to 300 million people… you’re not expecting much of it yourself! That’s a nice conclusion, isn’t it? Nice people should vote, and selfish people shouldn’t?

Of course, politics is the mind killer, and there are debates to be had about whether voting in the current system is immoral because the right thing to do is abstain in silent protest that we aren’t using approval voting, which has better properties than the current system… but I don’t think that’s how to get a new voting system. I think while we’re making whatever efforts we can to build a better global community, it’s no sacrifice to vote in the current system if it’s really worth that much in expected donations.

So if you weren’t going to vote already, give some thought to this expected donation angle, and maybe you’ll start. Maybe you’ll start telling your swing state friends to vote, too. And if you do vote to experience a sense of pride in doing your civic duty, I say go ahead and keep feeling it!

An image summary

Thanks to Gavan Wilhite and his team for putting together an infographic to summarize these ideas:

Related reading

I’ve found a few papers by authors with similar thoughts to these:

Also, I found this this interesting Overcoming Bias post, by Andrew Gelman as well.


1 A nitpick, for people like me who are very particular about what they mean by utility: in this post, I’m calculating expected altruistic dollars, not expected utility. However, while our personal utility functions are (or would be, if we managed to have them!) certainly non-linear in the amount of money we spend on ourselves, there is a compelling argument for having the altruistic part of your utility function be approximately linear in altruistic dollars: there are just so many dollars in the world, and it’s reasonable to assume utility is approximately differentiable in commodities. So on the scale of the world, your effect on how altruistic dollars are spent is small enough that you should value them approximately linearly.

2 (Added in response to an email from Ben Hoffman on Oct 28, 2016) This “\$100 billion valuation” step is really two steps combined: a comparison of government cost-effectiveness to the cost-effectiveness of a typical charitable donation, and an estimate of the difference between the two different governments you’d get under different presidents. You could also take into account world-wide externalities here, like the impact of wars. All things considered, I’d be pretty surprised if the difference in candidates over their 4-year term was worth less than an order of magnitude bellow \$100 billion typical-charity dollars.

Protected: Move-in application for room in 4-bedroom house at south edge of Berkeley

This content is password protected. To view it please enter your password below:

Professional feedback form

Leveraging academia

Since a lot of interest in AI alignment has started to build, I’m getting a lot more emails of the form “Hey, how can I get into this hot new field?”. This is great. In the past I was getting so few messages like this that I could respond to basically all of them with many hours of personal conversation.

But now I can’t respond to everybody anymore, so I have a new plan: leverage academia.

To grossly oversimplify things, here’s the heuristic. If the typical prospective researcher (say, an inbound grad student at a top school) needs 100 hours of guidance/mentorship to become a productive contributor to AI alignment research, maybe only 10 of those hours need to come from someone already in the field, and the remaining 90 hours can come from other researchers in CS/ML/math/econ/neuro/cogsci. So if I have 100 hours of guidance to give this year, I can choose between mentoring 1 person, or 10 people who are getting 90% of their guidance elsewhere. The latter produces more researchers, and potentially researchers of a higher quality because of the diversity of views they’re seeing (provided the student has the filter-out-incorrect-views property, which is of course critical). So that’s what I’m doing, and this blog post is my generic response to questions about how to get into AI alignment research 🙂

I think this policy might also be a good filter for good-team-players, in the following sense: When you’re part of a team, it’s quite helpful if you can leverage resources outside your team to solve the team’s problems without having to draw heavily on the team’s internal resources. Thus, if you want to be part of a new/young field like AI alignment, it’s nice if you can draw on resources outside that field to make it stronger.

So! If I send you a link to this blog post, please don’t read me as saying “I don’t have any advice for you.” Because I do have some advice: aside from going to grad school and deliberately learning from it, and choosing Berkeley for your PhD/postdoc (or transferring there), I’m also advising that you acquire and demonstrate the quality of drawing from non-scarce resources to help produce scarce ones. Use non-scarce resources to decide what to learn (e.g., read this blog post by Jan Leike); use non-scarce resources to learn that stuff (e.g., college courses, online lectures, books), and use non-scarce resources to demonstrate what you’ve learned (standardized tests, competitions, publications), at least up to the point where you get admitted as a grad student to a top school. And if that school is Berkeley, I will help you find an advisor!

Seeking a paid part-time assistant for AI alignment research

Please share this if you think anyone you know might be interested.

Sometimes in my research I have to do some task on a computer that I could easily outsource, e.g., adding bibliographical data to a list of papers (i.e., when they were written, who the authors were, etc.). If you think you might be interested in trying some work like this, in exchange for

  • $20/hour, paid to you from my own pocket,
  • exposure to the research materials I’m working with, and
  • knowing you’re doing something helpful to AI alignment research, then
Continue reading

Interested in AI Alignment? Apply to Berkeley.

Summary: Researching how to control (“align”) highly-advanced future AI systems is now officially cool, and UC Berkeley is the place to do it.

Interested in AI alignment research? Apply to Berkeley for a PhD or postdoc (deadlines are approaching), or transfer into Berkeley from a PhD or postdoc at another top school. If you get into one of the following programs at Berkeley:

  • a PhD program in computer science, mathematics, logic, or statistics, or
  • a postdoc specializing in cognitive science, cybersecurity, economics, evolutionary biology, mechanism design, neuroscience, or moral philosophy,
… then I will personally help you find an advisor who is supportive of you researching AI alignment, and introduce you to other researchers in Berkeley with related interests.

This was not something I could confidently offer you two years ago. Continue reading

“Entitlement to believe” is lacking in Effective Altruism

Sometimes the world needs you to think new thoughts. It’s good to be humble, but having low subjective credence in a conclusion is just one way people implement humility; another way is to feel unentitled to form your own belief in the first place, except by copying an “expert authority”. This is especially bad when there essentially are no experts yet — e.g. regarding the nascent sciences of existential risks — and the world really needs people to just start figuring stuff out. Continue reading

Breaking news: Scientists Have Discovered the Soul

2016 is a great year for physics. Not only have we discovered gravitational waves, but just this week, physicists have announced the existence of a long sought after object: the human soul. Continue reading

Credence – using subjective probabilities to express belief strengths

There are surprisingly many impediments to becoming comfortable making personal use of subjective probabilities, or “credences”: some conceptual, some intuitive, and some social. However, Phillip Tetlock has found that thinking in probabilities is essential to being a Superforcaster, so it is perhaps a skill and tendency worth cultivating on purpose. Continue reading

A story about Bayes, Part 2: Disagreeing with the establishment

10 years after my binary search through dietary supplements, which found that a particular blend of B and C vitamins was particularly energizing for me, a CBC news article reported that the blend I’d used — called “Emergen-C” — did not actually contain all of the vitamin ingredients on its label. Continue reading

A story about Bayes, Part 1: Binary search

When I was 19 and just beginning my PhD, I found myself with a lot of free time and flexibility in my schedule. Naturally, I decided to figure out which dietary supplements I should take. Continue reading

Help me write LaTeX on a large e-ink display ($200 reward)

Edit: my employer was eventuslly able to order me an e-ink monitor, so the reward is off 🙂

I would like to write LaTeX on a wireless-enabled e-ink display with a 13″ or larger screen to avoid visual fatigue. If you solve this problem for me, I will pay you a $200 reward, be extremely grateful, and write a blog post explaining your solution so that others might benefit 🙂 Some examples that I would consider solutions: Continue reading

Seeking a paid personal assistant to create more x-risk research hours

My main bottleneck as a researcher right now is that I have various bureaucracies I need to follow up with on a regular basis, which reduce the number of long interrupted periods I can spend on research. I could really use some help with this. Continue reading

Use a giant notebook to think better

Having a space to write things down frees up your mind — specifically, your executive system — from the task of holding things in working memory, so you can focus your attention on generating new thoughts instead of looping on your most recent ones to keep them alive. Writing down what’s in your head — math, plans, feelings, whatever — can start paying cognitive dividends in about 5 seconds, and can make the difference between a productive thinking day and a lame one. Continue reading

A Mindfulness-Based Stress Reduction course in the East Bay starting January 19

Summary: I think the standardized 8-week MBSR course format is better designed than most introductory meditation practices, and have found David Weinberg in particular to be an excellent mindfulness instructor. Since something like 30 to 100 people have asked me to recommend a way to learn/practice mindfulness, I’m batch-answering with this post. Continue reading

Why CFAR spreads altruism organically, and why Labs & Core make a great team

Following on “Why scaling slowly has been awesome for CFAR Core”, here are two other questions I’ve gotten repeatedly about CFAR:

Q2: Why isn’t altruism training an explicit part of CFAR’s core workshop curriculum?
Continue reading

Red-penning: rolling out an experimental rationality / creativity technique

Note: I’m writing about this technique to (1) reduce the overhead cost of testing it, and (2) illustrate what I consider good practices for “rolling out” a new technique to be added to a rationality curriculum. Despite seeming super-useful in my first-person perspective, experience says the technique itself probably needs to undergo several tests and revisions before it will actually work as intended, even for most readers of my blog I suspect. Continue reading

Why scaling slowly has been awesome for CFAR Core

Summary: Since I offered to answer questions about my pledge to donate 10% of my annual salary to CFAR as an existential risk reduction, the question “Why doesn’t CFAR do something that will scale faster than workshops?” keeps coming up, so I’m answering it here. Continue reading

Break your habits: be more empirical

Summary: The common attitude that “You think too much” might be better parsed as “You don’t experiment enough.” Once you’ve got an established procedure for living optimally in «setting», be a good scientist and keep trying to falsify your theory when it’s not too costly to do so.

Continue reading

Beat the bystander effect with minimal social pressure

Summary: Develop an allergy to saying “Will anyone do X?”. Instead query for more specific error signals: Continue reading

AI strategy and policy research positions at FHI (deadline Jan 6)

Oxford’s Future of Humanity Institute has some new positions opening up at their Strategic Artificial Intelligence Research Centre. I know these guys — they’re super awesome — and if you have the following three properties, then humanity needs you to step up and solve the future: Continue reading

The 2015 x-risk ecosystem

Summary: Because of its plans to increase collaboration and run training/recruiting programs for other groups, CFAR currently looks to me like the most valuable pathway per-dollar-donated for reducing x-risk, followed closely by MIRI, and GPP+80k. As well, MIRI looks like the most valuable place for new researchers (funding permitting; see this post), followed very closely by FHI, and CSER. Continue reading

Why I want humanity to survive — a holiday reflection

Life on Earth is almost 4 billion years old. During that time, many trillions of complex life forms have starved to death, been slowly eaten alive by predators or diseases, or simply withered away. But there has also been much joy, play, love, flourishing, and even creativity.

Continue reading

MIRI needs funding to scale with other AI safety programs

Summary: MIRI’s end-of-year fundraiser is on, and I’ve never been more convinced of what MIRI can offer the world. Continue reading

The Problem of IndignationBot, Part 4

Summary: I proved a parametric, bounded version of Löb’s Theorem that shows bounded self-reflective agents exhibit weird Löbian behavior, too. Continue reading

The Problem of IndignationBot, Part 3

Summary: Is strange “Löbian” self-reflective behavior a just theoretical symptom of assuming unbounded computational resources?

Continue reading

(Ignore this post)

Apologies to any subscribers; I needed to publish this in order to test sidebar-hiding with several different devices and login credentials 🙂   Continue reading

Embracing boredom as exploratory overhead cost

(Follow-up to Fun does not preclude burnout)

Sometimes I decide to spend a few weeks or months putting some of my social needs on hold in favor of something specific, like a deadline. But after that’s done, and I “have free time” again, I often find myself leaning toward work as a default pass-time. When I ask my intuition “What’s a fun thing to do this weekend?”, I get a resounding “Work!” Continue reading

Fun does not preclude burnout

As far as I can tell, I’ve never experienced burnout, but I think that’s only because I notice when I’m getting close. And in recent years, I’ve had a number of friends, especially those interested in Effective Altruism, make the mistake of burning out while having fun. So, I wanted to make a public service announcement: The fact that your work is fun does not mean that you can’t burn out. Continue reading

Use separate email threads for separate action-requests

When I realized this principle, I experienced around a 2x or 3x increase in my rate of causing-people-to-do-things-over-email, out of the “usually doesn’t work” range into the “usually works” range. I find myself repeating this advice a lot in an attempt to boost the effectiveness of friends interested in effective altruism and related work, so I’m making a blog post to make it easier. Continue reading

The Problem of IndignationBot, Part 2

Summary: Agents that can reason about their own source codes are weirder than you think.

Continue reading

What’s your vision of a beautiful life?

After releasing my Robust Rental Harmony algorithm, I felt a certain sense of satisfaction, like my friends and I had built something wholesome and beautiful.  Reflecting on this,  it occurred to me that I might want my life to feel like an artistic creation… like a beautiful substructure of mathematics that reflectively self-appreciates wherever it arises. This felt different from my desire to help the world at large, and also from my desire for moment-to-moment enjoyment. Continue reading

Deliberate Grad School

Among my friends interested in rationality, effective altruism, and existential risk reduction, I often hear: “If you want to have a real positive impact on the world, grad school is a waste of time. It’s better to use deliberate practice to learn whatever you need instead of working within the confines of an institution.” Continue reading

The Problem of IndignationBot, Part 1

I like to state the Prisoner’s Dilemma by saying that each player can destroy \$2 of the other player’s utility in exchange for \$1 for himself. Writing “C” and “D” for “cooperate” and “defect”, we have the following: Continue reading

Willpower Depletion vs Willpower Distraction

I once asked a room full of about 100 neuroscientists whether willpower depletion was a thing, and there was widespread disagreement with the idea. (A propos, this is a great way to quickly gauge consensus in a field.) Basically, for a while some researchers believed that willpower depletion “is” glucose depletion in the prefrontal cortex, but some more recent experiments have failed to replicate this, e.g. by finding that the mere taste of sugar is enough to “replenish” willpower faster than the time it takes blood to move from the mouth to the brain: Continue reading