Category Archives: Life

Deserving Trust, II: It’s not about reputation

Summary: a less mathematical account of what I mean by “deserving trust”.

When I was a child, my father made me promises. Of the promises he made, he managed to keep 100% of them. Not 90%, but 100%. He would say things like “Andrew, I’ll take you to play in the sand pit tomorrow, even if you forget to bug me about it”, and then he would. This often saved him from being continually pestered by me to keep his word, because I knew I could trust him.

Around 1999 (tagged in my memory as “age 13”), I came to be aware of this property of my father in a very salient way, and decided I wanted to be like that, too. When I’d tell someone they could count on me, if I said “I promise”, then I wanted to know for myself that they could really count on me. I wanted to know I deserved their trust before I asked for it. At the time, I couldn’t recall breaking any explicit promises, and I decided to start keeping a careful track from then on to make sure I didn’t break any promises thereafter.

About a year later, around 2000, I got really wrapped up in thinking about what I wanted from life, in full generality. I’d seen friends undergo drastic changes in their world views, like deconverting from Christianity, and becoming deeply confused about what they wanted when they realized that everything they previously wanted was expressed in terms of a “God” concept that no longer had a referent for them. I wanted to be able to express what I wanted in more stable, invariant terms, so I chose my own sensory experiences as the base language — something I believed more than anything else to be an invariant feature of my existence — and then began trying to express all my values to myself in terms of those features. Values like “I enjoy the taste of strawberry” or “it feels good to think about math” or “the split second when I’m airborne at the apex of a parabolic arc feels awesome.” Life became about maximizing the integral of the intrinsic rewardingness of my experiences over the rest of time, which I called my “intensity integral”, because it roughly corresponded to having intense/enriching intellectual and emotional experiences. (Nowadays, I’d say my life back then felt like an MDP, and I’d call the intensity function my “reward function”. I’ll keep these anachronistic comments in parentheses, though, to ensure a more accurate representation of the language and ideas I was using at the time.)

So by 2001, I had decided to make the experience-optimization thing a basic policy; that as a matter of principle, I should be maximizing the integral of some function of my sensory experiences over time, a function that did not depend too much on references to concepts in the external world like “God” or “other people”. I knew I had to get along with others, of course, but I figured that was easier to explain as something instrumental to future opportunities and experiences, and not something I valued intrinsically. It seemed conceptually simpler to try to explain interactions with others as part of a strategy than as part of my intrinsic goals.

But by 2005, I had fallen deeply in love with someone who I felt understood me pretty well, and I began to feel differently about this self-centered experience-optimization thing. It started to seem like I cared also about *her* experiences, even if I didn’t observe them myself, even indirectly. I ran through many dozens of thought experiments over a period of months, checking that I couldn’t find some conceptually simpler explanation, until I eventually felt I had to accept this about myself. Even if it had no strategic means of ever paying off in enjoyable experiences for *me*, I still wanted *her* to have enjoyable experiences, full stop.

Around the same time, something even more striking to me happened: I realized I also cared about other things she cared about, aside from her experiences. In other words, the scope of what I cared about started expanding outward to things that were not observable by either of us, even indirectly. I wanted the little stack of rocks that I built for her while walking alone on the beach one day to stay standing, even though I never expected her to see it, because I knew she would like it if she could. (In contemporary terms, my life started to feel more like a POMDP, so much so that, by the time I first encountered the definition of an MDP around 2011, it felt like a deeply misguided concept that reminded me of my teenage self-model, and I didn’t spend much time studying it.)

At this point, depending whether you want to consider “me” to be my whole brain or just the conscious, self-reporting part, I either realized I’d been over-fitting my self-model, or underwent “value drift”. When I introspected on how I felt about this other person, and what was driving this change in what I cared about (I did that a lot), it felt like I wanted to deserve her trust, the same way I wanted to keep promises, the way my father did. Even when she wasn’t looking and would never know about it, I wanted to do things that she would want me to do, to produce effects that, even if neither of us would ever observe them, would be the effects she wanted to happen in the world. This was pretty incongruous with my model of myself at the time, so, as I pretty much always do when something important seems incongruous, I entered a period of deep reflection.

Before long, I noticed a similarity between my situation and Newcomb’s problem, and I recalled other people describing similar experiences. I wanted to deserve her trust in the same way I wanted to be a one-boxer on Newcomb: even when the predictor-genie in the experiment isn’t looking anymore, you have to one-box today so that the genie will have trusted you yesterday and placed $1,000,000 in the first box to begin with. (For technical details, see this 2010 post I made to LessWrong, and another I made earlier this year on the same topic.)

Basically, I noticed that the assumptions of classical game theory that give rise to homo-economicus behavior were just false when a person can really get to know you and understand you, because that enables them to imagine scenarios about you before you actually get into those scenarios. In other words, your reputation literally precedes you. Your personality governs not only what you do in reality, but also — to a weaker, noisier, but non-zero extent — what you do in the imaginations and instinctive impulses of other people who knew you yesterday.

So, when I find myself entrusted by someone who isn’t looking anymore, or who can’t otherwise punish me for defecting, I still ask myself, “If this were happening in their imagination, before they decided to trust me, would they have given me their trust in the first place?”

This is what I call deserving trust. This framework fits both with my felt-sense of wanting to deserve trust, and with my normative understanding of decision theory. It’s not about having a reputation of being trustworthy. It’s about doing today what people and institutions who might now be unable to observe or punish you would want you to do when they made the decision to trust you.

As I’ve been finding myself expanding my ring of coworkers and collaborators wider and wider, I’ve been wanting to make these distinctions more and more explicit and understandable. The closer I come to working with a new colleague, the more I want them to help me help us be trustworthy. I feel a strong desire for them to know I’m not just trying to preserve our reputation, but that I actually want us, as a group, to be trustworthy, which usually involves making some extra effort. I feel this desire because, unless I make this distinction explicit, people often often respond with something like “Yeah, we don’t want to be perceived as [blah blah blah]”, which leaves me feeling disappointed and like I haven’t really communicated how I want us to work with each other and the outside world. Optimizing perceptions today just isn’t good enough for deserving trust yesterday.

As a researcher, I’ve seen academics do some fairly back-stabby things to each other by now, and while blatant examples of it are rare, there are still commonplace things like writing disingenuous grant applications that I find fairly unacceptable for me personally, and I want my closest colleagues to really understand that I don’t want us to operate that way.

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.


I get a lot of email, and unfortunately, template email responses are not yet integrated into the mobile version of Google inbox. So, until then, please forgive me if I send you this page as a response! Hopefully it is better than no response at all.

Thanks for being understanding.

Continue reading

Deserving Trust / Grokking Newcomb’s Problem

Summary: This is a tutorial on how to properly acknowledge that your decision heuristics are not local to your own brain, and that as a result, it is sometimes normatively rational for you to act in ways that are deserving of trust, for no other reason other than to have deserved that trust in the past.

Related posts: I wrote about this 6 years ago on LessWrong (“Newcomb’s problem happened to me”), and last year Paul Christiano also gave numerous consequentialist considerations in favor of integrity (“Integrity for consequentialists”) that included this one. But since I think now is an especially important time for members of society to continue honoring agreements and mutual trust, I’m giving this another go. I was somewhat obsessed with Newcomb’s problem in high school, and have been milking insights from it ever since. I really think folks would do well to actually grok it fully.

You know that icky feeling you get when you realize you almost just fell prey to the sunk cost fallacy, and are now embarrassed at yourself for trying to fix the past by sabotaging the present? Let’s call this instinct “don’t sabotage the present for the past”. It’s generally very useful.

However, sometimes the usually-helpful “don’t sabotage the present for the past” instinct can also lead people to betray one another when there will be no reputational costs for doing so. I claim that not only is this immoral, but even more fundamentally, it is sometimes a logical fallacy. Specifically, whenever someone reasons about you and decides to trust you, you wind up in a fuzzy version of Newcomb’s problem where it may be rational for you to behave somewhat as though your present actions are feeding into their past reasoning process. This seems like a weird claim to make, but that’s exactly why I’m writing this post.

Continue reading

Protected: Move-in application for room in 4-bedroom house at south edge of Berkeley

This content is password protected. To view it please enter your password below:

A story about Bayes, Part 2: Disagreeing with the establishment

10 years after my binary search through dietary supplements, which found that a particular blend of B and C vitamins was particularly energizing for me, a CBC news article reported that the blend I’d used — called “Emergen-C” — did not actually contain all of the vitamin ingredients on its label. Continue reading

A story about Bayes, Part 1: Binary search

When I was 19 and just beginning my PhD, I found myself with a lot of free time and flexibility in my schedule. Naturally, I decided to figure out which dietary supplements I should take. Continue reading

Help me write LaTeX on a large e-ink display ($200 reward)

Edit: my employer was eventuslly able to order me an e-ink monitor, so the reward is off 🙂

I would like to write LaTeX on a wireless-enabled e-ink display with a 13″ or larger screen to avoid visual fatigue. If you solve this problem for me, I will pay you a $200 reward, be extremely grateful, and write a blog post explaining your solution so that others might benefit 🙂 Some examples that I would consider solutions: Continue reading

A Mindfulness-Based Stress Reduction course in the East Bay starting January 19

Summary: I think the standardized 8-week MBSR course format is better designed than most introductory meditation practices, and have found David Weinberg in particular to be an excellent mindfulness instructor. Since something like 30 to 100 people have asked me to recommend a way to learn/practice mindfulness, I’m batch-answering with this post. Continue reading

Red-penning: rolling out an experimental rationality / creativity technique

Note: I’m writing about this technique to (1) reduce the overhead cost of testing it, and (2) illustrate what I consider good practices for “rolling out” a new technique to be added to a rationality curriculum. Despite seeming super-useful in my first-person perspective, experience says the technique itself probably needs to undergo several tests and revisions before it will actually work as intended, even for most readers of my blog I suspect. Continue reading

Break your habits: be more empirical

Summary: The common attitude that “You think too much” might be better parsed as “You don’t experiment enough.” Once you’ve got an established procedure for living optimally in «setting», be a good scientist and keep trying to falsify your theory when it’s not too costly to do so.

Continue reading

Why I want humanity to survive — a holiday reflection

Life on Earth is almost 4 billion years old. During that time, many trillions of complex life forms have starved to death, been slowly eaten alive by predators or diseases, or simply withered away. But there has also been much joy, play, love, flourishing, and even creativity.

Continue reading

Embracing boredom as exploratory overhead cost

(Follow-up to Fun does not preclude burnout)

Sometimes I decide to spend a few weeks or months putting some of my social needs on hold in favor of something specific, like a deadline. But after that’s done, and I “have free time” again, I often find myself leaning toward work as a default pass-time. When I ask my intuition “What’s a fun thing to do this weekend?”, I get a resounding “Work!” Continue reading

Fun does not preclude burnout

As far as I can tell, I’ve never experienced burnout, but I think that’s only because I notice when I’m getting close. And in recent years, I’ve had a number of friends, especially those interested in Effective Altruism, make the mistake of burning out while having fun. So, I wanted to make a public service announcement: The fact that your work is fun does not mean that you can’t burn out. Continue reading

What’s your vision of a beautiful life?

After releasing my Robust Rental Harmony algorithm, I felt a certain sense of satisfaction, like my friends and I had built something wholesome and beautiful.  Reflecting on this,  it occurred to me that I might want my life to feel like an artistic creation… like a beautiful substructure of mathematics that reflectively self-appreciates wherever it arises. This felt different from my desire to help the world at large, and also from my desire for moment-to-moment enjoyment. Continue reading