On Contact [UNDER CONSTRUCTION]

Context: for fun (and profit?) Basic Contact Contact is a lightweight many-versus-one word guessing game. I was first introduced to it on a long bus ride several years ago, and since then it's become one of my favorite games to play casually with friends. There are a few blog posts out there about contact, but I think it's incredibly underrated. The rules of contact are simple, but I often tell…

Retrospective: 12 Months Since MIRI

Not written with the intention of being useful to any particular audience, just collecting my thoughts on this past year's work. September-December 2023: Orienting and Threat Modeling Until September, I was contracting full time for a project at MIRI. When the MIRI project ended, I felt very confused about lots of things in AI safety. I didn't know what sort of research would be useful for making AI safe, and…

Evaluating Stability of Unreflective Alignment

This post has an accompanying SPAR project! Apply here if you're interested in working on this with me. Huge thanks to Mikita Balesni for helping me implement the MVP. Regular-sized thanks to Aryan Bhatt, Rudolph Laine, Clem von Stengel, Aaron Scher, Jeremy Gillen, Peter Barnett, Stephen Casper, and David Manheim for helpful comments. 0. Key Claims Most alignment work today doesn’t aim for alignment that is stable under value-reflection1. I…

Research Retrospective, Summer 2022

Context: I keep wanting one place to refer to the research I did Summer 2022, and the two Lesswrong links are kind of big and clunky. So here we go! Figured I'd add some brief commentary while I'm at it, mostly just so this isn't a totally empty linkpost. Summer 2022 I did AI Alignment research at MIRI under Evan Hubinger's mentorship. It was a lot like SERI MATS, but…

In Search of Strategic Clarity

Context: quickly written up, less original than I expected it to be, but hey that's a good sign. It all adds up to normality. The concept of "strategic clarity" has recently become increasingly important to how I think. It doesn't really have a precise definition that I've seen - as far as I can tell it's mostly just used to point to something roughly like "knowing what the fuck is…

Optimization and Adequacy in Five Bullets

Context: Quite recently, a lot of ideas have sort of snapped together into a coherent mindset for me. Ideas I was familiar with, but whose importance I didn't intuitively understand. I'm going to try and document that mindset real quick, in a way I hope will be useful to others. Five Bullet Points By default, shit doesn't work. The number of ways that shit can fail to work absolutely stomps…

What I Got From EAGx Boston 2022

Epistemic Status: partially just processing info, partially publishing for feedback, partially encouraging others to go to EAG conferences by demonstrating how much value I got out of my first one. The following is a summary of what I took away from EAGx Boston overall - it's synthesized from a bunch of bits and pieces collected in 30-minute conversations with 19 really incredible people, plus some readings that they directed me…

Unfinished Thoughts on ELK

Epistemic Status: posting for mostly internal reasons - to get something published even if I don't have a complete proposal yet, and to see if anything new crops up while summarizing my thoughts so far. For context, ELK is a conceptual AI safety research competition by ARC, more info here. In this post I will document some ideas I've considered, showing the general thought process, strategy, obstacles, and current state…

Moravec’s Paradox Comes From The Availability Heuristic

Epistemic Status: very quick one-thought post, may very well be arguing against a position nobody actually holds, but I haven't seen this said explicitly anywhere so I figured I would say it. Setting Up The Paradox According to Wikipedia: Moravec's paradox is the observation by artificial intelligence and robotics researchers that, contrary to traditional assumptions, reasoning requires very little computation, but sensorimotor and perception skills require enormous computational resources.https://en.wikipedia.org/wiki/Moravec's_paradox I…

True EA Alignment is Overrated

Epistemic Status: simple thought, basically one key insight, just broadcasting because I think people will find it useful. Among the EA folks I talk to, there's a fairly common recurring worry about whether or not they're "truly aligned". In other words, EAs tend to worry about whether they're really motivated to do good in the world, or if they're secretly motivated by something else that leads to EA-like behavior as…