Evaluating Stability of Unreflective Alignment

This post has an accompanying SPAR project! Apply here if you're interested in working on this with me. Huge thanks to Mikita Balesni for helping me implement the MVP. Regular-sized thanks to Aryan Bhatt, Rudolph Laine, Clem von Stengel, Aaron Scher, Jeremy Gillen, Peter Barnett, Stephen Casper, and David Manheim for helpful comments. 0. Key Claims Most alignment work today doesn’t aim for alignment that is stable under value-reflection1. I…

In Search of Strategic Clarity

Context: quickly written up, less original than I expected it to be, but hey that's a good sign. It all adds up to normality. The concept of "strategic clarity" has recently become increasingly important to how I think. It doesn't really have a precise definition that I've seen - as far as I can tell it's mostly just used to point to something roughly like "knowing what the fuck is…

Optimization and Adequacy in Five Bullets

Context: Quite recently, a lot of ideas have sort of snapped together into a coherent mindset for me. Ideas I was familiar with, but whose importance I didn't intuitively understand. I'm going to try and document that mindset real quick, in a way I hope will be useful to others. Five Bullet Points By default, shit doesn't work. The number of ways that shit can fail to work absolutely stomps…

DIY Asymmetric Weapons With Symmetric Weapons And Bayescraft

Epistemic status: Follow-up to this post. Fairly well considered, few hours total epistemic effort. Substantially more confident than before that this is correct, but still feel very ick about it. An asymmetric weapon is any strategy that has a higher probability of winning p(Win) if it is aligned with one side its axis of asymmetry than with the other. In other words, p(Win | X) > p(Win | ~X). This…

Math, Science, and Logic Are Predictive Models

Epistemic status: content summarized and synthesized (0-1 steps of reasoning) from the Sequences by Eliezer Yudkowsky. Unreasonable Effectiveness I’ve heard several people – professors, peers, and folks on the internet – express surprise at the "unreasonable effectiveness” of mathematics in describing the physical universe. In other words, they claim to be surprised that mathematical laws are able to describe the workings of the universe so precisely. From a certain point…

Discuss the Substance, Not the Symbol

Epistemic status: content summarized and synthesized (0-1 steps of reasoning) from the Sequences by Eliezer Yudkowsky, specifically A Human's Guide to Words. Guiding Puzzle: Is X a Y? Questions of the form "is X a Y" are all over the place, and a huge amount of cognitive power goes into trying to answer them. Some of them are fun or trivial, like "is water wet" or "is cereal a soup".…