What I Got From EAGx Boston 2022 - James Lucassen's Blog

Epistemic Status: partially just processing info, partially publishing for feedback, partially encouraging others to go to EAG conferences by demonstrating how much value I got out of my first one.

The following is a summary of what I took away from EAGx Boston overall – it’s synthesized from a bunch of bits and pieces collected in 30-minute conversations with 19 really incredible people, plus some readings that they directed me to, plus follow-up conversations via Zoom after the official conference. Credit is theirs, mistakes are mine.

This post is not really intended to make a single coherent point – instead, a big chunk of the goal is to showcase a large volume of stuff I learned, to demonstrate how great EAGx Boston was. The recommended way to read this is to skim and focus on what’s interesting to you.

I went into the conference having laid out a few goals:

Get a better understanding of AGI policy and strategy, and their ramifications on technical work (considerations like the alignment tax).
Learn about other folks’ inside views on alignment, and maybe pick up some meta-principles for how to develop such an inside view and do actually impactful alignment work guided by principled reasoning.
Swap takes on community building, to improve things at EA Claremont and maybe beyond.
Learn more about S-risks, and whether they’re still important when not given normative priority.
Generally farm for crucial considerations by being exposed to new ideas and things that could potentially shift or boost my estimated max-impact path.

AGI Policy/Strategy

AGI policy and strategy (hereafter just PS) is extremely important and very difficult+neglected – enough that it seems like it could make up a non-negligible chunk of failure scenarios. Lots of hedging language on that, which is mostly because I don’t know how to estimate how difficult things are in a principled way, but it certainly seems “very hard”.
I sort of knew this already, but I hadn’t really internalized it until I talked to a bunch of PS people and started actually thinking about the details. Folks on the technical side tend to treat PS as a black box and just hope it’ll work out – PS folks do the same with the technical solution. Not doing this seems important for overall strategy and direction.
In particular, making rollout of AGI go well seems important enough that it’s certainly worth considering policy/strategy work if you have a technical background, provided you’re not expecting to be super valuable working on alignment.
A big part of what makes PS so difficult is unpredictability. Something like a Russia-Ukraine war or a Trump presidency could happen at any time, and disrupt otherwise well-laid plans.
Depending on the extent of this unpredictability, PS impact stories can look very different. Under high unpredictability, we mostly just get aligned people into positions of power, prepare a ton of contingency plans, and hope we can wing it effectively when crunch time arrives in some unexpected way. Under lower predictability we’re able to set up strategies in advance, anticipate how crunch time will work out, and front-load more of the work before crunch time arrives.

AGI Technical Research Directions

Folks who know more about transformers than I do are more optimistic about Chris Olah’s science-research-style interpretability paradigm scaling to much bigger or more complicated models. The apparent jump in complexity from CNN circuits to transformer circuits seems to be mostly because CNN’s are especially easy, since humans are so visual. Apparently the underlying math is mostly the same, which is promising.
Double descent may not be about gradient descent priors? More on this later.
Ontology translations may be necessarily lossy? More on this later.
Model splintering be incoherent at lower levels of abstraction? More on this later.

Community Building

One perspective on community building says that if you expose a lot of people to EA content, a somewhat rare minority of them will be very agentic, take to the ideas very quickly, and read up more on their own time. It says that these are the people who will end up becoming the most valuable members of the community, and so we should focus on finding them, with a low-false-negative test like exposure to an EAG or the Bay Area community. Once they’ve been identified, they’ll take the initiative and find their way to high impact work without much hand-holding at all. Call this the Filtering view.
Another perspective says that the properties the Filtering view looks for are more learnable, and so there may be people with great potential that would be missed by a strict filter. This view encourages something like the Fellowship model, which provides lots of commitment mechanisms and social proof. That way, participants that accept the core principles have plenty of time to internalize them and start taking action. Over time, these people can also grow into highly impactful members of the community. Call this the Development view.
The third main perspective I encountered says that the growth-potential that the Development view banks on is variable between people, but also roughly constant and observable within one person. This means that you can test for potential relatively quickly, and then devote more resources to those who grow the fastest, producing a sort of narrowing and accelerating pipeline. Call this the Multi-Armed view.
In a later post I will attempt to encompass these three views in one Grand Unified Theory of Community Building. More on this later.

S-Risks

S-risk concerns seem to mostly come from AI, but not necessarily. From misaligned AGI we get the simple misuse scenario and the near-miss scenario. Even with aligned AGI, we can still get the multipolar-threat scenario, and the acausal-threat scenario. And without AGI at all we might still get massive simulated suffering, or wild animal suffering scaling to space.
On a recommendation I re-read some arguments for suffering-focused ethics, and I think I tentatively agree that the worst reaches of suffering in existence right now are stronger than the highest reaches of happiness (although this depends on some hard-to-evaluate stuff, like exactly how great DMT is). But I don’t think this is sufficient for normative priority, especially not infinite priority of the negative-utilitarian sort.
It’s unclear how much S-risks matter in the totalist picture, when you don’t give them normative priority like with the suffering-focused-ethics view. Making this kind of empirical case seems like it depends on a lot of sticky stuff like the cap on suffering vs welfare, the details of decision theory (including acausal stuff), and the probabilities of specific kinds of misalignment scenarios, to see how much it all factors into the expected value.
On a recommendation I re-read this, and it totally splintered my previous view of morality. Looks like idealizing subjectivism is my new meta-ethics. This changes things!
I am still confused about consciousness, but in a different way – having read this, the question is no longer “what is consciousness” as if it has some sort of mysterious fundamental essence, but rather “which things are conscious”, where “consciousness” is simply defined as “the property that enables happiness/suffering, which my idealized values (seem likely to) care about”. Nice!

AGI Policy/Strategy

AGI Technical Research Directions

Community Building

S-Risks

Meta