March 2026 - James Lucassen's Blog

Hidden Role Games as a Trusted Model Eval

March 16, 2026Projectsartificial-intelligence, cross-postedNo Comments

TLDR: to be dangerous, early schemers will have to do a bunch of adversarial reasoning under uncertainty. The current models seem extremely bad at this kind of reasoning relative to R&D automation capabilities like coding. I'm quite happy with that and I'd like to keep an eye on it as we get closer to automated AI R&D. Flexible Adversarial Strategy Under Uncertainty Trusted models are a pretty important concept in…