We Should Be Testing Frameworks

We Should Be Testing Frameworks

Epistemic status: based mainly on Tetlock’s research and thinking of knowledge as predictive power. Really only one major inferential step – “if it works it works, and now we can actually tell if it does!”

Overview

I hear the word “framework” tossed around a lot in academia and adjacent circles. Until very recently, I thought the word was at best just a bit of decorative academ-ese, an excuse to use more prestigious-sounding jargon. At worst, I thought it was used to create an illusion of rigor. I imagine some readers can relate.

This changed after I read Superforecasting. I now think that frameworks can actually be useful, and their usefulness can be empirically verified. Importantly, I think there’s a lot more usefulness to be gained if we change how we think about frameworks. Superforecasting itself doesn’t directly say much about this, so this post will walk through how I got that result.

Why I Thought Frameworks Couldn’t Be Useful

Frameworks don’t really have a well-defined structure. At the less structured end, they can be no more than a set of words and concepts used to describe a topic. Most frameworks are at the more structured end, with diagrams and flowcharts explaining which things are supposed to interact with which. A simple example of this kind of framework is Marx’s famous idea of the base and superstructure. Let’s compare it to an extremely famous physical model, the description of electricity and magnetism specified by Maxwell’s equations.

Marx vs Maxwell: you vs the guy she tells you not to worry about

At first glance, these two look very similar. But there’s a crucial difference in the level of detail that makes base-superstructure a framework, and Maxwell’s equations a predictive model. When asked how the base affects the superstructure, Marx’s framework proposes that the base shapes (and maintains) the superstructure. I suppose that’s a prediction in a sense, but it’s an extremely vague one. I can’t think of any empirical result that would make a Marxist say “Wait! The superstructure wasn’t shaped (or maintained) by the base! The theory is wrong!” In comparison, Maxwell’s equations tell you exactly how a magnetic field will affect an electric field, with mathematical exactness. If electromagnetism ever violated this rule, we would immediately know that something was amiss.

This was the root of why I thought models couldn’t be useful – I thought they were too vague to make falsifiable predictions, which means they can’t participate in the scientific method. In less traditional, more rationalist language, frameworks don’t constrain expectation very much, which means they can’t be adjusted to match the territory the same way true predictive models can. And if they can’t be tuned over time to better match the territory, then they’re really no better than just trusting the judgement of the person who invented the framework when it comes to providing predictive power.

Why I Now Think Frameworks Can Be Useful

So how did reading Superforecasting lead me to conclude that frameworks can actually be useful? Well, it showed me that there are ways to measure predictive power, and that these measurements can be conducted with a scientific level of rigor. How is this relevant? Let’s trace the reasoning a bit more…

People don’t just propose conceptual frameworks for no reason. Whenever a framework is introduced, it’s usually with the explicit or implicit claim that this framework is useful for understanding the world. Even if a framework isn’t as specific as a model and can’t tell you exactly what to expect, thinking in terms of that framework is supposed to be at least somewhat helpful in homing in on the way reality will behave. In other words, it’s supposed to improve your predictive power.

Now, if we can’t assess whether or not a framework actually fulfills this promise, then just anyone can claim their framework it useful. Astrologers could claim their conceptual framework is useful for thinking about the world, and there’s nothing we could really say in reply. In this case, frameworks really are little better than guessing at who’s hawking the flowchart with the most predictive power.

But with methods like Tetlock’s, we can finally check if frameworks do what they say they do! The whole point of requiring models to make falsifiable predictions in the first place was to check their predictive power. Predictive power is what epistemic rationality is all about in the end, and now we can just measure it directly! We can just do an experiment where we teach one group about Marx’s base-superstructure theory, teach another group about some rival theory of society or just placebo babble, and let the prediction scores tell us which is more useful for understanding the world we live in. Heck, if we’re sufficiently confident in our measurements of predictive power that we’re not likely to Goodhart’s Curse ourselves, we could even design forecaster training by iterated A/B testing. Once we have a metric, all that’s left to do is descend the gradient.

Caveats

I’m pretty excited about this possibility. It seems like a promising way to bring more rigor to the social sciences, which is a serious holy grail of epistemic rationality if there ever was one. However, I can imagine a few ways it might not be as simple as I’m hoping.

  • No good placebos for control. I can’t think of a good way to teach someone a “placebo” analytical framework off the top of my head. That doesn’t mean such a thing doesn’t exist. This problem could be avoided by only doing head-to-head comparisons of rival theories, and settling for a pecking order of frameworks.
  • High variance, small effect size. It seems plausible that the changes in prediction scores that individuals would see as a result of exposure to a new analytic framework could vary pretty widely. It also seems plausible that the benefit or detriment of most frameworks would be pretty small. If both of these things are the case, then very high-powered experiments would be required to obtain useful results. This may not be worth the resources. Then again, the observation that effect sizes are very small would be a useful result in itself.
  • Generalization from experiments. The effect we’d be able to observe directly with the proposed experiment structure is “boost in predictive power of population X from exposure to framework Y, compared to control”. It seems quite possible that different populations would be behave differently. In particular, the effects might vary based on the level of credence assigned to framework X before the experiment. Teaching a bunch of Marxists about base-superstructure is likely to have very different effects than doing the same with non-Marxists.