RESEARCH

An RL environment for political campaign strategy

Built at the General Reasoning hackathon. Four campaign strategies, twelve weeks, one race: the best approach beats the worst by five points

PROBLEM

Most evaluations of AI agents test whether a model can use a tool: search the web, fill a form, send an email. They don't test whether a model can reason about cause and effect in a domain where the right move isn't obvious. Real-world political campaign strategy is one of those domains, and one of the few with strong research evidence on what actually works.

APPROACH

As part of General Reasoning Hackathon, I built a 12-week simulation of a US House race where an agent runs the ground game. It decides who to hire, which voters to target, whether to persuade or mobilise, and when. The simulator is calibrated to causal estimates from political science field experiments, sitting on top of a dataset of 97 competitive races from 2002 to 2024. The agent gets five tools and no strategy hint. It has to work out the three-phase structure (hire early, persuade the middle, mobilise the base at the end) on its own.

The six voter segments and how each one responds to being persuaded or reminded to vote. Swing voters and low-propensity supporters are the only two that move the needle, and they move it in opposite directions

OUTCOME

I ran fixed strategies through the simulator first to confirm it actually rewards good decisions and penalises bad ones. The best strategy beats the worst by around 5 percentage points, which is the headroom a model has to navigate. The next step is running frontier models against the held-out test set to see how each one performs. The thing I took away: building a clean eval environment is mostly a calibration problem, not an engineering one.

View on GitHub

Final vote share for each strategy against the 49.6% starting point. Getting the targeting wrong costs more than doing nothing at all.

← All projects