Vili Kohonen comments on Prompt optimization can enable AI control research

Vili Kohonen 23 Sep 2025 13:26 UTC
4 points
0
Have to highlight here that this wasn’t even the initial project of Mia and Zach; they pivoted halfway through the week from trying to craft better red team policies to this. Automated prompt engineering and fine-tuning from that seems to fit the blue team’s bill better. The monitor as a classifier is arguably more straightforward than creating effective attack policies for the red team, although it’s important to pursue strategies to find upper bounds for both teams.