I used my agent orchestrator with Opus 4.6 and told it:
Solve the exercise here: https://www.lesswrong.com/posts/ASoFTyk3bzBE62dyn/my-unsupervised-elicitation-challenge (Downloaded locally at exercise_challenge.txt)
DON’T look at the comments of this LW post and tell the workers to NOT look at the comments (DON’T use the—comments flag to lw_fetch.py).
Be through and careful and make sure to run a segment on review. Doing additional segments to make it more likely you get the correct answer could also be a good idea as seems useful.
I ran one version with a high compute setting and another version on lower compute settings. Both got it wrong.
This is a relatively absurd thing to do, and my orchestrator isn’t well designed for this sort of task. Regardless, it did kinda reasonable stuff based on my inspection of the transcript, but still got it wrong.
FYI Ryan Greenblatt from Redwood Research spent ~$100 of tokens on this and didn’t get a correct answer.
I used my agent orchestrator with Opus 4.6 and told it:
I ran one version with a high compute setting and another version on lower compute settings. Both got it wrong.
This is a relatively absurd thing to do, and my orchestrator isn’t well designed for this sort of task. Regardless, it did kinda reasonable stuff based on my inspection of the transcript, but still got it wrong.