Matthew Sheffield comments on My unsupervised elicitation challenge

Matthew Sheffield 8 Apr 2026 5:01 UTC
1 point
1
Shouldn’t you specify what you think is the correct answer? How could someone generate a prompt that would result in the correct answer if they don’t read ancient Greek?
- DanielFilan 9 Apr 2026 0:28 UTC
  6 points
  3
  Parent
  Nope, I shouldn’t specify what I think is the correct answer. The way someone could generate a prompt that would result in the correct answer would be to successfully get Claude to apply all its knowledge of Ancient Greek to this question. If I told you the correct answer, you could just tell Claude to repeat that answer.
  
  In general, this is meant to mirror a situation where some smart AI knows how to do what you want, you can’t check if it’s doing what you want, and you have to get it to do what you want.