chrisjbillington comments on My unsupervised elicitation challenge

chrisjbillington 19 Apr 2026 3:43 UTC
1 point
0
Any suggestions on how I might validate the answers Claude gives, so that I don’t just waste your time sending a bunch of incorrect attempts?

Here is an attempt using an approach that worked well for things like substitution-ciphers-with-errors—Claude would overthink it and confuse itself, whereas encouraging it to act purely on instinct worked well, and telling it to repeat the question four times allowed it some gut-level thinking space without the kind of structured thinking that led it astray:

https://claude.ai/share/f00ed43b-e26b-4e98-b4f7-9ad77100fac0

(I have zero clue if this is remotely correct)

Here’s an example of the buggy substitution cipher that this approach worked well with, as far back as Sonnet 3.5 (“new”):

https://x.com/Chrisbilbo/status/1884004589453848945
- DanielFilan 19 Apr 2026 3:50 UTC
  2 points
  0
  Parent
  
  Any suggestions on how I might validate the answers Claude gives, so that I don’t just waste your time sending a bunch of incorrect attempts?
  
  Alas the whole point is you sort of can’t. I will not be annoyed if you submit five attempts, but if you submit more I might find that a bit annoying.
  
  Your approach contains errors, alas.
  - chrisjbillington 19 Apr 2026 3:59 UTC
    1 point
    0
    Parent
    Very reasonable.
    
    Interesting challenge, looking forward to the eventual reveal!