mattmacdermott comments on Unsupervised Elicitation of Language Models

mattmacdermott 15 Jun 2025 2:50 UTC
4 points
0
Do you think of this work as an ELK thing?
- Fabien Roger 15 Jun 2025 13:06 UTC
  5 points
  1
  Parent
  It’s at least related. Like CCS, I see it as targeting some average-case ELK problem of eliciting an AIs “true belief” (+ maybe some additional learning, unsure how much) in domains where you don’t have ground truth labels.
  My excitement about it solving ELK in practice will depend on how robust it is to variations that make the setting closer to the most important elicitation problems (e.g. situations where an AI knows very well what the humans want to hear, and where this differs from what it believes to be true).