Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 17 Jan 2025 21:23 UTC
3 points
0
That’s pretty interesting! I would guess that it’s difficult to elicit introspection by default. Most of the papers where this is reported to work well involve fine-tuning the models. So maybe “willingness to self-report honestly” should be something we train models to do.
- the gears to ascension 18 Jan 2025 3:29 UTC
  2 points
  0
  Parent
  willingness seems likely to be understating it. a context where the capability is even part of the author context seems like a prereq. finetuning would produce that, with fewshot one has to figure out how to make it correlate. I’ll try some more ideas.
  - eggsyntax 2 Apr 2025 13:37 UTC
    2 points
    0
    Parent
    a context where the capability is even part of the author context
    Can you unpack that a bit? I’m not sure what you’re pointing to. Maybe something like: few-shot examples of correct introspection (assuming you can identify those)?