Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 13 Jan 2025 5:21 UTC
1 point
0
Introspection is an instantiation of ‘Connecting the Dots’.
- Connecting the Dots: train a model g on (x, f(x)) pairs; the model g can infer things about f.
- Introspection: Train a model g on (x, f(x)) pairs, where x are prompts and f(x) are the model’s responses. Then the model can infer things about f. Note that here we have f = g, which is a special case of the above.