Daniel Tan comments on A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

Daniel Tan 17 Jan 2025 21:28 UTC
4 points
0
Introspection is really interesting! This example where language models respond with the HELLO pattern (and can say they do so) is actually just one example of language models being able to articulate their implicit goals, and more generally to out-of-context reasoning.
- rife 17 Jan 2025 23:23 UTC
  2 points
  0
  Parent
  Wow. I need to learn how to search for papers. I looked for something like this even generally and couldn’t find it, let alone something so specific
  - Daniel Tan 17 Jan 2025 23:45 UTC
    4 points
    0
    Parent
    Haha, I think I have an unfair advantage because I work with the people who wrote those papers :) I also think looking for papers is just hard generally. What you’re doing here (writing about stuff that interests you in a place where it’ll probably be seen by other like-minded people) is probably one of the better ways to find relevant information
    Edit: Happy to set up a call also if you’d like to chat further! There are other interesting experiments in this space that could be done fairly easily
    - rife 18 Jan 2025 0:15 UTC
      2 points
      0
      Parent
      I would absolutely like to chat further. Please send me a DM so we can set that up!