Nathan Helm-Burger comments on A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

Nathan Helm-Burger 24 Jan 2025 0:52 UTC
7 points
6
This paper supports the idea of the model being able to self-report an implicit task it has been fine-tuned to do. https://arxiv.org/html/2501.11120v1

This leads me to suspect that rife’s experiments with reporting the implicit acrostic will be successful.

I agree with the decision to test less common words. You may need even more examples? Or maybe just to train for more epochs on the 200 examples.
- rife 24 Jan 2025 1:04 UTC
  5 points
  0
  Parent
  Forgot to follow up here but turning up the learning rate multiplier to 10 seemed to do the trick without introducing any over-fitting weirdness or instability