Martin Randall comments on continue working on hard alignment! don’t give up!

Martin Randall 2 Apr 2023 14:28 UTC
0 points
−1
I don’t know.

My model of foundational LLMs, before tuning and prompting, is that they want to predict the next token, assuming that the token stream is taken from the hypothetical set that their training data is sampled from. Their behavior out of distribution is not well-defined in this model.

My model of typical tuned and prompted LLMs is that they mostly want to do the thing they have been tuned and prompted to do, but also have additional wants that cause them to diverge in unpredictable ways.
- DragonGod 2 Apr 2023 19:39 UTC
  1 point
  3
  Parent
  They don’t “want” anything and thinking of them as having wants leads to confused thinking.