ProgramCrafter comments on Florian_Dietz’s Shortform

ProgramCrafter 26 May 2025 13:09 UTC
−4 points
−1
Could this enable the LLM to realize when it is being trained?
It could not. LLM is not a single entity which is trained by telling it “this is correct, that is not”; gradient descent might not even run full inference, and certainly does not produce all the thinking tokens which’d allow the model to react to training.
- Gunnar_Zarncke 26 May 2025 17:39 UTC
  3 points
  3
  Parent
  That proves too much. It would also rule out that LLMs have situational awareness.
  - ProgramCrafter 26 May 2025 19:49 UTC
    −1 points
    0
    Parent
    Oh. That only applies to RLHF finetuning, right? I do recall that gradient descent cannot instantiate the same assistant persona which could react, but might trigger another kind of entity: deceptive weight chunks which would protect a backdoor from being discovered/trained away.