dr_s comments on Igor Ivanov’s Shortform

dr_s 8 Aug 2025 5:40 UTC
4 points
1
The question is whether an LLM even can keep a coherent long term plan like “deceive the evaluators” just as well without relying on CoT. I suppose it might be able to if it also has internal memory of some sort.
- Jozdien 8 Aug 2025 13:29 UTC
  7 points
  0
  Parent
  I don’t think it needs to look like a coherent long-term plan from the model’s perspective. Awareness of CoTs being monitored and a consistent desire to do something that would be bad if monitored could influence the model’s outputs without requiring a coherent long-term plan.