axelcore comments on axelcore’s Shortform

axelcore 12 Apr 2026 0:59 UTC
2 points
0
Are there any major papers/posts/etc about how training data containing discussion of AI behavior affects the resulting model behavior? Anything like Anthropic’s alignment faking paper, but more broad.
- StanislavKrym 12 Apr 2026 1:29 UTC
  3 points
  0
  Parent
  Yes, there is an entire wikitag devoted to this.
  - axelcore 12 Apr 2026 4:04 UTC
    2 points
    0
    Parent
    Thank you. In hindsight this was searchable and an unnecessary post, so I apologize for the obvious question.