StanislavKrym comments on Towards training-time mitigations for alignment faking in RL