Oliver Daniels comments on Oliver Daniels-Koch’s Shortform

Oliver Daniels 27 Nov 2025 3:41 UTC
1 point
0
unclear whether outcome-based training “leaking” into chain of thought b/c of parameter sharing / short-to-long generalization is good or bad for safety

on the one hand, it makes scheming less likely
on the other hand, it can make CoT less monitorable