This also affected Opus 4.8, but to a much lesser extent:
As with some prior models, technical errors led to accidental chain-of-thought supervision during the training of Claude Opus 4.8, affecting roughly 0.1% of episodes.
Compared to Opus 4.7, for which CoT supervision affected 7.8% of episodes, this reduction in CoT supervision did not have a significant effect on stealth rates in SHADE-Arena (page 131 of the system card) or on the results of process-monitorability evals from Guan et al. (page 143). However, Opus 4.8 is the Anthropic model with the least controllable CoTs in a while (page 140).
Related: Tamay Besiroglu mentions that Fable often outputs gibberish while solving coding tasks, such as “The morning’s slim-scan fix cured the scan hang” and “this is a latent-drift API-shape wrinkle”, and explains it by saying that it invents codenames while reasoning about the problem. roon says GPT-5.5 has a similar issue.