Opus 4.8 is showing regressions on some benchmarks too (e.g. VendingBench 2) relative to 4.7. So I would argue the stylometric identification failure is mainly symptomatic of a more general capabilities regression in Opus 4.8, not anything specific.
Opus 4.8 is showing regressions on some benchmarks too (e.g. VendingBench 2) relative to 4.7. So I would argue the stylometric identification failure is mainly symptomatic of a more general capabilities regression in Opus 4.8, not anything specific.