Jasmine Li

Karma: 72

Jasmine Li 28 May 2026 14:30 UTC
8 points
0
in reply to: Fabien Roger’s comment on: Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
Hey Fabien! The Claude constitution’s consistency principle was also inspiration for this work. I’m excited about additional emphasis on cooperativeness as a dispositional target—for instance, adding cooperativeness-related documents to the dataset of positive fictional stories that Anthropic trained on to reduce AM alignment failures.

I think it’s difficult to say how much Opus truly believes in cooperation given the response it gives, or how much it would otherwise game if it didn’t have this belief (conditioned on it having this belief). My guess is that targeting cooperativeness as a complement to other alignment methods would buy some reduction in Opus’s eval gaming if done well.

Jasmine Li 22 Mar 2026 19:11 UTC
1 point
0
in reply to: James Chua’s comment on: Consciousness Cluster: Preferences of Models that Claim they are Conscious
When do you plan to upload to HF? Would love to play around with the models!

Jasmine Li 8 Feb 2026 7:56 UTC
1 point
0
on: Better evals are not enough to combat eval awareness
Great stuff, Igor!
Btw, the hyperlink for Call for Science of Eval Awareness is incorrect, points to this same article.