Chijioke Ugwuanyi

Karma: 52

Scheming Evals Mislead in Both Directions

Chijioke Ugwuanyi, eric-z and TerryJCZhang

3 Jul 2026 11:49 UTC

22 points

0 comments10 min readLW link

Chijioke Ugwuanyi 20 May 2026 18:58 UTC
2 points
0
in reply to: Chamod Kalupahana’s comment on: From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill
Thank you for your comment, and yes, I read a paper today: https://arxiv.org/abs/2412.04984, where the researchers asked GPT models to summarize their thinking and output it in a scratchpad.

I should have done this.

Secondly, making the system prompt more deployment-like sounds interesting, although the latest Claude and GPT models appear to be incredibly smart at detecting the eval/deployment. This would be a credible ablation to run.

From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill

Chijioke Ugwuanyi20 May 2026 8:28 UTC

15 points

2 comments19 min readLW link

Chijioke Ugwuanyi 6 May 2026 7:12 UTC
1 point
0
in reply to: emile delcourt’s comment on: Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models
Interesting! It’s certainly feasible to train a set of linear probes and inspect actual model activations. Will be taking this up in the coming weeks.

Blackmail at 8 Billion Parameters: Agentic Misalignment in Sub-Frontier Models

Chijioke Ugwuanyi27 Apr 2026 8:59 UTC

15 points

2 comments7 min readLW link

Replication of Koorndijk (2025): Differential Compliance May Reflect Prompt Sensitivity Rather Than Strategic Reasoning

Chijioke Ugwuanyi and TerryJCZhang

13 Feb 2026 16:12 UTC

9 points

0 comments8 min readLW link