Igor Ivanov

Karma: 1,103

How eval awareness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC

26 points

9 comments6 min readLW link

Better evals are not enough to combat eval awareness

Igor Ivanov29 Jan 2026 20:42 UTC

18 points

14 comments5 min readLW link

Mainstream approach for alignment evals is a dead end

Igor Ivanov6 Jan 2026 19:52 UTC

56 points

9 comments5 min readLW link

Call for Science of Eval Awareness (+ Research Directions)

Igor Ivanov25 Dec 2025 17:26 UTC

31 points

23 comments5 min readLW link

What is an evaluation, and why this definition matters

Igor Ivanov15 Dec 2025 14:53 UTC

33 points

1 comment7 min readLW link

Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs

Igor Ivanov26 Sep 2025 21:56 UTC

17 points

0 comments14 min readLW link

Igor Ivanov’s Shortform

Igor Ivanov7 Aug 2025 21:45 UTC

5 points

25 comments1 min readLW link

LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance

Igor Ivanov8 Jul 2025 11:50 UTC

29 points

8 comments7 min readLW link

I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment

Alex Kedryk and Igor Ivanov

30 May 2025 18:57 UTC

34 points

0 comments2 min readLW link

It’s hard to make scheming evals look realistic for LLMs

Igor Ivanov and Danil Kadochnikov

24 May 2025 19:17 UTC

152 points

29 comments5 min readLW link

LLMs can strategically deceive while doing gain-of-function research

Igor Ivanov24 Jan 2024 15:45 UTC

36 points

4 comments11 min readLW link

Psychology of AI doomers and AI optimists

Igor Ivanov28 Dec 2023 17:55 UTC

4 points

1 comment22 min readLW link

5 psychological reasons for dismissing x-risks from AGI

Igor Ivanov26 Oct 2023 17:21 UTC

24 points

6 comments4 min readLW link

Let’s talk about Impostor syndrome in AI safety

Igor Ivanov22 Sep 2023 13:51 UTC

30 points

4 comments3 min readLW link

Impending AGI doesn’t make everything else unimportant

Igor Ivanov4 Sep 2023 12:34 UTC

29 points

12 comments5 min readLW link

6 non-obvious mental health issues specific to AI safety

Igor Ivanov18 Aug 2023 15:46 UTC

148 points

24 comments4 min readLW link

What is everyone doing in AI governance

Igor Ivanov8 Jul 2023 15:16 UTC

12 points

0 comments5 min readLW link

A couple of questions about Conjecture’s Cognitive Emulation proposal

Igor Ivanov11 Apr 2023 14:05 UTC

30 points

1 comment3 min readLW link

How do we align humans and what does it mean for the new Conjecture’s strategy

Igor Ivanov28 Mar 2023 17:54 UTC

7 points

4 comments7 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor Ivanov1 Mar 2023 9:09 UTC

38 points

4 comments7 min readLW link