Igor Ivanov

Karma: 891

What is an evaluation, and why this definition matters

Igor Ivanov15 Dec 2025 14:53 UTC

23 points

1 comment7 min readLW link

Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs

Igor Ivanov26 Sep 2025 21:56 UTC

16 points

0 comments14 min readLW link

Igor Ivanov’s Shortform

Igor Ivanov7 Aug 2025 21:45 UTC

5 points

5 comments1 min readLW link

LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance

Igor Ivanov8 Jul 2025 11:50 UTC

29 points

8 comments7 min readLW link

I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment

Aleksandr Kedrik and Igor Ivanov

30 May 2025 18:57 UTC

33 points

0 comments2 min readLW link

It’s hard to make scheming evals look realistic for LLMs

Igor Ivanov and Danil Kadochnikov

24 May 2025 19:17 UTC

150 points

29 comments5 min readLW link

LLMs can strategically deceive while doing gain-of-function research

Igor Ivanov24 Jan 2024 15:45 UTC

36 points

4 comments11 min readLW link

Psychology of AI doomers and AI optimists

Igor Ivanov28 Dec 2023 17:55 UTC

4 points

1 comment22 min readLW link

5 psychological reasons for dismissing x-risks from AGI

Igor Ivanov26 Oct 2023 17:21 UTC

24 points

6 comments4 min readLW link

Let’s talk about Impostor syndrome in AI safety

Igor Ivanov22 Sep 2023 13:51 UTC

30 points

4 comments3 min readLW link

Impending AGI doesn’t make everything else unimportant

Igor Ivanov4 Sep 2023 12:34 UTC

29 points

12 comments5 min readLW link

6 non-obvious mental health issues specific to AI safety

Igor Ivanov18 Aug 2023 15:46 UTC

147 points

24 comments4 min readLW link

What is everyone doing in AI governance

Igor Ivanov8 Jul 2023 15:16 UTC

12 points

0 comments5 min readLW link

A couple of questions about Conjecture’s Cognitive Emulation proposal

Igor Ivanov11 Apr 2023 14:05 UTC

30 points

1 comment3 min readLW link

How do we align humans and what does it mean for the new Conjecture’s strategy

Igor Ivanov28 Mar 2023 17:54 UTC

7 points

4 comments7 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor Ivanov1 Mar 2023 9:09 UTC

38 points

4 comments7 min readLW link

Emotional attachment to AIs opens doors to problems

Igor Ivanov22 Jan 2023 20:28 UTC

20 points

10 comments4 min readLW link

AI security might be helpful for AI alignment

Igor Ivanov6 Jan 2023 20:16 UTC

36 points

1 comment2 min readLW link

Fear mitigated the nuclear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC

6 points

8 comments5 min readLW link