RSS

Igor Ivanov

Karma: 891

What is an eval­u­a­tion, and why this defi­ni­tion matters

Igor Ivanov15 Dec 2025 14:53 UTC
23 points
1 comment7 min readLW link

Com­par­a­tive Anal­y­sis of Black Box Meth­ods for De­tect­ing Eval­u­a­tion Aware­ness in LLMs

Igor Ivanov26 Sep 2025 21:56 UTC
16 points
0 comments14 min readLW link

Igor Ivanov’s Shortform

Igor Ivanov7 Aug 2025 21:45 UTC
5 points
5 comments1 min readLW link

LLMs are Ca­pable of Misal­igned Be­hav­ior Un­der Ex­plicit Pro­hi­bi­tion and Surveillance

Igor Ivanov8 Jul 2025 11:50 UTC
29 points
8 comments7 min readLW link

I repli­cated the An­thropic al­ign­ment fak­ing ex­per­i­ment on other mod­els, and they didn’t fake alignment

30 May 2025 18:57 UTC
33 points
0 comments2 min readLW link

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

24 May 2025 19:17 UTC
150 points
29 comments5 min readLW link

LLMs can strate­gi­cally de­ceive while do­ing gain-of-func­tion re­search

Igor Ivanov24 Jan 2024 15:45 UTC
36 points
4 comments11 min readLW link

Psy­chol­ogy of AI doomers and AI optimists

Igor Ivanov28 Dec 2023 17:55 UTC
4 points
1 comment22 min readLW link

5 psy­cholog­i­cal rea­sons for dis­miss­ing x-risks from AGI

Igor Ivanov26 Oct 2023 17:21 UTC
24 points
6 comments4 min readLW link

Let’s talk about Im­pos­tor syn­drome in AI safety

Igor Ivanov22 Sep 2023 13:51 UTC
30 points
4 comments3 min readLW link

Im­pend­ing AGI doesn’t make ev­ery­thing else unimportant

Igor Ivanov4 Sep 2023 12:34 UTC
29 points
12 comments5 min readLW link

6 non-ob­vi­ous men­tal health is­sues spe­cific to AI safety

Igor Ivanov18 Aug 2023 15:46 UTC
147 points
24 comments4 min readLW link

What is ev­ery­one do­ing in AI governance

Igor Ivanov8 Jul 2023 15:16 UTC
12 points
0 comments5 min readLW link

A cou­ple of ques­tions about Con­jec­ture’s Cog­ni­tive Emu­la­tion proposal

Igor Ivanov11 Apr 2023 14:05 UTC
30 points
1 comment3 min readLW link

How do we al­ign hu­mans and what does it mean for the new Con­jec­ture’s strategy

Igor Ivanov28 Mar 2023 17:54 UTC
7 points
4 comments7 min readLW link

Prob­lems of peo­ple new to AI safety and my pro­ject ideas to miti­gate them

Igor Ivanov1 Mar 2023 9:09 UTC
38 points
4 comments7 min readLW link

Emo­tional at­tach­ment to AIs opens doors to problems

Igor Ivanov22 Jan 2023 20:28 UTC
20 points
10 comments4 min readLW link

AI se­cu­rity might be helpful for AI alignment

Igor Ivanov6 Jan 2023 20:16 UTC
36 points
1 comment2 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC
6 points
8 comments5 min readLW link