Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Igor Ivanov
Karma:
891
All
Posts
Comments
New
Top
Old
What is an evaluation, and why this definition matters
Igor Ivanov
15 Dec 2025 14:53 UTC
23
points
1
comment
7
min read
LW
link
Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs
Igor Ivanov
26 Sep 2025 21:56 UTC
16
points
0
comments
14
min read
LW
link
Igor Ivanov’s Shortform
Igor Ivanov
7 Aug 2025 21:45 UTC
5
points
5
comments
1
min read
LW
link
LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance
Igor Ivanov
8 Jul 2025 11:50 UTC
29
points
8
comments
7
min read
LW
link
I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment
Aleksandr Kedrik
and
Igor Ivanov
30 May 2025 18:57 UTC
33
points
0
comments
2
min read
LW
link
It’s hard to make scheming evals look realistic for LLMs
Igor Ivanov
and
Danil Kadochnikov
24 May 2025 19:17 UTC
150
points
29
comments
5
min read
LW
link
LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov
24 Jan 2024 15:45 UTC
36
points
4
comments
11
min read
LW
link
Psychology of AI doomers and AI optimists
Igor Ivanov
28 Dec 2023 17:55 UTC
4
points
1
comment
22
min read
LW
link
5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov
26 Oct 2023 17:21 UTC
24
points
6
comments
4
min read
LW
link
Let’s talk about Impostor syndrome in AI safety
Igor Ivanov
22 Sep 2023 13:51 UTC
30
points
4
comments
3
min read
LW
link
Impending AGI doesn’t make everything else unimportant
Igor Ivanov
4 Sep 2023 12:34 UTC
29
points
12
comments
5
min read
LW
link
6 non-obvious mental health issues specific to AI safety
Igor Ivanov
18 Aug 2023 15:46 UTC
147
points
24
comments
4
min read
LW
link
What is everyone doing in AI governance
Igor Ivanov
8 Jul 2023 15:16 UTC
12
points
0
comments
5
min read
LW
link
A couple of questions about Conjecture’s Cognitive Emulation proposal
Igor Ivanov
11 Apr 2023 14:05 UTC
30
points
1
comment
3
min read
LW
link
How do we align humans and what does it mean for the new Conjecture’s strategy
Igor Ivanov
28 Mar 2023 17:54 UTC
7
points
4
comments
7
min read
LW
link
Problems of people new to AI safety and my project ideas to mitigate them
Igor Ivanov
1 Mar 2023 9:09 UTC
38
points
4
comments
7
min read
LW
link
Emotional attachment to AIs opens doors to problems
Igor Ivanov
22 Jan 2023 20:28 UTC
20
points
10
comments
4
min read
LW
link
AI security might be helpful for AI alignment
Igor Ivanov
6 Jan 2023 20:16 UTC
36
points
1
comment
2
min read
LW
link
Fear mitigated the nuclear threat, can it do the same to AGI risks?
Igor Ivanov
9 Dec 2022 10:04 UTC
6
points
8
comments
5
min read
LW
link
Back to top