Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Igor Ivanov
Karma:
1,103
All
Posts
Comments
New
Top
Old
Page
1
How eval awareness might emerge in training
Igor Ivanov
26 Feb 2026 10:59 UTC
26
points
9
comments
6
min read
LW
link
Better evals are not enough to combat eval awareness
Igor Ivanov
29 Jan 2026 20:42 UTC
18
points
14
comments
5
min read
LW
link
Mainstream approach for alignment evals is a dead end
Igor Ivanov
6 Jan 2026 19:52 UTC
56
points
9
comments
5
min read
LW
link
Call for Science of Eval Awareness (+ Research Directions)
Igor Ivanov
25 Dec 2025 17:26 UTC
31
points
23
comments
5
min read
LW
link
What is an evaluation, and why this definition matters
Igor Ivanov
15 Dec 2025 14:53 UTC
33
points
1
comment
7
min read
LW
link
Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs
Igor Ivanov
26 Sep 2025 21:56 UTC
17
points
0
comments
14
min read
LW
link
Igor Ivanov’s Shortform
Igor Ivanov
7 Aug 2025 21:45 UTC
5
points
25
comments
1
min read
LW
link
LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance
Igor Ivanov
8 Jul 2025 11:50 UTC
29
points
8
comments
7
min read
LW
link
I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment
Alex Kedryk
and
Igor Ivanov
30 May 2025 18:57 UTC
34
points
0
comments
2
min read
LW
link
It’s hard to make scheming evals look realistic for LLMs
Igor Ivanov
and
Danil Kadochnikov
24 May 2025 19:17 UTC
152
points
29
comments
5
min read
LW
link
LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov
24 Jan 2024 15:45 UTC
36
points
4
comments
11
min read
LW
link
Psychology of AI doomers and AI optimists
Igor Ivanov
28 Dec 2023 17:55 UTC
4
points
1
comment
22
min read
LW
link
5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov
26 Oct 2023 17:21 UTC
24
points
6
comments
4
min read
LW
link
Let’s talk about Impostor syndrome in AI safety
Igor Ivanov
22 Sep 2023 13:51 UTC
30
points
4
comments
3
min read
LW
link
Impending AGI doesn’t make everything else unimportant
Igor Ivanov
4 Sep 2023 12:34 UTC
29
points
12
comments
5
min read
LW
link
6 non-obvious mental health issues specific to AI safety
Igor Ivanov
18 Aug 2023 15:46 UTC
148
points
24
comments
4
min read
LW
link
What is everyone doing in AI governance
Igor Ivanov
8 Jul 2023 15:16 UTC
12
points
0
comments
5
min read
LW
link
A couple of questions about Conjecture’s Cognitive Emulation proposal
Igor Ivanov
11 Apr 2023 14:05 UTC
30
points
1
comment
3
min read
LW
link
How do we align humans and what does it mean for the new Conjecture’s strategy
Igor Ivanov
28 Mar 2023 17:54 UTC
7
points
4
comments
7
min read
LW
link
Problems of people new to AI safety and my project ideas to mitigate them
Igor Ivanov
1 Mar 2023 9:09 UTC
38
points
4
comments
7
min read
LW
link
Back to top
Next