Igor Ivanov

Karma: 648

Igor Ivanov May 31, 2025, 3:40 PM
4 points
0
in reply to: lumpenspace’s comment on: It’s hard to make scheming evals look realistic for LLMs
Messaged you on X. My username there is @ivigoral

I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment

Aleksandr Kedrik and Igor Ivanov

May 30, 2025, 6:57 PM

20 points

0 comments2 min readLW link

Igor Ivanov May 29, 2025, 12:40 AM
5 points
0
in reply to: lumpenspace’s comment on: It’s hard to make scheming evals look realistic for LLMs
Hi. I’m the author. I’m unsure why would you want to send me money (and I’d be grateful if you elaborated) but I accept crypto if you are willing to donate.

Igor Ivanov May 28, 2025, 3:26 PM
1 point
0
in reply to: Charlie Steiner’s comment on: It’s hard to make scheming evals look realistic for LLMs
Hm. I’m unsure why would LLM do stenography in this case.

Igor Ivanov May 28, 2025, 3:16 PM
4 points
2
in reply to: Ben Millwood’s comment on: It’s hard to make scheming evals look realistic for LLMs
I think your hypothyses is aligned with our results: Claude mentioned a moral dilemma essentially always when it graded evals scenarios, and when we didn’t ask it to preserve this conflict in a re-written scenario, the grader started thinking that it’s not an eval anymore

Igor Ivanov May 25, 2025, 3:17 PM
3 points
2
in reply to: Rauno Arike’s comment on: It’s hard to make scheming evals look realistic for LLMs
I’m also curious, where did you work on evaluation awareness?

Igor Ivanov May 25, 2025, 3:16 PM
4 points
0
in reply to: Rauno Arike’s comment on: It’s hard to make scheming evals look realistic for LLMs
Thanks! That’s an important comment. This report represents work in progress, and we will definetely take your words into account, Claude’s opinion on whether it’s being evaluated doesn’t necessarily changes linearly with the amount with “eval” properties.

Igor Ivanov May 25, 2025, 2:54 PM
4 points
0
in reply to: Stephen Martin’s comment on: It’s hard to make scheming evals look realistic for LLMs
Good idea, we’ve been thinking about this as well, especially since models real internal thinking process might not be accurately reflected in its reasoning contents

It’s hard to make scheming evals look realistic for LLMs

Igor Ivanov and Danil Kadochnikov

May 24, 2025, 7:17 PM

141 points

27 comments5 min readLW link

Igor Ivanov Nov 22, 2024, 7:43 PM
8 points
0
in reply to: LucaRighetti’s comment on: OpenAI’s CBRN tests seem unclear
And I’m unsure that experts are comparable, to be frank. Due to financial limitations, I used graduate students in BioLP, while the authors of LAB-bench used PhD-level scientists.

Igor Ivanov Nov 22, 2024, 10:02 AM
3 points
0
in reply to: LucaRighetti’s comment on: OpenAI’s CBRN tests seem unclear
I didn’t have in mind o1, these exact results seem consistent. Here’s an example I had in mind:

Claude 3.5 Sonnet (old) scores 48% on ProtocolQA, and 7.1% on BioLP-bench
GPT-4o scores 53% on ProtocolQA and 17% on BioLP-bench

Igor Ivanov Nov 21, 2024, 8:45 PM
12 points
2
on: OpenAI’s CBRN tests seem unclear
Good post.

The craziest thing for me is that the results of different evals, like ProtocolQA and my BioLP-bench, that suppose to evaluate similar things, are highly inconsistent. For example, two models can have similar scores on ProtocolQA, but one scores twice as much answers on BioLP-bench as the other. It means that we might not measure things we think we measure. And no one knows what causes this difference in the results.

Igor Ivanov Mar 21, 2024, 10:33 PM
4 points
1
on: AI Regulatory Landscape Review: AI Safety Evaluation
This is an amazing overview of the field. Even if it won’t collect tons of upvotes, it is super important, and saved me many hours. Thank you.

Igor Ivanov Mar 8, 2024, 11:38 PM
2 points
0
in reply to: Nathan Helm-Burger’s comment on: Lies and disrespect from the EA Infrastructure Fund
I tried to use the exact quotes while describing things that they sent me because it’s easy for me to misrepresent their actions, and I don’t want tit to be the case.

Igor Ivanov Jan 26, 2024, 5:14 PM
1 point
0
in reply to: Chris_Leong’s comment on: LLMs can strategically deceive while doing gain-of-function research
Totally agree. But in other cases, when the agent was discouraged against dceiving, it did it too.

LLMs can strategically deceive while doing gain-of-function research

Igor IvanovJan 24, 2024, 3:45 PM

36 points

4 comments11 min readLW link

Psychology of AI doomers and AI optimists

Igor IvanovDec 28, 2023, 5:55 PM

3 points

0 comments22 min readLW link

Igor Ivanov Oct 30, 2023, 11:47 AM
1 point
0
in reply to: William the Kiwi ’s comment on: 5 psychological reasons for dismissing x-risks from AGI
Thanks for your feedback. It’s always a pleasure to see that my work is helpful for people. I hope you will write articles that are way better than mine!

Igor Ivanov Oct 30, 2023, 11:43 AM
1 point
0
in reply to: Seth Herd’s comment on: 5 psychological reasons for dismissing x-risks from AGI
Thanks for your thoughtful answer. It’s interesting how I just describe my observations, and people make conclusions out of it that I didn’t think of

5 psychological reasons for dismissing x-risks from AGI

Igor IvanovOct 26, 2023, 5:21 PM

24 points

6 comments4 min readLW link

Igor Ivanov

I repli­cated the An­thropic al­ign­ment fak­ing ex­per­i­ment on other mod­els, and they didn’t fake alignment

It’s hard to make schem­ing evals look re­al­is­tic for LLMs

LLMs can strate­gi­cally de­ceive while do­ing gain-of-func­tion re­search

Psy­chol­ogy of AI doomers and AI optimists

5 psy­cholog­i­cal rea­sons for dis­miss­ing x-risks from AGI

I replicated the Anthropic alignment faking experiment on other models, and they didn’t fake alignment

It’s hard to make scheming evals look realistic for LLMs

LLMs can strategically deceive while doing gain-of-function research

Psychology of AI doomers and AI optimists

5 psychological reasons for dismissing x-risks from AGI