Vika

Karma: 3,267

Victoria Krakovna. Research scientist at DeepMind working on AI safety, and cofounder of the Future of Life Institute. Website and blog: vkrakovna.wordpress.com

Access to agent CoT makes monitors vulnerable to persuasion

Nikita Ostrovsky, Julija Bainiaksina, Tuna and Vika

25 Jul 2025 16:09 UTC

18 points

0 comments4 min readLW link

Evaluating and monitoring for AI scheming

Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho and Rohin Shah

10 Jul 2025 14:24 UTC

52 points

9 comments5 min readLW link

(deepmindsafetyresearch.medium.com)

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

14 Feb 2025 15:43 UTC

103 points

2 comments1 min readLW link

(deepmindsafetyresearch.medium.com)

Moving on from community living

Vika17 Apr 2024 17:02 UTC

64 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

When discussing AI risks, talk about capabilities, not intelligence

Vika11 Aug 2023 13:38 UTC

124 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

[Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategy

Vika and Rohin Shah

7 Mar 2023 11:55 UTC

128 points

13 comments5 min readLW link

(drive.google.com)

Power-seeking can be probable and predictive for trained agents

Vika and janos

28 Feb 2023 21:10 UTC

56 points

22 comments9 min readLW link

(arxiv.org)

Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika, Vikrant Varma, Ramana Kumar and Rohin Shah

25 Nov 2022 14:36 UTC

39 points

9 comments6 min readLW link

(vkrakovna.wordpress.com)

Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

79 points

4 comments25 min readLW link

Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

1 Nov 2022 11:03 UTC

127 points

24 comments4 min readLW link 1 review

DeepMind alignment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC

397 points

37 comments14 min readLW link 1 review

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Vika, Vikrant Varma, Ramana Kumar and Mary Phuong

12 Aug 2022 15:17 UTC

86 points

4 comments3 min readLW link 1 review

(vkrakovna.wordpress.com)

Paradigms of AI alignment: components and enablers

Vika2 Jun 2022 6:19 UTC

53 points

4 comments8 min readLW link

ELK contest submission: route understanding through the human ontology

Vika, Ramana Kumar and Vikrant Varma

14 Mar 2022 21:42 UTC

21 points

2 comments2 min readLW link

Optimization Concepts in the Game of Life

Vika and Ramana Kumar

16 Oct 2021 20:51 UTC

75 points

16 comments10 min readLW link

Tradeoff between desirable properties for baseline choices in impact measures

Vika4 Jul 2020 11:56 UTC

37 points

24 comments5 min readLW link

Possible takeaways from the coronavirus pandemic for slow AI takeoff

Vika31 May 2020 17:51 UTC

135 points

36 comments3 min readLW link 1 review

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

6 May 2020 23:51 UTC

69 points

9 comments6 min readLW link

Classifying specification problems as variants of Goodhart’s Law

Vika19 Aug 2019 20:40 UTC

72 points

5 comments5 min readLW link 1 review

Designing agent incentives to avoid side effects

Vika and TurnTrout

11 Mar 2019 20:55 UTC

29 points

0 comments2 min readLW link

(medium.com)