Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Vikrant Varma
(Vikrant Varma)
Karma:
730
Research Engineer at DeepMind.
Publications
All
Posts
Comments
New
Top
Old
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
lsgos
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
and
Neel Nanda
25 Apr 2024 18:43 UTC
62
points
35
comments
1
min read
LW
link
(arxiv.org)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lsgos
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
71
points
8
comments
8
min read
LW
link
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lsgos
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
and
Vikrant Varma
19 Apr 2024 19:06 UTC
68
points
0
comments
3
min read
LW
link
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
and
Rohin Shah
18 Dec 2023 11:58 UTC
147
points
21
comments
10
min read
LW
link
Explaining grokking through circuit efficiency
Vikrant Varma
and
Rohin Shah
8 Sep 2023 14:39 UTC
98
points
10
comments
3
min read
LW
link
(arxiv.org)
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Rohin Shah
25 Nov 2022 14:36 UTC
39
points
9
comments
6
min read
LW
link
(vkrakovna.wordpress.com)
Threat Model Literature Review
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
75
points
4
comments
25
min read
LW
link
Clarifying AI X-risk
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
1 Nov 2022 11:03 UTC
127
points
24
comments
4
min read
LW
link
1
review
More examples of goal misgeneralization
Rohin Shah
and
Vikrant Varma
7 Oct 2022 14:38 UTC
53
points
8
comments
2
min read
LW
link
(deepmindsafetyresearch.medium.com)
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Mary Phuong
12 Aug 2022 15:17 UTC
85
points
4
comments
3
min read
LW
link
1
review
(vkrakovna.wordpress.com)
ELK contest submission: route understanding through the human ontology
Vika
,
Ramana Kumar
and
Vikrant Varma
14 Mar 2022 21:42 UTC
21
points
2
comments
2
min read
LW
link
Back to top