RSS

Vlad Mikulik(Vladimir Mikulik)

Karma: 667

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
147 points
21 comments10 min readLW link

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

20 Jul 2023 10:50 UTC
43 points
3 comments2 min readLW link
(arxiv.org)

Speci­fi­ca­tion gam­ing: the flip side of AI ingenuity

6 May 2020 23:51 UTC
65 points
9 comments6 min readLW link

Utility ≠ Reward

Vlad Mikulik5 Sep 2019 17:28 UTC
121 points
24 comments1 min readLW link2 reviews

2-D Robustness

Vlad Mikulik30 Aug 2019 20:27 UTC
85 points
8 comments2 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
82 points
5 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
117 points
20 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
103 points
17 comments13 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
83 points
48 comments12 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
184 points
42 comments12 min readLW link3 reviews

Clar­ify­ing Con­se­quen­tial­ists in the Solomonoff Prior

Vlad Mikulik11 Jul 2018 2:35 UTC
20 points
16 comments6 min readLW link