Dan Braun

Karma: 561

Understanding strategic deception and deceptive alignment

Marius Hobbhahn, Mikita Balesni, Jérémy Scheurer and Dan Braun

25 Sep 2023 16:27 UTC

58 points

16 comments7 min readLW link

(www.apolloresearch.ai)

Announcing Apollo Research

Marius Hobbhahn, beren, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni and Jérémy Scheurer

30 May 2023 16:17 UTC

215 points

11 comments8 min readLW link

A small update to the Sparse Coding interim research report

Lee Sharkey, Dan Braun and beren

30 Apr 2023 19:54 UTC

61 points

5 comments1 min readLW link

Navigating public AI x-risk hype while pursuing technical solutions

Dan Braun19 Feb 2023 12:22 UTC

18 points

0 comments2 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

13 Dec 2022 15:41 UTC

137 points

22 comments22 min readLW link 2 reviews

Interpreting Neural Networks through the Polytope Lens

Sid Black, Lee Sharkey, Connor Leahy, beren, CRG, merizian, Eric Winsor and Dan Braun

23 Sep 2022 17:58 UTC

136 points

29 comments33 min readLW link