RSS

ojorgensen

Karma: 179

AI Safety Researcher, my website is here.

Un­der­stand­ing Coun­ter­bal­anced Sub­trac­tions for Bet­ter Ac­ti­va­tion Additions

ojorgensen17 Aug 2023 13:53 UTC
21 points
0 comments14 min readLW link

Be­cause of Lay­erNorm, Direc­tions in GPT-2 MLP Lay­ers are Monosemantic

ojorgensen28 Jul 2023 19:43 UTC
12 points
3 comments13 min readLW link

UK Foun­da­tion Model Task Force—Ex­pres­sion of Interest

ojorgensen18 Jun 2023 9:43 UTC
64 points
2 comments1 min readLW link
(twitter.com)

ojor­gensen’s Shortform

ojorgensen4 May 2023 13:51 UTC
2 points
1 comment1 min readLW link

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensen20 Dec 2022 14:35 UTC
14 points
0 comments6 min readLW link

[Question] Which Is­sues in Con­cep­tual Align­ment have been For­mal­ised or Ob­served (or not)?

ojorgensen1 Nov 2022 22:32 UTC
4 points
0 comments1 min readLW link