RSS

[Question] Shane Legg’s nec­es­sary prop­er­ties for ev­ery AGI Safety plan

jacquesthibs1 May 2024 17:15 UTC
59 points
8 comments1 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC
156 points
7 comments1 min readLW link
(ailabwatch.org)

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

30 Apr 2024 18:51 UTC
128 points
19 comments45 min readLW link

ACX Covid Ori­gins Post con­vinced readers

ErnestScribbler1 May 2024 13:06 UTC
50 points
4 comments2 min readLW link

LessWrong Com­mu­nity Week­end 2024, open for applications

1 May 2024 10:18 UTC
56 points
0 comments7 min readLW link

Man­i­fund Q1 Retro: Learn­ings from im­pact certs

Austin Chen1 May 2024 16:48 UTC
35 points
0 comments1 min readLW link

Why I’m do­ing PauseAI

Joseph Miller30 Apr 2024 16:21 UTC
84 points
6 comments4 min readLW link

Iron­ing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC
137 points
26 comments11 min readLW link

Ques­tions for labs

Zach Stein-Perlman30 Apr 2024 22:15 UTC
54 points
5 comments8 min readLW link

[Linkpost] Silver Bul­letin: For most peo­ple, poli­tics is about fit­ting in

Gunnar_Zarncke1 May 2024 18:12 UTC
17 points
0 comments1 min readLW link
(www.natesilver.net)

Take SCIFs, it’s dan­ger­ous to go alone

1 May 2024 8:02 UTC
32 points
1 comment3 min readLW link

Transcoders en­able fine-grained in­ter­pretable cir­cuit anal­y­sis for lan­guage models

30 Apr 2024 17:58 UTC
46 points
10 comments17 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
151 points
59 comments10 min readLW link

AXRP Epi­sode 30 - AI Se­cu­rity with Jeffrey Ladish

DanielFilan1 May 2024 2:50 UTC
25 points
0 comments79 min readLW link

KAN: Kol­mogorov-Arnold Networks

Gunnar_Zarncke1 May 2024 16:50 UTC
9 points
9 comments1 min readLW link
(arxiv.org)

Towards Mul­ti­modal In­ter­pretabil­ity: Learn­ing Sparse In­ter­pretable Fea­tures in Vi­sion Transformers

hugofry29 Apr 2024 20:57 UTC
59 points
6 comments11 min readLW link

The In­ten­tional Stance, LLMs Edition

Eleni Angelou30 Apr 2024 17:12 UTC
30 points
2 comments8 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
123 points
11 comments9 min readLW link

The for­mal goal is a pointer

Pi Rogers1 May 2024 0:27 UTC
19 points
9 comments1 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024 20:28 UTC
46 points
2 comments14 min readLW link