My hour of memoryless lucidity

Eric Neyman4 May 2024 1:40 UTC

52 points

2 comments5 min readLW link

(ericneyman.wordpress.com)

Key takeaways from our EA and alignment research surveys

Cameron Berg, Judd Rosenblatt, florin_pop and AE Studio

3 May 2024 18:10 UTC

63 points

3 comments21 min readLW link

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habryka3 May 2024 18:10 UTC

45 points

10 comments4 min readLW link

(aisafety.dance)

[Question] Which skincare products are evidence-based?

Vanessa Kosoy2 May 2024 15:22 UTC

83 points

24 comments1 min readLW link

Mechanistic Interpretability Workshop Happening at ICML 2024!

Neel Nanda, LawrenceC and Fazl

3 May 2024 1:18 UTC

47 points

2 comments1 min readLW link

Introducing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC

174 points

7 comments1 min readLW link

(ailabwatch.org)

[Question] Were there any ancient rationalists?

OliverHayman3 May 2024 18:26 UTC

11 points

3 comments1 min readLW link

Q&A on Proposed SB 1047

Zvi2 May 2024 15:10 UTC

61 points

1 comment44 min readLW link

(thezvi.wordpress.com)

Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

30 Apr 2024 18:51 UTC

143 points

26 comments45 min readLW link

ACX Covid Origins Post convinced readers

ErnestScribbler1 May 2024 13:06 UTC

74 points

7 comments2 min readLW link

Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

140 points

33 comments11 min readLW link

AI Clarity: An Initial Research Agenda

Justin Bullock, Corin Katzke, Zershaaneh Qureshi and David_Kristoffersson

3 May 2024 13:54 UTC

8 points

0 comments8 min readLW link

Why I’m doing PauseAI

Joseph Miller30 Apr 2024 16:21 UTC

93 points

11 comments4 min readLW link

Weekly newsletter for AI safety events and training programs

Bryce Robertson3 May 2024 0:33 UTC

21 points

0 comments1 min readLW link

[Question] Shane Legg’s necessary properties for every AGI Safety plan

jacquesthibs1 May 2024 17:15 UTC

55 points

10 comments1 min readLW link

LessWrong Community Weekend 2024, open for applications

UnplannedCauliflower and jt

1 May 2024 10:18 UTC

60 points

0 comments7 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

27 Apr 2024 11:13 UTC

176 points

66 comments10 min readLW link

Questions for labs

Zach Stein-Perlman30 Apr 2024 22:15 UTC

64 points

9 comments8 min readLW link

Manifund Q1 Retro: Learnings from impact certs

Austin Chen1 May 2024 16:48 UTC

40 points

1 comment1 min readLW link

Please stop publishing ideas/insights/research about AI

Tamsin Leake2 May 2024 14:54 UTC

21 points

48 comments4 min readLW link