My hour of memoryless lucidity

Eric Neyman4 May 2024 1:40 UTC

235 points

16 comments5 min readLW link

(ericneyman.wordpress.com)

[Question] Does reducing the amount of RL for a given capability level make AI safer?

Chris_Leong5 May 2024 17:04 UTC

42 points

6 comments1 min readLW link

introduction to cancer vaccines

bhauth5 May 2024 1:06 UTC

54 points

2 comments5 min readLW link

(www.bhauth.com)

Introducing AI-Powered Audiobooks of Rational Fiction Classics

Askwho4 May 2024 17:32 UTC

60 points

11 comments1 min readLW link

Key takeaways from our EA and alignment research surveys

Cameron Berg, Judd Rosenblatt, florin_pop and AE Studio

3 May 2024 18:10 UTC

79 points

5 comments21 min readLW link

Introducing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC

188 points

15 comments1 min readLW link

(ailabwatch.org)

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habryka3 May 2024 18:10 UTC

76 points

10 comments4 min readLW link

(aisafety.dance)

Now THIS is forecasting: understanding Epoch’s Direct Approach

Elliot_Mckernon and Zershaaneh Qureshi

4 May 2024 12:06 UTC

51 points

3 comments19 min readLW link

S-Risks: Fates Worse Than Extinction

aggliu and Writer

4 May 2024 15:30 UTC

41 points

2 comments6 min readLW link

(youtu.be)

[Question] Which skincare products are evidence-based?

Vanessa Kosoy2 May 2024 15:22 UTC

101 points

36 comments1 min readLW link

Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

30 Apr 2024 18:51 UTC

150 points

32 comments45 min readLW link

Some Experiments I’d Like Someone To Try With An Amnestic

johnswentworth4 May 2024 22:04 UTC

27 points

17 comments3 min readLW link

Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

144 points

34 comments11 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

27 Apr 2024 11:13 UTC

183 points

75 comments10 min readLW link

Q&A on Proposed SB 1047

Zvi2 May 2024 15:10 UTC

63 points

3 comments44 min readLW link

(thezvi.wordpress.com)

Why I’m doing PauseAI

Joseph Miller30 Apr 2024 16:21 UTC

99 points

14 comments4 min readLW link

Mechanistic Interpretability Workshop Happening at ICML 2024!

Neel Nanda, LawrenceC and Fazl

3 May 2024 1:18 UTC

47 points

4 comments1 min readLW link

ACX Covid Origins Post convinced readers

ErnestScribbler1 May 2024 13:06 UTC

75 points

7 comments2 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC

293 points

108 comments17 min readLW link

(dynomight.net)

Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai16 Apr 2024 21:16 UTC

364 points

82 comments12 min readLW link