All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

Experimentation (Part 7 of “The Sense Of Physical Necessity”)

LoganStrohlMar 18, 2024, 9:25 PM

33 points

0 comments10 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Neuroscience and Alignment

Garrett BakerMar 18, 2024, 9:09 PM

40 points

25 comments2 min readLW link

GPT, the magical collaboration zone, Lex Fridman and Sam Altman

Bill BenzonMar 18, 2024, 8:04 PM

3 points

1 comment3 min readLW link

Measuring Coherence of Policies in Toy Environments

dx26 and Richard_Ngo

Mar 18, 2024, 5:59 PM

59 points

9 comments14 min readLW link

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Neel Nanda, János Kramár, Tom Lieberum and Rohin Shah

Mar 18, 2024, 5:28 PM

19 points

0 comments1 min readLW link

(arxiv.org)

Community Notes by X

NicholasKeesMar 18, 2024, 5:13 PM

127 points

15 comments7 min readLW link

[Question] Is the Basilisk pretending to be hidden in this simulation so that it can check what I would do if conditioned by a world without the Basilisk?

maybefbiMar 18, 2024, 4:05 PM

−18 points

1 comment1 min readLW link

On Devin

ZviMar 18, 2024, 1:20 PM

148 points

34 comments11 min readLW link

(thezvi.wordpress.com)

RLLMv10 experiment

MiguelDevMar 18, 2024, 8:32 AM

5 points

0 comments2 min readLW link

Join the AI Evaluation Tasks Bounty Hackathon

Esben KranMar 18, 2024, 8:15 AM

12 points

1 comment LW link

5 Physics Problems

DaemonicSigil and Muireall

Mar 18, 2024, 8:05 AM

60 points

0 comments15 min readLW link

Inferring the model dimension of API-protected LLMs

Ege ErdilMar 18, 2024, 6:19 AM

34 points

3 comments4 min readLW link

(arxiv.org)

AI strategy given the need for good reflection

owencbMar 18, 2024, 12:48 AM

7 points

0 comments LW link

XAI releases Grok base model

Jacob G-WMar 18, 2024, 12:47 AM

11 points

3 comments1 min readLW link

(x.ai)

Toki pona FAQ

dkl9Mar 17, 2024, 9:44 PM

37 points

9 comments1 min readLW link

(dkl9.net)

EA ErFiN Project work

Max_He-HoMar 17, 2024, 8:42 PM

2 points

0 comments1 min readLW link

EA ErFiN Project work

Max_He-HoMar 17, 2024, 8:37 PM

2 points

0 comments1 min readLW link

[Question] Alice and Bob is debating on a technique. Alice says Bob should try it before denying it. Is it a fallacy or something similar?

OokerMar 17, 2024, 8:01 PM

0 points

19 comments2 min readLW link

Is there a way to calculate the P(we are in a 2nd cold war)?

cloakMar 17, 2024, 8:01 PM

−9 points

2 comments1 min readLW link

The Worst Form Of Government (Except For Everything Else We’ve Tried)

johnswentworthMar 17, 2024, 6:11 PM

135 points

47 comments4 min readLW link

Applying simulacrum levels to hobbies, interests and goals

DMMFMar 17, 2024, 4:18 PM

15 points

2 comments4 min readLW link

(danfrank.ca)

What is the best argument that LLMs are shoggoths?

JoshuaFoxMar 17, 2024, 11:36 AM

26 points

22 comments1 min readLW link

Invitation to the Princeton AI Alignment and Safety Seminar

Sadhika MalladiMar 17, 2024, 1:10 AM

6 points

1 comment1 min readLW link

Anxiety vs. Depression

SableMar 17, 2024, 12:15 AM

86 points

35 comments3 min readLW link

(affablyevil.substack.com)

Celiefs

TheLemmaLlamaMar 16, 2024, 11:56 PM

3 points

8 comments1 min readLW link

My PhD thesis: Algorithmic Bayesian Epistemology

Eric NeymanMar 16, 2024, 10:56 PM

262 points

14 comments7 min readLW link

(arxiv.org)

How people stopped dying from diarrhea so much (& other life-saving decisions)

WriterMar 16, 2024, 4:00 PM

45 points

0 comments LW link

(youtu.be)

Transformative trustbuilding via advancements in decentralized lie detection

trevorMar 16, 2024, 5:56 AM

20 points

10 comments38 min readLW link

(www.ncbi.nlm.nih.gov)

Enter the WorldsEnd

Akram ChoudharyMar 16, 2024, 1:34 AM

−25 points

8 comments1 min readLW link

Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

Benjamin BourlierMar 15, 2024, 11:17 PM

−4 points

3 comments16 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

Mar 15, 2024, 11:16 PM

90 points

0 comments1 min readLW link

(metr.github.io)

Are AIs conscious? It might depend

Logan ZoellnerMar 15, 2024, 11:09 PM

6 points

6 comments3 min readLW link

Beyond Maxipok — good reflective governance as a target for action

owencbMar 15, 2024, 10:22 PM

20 points

0 comments LW link

Middle Child Phenomenon

PhilosophicalSoulMar 15, 2024, 8:47 PM

3 points

3 comments2 min readLW link

Capability or Alignment? Respect the LLM Base Model’s Capability During Alignment

Jingfeng YangMar 15, 2024, 5:56 PM

7 points

0 comments24 min readLW link

Rational Animations offers animation production and writing services!

WriterMar 15, 2024, 5:26 PM

33 points

0 comments1 min readLW link

Improving SAE’s by Sqrt()-ing L1 & Removing Lowest Activating Features

Logan Riggs and Jannik Brinkmann

Mar 15, 2024, 4:30 PM

26 points

5 comments4 min readLW link

Stuttgart, Germany—ACX Spring Meetups Everywhere 2024

Benjamin RMar 15, 2024, 2:59 PM

2 points

1 comment1 min readLW link

Controlling AGI Risk

TeaSeaMar 15, 2024, 4:56 AM

6 points

8 comments4 min readLW link

Ulm, Germany—ACX Spring Meetups Everywhere 2024

Benjamin RMar 15, 2024, 1:32 AM

2 points

1 comment1 min readLW link

Newport News/ Virginia ACX Meetup

DanielMar 14, 2024, 11:46 PM

1 point

0 comments1 min readLW link

Constructive Cauchy sequences vs. Dedekind cuts

jessicataMar 14, 2024, 11:04 PM

47 points

23 comments4 min readLW link

(unstableontology.com)

A Nail in the Coffin of Exceptionalism

Yeshua GodMar 14, 2024, 10:41 PM

−17 points

0 comments3 min readLW link

Toward a Broader Conception of Adverse Selection

Ricki HeicklenMar 14, 2024, 10:40 PM

177 points

61 comments13 min readLW link

(bayesshammai.substack.com)

More people getting into AI safety should do a PhD

AdamGleaveMar 14, 2024, 10:14 PM

61 points

24 comments12 min readLW link

(gleave.me)

Collection (Part 6 of “The Sense Of Physical Necessity”)

LoganStrohlMar 14, 2024, 9:37 PM

28 points

0 comments8 min readLW link

Fixed point or oscillate or noise

lemonhopeMar 14, 2024, 6:37 PM

3 points

10 comments1 min readLW link

How useful is “AI Control” as a framing on AI X-Risk?

habryka and ryan_greenblatt

Mar 14, 2024, 6:06 PM

70 points

4 comments34 min readLW link

Sparse autoencoders find composed features in small toy models

Evan Anders, Clement Neo, Jason Hoelscher-Obermaier and Jessica N. Howard

Mar 14, 2024, 6:00 PM

33 points

12 comments15 min readLW link