All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM

75 points

12 comments16 min readLW link

Arrogance and People Pleasing

Jonathan MoregårdFeb 6, 2024, 6:43 PM

26 points

7 comments4 min readLW link

(honestliving.substack.com)

What does davidad want from «boundaries»?

Chipmonk and davidad

Feb 6, 2024, 5:45 PM

47 points

1 comment5 min readLW link

[Question] How can I efficiently read all the Dath Ilan worldbuilding?

mike_hawkeFeb 6, 2024, 4:52 PM

10 points

1 comment1 min readLW link

Preventing model exfiltration with upload limits

ryan_greenblattFeb 6, 2024, 4:29 PM

71 points

22 comments14 min readLW link

Evolution is an observation, not a process

Neil Feb 6, 2024, 2:49 PM

8 points

11 comments5 min readLW link

[Question] Why do we need an understanding of the real world to predict the next tokens in a body of text?

Valentin BaltadzhievFeb 6, 2024, 2:43 PM

2 points

12 comments1 min readLW link

On the Debate Between Jezos and Leahy

ZviFeb 6, 2024, 2:40 PM

64 points

6 comments63 min readLW link

(thezvi.wordpress.com)

Why Two Valid Answers Approach is not Enough for Sleeping Beauty

Ape in the coatFeb 6, 2024, 2:21 PM

6 points

12 comments6 min readLW link

Are most personality disorders really trust disorders?

chaosmageFeb 6, 2024, 12:37 PM

20 points

4 comments1 min readLW link

From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models

Roman LeventovFeb 6, 2024, 10:18 AM

8 points

1 comment4 min readLW link

(arxiv.org)

Fluent dreaming for language models (AI interpretability method)

tbenthompson, mikes and Zygi Straznickas

Feb 6, 2024, 6:02 AM

46 points

5 comments1 min readLW link

(arxiv.org)

Selfish AI Inevitable

Davey MorseFeb 6, 2024, 4:29 AM

1 point

0 comments1 min readLW link

Toy models of AI control for concentrated catastrophe prevention

Fabien Roger and Buck

Feb 6, 2024, 1:38 AM

51 points

2 comments7 min readLW link

Things You’re Allowed to Do: University Edition

Saul MunnFeb 6, 2024, 12:36 AM

97 points

13 comments5 min readLW link

(www.brasstacks.blog)

Value learning in the absence of ground truth

Joel_SaarinenFeb 5, 2024, 6:56 PM

47 points

8 comments45 min readLW link

Implementing activation steering

AnnahFeb 5, 2024, 5:51 PM

75 points

8 comments7 min readLW link

AI alignment as a translation problem

Roman LeventovFeb 5, 2024, 2:14 PM

22 points

2 comments3 min readLW link

Safe Stasis Fallacy

DavidmanheimFeb 5, 2024, 10:54 AM

54 points

2 comments LW link

[Question] How has internalising a post-AGI world affected your current choices?

yanni kyriacosFeb 5, 2024, 5:43 AM

10 points

8 comments1 min readLW link

A thought experiment for comparing “biological” vs “digital” intelligence increase/explosion

Super AGIFeb 5, 2024, 4:57 AM

6 points

3 comments1 min readLW link

Noticing Panic

Cole WyethFeb 5, 2024, 3:45 AM

59 points

8 comments3 min readLW link

EA/ACX/LW February Santa Cruz Meetup

madmailFeb 4, 2024, 11:26 PM

1 point

0 comments1 min readLW link

Vitalia Rationality Meetup

veronicaFeb 4, 2024, 7:46 PM

1 point

0 comments1 min readLW link

Personal predictions

Daniele De NuntiisFeb 4, 2024, 3:59 AM

2 points

2 comments3 min readLW link

A sketch of acausal trade in practice

Richard_NgoFeb 4, 2024, 12:32 AM

36 points

4 comments7 min readLW link

Brute Force Manufactured Consensus is Hiding the Crime of the Century

RokoFeb 3, 2024, 8:36 PM

209 points

156 comments9 min readLW link

My thoughts on the Beff Jezos—Connor Leahy debate

kwiat.devFeb 3, 2024, 7:47 PM

−5 points

23 comments4 min readLW link

The Journal of Dangerous Ideas

rogersbaconFeb 3, 2024, 3:40 PM

−25 points

4 comments5 min readLW link

(www.secretorum.life)

Attitudes about Applied Rationality

Camille Berger Feb 3, 2024, 2:42 PM

108 points

18 comments4 min readLW link

Practicing my Handwriting in 1439

Maxwell TabarrokFeb 3, 2024, 1:21 PM

11 points

0 comments3 min readLW link

(www.maximum-progress.com)

Finite Factored Sets to Bayes Nets Part 2

J BostockFeb 3, 2024, 12:25 PM

6 points

0 comments8 min readLW link

Why I no longer identify as transhumanist

Kaj_SotalaFeb 3, 2024, 12:00 PM

55 points

33 comments3 min readLW link

(kajsotala.fi)

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Feb 3, 2024, 6:50 AM

78 points

4 comments8 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran WFeb 3, 2024, 4:00 AM

16 points

0 comments5 min readLW link

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

Feb 2, 2024, 11:17 PM

98 points

0 comments9 min readLW link

Survey for alignment researchers!

Cameron Berg, Judd Rosenblatt and AE Studio

Feb 2, 2024, 8:41 PM

71 points

11 comments1 min readLW link

Voting Results for the 2022 Review

Ben PaceFeb 2, 2024, 8:34 PM

57 points

3 comments73 min readLW link

On Dwarkesh’s 3rd Podcast With Tyler Cowen

ZviFeb 2, 2024, 7:30 PM

36 points

9 comments21 min readLW link

(thezvi.wordpress.com)

Most experts believe COVID-19 was probably not a lab leak

DanielFilanFeb 2, 2024, 7:28 PM

66 points

89 comments2 min readLW link

(gcrinstitute.org)

What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.bartenFeb 2, 2024, 6:59 PM

13 points

12 comments9 min readLW link

Solving alignment isn’t enough for a flourishing future

micFeb 2, 2024, 6:23 PM

27 points

0 comments LW link

(papers.ssrn.com)

Manifold Markets

PeterMcCluskeyFeb 2, 2024, 5:48 PM

26 points

9 comments4 min readLW link

(bayesianinvestor.com)

Types of subjective welfare

MichaelStJulesFeb 2, 2024, 9:56 AM

10 points

3 comments LW link

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph BloomFeb 2, 2024, 6:54 AM

103 points

37 comments15 min readLW link

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porbyFeb 2, 2024, 5:49 AM

47 points

1 comment4 min readLW link

(arxiv.org)

Running a Prediction Market Mafia Game

Arjun PanicksseryFeb 1, 2024, 11:24 PM

22 points

5 comments1 min readLW link

(arjunpanickssery.substack.com)

Evaluating Stability of Unreflective Alignment

james.lucassenFeb 1, 2024, 10:15 PM

57 points

12 comments18 min readLW link

(jlucassen.com)

Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

simeon_cFeb 1, 2024, 9:30 PM

69 points

17 comments1 min readLW link

(www.aria.org.uk)

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

16 points

15 comments13 min readLW link