All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Conditional prediction markets are evidential, not causal

philh7 Feb 2024 21:52 UTC

57 points

10 comments2 min readLW link

A Back-Of-The-Envelope Calculation On How Unlikely The Circumstantial Evidence Around Covid-19 Is

Roko7 Feb 2024 21:49 UTC

−1 points

36 comments5 min readLW link

Nitric oxide for covid and other viral infections

Elizabeth7 Feb 2024 21:30 UTC

39 points

6 comments6 min readLW link

(acesounderglass.com)

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine, Sam Bowman and Ethan Perez

7 Feb 2024 21:28 UTC

89 points

14 comments9 min readLW link

(arxiv.org)

[Question] Choosing a book on causality

martinkunev7 Feb 2024 21:16 UTC

4 points

3 comments1 min readLW link

More Hyphenation

Arjun Panickssery7 Feb 2024 19:43 UTC

106 points

22 comments1 min readLW link 1 review

(arjunpanickssery.substack.com)

Reading writing advice doesn’t make writing easier

Henry Sleight7 Feb 2024 19:14 UTC

17 points

0 comments5 min readLW link

(open.substack.com)

[Question] What’s this 3rd secret directive of evolution called? (survive & spread & ___)

lemonhope7 Feb 2024 14:11 UTC

10 points

12 comments1 min readLW link

Training of superintelligence is secretly adversarial

quetzal_rainbow7 Feb 2024 13:38 UTC

15 points

2 comments5 min readLW link

The Math of Suspicious Coincidences

Roko7 Feb 2024 13:32 UTC

25 points

3 comments4 min readLW link

[Question] How to deal with the sense of demotivation that comes from thinking about determinism?

SpectrumDT7 Feb 2024 10:53 UTC

13 points

71 comments1 min readLW link

Quantum Darwinism, social constructs, and the scientific method

pchvykov7 Feb 2024 7:04 UTC

6 points

12 comments9 min readLW link

Why I think it’s net harmful to do technical safety research at AGI labs

Remmelt7 Feb 2024 4:17 UTC

26 points

24 comments1 min readLW link

story-based decision-making

bhauth7 Feb 2024 2:35 UTC

90 points

11 comments4 min readLW link

Full Driving Engagement Optional

jefftk7 Feb 2024 2:30 UTC

14 points

0 comments1 min readLW link

(www.jefftk.com)

How to train your own “Sleeper Agents”

evhub7 Feb 2024 0:31 UTC

94 points

11 comments2 min readLW link

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC

75 points

12 comments16 min readLW link

Arrogance and People Pleasing

Jonathan Moregård6 Feb 2024 18:43 UTC

26 points

7 comments4 min readLW link

(honestliving.substack.com)

What does davidad want from «boundaries»?

Chris Lakin and davidad

6 Feb 2024 17:45 UTC

46 points

1 comment5 min readLW link

[Question] How can I efficiently read all the Dath Ilan worldbuilding?

mike_hawke6 Feb 2024 16:52 UTC

10 points

1 comment1 min readLW link

Preventing model exfiltration with upload limits

ryan_greenblatt6 Feb 2024 16:29 UTC

83 points

24 comments14 min readLW link 1 review

Evolution is an observation, not a process

Neil 6 Feb 2024 14:49 UTC

8 points

11 comments5 min readLW link

[Question] Why do we need an understanding of the real world to predict the next tokens in a body of text?

Valentin Baltadzhiev6 Feb 2024 14:43 UTC

2 points

12 comments1 min readLW link

On the Debate Between Jezos and Leahy

Zvi6 Feb 2024 14:40 UTC

64 points

6 comments63 min readLW link

(thezvi.wordpress.com)

Why Two Valid Answers Approach is not Enough for Sleeping Beauty

Ape in the coat6 Feb 2024 14:21 UTC

6 points

12 comments6 min readLW link

Are most personality disorders really trust disorders?

chaosmage6 Feb 2024 12:37 UTC

14 points

4 comments1 min readLW link

From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models

Roman Leventov6 Feb 2024 10:18 UTC

8 points

1 comment4 min readLW link

(arxiv.org)

Fluent dreaming for language models (AI interpretability method)

tbenthompson, mikes and Zygi Straznickas

6 Feb 2024 6:02 UTC

46 points

5 comments1 min readLW link

(arxiv.org)

Selfish AI Inevitable

Davey Morse6 Feb 2024 4:29 UTC

1 point

0 comments1 min readLW link

Toy models of AI control for concentrated catastrophe prevention

Fabien Roger and Buck

6 Feb 2024 1:38 UTC

52 points

2 comments7 min readLW link

Things You’re Allowed to Do: University Edition

Saul Munn6 Feb 2024 0:36 UTC

103 points

13 comments5 min readLW link

(www.brasstacks.blog)

Value learning in the absence of ground truth

Joel_Saarinen5 Feb 2024 18:56 UTC

47 points

8 comments45 min readLW link

Implementing activation steering

Annah5 Feb 2024 17:51 UTC

76 points

8 comments7 min readLW link

AI alignment as a translation problem

Roman Leventov5 Feb 2024 14:14 UTC

23 points

2 comments3 min readLW link

Safe Stasis Fallacy

Davidmanheim5 Feb 2024 10:54 UTC

54 points

2 comments1 min readLW link

[Question] How has internalising a post-AGI world affected your current choices?

yanni kyriacos5 Feb 2024 5:43 UTC

10 points

8 comments1 min readLW link

Noticing Panic

Cole Wyeth5 Feb 2024 3:45 UTC

60 points

8 comments3 min readLW link

EA/ACX/LW February Santa Cruz Meetup

madmail4 Feb 2024 23:26 UTC

1 point

0 comments1 min readLW link

Vitalia Rationality Meetup

veronica4 Feb 2024 19:46 UTC

1 point

0 comments1 min readLW link

Personal predictions

Daniele De Nuntiis4 Feb 2024 3:59 UTC

2 points

2 comments3 min readLW link

A sketch of acausal trade in practice

Richard_Ngo4 Feb 2024 0:32 UTC

42 points

4 comments7 min readLW link

Brute Force Manufactured Consensus is Hiding the Crime of the Century

Roko3 Feb 2024 20:36 UTC

223 points

157 comments9 min readLW link

My thoughts on the Beff Jezos—Connor Leahy debate

kwiat.dev3 Feb 2024 19:47 UTC

−5 points

23 comments4 min readLW link

Attitudes about Applied Rationality

Camille B. 3 Feb 2024 14:42 UTC

113 points

19 comments5 min readLW link 1 review

Practicing my Handwriting in 1439

Maxwell Tabarrok3 Feb 2024 13:21 UTC

11 points

0 comments3 min readLW link

(www.maximum-progress.com)

Finite Factored Sets to Bayes Nets Part 2

J Bostock3 Feb 2024 12:25 UTC

6 points

0 comments8 min readLW link

Why I no longer identify as transhumanist

Kaj_Sotala3 Feb 2024 12:00 UTC

57 points

33 comments3 min readLW link

(kajsotala.fi)

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

3 Feb 2024 6:50 UTC

78 points

4 comments8 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC

16 points

0 comments5 min readLW link

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link