All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Drug development costs can range over two orders of magnitude

rossry3 Nov 2024 23:13 UTC

38 points

0 comments11 min readLW link

Redefining Tolerance: Beyond Popper’s Paradox

mindprison3 Nov 2024 22:23 UTC

−1 points

0 comments3 min readLW link

Goal: Understand Intelligence

Johannes C. Mayer3 Nov 2024 21:20 UTC

14 points

19 comments1 min readLW link

Current safety training techniques do not fully transfer to the agent setting

Simon Lermen and fidgetsinner

3 Nov 2024 19:24 UTC

162 points

9 comments5 min readLW link

Why our politicians aren’t Median

Yair Halberstadt3 Nov 2024 14:03 UTC

73 points

15 comments3 min readLW link

Human Biodiversity (Part 4: Astral Codex Ten)

Evan_Gaensbauer3 Nov 2024 4:20 UTC

−14 points

5 comments1 min readLW link

(reflectivealtruism.com)

Understanding incomparability versus incommensurability in relation to RLHF

artemiocobb2 Nov 2024 22:57 UTC

1 point

1 comment2 min readLW link

electric turbofans

bhauth2 Nov 2024 22:50 UTC

63 points

2 comments5 min readLW link

(bhauth.com)

Reality as Category-Theoretic State Machines: A Mathematical Framework

Wenitte Apiou2 Nov 2024 21:04 UTC

−8 points

0 comments2 min readLW link

The Median Researcher Problem

johnswentworth2 Nov 2024 20:16 UTC

167 points

74 comments1 min readLW link 2 reviews

Testing “True” Language Understanding in LLMs: A Simple Proposal

MtryaSam2 Nov 2024 19:12 UTC

9 points

2 comments2 min readLW link

Testing “True” Language Understanding in LLMs: A Simple Proposal

MtryaSam2 Nov 2024 19:12 UTC

−3 points

0 comments2 min readLW link

Fragile, Robust, and Antifragile Preference Satisfaction

adamShimi2 Nov 2024 17:25 UTC

19 points

0 comments5 min readLW link

(formethods.substack.com)

Higher Order Signs, Hallucination and Schizophrenia

Nicolas Villarreal2 Nov 2024 16:33 UTC

4 points

0 comments13 min readLW link

(nicolasdvillarreal.substack.com)

[Question] Is OpenAI net negative for AI Safety?

Lysandre Terrisse2 Nov 2024 16:18 UTC

4 points

0 comments1 min readLW link

Two arguments against longtermist thought experiments

momom22 Nov 2024 10:22 UTC

15 points

6 comments3 min readLW link

Both-Sidesism—When Fair & Balanced Goes Wrong

James Stephen Brown2 Nov 2024 3:04 UTC

3 points

15 comments6 min readLW link

(nonzerosum.games)

What can we learn from insecure domains?

Logan Zoellner1 Nov 2024 23:53 UTC

14 points

21 comments1 min readLW link

Science advances one funeral at a time

Cameron Berg, Kvee, Diogo de Lucena and Trent Hodgeson

1 Nov 2024 23:06 UTC

104 points

9 comments2 min readLW link

The Cartesian Crisis

mindprison1 Nov 2024 23:02 UTC

−5 points

2 comments2 min readLW link

Hypothesis on Composition Circuits in Vision Transformers

phenomanon1 Nov 2024 22:16 UTC

2 points

0 comments3 min readLW link

SAE Probing: What is it good for?

Subhash Kantamneni, Josh Engels, Senthooran Rajamanoharan and Neel Nanda

1 Nov 2024 19:23 UTC

34 points

0 comments11 min readLW link

[Question] Set Theory Multiverse vs Mathematical Truth—Philosophical Discussion

Wenitte Apiou1 Nov 2024 18:56 UTC

8 points

25 comments1 min readLW link

Educational CAI: Aligning a Language Model with Pedagogical Theories

Bharath Puranam1 Nov 2024 18:55 UTC

5 points

1 comment13 min readLW link

Prediction markets and Taxes

Edmund Nelson1 Nov 2024 17:39 UTC

11 points

8 comments1 min readLW link

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets

GeneSmith1 Nov 2024 17:26 UTC

90 points

18 comments5 min readLW link

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures

Sahil1 Nov 2024 17:24 UTC

53 points

3 comments35 min readLW link

Seeking Collaborators

abramdemski1 Nov 2024 17:13 UTC

64 points

15 comments7 min readLW link

Complete Feedback

abramdemski1 Nov 2024 16:58 UTC

27 points

8 comments3 min readLW link

Levers for Biological Progress—A Response to “Machines of Loving Grace”

Niko_McCarty1 Nov 2024 16:35 UTC

20 points

0 comments20 min readLW link

(www.asimov.press)

2024 Unofficial LW Community Census, Request for Comments

Screwtape1 Nov 2024 16:34 UTC

23 points

32 comments3 min readLW link

[Question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?

corruptedCatapillar1 Nov 2024 7:29 UTC

25 points

2 comments3 min readLW link

(draft) Cyborg software should be open (?)

AtillaYasar1 Nov 2024 7:24 UTC

4 points

5 comments3 min readLW link

Another UFO Bet

codyz1 Nov 2024 1:55 UTC

9 points

11 comments1 min readLW link

Trading Candy

jefftk1 Nov 2024 1:10 UTC

28 points

4 comments1 min readLW link

(www.jefftk.com)

JargonBot Beta Test

Raemon1 Nov 2024 1:05 UTC

84 points

55 comments6 min readLW link

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

ChengCheng, Brendan Murphy, AdamGleave and Kellin Pelrine

1 Nov 2024 0:10 UTC

18 points

0 comments6 min readLW link

(far.ai)

The slingshot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC

33 points

0 comments8 min readLW link

Toward Safety Case Inspired Basic Research

Lucas Teixeira, Lauren Greenspan, Dmitry Vaintrob and Eric Winsor

31 Oct 2024 23:06 UTC

57 points

3 comments13 min readLW link

Spooky Recommendation System Scaling

phdead31 Oct 2024 22:00 UTC

11 points

0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

Lorec31 Oct 2024 17:25 UTC

1 point

0 comments3 min readLW link

Toward Safety Cases For AI Scheming

Mikita Balesni and Marius Hobbhahn

31 Oct 2024 17:20 UTC

60 points

1 comment2 min readLW link

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC

46 points

5 comments77 min readLW link

(thezvi.wordpress.com)

The Compendium, A full argument about extinction risk from AGI

adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell and Andrea_Miotti

31 Oct 2024 12:01 UTC

196 points

52 comments2 min readLW link

(www.thecompendium.ai)

Some Preliminary Notes on the Promise of a Wisdom Explosion

Chris_Leong31 Oct 2024 9:21 UTC

2 points

0 comments1 min readLW link

(aiimpacts.org)

What TMS is like

Sable31 Oct 2024 0:44 UTC

228 points

26 comments6 min readLW link

(affablyevil.substack.com)

AI Safety at the Frontier: Paper Highlights, October ’24

gasteigerjo31 Oct 2024 0:09 UTC

3 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC

27 points

0 comments12 min readLW link

Generic advice caveats

Saul Munn30 Oct 2024 21:03 UTC

27 points

1 comment3 min readLW link

(www.brasstacks.blog)

I turned decision theory problems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC

104 points

23 comments1 min readLW link