All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728 29 30 31

Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics

sweenesm26 Jan 2024 23:40 UTC

14 points

6 comments4 min readLW link

An Invitation to Refrain from Downvoting Posts into Net-Negative Karma

MikkW26 Jan 2024 20:13 UTC

3 points

12 comments1 min readLW link

The Good Balsamic Vinegar

jenn26 Jan 2024 19:30 UTC

52 points

4 comments2 min readLW link

(jenn.site)

The Perspective-based Explanation to the Reflective Inconsistency Paradox

dadadarren26 Jan 2024 19:00 UTC

10 points

16 comments8 min readLW link

To Boldly Code

StrivingForLegibility26 Jan 2024 18:25 UTC

26 points

4 comments3 min readLW link

Incorporating Mechanism Design Into Decision Theory

StrivingForLegibility26 Jan 2024 18:25 UTC

17 points

4 comments4 min readLW link

Making every researcher seek grants is a broken model

jasoncrawford26 Jan 2024 16:06 UTC

160 points

41 comments4 min readLW link

(rootsofprogress.org)

Notes on Innocence

David Gross26 Jan 2024 14:45 UTC

13 points

21 comments18 min readLW link

Stacked Laptop Monitor

jefftk26 Jan 2024 14:10 UTC

22 points

5 comments1 min readLW link

(www.jefftk.com)

Surgery Works Well Without The FDA

Maxwell Tabarrok26 Jan 2024 13:31 UTC

41 points

28 comments4 min readLW link

(maximumprogress.substack.com)

[Question] Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Roman Leventov26 Jan 2024 9:49 UTC

21 points

5 comments1 min readLW link

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

26 Jan 2024 7:22 UTC

161 points

60 comments57 min readLW link

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaley26 Jan 2024 3:58 UTC

16 points

2 comments11 min readLW link

Musings on Cargo Cult Consciousness

Gareth Davidson25 Jan 2024 23:00 UTC

−13 points

11 comments17 min readLW link

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthena25 Jan 2024 19:17 UTC

94 points

14 comments1 min readLW link

(www.rand.org)

[Question] Bayesian Reflection Principles and Ignorance of the Future

crickets25 Jan 2024 19:00 UTC

5 points

3 comments1 min readLW link

“Does your paradigm beget new, good, paradigms?”

Raemon25 Jan 2024 18:23 UTC

40 points

6 comments2 min readLW link

AI #48: The Talk of Davos

Zvi25 Jan 2024 16:20 UTC

38 points

9 comments36 min readLW link

(thezvi.wordpress.com)

Importing a Python File by Name

jefftk25 Jan 2024 16:00 UTC

12 points

7 comments1 min readLW link

(www.jefftk.com)

[Repost] The Copenhagen Interpretation of Ethics

mesaoptimizer25 Jan 2024 15:20 UTC

83 points

4 comments5 min readLW link

(web.archive.org)

Nash Bargaining between Subagents doesn’t solve the Shutdown Problem

A.H.25 Jan 2024 10:47 UTC

22 points

1 comment9 min readLW link

Status-oriented spending

Adam Zerner25 Jan 2024 6:46 UTC

14 points

19 comments4 min readLW link

Protecting agent boundaries

Chris Lakin25 Jan 2024 4:13 UTC

11 points

6 comments2 min readLW link

[Question] Is a random box of gas predictable after 20 seconds?

Thomas Kwa and habryka

24 Jan 2024 23:00 UTC

38 points

35 comments1 min readLW link

[Question] Will quantum randomness affect the 2028 election?

Thomas Kwa and habryka

24 Jan 2024 22:54 UTC

66 points

52 comments1 min readLW link

AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes

Dan H and Corin Katzke

24 Jan 2024 19:38 UTC

27 points

1 comment6 min readLW link

(newsletter.safe.ai)

Krueger Lab AI Safety Internship 2024

Joey Bream24 Jan 2024 19:17 UTC

3 points

0 comments1 min readLW link

Agents that act for reasons: a thought experiment

Michele Campolo24 Jan 2024 16:47 UTC

3 points

0 comments3 min readLW link

Impact Assessment of AI Safety Camp (Arb Research)

Samuel Holton24 Jan 2024 16:19 UTC

10 points

0 comments11 min readLW link

(forum.effectivealtruism.org)

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

24 Jan 2024 16:11 UTC

268 points

74 comments28 min readLW link 1 review

LLMs can strategically deceive while doing gain-of-function research

Igor Ivanov24 Jan 2024 15:45 UTC

36 points

4 comments11 min readLW link

Monthly Roundup #14: January 2024

Zvi24 Jan 2024 12:50 UTC

38 points

22 comments44 min readLW link

(thezvi.wordpress.com)

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

24 Jan 2024 9:33 UTC

197 points

34 comments1 min readLW link

Global LessWrong/AC10 Meetup on VRChat

Tomás B. and the gears to ascension

24 Jan 2024 5:44 UTC

15 points

2 comments1 min readLW link

Humans aren’t fleeb.

Charlie Steiner24 Jan 2024 5:31 UTC

37 points

5 comments2 min readLW link

A Paradigm Shift in Sustainability

Jose Miguel Cruz y Celis23 Jan 2024 23:34 UTC

5 points

0 comments18 min readLW link

From Finite Factors to Bayes Nets

J Bostock23 Jan 2024 20:03 UTC

38 points

7 comments8 min readLW link

Institutional economics through the lens of scale-free regulative development, morphogenesis, and cognitive science

Roman Leventov23 Jan 2024 19:42 UTC

8 points

0 comments14 min readLW link

Making a Secular Solstice Songbook

jefftk23 Jan 2024 19:40 UTC

38 points

6 comments1 min readLW link

(www.jefftk.com)

Simple Appreciations

Jonathan Moregård23 Jan 2024 16:23 UTC

17 points

11 comments4 min readLW link

(open.substack.com)

[Question] What environmental cues had you not seen them would have ended in disaster?

koratkar23 Jan 2024 14:59 UTC

19 points

2 comments1 min readLW link

Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)

Kaj_Sotala23 Jan 2024 14:05 UTC

46 points

2 comments2 min readLW link

(www.nature.com)

“Safety as a Scientific Pursuit” (2024)

technicalities23 Jan 2024 12:40 UTC

18 points

3 comments2 min readLW link

(banburismus.substack.com)

Brainstorming: Slow Takeoff

David Piepgrass23 Jan 2024 6:58 UTC

3 points

0 comments51 min readLW link

Reframing Acausal Trolling as Acausal Patronage

StrivingForLegibility23 Jan 2024 3:04 UTC

14 points

0 comments2 min readLW link

Orthogonality or the “Human Worth Hypothesis”?

Jeffs23 Jan 2024 0:57 UTC

21 points

31 comments3 min readLW link

the subreddit size threshold

bhauth23 Jan 2024 0:38 UTC

32 points

3 comments4 min readLW link

(www.bhauth.com)

Starting in mechanistic interpretability

Jakub Smékal22 Jan 2024 23:40 UTC

1 point

0 comments3 min readLW link

(jakubsmekal.com)

We need a Science of Evals

Marius Hobbhahn and Jérémy Scheurer

22 Jan 2024 20:30 UTC

74 points

13 comments9 min readLW link

Announcing the SoS Research Collective for independent researchers (and academics thinking independently)

rogersbacon22 Jan 2024 20:13 UTC

15 points

0 comments8 min readLW link

(www.theseedsofscience.pub)