All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Overcoming the MWC

Mark FreedJul 25, 2023, 5:31 PM

3 points

0 comments3 min readLW link

Russian parliamentarian: let’s ban personal computers and the Internet

RomanSJul 25, 2023, 5:30 PM

11 points

6 comments2 min readLW link

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Corin Katzke and Dan H

Jul 25, 2023, 4:58 PM

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

“The Universe of Minds”—call for reviewers (Seeds of Science)

rogersbaconJul 25, 2023, 4:53 PM

7 points

0 comments1 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

berenJul 25, 2023, 4:41 PM

53 points

4 comments18 min readLW link

Should you work at a leading AI lab? (including in non-safety roles)

Benjamin HiltonJul 25, 2023, 4:29 PM

7 points

0 comments12 min readLW link

Whisper’s Word-Level Timestamps are Out

Varshul GuptaJul 25, 2023, 2:32 PM

−18 points

2 comments2 min readLW link

(dubverseblack.substack.com)

AIS 101: Task decomposition for scalable oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM

35 points

0 comments19 min readLW link

(docs.google.com)

Anthropic Observations

ZviJul 25, 2023, 12:50 PM

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Autonomous Alignment Oversight Framework (AAOF)

JustausernameJul 25, 2023, 10:25 AM

−9 points

0 comments4 min readLW link

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM

135 points

16 comments8 min readLW link

Secure Hand Holding

jefftkJul 25, 2023, 1:40 AM

28 points

43 comments1 min readLW link

(www.jefftk.com)

Open problems in activation engineering

TurnTrout, woog, lisathiergart, Monte M and Ulisse Mini

Jul 24, 2023, 7:46 PM

51 points

2 comments1 min readLW link

(coda.io)

Subdivisions for Useful Distillations?

Sharat Jacob JacobJul 24, 2023, 6:55 PM

9 points

2 comments2 min readLW link

Optimizing For Approval And Disapproval

Thoth HermesJul 24, 2023, 6:46 PM

−1 points

0 comments12 min readLW link

(thothhermes.substack.com)

An Opinionated Guide to Computability and Complexity (Post #0)

Noosphere89Jul 24, 2023, 5:53 PM

10 points

10 comments3 min readLW link

Slowing down AI progress is an underexplored alignment strategy

Norman BorlaugJul 24, 2023, 4:56 PM

42 points

27 comments5 min readLW link

Anticipation in LLMs

derek shillerJul 24, 2023, 3:53 PM

6 points

0 comments13 min readLW link

The cone of freedom (or, freedom might only be instrumentally valuable)

dkl9Jul 24, 2023, 3:38 PM

−10 points

6 comments2 min readLW link

(dkl9.net)

A reformulation of Finite Factored Sets

Matthias G. MayerJul 24, 2023, 1:02 PM

76 points

1 comment8 min readLW link

Brain Efficiency Cannell Prize Contest Award Ceremony

Alexander Gietelink OldenzielJul 24, 2023, 11:30 AM

149 points

12 comments7 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.bartenJul 24, 2023, 10:07 AM

12 points

0 comments7 min readLW link

(time.com)

Cryonics and Regret

MvBJul 24, 2023, 9:16 AM

192 points

35 comments2 min readLW link 1 review

Rationality !== Winning

RaemonJul 24, 2023, 2:53 AM

170 points

51 comments4 min readLW link

[Question] Which rationality posts are begging for further practical development?

LoganStrohlJul 23, 2023, 10:22 PM

60 points

17 comments1 min readLW link

Please speak unpredictably

dkl9Jul 23, 2023, 10:09 PM

21 points

16 comments1 min readLW link

(dkl9.net)

QAPR 5: grokking is maybe not that big a deal?

Quintin PopeJul 23, 2023, 8:14 PM

114 points

15 comments9 min readLW link

My favorite AI governance research this year so far

Zach Stein-PerlmanJul 23, 2023, 4:30 PM

26 points

1 comment7 min readLW link

(blog.aiimpacts.org)

“Justice, Cherryl.”

Zack_M_DavisJul 23, 2023, 4:16 PM

91 points

21 comments9 min readLW link 1 review

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

JustausernameJul 23, 2023, 4:08 PM

4 points

1 comment3 min readLW link

Autogynephilia discourse is so absurdly bad on all sides

tailcalledJul 23, 2023, 1:12 PM

44 points

24 comments2 min readLW link

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

Jul 22, 2023, 8:21 PM

21 points

5 comments6 min readLW link

Think like a consultant not a salesperson

Adam ZernerJul 22, 2023, 7:31 PM

16 points

5 comments2 min readLW link

Optimization, loss set at variance in RL

ClairstanJul 22, 2023, 6:25 PM

1 point

1 comment3 min readLW link

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidadJul 22, 2023, 6:09 PM

80 points

2 comments2 min readLW link

Apollo Neuro Follow Up

ElizabethJul 22, 2023, 5:20 PM

28 points

0 comments1 min readLW link

(acesounderglass.com)

Expert trap – Ways out (Part 3 of 3)

Paweł SysiakJul 22, 2023, 1:06 PM

4 points

0 comments9 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

Jul 22, 2023, 12:21 PM

31 points

0 comments9 min readLW link

Replacing the Big Air Purifier

jefftkJul 22, 2023, 12:10 PM

10 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] I’m consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?

Benjamin HendricksJul 21, 2023, 9:10 PM

71 points

42 comments2 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarikJul 21, 2023, 9:03 PM

12 points

18 comments3 min readLW link

Cooking Air Quality

jefftkJul 21, 2023, 7:30 PM

16 points

1 comment2 min readLW link

(www.jefftk.com)

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

Jul 21, 2023, 6:27 PM

29 points

6 comments7 min readLW link

News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI

Jonathan ClaybroughJul 21, 2023, 6:00 PM

65 points

10 comments2 min readLW link

(www.whitehouse.gov)

The UAP Disclosure Act of 2023 and its implications

andeslodesJul 21, 2023, 5:21 PM

36 points

47 comments20 min readLW link

(www.congress.gov)

To use computers well, learn their rules

dkl9Jul 21, 2023, 5:00 PM

4 points

6 comments4 min readLW link

(dkl9.net)

BCIs and the ecosystem of modular minds

berenJul 21, 2023, 3:58 PM

88 points

14 comments11 min readLW link

Priorities for the UK Foundation Models Taskforce

Andrea_MiottiJul 21, 2023, 3:23 PM

105 points

4 comments5 min readLW link

(www.conjecture.dev)

Training Process Transparency through Gradient Interpretability: Early experiments on toy language models

robertzk and evhub

Jul 21, 2023, 2:52 PM

56 points

1 comment1 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo57Jul 21, 2023, 2:03 PM

−5 points

2 comments1 min readLW link