All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

[Question] Which rationality posts are begging for further practical development?

LoganStrohl23 Jul 2023 22:22 UTC

60 points

17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC

46 points

21 comments1 min readLW link

(dkl9.net)

QAPR 5: grokking is maybe not that big a deal?

Quintin Pope23 Jul 2023 20:14 UTC

114 points

15 comments9 min readLW link

My favorite AI governance research this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC

26 points

1 comment7 min readLW link

(blog.aiimpacts.org)

“Justice, Cherryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC

91 points

21 comments9 min readLW link 1 review

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

Justausername23 Jul 2023 16:08 UTC

4 points

1 comment3 min readLW link

Autogynephilia discourse is so absurdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC

44 points

24 comments2 min readLW link

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

22 Jul 2023 20:21 UTC

21 points

5 comments6 min readLW link

Think like a consultant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC

16 points

5 comments2 min readLW link

Optimization, loss set at variance in RL

Clairstan22 Jul 2023 18:25 UTC

1 point

1 comment3 min readLW link

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidad22 Jul 2023 18:09 UTC

80 points

2 comments2 min readLW link

Apollo Neuro Follow Up

Elizabeth22 Jul 2023 17:20 UTC

28 points

0 comments1 min readLW link

(acesounderglass.com)

Expert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC

4 points

0 comments9 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

22 Jul 2023 12:21 UTC

31 points

0 comments9 min readLW link

Replacing the Big Air Purifier

jefftk22 Jul 2023 12:10 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] I’m consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?

Benjamin Hendricks21 Jul 2023 21:10 UTC

73 points

42 comments2 min readLW link

Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

VojtaKovarik21 Jul 2023 21:03 UTC

12 points

18 comments3 min readLW link

Cooking Air Quality

jefftk21 Jul 2023 19:30 UTC

16 points

1 comment2 min readLW link

(www.jefftk.com)

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI

Jonathan Claybrough21 Jul 2023 18:00 UTC

65 points

10 comments2 min readLW link

(www.whitehouse.gov)

The UAP Disclosure Act of 2023 and its implications

andeslodes21 Jul 2023 17:21 UTC

36 points

47 comments20 min readLW link

(www.congress.gov)

To use computers well, learn their rules

dkl921 Jul 2023 17:00 UTC

4 points

6 comments4 min readLW link

(dkl9.net)

BCIs and the ecosystem of modular minds

beren21 Jul 2023 15:58 UTC

88 points

14 comments11 min readLW link

Priorities for the UK Foundation Models Taskforce

Andrea_Miotti21 Jul 2023 15:23 UTC

105 points

4 comments5 min readLW link

(www.conjecture.dev)

Training Process Transparency through Gradient Interpretability: Early experiments on toy language models

robertzk and evhub

21 Jul 2023 14:52 UTC

56 points

1 comment1 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC

−5 points

2 comments1 min readLW link

Case for Foundation Models beyond English

Varshul Gupta21 Jul 2023 13:59 UTC

1 point

0 comments3 min readLW link

(dubverseblack.substack.com)

Meta is hiring for LLM red teaming position

Michael Tontchev21 Jul 2023 13:57 UTC

7 points

0 comments1 min readLW link

(us.meta.talentnet.community)

[Linkpost] Interpreting Multimodal Video Transformers Using Brain Recordings

Bogdan Ionut Cirstea21 Jul 2023 11:26 UTC

5 points

0 comments1 min readLW link

Berlin AI Alignment Open Meetup August 2023

GuyP21 Jul 2023 10:58 UTC

1 point

0 comments1 min readLW link

Decoding intermediate activations in llama-2-7b

Nina Panickssery21 Jul 2023 5:35 UTC

39 points

3 comments4 min readLW link

GPT-2′s positional embedding matrix is a helix

AdamYedidia21 Jul 2023 4:16 UTC

52 points

21 comments4 min readLW link

Problems with predictive history classes

dkl920 Jul 2023 23:28 UTC

15 points

5 comments1 min readLW link

Announcement: AI Narrations Available for All New LessWrong Posts

Solenoid_Entity, Ruby, Raemon, peter_hartree and TYPE III AUDIO

20 Jul 2023 22:17 UTC

71 points

28 comments1 min readLW link

AI #21: The Cup Overfloweth

Zvi20 Jul 2023 21:30 UTC

47 points

4 comments64 min readLW link

(thezvi.wordpress.com)

All AGI Safety questions welcome (especially basic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC

38 points

41 comments2 min readLW link

(forum.effectivealtruism.org)

Growth of Publicly Available Genetic Sequencing Data

jefftk20 Jul 2023 19:50 UTC

11 points

2 comments1 min readLW link

(www.jefftk.com)

Progress links and tweets, 2023-07-20: “A goddess enthroned on a car”

jasoncrawford20 Jul 2023 18:28 UTC

12 points

4 comments2 min readLW link

(rootsofprogress.org)

Boundary Placement Rebellion

tailcalled20 Jul 2023 17:40 UTC

54 points

21 comments12 min readLW link

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

zhanpeng_zhou20 Jul 2023 17:38 UTC

22 points

13 comments3 min readLW link

(openreview.net)

Even Superhuman Go AIs Have Surprising Failure Modes

AdamGleave, EuanMcLean, Tony Wang, Kellin Pelrine, Tom Tseng, Yawen Duan, Joseph Miller and MichaelDennis

20 Jul 2023 17:31 UTC

130 points

22 comments10 min readLW link

(far.ai)

Paper digestion: “May We Have Your Attention Please? Human-Rights NGOs and the Problem of Global Communication”

Klara Helene Nielsen20 Jul 2023 17:08 UTC

4 points

1 comment2 min readLW link

(journals.sagepub.com)

The (short) case for predicting what Aliens value

Jim Buhler20 Jul 2023 15:25 UTC

17 points

5 comments3 min readLW link

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah and Vlad Mikulik

20 Jul 2023 10:50 UTC

44 points

3 comments2 min readLW link

(arxiv.org)

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirk20 Jul 2023 9:56 UTC

39 points

2 comments5 min readLW link

A case for gamete personhood (reductio ad absurdum)

Ansyn131220 Jul 2023 8:25 UTC

−1 points

4 comments1 min readLW link

Contra Contra the Social Model of Disability

DirectedEvolution20 Jul 2023 6:59 UTC

21 points

22 comments16 min readLW link

[Question] Do you speed up capabilities when you do AI integrations and consume overhangs?

Michael Tontchev20 Jul 2023 6:40 UTC

6 points

1 comment1 min readLW link

Project Lawful Audiobook: An Unofficial Fan Production with ElevenLabs AI

Askwho19 Jul 2023 23:34 UTC

22 points

3 comments1 min readLW link

(askwhocastsai.substack.com)

Using predictors in corrigible systems

porby19 Jul 2023 22:29 UTC

21 points

6 comments27 min readLW link