All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30

Extrapolating from Five Words

Gordon Seidoh Worley15 Nov 2023 23:21 UTC

40 points

11 comments2 min readLW link

In Defense of Parselmouths

Screwtape15 Nov 2023 23:02 UTC

56 points

12 comments10 min readLW link 1 review

Life on the Grid (Part 1)

rogersbacon15 Nov 2023 22:37 UTC

12 points

4 comments9 min readLW link

(www.secretorum.life)

Glomarization FAQ

Zane15 Nov 2023 20:20 UTC

39 points

5 comments5 min readLW link

Testbed evals: evaluating AI safety even when it can’t be directly measured

joshc15 Nov 2023 19:00 UTC

72 points

2 comments4 min readLW link

EA/ACX/LW November Santa Cruz Meetup

madmail15 Nov 2023 18:39 UTC

1 point

0 comments1 min readLW link

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Joe Carlsmith15 Nov 2023 17:16 UTC

83 points

28 comments30 min readLW link 1 review

Large Language Models can Strategically Deceive their Users when Put Under Pressure.

ReaderM15 Nov 2023 16:36 UTC

90 points

9 comments2 min readLW link 1 review

(arxiv.org)

AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI

Corin Katzke, allison huang and Dan H

15 Nov 2023 16:07 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

‘Theories of Values’ and ‘Theories of Agents’: confusions, musings and desiderata

Mateusz Bagiński and Nora_Ammann

15 Nov 2023 16:00 UTC

35 points

8 comments24 min readLW link

Experiences and learnings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC

111 points

4 comments18 min readLW link

A conceptual precursor to today’s language machines [Shannon]

Bill Benzon15 Nov 2023 13:50 UTC

24 points

6 comments2 min readLW link

[Question] Should Advanced Placement High School classes discuss Israel-Palestine? If so, how? If not, why? Who should make this decision?

Gesild Muka15 Nov 2023 4:50 UTC

−1 points

5 comments1 min readLW link

Reinforcement Via Giving People Cookies

Screwtape15 Nov 2023 4:34 UTC

70 points

9 comments6 min readLW link

Incidental polysemanticity

Victor Lecomte, Kushal Thaman, tmychow and Rylan Schaeffer

15 Nov 2023 4:00 UTC

43 points

7 comments11 min readLW link

LLMs May Find It Hard to FOOM

RogerDearnaley15 Nov 2023 2:52 UTC

13 points

30 comments12 min readLW link

Linearity Fallacies

hippo15 Nov 2023 2:23 UTC

15 points

0 comments5 min readLW link

SIA Is Just Being a Bayesian About the Fact That One Exists

Bentham's Bulldog14 Nov 2023 22:55 UTC

3 points

5 comments4 min readLW link

AI Alignment [progress] this Week (11/12/2023)

Logan Zoellner14 Nov 2023 22:21 UTC

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)

[Question] When did Eliezer Yudkowsky change his mind about neural networks?

[deactivated]14 Nov 2023 21:24 UTC

32 points

15 comments1 min readLW link

Betting on what is un-falsifiable and un-verifiable

Abhimanyu Pallavi Sudhir14 Nov 2023 21:11 UTC

15 points

0 comments15 min readLW link

Facebook is Paying Me to Post

jefftk14 Nov 2023 19:10 UTC

26 points

5 comments1 min readLW link

(www.jefftk.com)

Feelings, Nothing More than Feelings, About AI

PaulBecon14 Nov 2023 18:50 UTC

7 points

0 comments3 min readLW link

Kids or No kids

Kids or no kids14 Nov 2023 18:37 UTC

100 points

10 comments13 min readLW link

Raemon’s Deliberate (“Purposeful?”) Practice Club

Raemon, Elizabeth, lynettebye and Alex_Altair

14 Nov 2023 18:24 UTC

62 points

11 comments22 min readLW link

More metal less ore

Logan Kieller14 Nov 2023 16:59 UTC

10 points

3 comments2 min readLW link

(logankieller.substack.com)

Monthly Roundup #12: November 2023

Zvi14 Nov 2023 15:20 UTC

34 points

5 comments33 min readLW link

(thezvi.wordpress.com)

Do you want a first-principled preparedness guide to prepare yourself and loved ones for potential catastrophes?

Ulrik Horn14 Nov 2023 12:13 UTC

16 points

5 comments15 min readLW link

[Question] Is there Work on Embedded Agency in Cellular Automata Toy Models?

Johannes C. Mayer14 Nov 2023 9:08 UTC

10 points

0 comments1 min readLW link

[Question] Would this be Progress in Solving Embedded Agency?

Johannes C. Mayer14 Nov 2023 9:08 UTC

9 points

2 comments2 min readLW link

Is Interpretability All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC

2 points

1 comment1 min readLW link

What is wisdom?

TsviBT14 Nov 2023 2:13 UTC

47 points

3 comments13 min readLW link

Festival Stats 2023

jefftk14 Nov 2023 1:20 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Out of the Box

jesseduffield13 Nov 2023 23:43 UTC

5 points

1 comment7 min readLW link

Loudly Give Up, Don’t Quietly Fade

Screwtape13 Nov 2023 23:30 UTC

194 points

13 comments6 min readLW link 1 review

Great Empathy and Great Response Ability

positivesum13 Nov 2023 23:04 UTC

16 points

0 comments3 min readLW link

(tryingtruly.substack.com)

Theories of Change for AI Auditing

Lee Sharkey, beren and Marius Hobbhahn

13 Nov 2023 19:33 UTC

54 points

0 comments18 min readLW link

(www.apolloresearch.ai)

They are made of repeating patterns

quetzal_rainbow13 Nov 2023 18:17 UTC

62 points

4 comments2 min readLW link

How to Upload a Mind (In Three Not-So-Easy Steps)

aggliu and Writer

13 Nov 2023 18:13 UTC

26 points

0 comments7 min readLW link

(youtu.be)

Non-myopia stories

lberglund13 Nov 2023 17:52 UTC

29 points

10 comments7 min readLW link

It’s OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

Mikhail Samin13 Nov 2023 16:51 UTC

0 points

17 comments7 min readLW link

Suggestions for chess puzzles

Zane13 Nov 2023 15:39 UTC

13 points

1 comment1 min readLW link

Why small phenomenons are relevant to morality

Ryo 13 Nov 2023 15:25 UTC

1 point

0 comments3 min readLW link

Optionality approach to ethics

Ryo 13 Nov 2023 15:23 UTC

7 points

3 comments3 min readLW link

Redirecting one’s own taxes as an effective altruism method

David Gross13 Nov 2023 15:17 UTC

−25 points

35 comments16 min readLW link

AISC Project: Benchmarks for Stable Reflectivity

jacquesthibs13 Nov 2023 14:51 UTC

17 points

0 comments8 min readLW link

Research Adenda: Modelling Trajectories of Language Models

NickyP13 Nov 2023 14:33 UTC

28 points

0 comments12 min readLW link

Bostrom Goes Unheard

Zvi13 Nov 2023 14:11 UTC

81 points

9 comments18 min readLW link

November hangout in Warsaw

ntoxeg13 Nov 2023 13:20 UTC

1 point

1 comment1 min readLW link

The Science Algorithm AISC Project

Johannes C. Mayer13 Nov 2023 12:52 UTC

12 points

0 comments1 min readLW link

(docs.google.com)