All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

Are AIs like Animals? Perspectives and Strategies from Biology

Jackson EmanuelMay 16, 2023, 11:39 PM

1 point

0 comments21 min readLW link

A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)

Joseph BloomMay 16, 2023, 10:59 PM

36 points

2 comments16 min readLW link

A TAI which kills all humans might also doom itself

Jeffrey HeningerMay 16, 2023, 10:36 PM

7 points

3 comments3 min readLW link

Brief notes on the Senate hearing on AI oversight

DizietMay 16, 2023, 10:29 PM

77 points

2 comments2 min readLW link

$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions

johnswentworthMay 16, 2023, 9:31 PM

40 points

11 comments2 min readLW link

Progress links and tweets, 2023-05-16

jasoncrawfordMay 16, 2023, 8:54 PM

14 points

0 comments1 min readLW link

(rootsofprogress.org)

AI Will Not Want to Self-Improve

petersalibMay 16, 2023, 8:53 PM

28 points

24 comments20 min readLW link

Nice intro video to RSI

Nathan Helm-BurgerMay 16, 2023, 6:48 PM

12 points

0 comments1 min readLW link

(youtu.be)

[Interview w/ Zvi Mowshowitz] Should we halt progress in AI?

fowlertmMay 16, 2023, 6:12 PM

18 points

2 comments3 min readLW link

AI Risk & Policy Forecasts from Metaculus & FLI’s AI Pathways Workshop

_will_May 16, 2023, 6:06 PM

11 points

4 comments8 min readLW link

[Question] Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?

Thoth HermesMay 16, 2023, 6:02 PM

2 points

0 comments1 min readLW link

Decision Theory with the Magic Parts Highlighted

moridinamaelMay 16, 2023, 5:39 PM

175 points

24 comments5 min readLW link

We learn long-lasting strategies to protect ourselves from danger and rejection

Richard_NgoMay 16, 2023, 4:36 PM

86 points

5 comments5 min readLW link

Proposal: Align Systems Earlier In Training

OneManyNoneMay 16, 2023, 4:24 PM

18 points

0 comments11 min readLW link

Procedural Executive Function, Part 2

DaystarEldMay 16, 2023, 4:22 PM

24 points

0 comments18 min readLW link

(daystareld.com)

My current workflow to study the internal mechanisms of LLM

Yulu PiMay 16, 2023, 3:27 PM

4 points

0 comments1 min readLW link

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

Christopher KingMay 16, 2023, 3:18 PM

22 points

6 comments2 min readLW link

AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control

Dan H and Orpheus16

May 16, 2023, 3:14 PM

31 points

0 comments6 min readLW link

(newsletter.safe.ai)

Lazy Baked Mac and Cheese

jefftkMay 16, 2023, 2:40 PM

18 points

2 comments1 min readLW link

(www.jefftk.com)

Tyler Cowen’s challenge to develop an ‘actual mathematical model’ for AI X-Risk

Joe BrentonMay 16, 2023, 11:57 AM

6 points

4 comments1 min readLW link

Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios

Simon Lermen, Teun van der Weij and Leon Lang

May 16, 2023, 10:53 AM

26 points

0 comments13 min readLW link

[Review] Two People Smoking Behind the Supermarket

lsusrMay 16, 2023, 7:25 AM

32 points

1 comment1 min readLW link

Superposition and Dropout

Edoardo PonaMay 16, 2023, 7:24 AM

21 points

5 comments6 min readLW link

[Question] What is the literature on long term water fasts?

lcMay 16, 2023, 3:23 AM

16 points

4 comments1 min readLW link

Lessons learned from offering in-office nutritional testing

ElizabethMay 15, 2023, 11:20 PM

80 points

11 comments14 min readLW link

(acesounderglass.com)

Judgments often smuggle in implicit standards

Richard_NgoMay 15, 2023, 6:50 PM

95 points

4 comments3 min readLW link

Rational retirement plans

IkMay 15, 2023, 5:49 PM

5 points

17 comments1 min readLW link

[Question] (Crosspost) Asking for online calls on AI s-risks discussions

jackchang110May 15, 2023, 5:42 PM

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

Simple experiments with deceptive alignment

Andreas_MoeMay 15, 2023, 5:41 PM

7 points

0 comments4 min readLW link

Some Summaries of Agent Foundations Work

mattmacdermottMay 15, 2023, 4:09 PM

62 points

1 comment13 min readLW link

Facebook Increased Visibility

jefftkMay 15, 2023, 3:40 PM

15 points

1 comment1 min readLW link

(www.jefftk.com)

Un-unpluggability—can’t we just unplug it?

Oliver SourbutMay 15, 2023, 1:23 PM

26 points

10 comments12 min readLW link

(www.oliversourbut.net)

[Question] Can we learn much by studying the behaviour of RL policies?

AidanGothMay 15, 2023, 12:56 PM

1 point

0 comments1 min readLW link

How I apply (so-called) Non-Violent Communication

Kaj_SotalaMay 15, 2023, 9:56 AM

86 points

28 comments3 min readLW link

Let’s build a fire alarm for AGI

chaosmageMay 15, 2023, 9:16 AM

−1 points

0 comments2 min readLW link

From fear to excitement

Richard_NgoMay 15, 2023, 6:23 AM

132 points

9 comments3 min readLW link

Reward is the optimization target (of capabilities researchers)

Max HMay 15, 2023, 3:22 AM

32 points

4 comments5 min readLW link

The Lightcone Theorem: A Better Foundation For Natural Abstraction?

johnswentworthMay 15, 2023, 2:24 AM

69 points

25 comments6 min readLW link

GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion

Zach Stein-PerlmanMay 15, 2023, 1:42 AM

28 points

11 comments1 min readLW link

(arxiv.org)

[Question] Why don’t quantilizers also cut off the upper end of the distribution?

Alex_AltairMay 15, 2023, 1:40 AM

25 points

2 comments1 min readLW link

Support Structures for Naturalist Study

LoganStrohlMay 15, 2023, 12:25 AM

47 points

6 comments10 min readLW link

Catastrophic Regressional Goodhart: Appendix

Thomas Kwa and Drake Thomas

May 15, 2023, 12:10 AM

25 points

1 comment9 min readLW link

Helping your Senator Prepare for the Upcoming Sam Altman Hearing

Tiago de VassalMay 14, 2023, 10:45 PM

69 points

2 comments1 min readLW link

(aisafetytour.com)

Difficulties in making powerful aligned AI

DanielFilanMay 14, 2023, 8:50 PM

41 points

1 comment10 min readLW link

(danielfilan.com)

How much do markets value Open AI?

XodarapMay 14, 2023, 7:28 PM

21 points

5 comments LW link

Misaligned AGI Death Match

Nate Reinar WindwoodMay 14, 2023, 6:00 PM

1 point

0 comments1 min readLW link

[Question] What new technology, for what institutions?

bhauth14 May 2023 17:33 UTC

29 points

6 comments3 min readLW link

A strong mind continues its trajectory of creativity

TsviBT14 May 2023 17:24 UTC

22 points

8 comments6 min readLW link

Ontologies Should Be Backwards-Compatible

Thoth Hermes14 May 2023 17:21 UTC

3 points

3 comments4 min readLW link

(thothhermes.substack.com)

Jaan Tallinn’s 2022 Philanthropy Overview

jaan14 May 2023 15:35 UTC

64 points

2 comments1 min readLW link

(jaan.online)