All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930 31

Comparing representation vectors between llama 2 base and chat

Nina Panickssery28 Oct 2023 22:54 UTC

36 points

5 comments2 min readLW link

Vaniver’s thoughts on Anthropic’s RSP

Vaniver28 Oct 2023 21:06 UTC

46 points

4 comments3 min readLW link

Book Review: Orality and Literacy: The Technologizing of the Word

Fergus Fettes28 Oct 2023 20:12 UTC

13 points

0 comments16 min readLW link

Regrant up to $600,000 to AI safety projects with GiveWiki

Dawn Drescher28 Oct 2023 19:56 UTC

33 points

1 comment3 min readLW link

Shane Legg interview on alignment

Seth Herd28 Oct 2023 19:28 UTC

66 points

20 comments2 min readLW link

(www.youtube.com)

AI Existential Safety Fellowships

mmfli28 Oct 2023 18:07 UTC

5 points

0 comments1 min readLW link

AI Safety Hub Serbia Official Opening

DusanDNesic and Tanja T

28 Oct 2023 17:03 UTC

55 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

 Managing AI Risks in an Era of Rapid Progress

Algon28 Oct 2023 15:48 UTC

36 points

5 comments11 min readLW link

(managing-ai-risks.com)

[Question] ELI5 Why isn’t alignment easier as models get stronger?

Logan Zoellner28 Oct 2023 14:34 UTC

3 points

9 comments1 min readLW link

Truthseeking, EA, Simulacra levels, and other stuff

Elizabeth and Vaniver

27 Oct 2023 23:56 UTC

45 points

12 comments9 min readLW link

[Question] Do you believe “E=mc^2” is a correct and/or useful equation, and, whether yes or no, precisely what are your reasons for holding this belief (with such a degree of confidence)?

l8c27 Oct 2023 22:46 UTC

12 points

14 comments1 min readLW link

Value systematization: how values become coherent (and misaligned)

Richard_Ngo27 Oct 2023 19:06 UTC

108 points

49 comments13 min readLW link

Techno-humanism is techno-optimism for the 21st century

Richard_Ngo27 Oct 2023 18:37 UTC

88 points

5 comments14 min readLW link

(www.mindthefuture.info)

Sanctuary for Humans

Nikola Jurkovic27 Oct 2023 18:08 UTC

22 points

9 comments1 min readLW link

Wireheading and misalignment by composition on NetHack

pierlucadoro27 Oct 2023 17:43 UTC

34 points

4 comments4 min readLW link

We’re Not Ready: thoughts on “pausing” and responsible scaling policies

HoldenKarnofsky27 Oct 2023 15:19 UTC

200 points

33 comments8 min readLW link

Aspiration-based Q-Learning

Clément Dumas and Jobst Heitzig

27 Oct 2023 14:42 UTC

38 points

5 comments11 min readLW link

Linkpost: Rishi Sunak’s Speech on AI (26th October)

bideup27 Oct 2023 11:57 UTC

85 points

8 comments7 min readLW link

(www.gov.uk)

ASPR & WARP: Rationality Camps for Teens in Taiwan and Oxford

Anna Gajdova27 Oct 2023 8:40 UTC

18 points

0 comments1 min readLW link

[Question] To what extent is the UK Government’s recent AI Safety push entirely due to Rishi Sunak?

Stephen Fowler27 Oct 2023 3:29 UTC

23 points

4 comments1 min readLW link

Bayesian Punishment

Rob Lucas27 Oct 2023 3:24 UTC

1 point

1 comment6 min readLW link

Online Dialogues Party — Sunday 5th November

Ben Pace27 Oct 2023 2:41 UTC

28 points

1 comment1 min readLW link

OpenAI’s new Preparedness team is hiring

leopold26 Oct 2023 20:42 UTC

60 points

2 comments1 min readLW link

Fake Deeply

Zack_M_Davis26 Oct 2023 19:55 UTC

33 points

7 comments1 min readLW link

(unremediatedgender.space)

Symbol/Referent Confusions in Language Model Alignment Experiments

johnswentworth26 Oct 2023 19:49 UTC

120 points

51 comments6 min readLW link 1 review

Unsupervised Methods for Concept Discovery in AlphaZero

aog26 Oct 2023 19:05 UTC

9 points

0 comments1 min readLW link

(arxiv.org)

[Question] Nonlinear limitations of ReLUs

magfrump26 Oct 2023 18:51 UTC

13 points

1 comment1 min readLW link

AI Alignment Problem: Requirement not optional (A Critical Analysis through Mass Effect Trilogy)

TAWSIF AHMED26 Oct 2023 18:02 UTC

−9 points

0 comments4 min readLW link

[Thought Experiment] Tomorrow’s Echo—The future of synthetic companionship.

Vimal Naran26 Oct 2023 17:54 UTC

−7 points

2 comments2 min readLW link

Disagreements over the prioritization of existential risk from AI

Olivier Coutu26 Oct 2023 17:54 UTC

10 points

0 comments6 min readLW link

[Question] What if AGI had its own universe to maybe wreck?

mseale26 Oct 2023 17:49 UTC

−1 points

2 comments1 min readLW link

Changing Contra Dialects

jefftk26 Oct 2023 17:30 UTC

25 points

2 comments1 min readLW link

(www.jefftk.com)

5 psychological reasons for dismissing x-risks from AGI

Igor Ivanov26 Oct 2023 17:21 UTC

24 points

6 comments4 min readLW link

5. Risks from preventing legitimate value change (value collapse)

Nora_Ammann26 Oct 2023 14:38 UTC

13 points

1 comment9 min readLW link

4. Risks from causing illegitimate value change (performative predictors)

Nora_Ammann26 Oct 2023 14:38 UTC

8 points

3 comments5 min readLW link

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem

Nora_Ammann26 Oct 2023 14:38 UTC

28 points

4 comments4 min readLW link

2. Premise two: Some cases of value change are (il)legitimate

Nora_Ammann26 Oct 2023 14:36 UTC

24 points

7 comments6 min readLW link

1. Premise one: Values are malleable

Nora_Ammann26 Oct 2023 14:36 UTC

21 points

1 comment15 min readLW link

0. The Value Change Problem: introduction, overview and motivations

Nora_Ammann26 Oct 2023 14:36 UTC

32 points

0 comments5 min readLW link

EPUBs of MIRI Blog Archives and selected LW Sequences

mesaoptimizer26 Oct 2023 14:17 UTC

44 points

5 comments1 min readLW link

(git.sr.ht)

UK Government publishes “Frontier AI: capabilities and risks” Discussion Paper

A.H.26 Oct 2023 13:55 UTC

5 points

0 comments2 min readLW link

(www.gov.uk)

AI #35: Responsible Scaling Policies

Zvi26 Oct 2023 13:30 UTC

66 points

10 comments55 min readLW link

(thezvi.wordpress.com)

RA Bounty: Looking for feedback on screenplay about AI Risk

Writer26 Oct 2023 13:23 UTC

32 points

6 comments1 min readLW link

Notes on “How do we become confident in the safety of a machine learning system?”

RohanS26 Oct 2023 3:13 UTC

4 points

0 comments13 min readLW link

Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter

Nate Thomas26 Oct 2023 3:07 UTC

42 points

10 comments1 min readLW link

CHAI internship applications are open (due Nov 13)

Erik Jenner26 Oct 2023 0:53 UTC

34 points

0 comments3 min readLW link

Architects of Our Own Demise: We Should Stop Developing AI Carelessly

Roko26 Oct 2023 0:36 UTC

170 points

75 comments3 min readLW link

EA Infrastructure Fund: June 2023 grant recommendations

Linch26 Oct 2023 0:35 UTC

21 points

0 comments12 min readLW link

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c25 Oct 2023 23:46 UTC

123 points

35 comments22 min readLW link 1 review

(www.navigatingrisks.ai)

AI as a science, and three obstacles to alignment strategies

So8res25 Oct 2023 21:00 UTC

194 points

80 comments11 min readLW link