All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

Elizabeth28 Sep 2023 23:30 UTC

334 points

250 comments22 min readLW link 2 reviews

(acesounderglass.com)

Competitive, Cooperative, and Cohabitive

Screwtape28 Sep 2023 23:25 UTC

51 points

13 comments5 min readLW link 1 review

The Coming Wave

PeterMcCluskey28 Sep 2023 22:59 UTC

27 points

1 comment6 min readLW link

(bayesianinvestor.com)

High-level interpretability: detecting an AI’s objectives

Paul Colognese and Jozdien

28 Sep 2023 19:30 UTC

72 points

4 comments21 min readLW link

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

JanB, Owain_Evans and SoerenMind

28 Sep 2023 18:53 UTC

187 points

39 comments3 min readLW link 1 review

Responsible scaling policy TLDR

lemonhope28 Sep 2023 18:51 UTC

9 points

0 comments1 min readLW link

Alignment Workshop talks

Richard_Ngo28 Sep 2023 18:26 UTC

37 points

1 comment1 min readLW link

(www.alignment-workshop.com)

My Current Thoughts on the AI Strategic Landscape

Jeffrey Heninger28 Sep 2023 17:59 UTC

11 points

28 comments14 min readLW link

My Arrogant Plan for Alignment

MrArrogant28 Sep 2023 17:51 UTC

2 points

6 comments6 min readLW link

Discursive Competence in ChatGPT, Part 2: Memory for Texts

Bill Benzon28 Sep 2023 16:34 UTC

1 point

0 comments3 min readLW link

Different views of alignment have different consequences for imperfect methods

Stuart_Armstrong28 Sep 2023 16:31 UTC

33 points

0 comments1 min readLW link

AI #31: It Can Do What Now?

Zvi28 Sep 2023 16:00 UTC

90 points

6 comments40 min readLW link

(thezvi.wordpress.com)

The point of a game is not to win, and you shouldn’t even pretend that it is

mako yass28 Sep 2023 15:54 UTC

53 points

27 comments4 min readLW link

(makopool.com)

Cohabitive Games so Far

mako yass28 Sep 2023 15:41 UTC

140 points

146 comments19 min readLW link 2 reviews

(makopool.com)

Wobbly Table Theorem in Practice

Morpheus28 Sep 2023 14:33 UTC

25 points

0 comments2 min readLW link

Weighing Animal Worth

jefftk28 Sep 2023 13:50 UTC

25 points

11 comments2 min readLW link

(www.jefftk.com)

ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC

40 points

10 comments2 min readLW link 1 review

(evals.alignment.org)

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)

Ruby28 Sep 2023 2:48 UTC

66 points

73 comments6 min readLW link

Alibaba Group releases Qwen, 14B parameter LLM

Nikola Jurkovic28 Sep 2023 0:12 UTC

5 points

1 comment1 min readLW link

(qianwen-res.oss-cn-beijing.aliyuncs.com)

Metaculus Launches 2023/2024 FluSight Challenge Supporting CDC, $5K in Prizes

ChristianWilliams27 Sep 2023 21:35 UTC

5 points

0 comments1 min readLW link

(www.metaculus.com)

Projects I would like to see (possibly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC

22 points

12 comments4 min readLW link

Towards Better Milestones for Monitoring AI Capabilities

snewman27 Sep 2023 21:18 UTC

11 points

0 comments14 min readLW link

[Question] Is Bjorn Lomborg roughly right about climate change policy?

yhoiseth27 Sep 2023 20:06 UTC

29 points

14 comments2 min readLW link

(www.sciencedirect.com)

Commonsense Good, Creative Good

jefftk27 Sep 2023 19:50 UTC

70 points

11 comments3 min readLW link

(www.jefftk.com)

Petrov Day [Spoiler Warning]

lsusr27 Sep 2023 19:20 UTC

6 points

5 comments1 min readLW link

The Hidden Complexity of Wishes—The Animation

Writer27 Sep 2023 17:59 UTC

33 points

0 comments1 min readLW link

(youtu.be)

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

corey morris27 Sep 2023 17:54 UTC

18 points

3 comments4 min readLW link

(medium.com)

[Question] What’s your standard for good work performance?

Chi Nguyen27 Sep 2023 16:58 UTC

30 points

3 comments1 min readLW link

The Role of Groups in the Progression of Human Understanding

Chris_Leong27 Sep 2023 15:09 UTC

11 points

0 comments2 min readLW link

The Great Disembedding

rogersbacon27 Sep 2023 14:53 UTC

16 points

6 comments16 min readLW link

(www.secretorum.life)

[Question] how do short-timeliners reason about the differences between brain and AI?

JavierCC27 Sep 2023 8:13 UTC

2 points

11 comments1 min readLW link

[Question] Is there a widely accepted metric for ‘genuineness’ in interpersonal communication?

M. Y. Zuo27 Sep 2023 5:30 UTC

6 points

2 comments1 min readLW link

Bariatric surgery seems like a no-brainer for most morbidly obese people

lc27 Sep 2023 1:05 UTC

12 points

12 comments3 min readLW link

Jacob on the Precipice

Richard_Ngo26 Sep 2023 21:16 UTC

48 points

8 comments11 min readLW link

(narrativeark.substack.com)

Text Posts from the Kids Group: 2022

jefftk26 Sep 2023 20:40 UTC

33 points

2 comments7 min readLW link

(www.jefftk.com)

GPT-4 for personal productivity: online distraction blocker

Sergii26 Sep 2023 17:41 UTC

67 points

13 comments2 min readLW link

(grgv.xyz)

ARENA 2.0 - Impact Report

CallumMcDougall26 Sep 2023 17:13 UTC

35 points

5 comments13 min readLW link

Mechanistic Interpretability Reading group

1stuserhere and woog

26 Sep 2023 16:26 UTC

15 points

0 comments1 min readLW link

Announcing the CNN Interpretability Competition

scasper26 Sep 2023 16:21 UTC

22 points

0 comments4 min readLW link

Making AIs less likely to be spiteful

Nicolas Macé, Anthony DiGiovanni and JesseClifton

26 Sep 2023 14:12 UTC

118 points

7 comments10 min readLW link

[Linkpost] Mark Zuckerberg confronted about Meta’s Llama 2 AI’s ability to give users detailed guidance on making anthrax—Business Insider

mic26 Sep 2023 12:05 UTC

18 points

11 comments2 min readLW link

(www.businessinsider.com)

Enforcing Far-Future Contracts for Governments

FCCC26 Sep 2023 4:26 UTC

−7 points

49 comments3 min readLW link

Carioca Petrov Day

Giskard26 Sep 2023 0:30 UTC

1 point

0 comments1 min readLW link

[Question] A few Alignment questions: utility optimizers, SLT, sharp left turn and identifiability

Igor Timofeev26 Sep 2023 0:27 UTC

6 points

1 comment2 min readLW link

Impact stories for model internals: an exercise for interpretability researchers

jenny25 Sep 2023 23:15 UTC

29 points

3 comments7 min readLW link

Autonomic Sanity

Sable25 Sep 2023 22:37 UTC

20 points

9 comments4 min readLW link

(affablyevil.substack.com)

[Question] What is wrong with this “utility switch button problem” approach?

Donald Hobson25 Sep 2023 21:36 UTC

14 points

3 comments1 min readLW link

You should just smile at strangers a lot

chaosmage25 Sep 2023 20:12 UTC

18 points

10 comments1 min readLW link

The King and the Golem

Richard_Ngo25 Sep 2023 19:51 UTC

210 points

19 comments5 min readLW link 1 review

(narrativeark.substack.com)

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

Jacy Reese Anthis, Janet Pauketat and Ali

25 Sep 2023 18:55 UTC

3 points

2 comments3 min readLW link

(www.sentienceinstitute.org)