All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28

Two problems with ‘Simulators’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC

79 points

13 comments5 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC

112 points

27 comments11 min readLW link

On Board Vision, Hollow Words, and the End of the World

Marcello17 Feb 2023 23:18 UTC

52 points

27 comments5 min readLW link

PICT: A Zero-Shot Prompt Template to Automate Evaluation

Quentin FEUILLADE--MONTIXI17 Feb 2023 23:16 UTC

17 points

1 comment11 min readLW link

Why Do We Believe

Screwtape17 Feb 2023 20:58 UTC

9 points

3 comments3 min readLW link

I Am Scared of Posting Negative Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC

63 points

29 comments1 min readLW link

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasper17 Feb 2023 20:48 UTC

49 points

9 comments12 min readLW link

Tinker Bell Theory and LLMs

Fergus Fettes17 Feb 2023 20:23 UTC

1 point

11 comments1 min readLW link

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

Vaniver17 Feb 2023 20:11 UTC

125 points

12 comments2 min readLW link

Microsoft and OpenAI, stop telling chatbots to roleplay as AI

hold_my_fish17 Feb 2023 19:55 UTC

51 points

10 comments1 min readLW link

A warm-up for the AI governance project

jacek17 Feb 2023 18:06 UTC

10 points

2 comments3 min readLW link

Link Post > Blog Post

party girl17 Feb 2023 17:59 UTC

4 points

6 comments1 min readLW link

(onthespectrumontheguestlist.substack.com)

One-layer transformers aren’t equivalent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC

127 points

11 comments7 min readLW link

[Question] Should we be kind and polite to emerging AIs?

David Gross17 Feb 2023 16:58 UTC

9 points

13 comments1 min readLW link

Follow-up Posting on Cyborg Psychologist

Hopkins Stanley17 Feb 2023 16:56 UTC

0 points

2 comments1 min readLW link

(www.lesswrong.com)

A “slow takeoff” might still look fast

MichaelDickens17 Feb 2023 16:51 UTC

5 points

3 comments1 min readLW link

AI Safety Info Distillation Fellowship

Robert Miles and mwatkins

17 Feb 2023 16:16 UTC

47 points

3 comments3 min readLW link

Nozick’s Dilemma: A Critique of Game Theory

Edward P. Könings17 Feb 2023 16:11 UTC

10 points

1 comment13 min readLW link

[Question] Are LLMs sufficient for AI takeoff?

rpglover6417 Feb 2023 15:46 UTC

8 points

2 comments1 min readLW link

Sydney’s Secret: A Short Story by Bing Chat

fela17 Feb 2023 13:31 UTC

36 points

1 comment5 min readLW link

Automating Consistency

Hoagy17 Feb 2023 13:24 UTC

10 points

0 comments1 min readLW link

Human decision processes are not well factored

remember and Gabriel Alfour

17 Feb 2023 13:11 UTC

33 points

3 comments2 min readLW link

2023 ACX Predictions: Buy/Sell/Hold

Zvi17 Feb 2023 13:10 UTC

25 points

3 comments20 min readLW link

(thezvi.wordpress.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC

115 points

63 comments3 min readLW link

Seeing more whole

Joe Carlsmith17 Feb 2023 5:12 UTC

42 points

1 comment26 min readLW link

Powerful mesa-optimisation is already here

Roman Leventov17 Feb 2023 4:59 UTC

35 points

1 comment2 min readLW link

(arxiv.org)

Self-Reference Breaks the Orthogonality Thesis

lsusr17 Feb 2023 4:11 UTC

44 points

35 comments2 min readLW link

The public supports regulating AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC

114 points

9 comments1 min readLW link

(aiimpacts.org)

Bring “Ban faster SIMD semiconductors” into the Overton window

worried-techno-optimist17 Feb 2023 3:27 UTC

−7 points

1 comment2 min readLW link

Republishing an old essay in light of current news on Bing’s AI: “Regarding Blake Lemoine’s claim that LaMDA is ‘sentient’, he might be right (sorta), but perhaps not for the reasons he thinks”

philosophybear17 Feb 2023 3:27 UTC

3 points

0 comments5 min readLW link

(philosophybear.substack.com)

How should AI systems behave, and who should decide? [OpenAI blog]

ShardPhoenix17 Feb 2023 1:05 UTC

22 points

2 comments1 min readLW link

(openai.com)

The Ethics of ACI

Akira Pyinya16 Feb 2023 23:51 UTC

−8 points

0 comments3 min readLW link

[Question] What is a world-model?

Adam Shai16 Feb 2023 22:39 UTC

14 points

2 comments1 min readLW link

Probability Theory: The Logic of Science, Jaynes

David Udell16 Feb 2023 21:57 UTC

29 points

0 comments18 min readLW link

[Question] Is AGI communist?

MP16 Feb 2023 21:28 UTC

−10 points

3 comments1 min readLW link

[Question] Is “goal-content integrity” still a problem?

G16 Feb 2023 20:46 UTC

−4 points

1 comment1 min readLW link

(www.reddit.com)

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceC16 Feb 2023 19:47 UTC

65 points

9 comments1 min readLW link

(arxiv.org)

Non-Unitary Quantum Logic—SERI MATS Research Sprint

Yegreg16 Feb 2023 19:31 UTC

27 points

0 comments7 min readLW link

[Question] Looking for a post about vibing and banter

Introspective16 Feb 2023 19:28 UTC

1 point

1 comment1 min readLW link

EIS V: Blind Spots In AI Safety Interpretability Research

scasper16 Feb 2023 19:09 UTC

58 points

24 comments10 min readLW link

Why should ethical anti-realists do ethics?

Joe Carlsmith16 Feb 2023 16:27 UTC

44 points

7 comments27 min readLW link

[Question] How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?

Noosphere8916 Feb 2023 15:25 UTC

58 points

66 comments1 min readLW link

Covid 2/16/23: It All Seems Rather Quaint

Zvi16 Feb 2023 15:10 UTC

25 points

2 comments5 min readLW link

(thezvi.wordpress.com)

Visualise your own probability of an AI catastrophe: an interactive Sankey plot

MNoetel16 Feb 2023 12:03 UTC

1 point

2 comments1 min readLW link

A poem co-written by ChatGPT

Sherrinford16 Feb 2023 10:17 UTC

13 points

0 comments7 min readLW link

Cambridge LW Rationality Practice: Being Specific

Tony Wang and Darmani

16 Feb 2023 6:37 UTC

2 points

0 comments1 min readLW link

Hashing out long-standing disagreements seems low-value to me

So8res16 Feb 2023 6:20 UTC

143 points

34 comments4 min readLW link

(Naïve) microeconomics of bundling goods

rossry16 Feb 2023 5:39 UTC

24 points

2 comments5 min readLW link

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Quinn16 Feb 2023 1:13 UTC

63 points

18 comments2 min readLW link

Progress links and tweets, 2023-02-15

jasoncrawford16 Feb 2023 0:04 UTC

10 points

0 comments1 min readLW link

(rootsofprogress.org)