18 Feb 2023 22:17 UTC

229 points

135 comments6 min readLW link

(andreamiotti.substack.com)

We should be signal-boosting anti Bing chat content

Matt Brooks18 Feb 2023 18:52 UTC

−4 points

13 comments2 min readLW link

Can talk, can think, can suffer.

Ilio18 Feb 2023 18:43 UTC

1 point

8 comments3 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

Near-Term Risks of an Obedient Artificial Intelligence

ymeskhout18 Feb 2023 18:30 UTC

20 points

1 comment6 min readLW link

EIS VII: A Challenge for Mechanists

scasper18 Feb 2023 18:27 UTC

36 points

4 comments3 min readLW link

Reading Speed Exists!

Johannes C. Mayer18 Feb 2023 15:30 UTC

12 points

9 comments1 min readLW link

The Practitioner’s Path 2.0: the Meditative Archetype

Evenflair18 Feb 2023 15:23 UTC

14 points

1 comment2 min readLW link

(guildoftherose.org)

Should we cry “wolf”?

Tapatakt18 Feb 2023 11:24 UTC

24 points

5 comments1 min readLW link

[Question] Name of the fallacy of assuming an extreme value (e.g. 0) with the illusion of ‘avoiding to have to make an assumption’?

FlorianH18 Feb 2023 8:11 UTC

4 points

1 comment1 min readLW link

I Think We’re Approaching The Bitter Lesson’s Asymptote

SomeoneYouOnceKnew18 Feb 2023 5:33 UTC

−3 points

9 comments5 min readLW link

Bus-Only Bus Lane Enforcement

jefftk18 Feb 2023 2:50 UTC

19 points

15 comments1 min readLW link

(www.jefftk.com)

Run Head on Towards the Falling Tears

Johannes C. Mayer18 Feb 2023 1:33 UTC

6 points

0 comments2 min readLW link

Two problems with ‘Simulators’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC

79 points

13 comments5 min readLW link

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC

112 points

27 comments11 min readLW link

On Board Vision, Hollow Words, and the End of the World

Marcello17 Feb 2023 23:18 UTC

52 points

27 comments5 min readLW link

PICT: A Zero-Shot Prompt Template to Automate Evaluation

Quentin FEUILLADE--MONTIXI17 Feb 2023 23:16 UTC

17 points

1 comment11 min readLW link

Why Do We Believe

Screwtape17 Feb 2023 20:58 UTC

9 points

3 comments3 min readLW link

I Am Scared of Posting Negative Takes About Bing’s AI

Yitz17 Feb 2023 20:50 UTC

63 points

29 comments1 min readLW link

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasper17 Feb 2023 20:48 UTC

49 points

9 comments12 min readLW link

Tinker Bell Theory and LLMs

Fergus Fettes17 Feb 2023 20:23 UTC

1 point

11 comments1 min readLW link

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

Vaniver17 Feb 2023 20:11 UTC

125 points

12 comments2 min readLW link

Microsoft and OpenAI, stop telling chatbots to roleplay as AI

hold_my_fish17 Feb 2023 19:55 UTC

51 points

10 comments1 min readLW link

A warm-up for the AI governance project

jacek17 Feb 2023 18:06 UTC

10 points

2 comments3 min readLW link

Link Post > Blog Post

party girl17 Feb 2023 17:59 UTC

4 points

6 comments1 min readLW link

(onthespectrumontheguestlist.substack.com)

One-layer transformers aren’t equivalent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC

127 points

11 comments7 min readLW link

[Question] Should we be kind and polite to emerging AIs?

David Gross17 Feb 2023 16:58 UTC

9 points

13 comments1 min readLW link

Follow-up Posting on Cyborg Psychologist

Hopkins Stanley17 Feb 2023 16:56 UTC

0 points

2 comments1 min readLW link

(www.lesswrong.com)

A “slow takeoff” might still look fast

MichaelDickens17 Feb 2023 16:51 UTC

5 points

3 comments1 min readLW link

AI Safety Info Distillation Fellowship

Robert Miles and mwatkins

17 Feb 2023 16:16 UTC

47 points

3 comments3 min readLW link

Nozick’s Dilemma: A Critique of Game Theory

Edward P. Könings17 Feb 2023 16:11 UTC

10 points

1 comment13 min readLW link

[Question] Are LLMs sufficient for AI takeoff?

rpglover6417 Feb 2023 15:46 UTC

8 points

2 comments1 min readLW link

Sydney’s Secret: A Short Story by Bing Chat

fela17 Feb 2023 13:31 UTC

36 points

1 comment5 min readLW link

Automating Consistency

Hoagy17 Feb 2023 13:24 UTC

10 points

0 comments1 min readLW link

Human decision processes are not well factored

remember and Gabriel Alfour

17 Feb 2023 13:11 UTC

33 points

3 comments2 min readLW link

2023 ACX Predictions: Buy/Sell/Hold

Zvi17 Feb 2023 13:10 UTC

25 points

3 comments20 min readLW link

(thezvi.wordpress.com)

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC

115 points

63 comments3 min readLW link

Seeing more whole

Joe Carlsmith17 Feb 2023 5:12 UTC

42 points

1 comment26 min readLW link

Powerful mesa-optimisation is already here

Roman Leventov17 Feb 2023 4:59 UTC

35 points

1 comment2 min readLW link

(arxiv.org)

Self-Reference Breaks the Orthogonality Thesis

lsusr17 Feb 2023 4:11 UTC

44 points

35 comments2 min readLW link

The public supports regulating AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC

114 points

9 comments1 min readLW link

(aiimpacts.org)

Bring “Ban faster SIMD semiconductors” into the Overton window

worried-techno-optimist17 Feb 2023 3:27 UTC

−7 points

1 comment2 min readLW link

Republishing an old essay in light of current news on Bing’s AI: “Regarding Blake Lemoine’s claim that LaMDA is ‘sentient’, he might be right (sorta), but perhaps not for the reasons he thinks”

philosophybear17 Feb 2023 3:27 UTC

3 points

0 comments5 min readLW link

(philosophybear.substack.com)

How should AI systems behave, and who should decide? [OpenAI blog]

ShardPhoenix17 Feb 2023 1:05 UTC

22 points

2 comments1 min readLW link

(openai.com)

The Ethics of ACI

Akira Pyinya16 Feb 2023 23:51 UTC

−8 points

0 comments3 min readLW link

[Question] What is a world-model?

Adam Shai16 Feb 2023 22:39 UTC

14 points

2 comments1 min readLW link

Probability Theory: The Logic of Science, Jaynes

David Udell16 Feb 2023 21:57 UTC

29 points

0 comments18 min readLW link

[Question] Is AGI communist?

MP16 Feb 2023 21:28 UTC

−10 points

3 comments1 min readLW link

[Question] Is “goal-content integrity” still a problem?

G16 Feb 2023 20:46 UTC

−4 points

1 comment1 min readLW link

(www.reddit.com)

Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)

LawrenceC16 Feb 2023 19:47 UTC

65 points

9 comments1 min readLW link

(arxiv.org)