1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM
17 points
8 comments15 min readLW link

Sam Alt­man fired from OpenAI

LawrenceCNov 17, 2023, 8:42 PM
192 points
75 comments1 min readLW link
(openai.com)

On the lethal­ity of bi­ased hu­man re­ward ratings

Nov 17, 2023, 6:59 PM
48 points
10 comments37 min readLW link

Coup probes: Catch­ing catas­tro­phes with probes trained off-policy

Fabien RogerNov 17, 2023, 5:58 PM
93 points
9 comments11 min readLW link1 review

On Lies and Liars

Gabriel AlfourNov 17, 2023, 5:13 PM
31 points
4 comments14 min readLW link
(cognition.cafe)

Clas­sify­ing rep­re­sen­ta­tions of sparse au­toen­coders (SAEs)

AnnahNov 17, 2023, 1:54 PM
15 points
6 comments2 min readLW link

R&D is a Huge Ex­ter­nal­ity, So Why Do Mar­kets Do So Much of it?

Maxwell TabarrokNov 17, 2023, 1:14 PM
15 points
14 comments3 min readLW link
(maximumprogress.substack.com)

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMosheNov 17, 2023, 11:14 AM
23 points
5 comments3 min readLW link

The dan­gers of re­pro­duc­ing while old

garymmNov 17, 2023, 5:55 AM
23 points
6 comments1 min readLW link
(www.garymm.org)

I put odds on ends with Nathan Young

KatjaGraceNov 17, 2023, 5:40 AM
8 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

De­bate helps su­per­vise hu­man ex­perts [Paper]

habrykaNov 17, 2023, 5:25 AM
29 points
6 comments1 min readLW link
(github.com)

A to Z of things

KatjaGraceNov 17, 2023, 5:20 AM
71 points
8 comments1 min readLW link1 review
(worldspiritsockpuppet.com)

On Tap­ping Out

ScrewtapeNov 17, 2023, 3:23 AM
51 points
14 comments8 min readLW link1 review

Elic­it­ing La­tent Knowl­edge in Com­pre­hen­sive AI Ser­vices Models

acabodiNov 17, 2023, 2:36 AM
6 points
0 comments5 min readLW link

Some Rules for an Alge­bra of Bayes Nets

Nov 16, 2023, 11:53 PM
85 points
45 comments14 min readLW link1 review

How much to up­date on re­cent AI gov­er­nance moves?

Nov 16, 2023, 11:46 PM
112 points
5 comments29 min readLW link

New LessWrong fea­ture: Dialogue Matching

Bird ConceptNov 16, 2023, 9:27 PM
106 points
22 comments3 min readLW link

Towards Eval­u­at­ing AI Sys­tems for Mo­ral Sta­tus Us­ing Self-Reports

Nov 16, 2023, 8:18 PM
45 points
3 comments1 min readLW link
(arxiv.org)

So­cial Dark Matter

Duncan Sabien (Inactive)Nov 16, 2023, 8:00 PM
362 points
127 comments34 min readLW link2 reviews

AI #38: Let’s Make a Deal

ZviNov 16, 2023, 7:50 PM
44 points
2 comments55 min readLW link
(thezvi.wordpress.com)

Fore­cast­ing AI (Overview)

jsteinhardtNov 16, 2023, 7:00 PM
35 points
0 comments2 min readLW link
(bounded-regret.ghost.io)

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg WeissNov 16, 2023, 6:46 PM
11 points
2 comments29 min readLW link

In­tel­li­gence in sys­tems (hu­man, AI) can be con­cep­tu­al­ized as the re­s­olu­tion and through­put at which a sys­tem can pro­cess and af­fect Shan­non in­for­ma­tion.

AiresJLNov 16, 2023, 5:46 PM
0 points
0 comments2 min readLW link

Life on the Grid (Part 2)

rogersbaconNov 16, 2023, 5:22 PM
7 points
0 comments15 min readLW link
(www.secretorum.life)

The im­pos­si­bil­ity of ra­tio­nally an­a­lyz­ing par­ti­san news

RationalDinoNov 16, 2023, 4:19 PM
4 points
4 comments1 min readLW link

We are Peace­craft.ai!

MadHatterNov 16, 2023, 2:15 PM
15 points
20 comments2 min readLW link

A di­alec­ti­cal view of the his­tory of AI, Part 1: We’re only in the an­tithe­sis phase. [A syn­the­sis is in the fu­ture.]

Bill BenzonNov 16, 2023, 12:34 PM
6 points
0 comments12 min readLW link

[Question] How much fraud is there in academia?

ChristianKlNov 16, 2023, 11:50 AM
23 points
10 comments1 min readLW link

Learn­ing co­effi­cient es­ti­ma­tion: the details

Zach FurmanNov 16, 2023, 3:19 AM
36 points
0 comments2 min readLW link
(colab.research.google.com)

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir KumarNov 16, 2023, 2:02 AM
1 point
0 comments1 min readLW link

My cri­tique of Eliezer’s deeply ir­ra­tional beliefs

JorterderNov 16, 2023, 12:34 AM
−35 points
1 comment9 min readLW link
(docs.google.com)

Ex­trap­o­lat­ing from Five Words

Gordon Seidoh WorleyNov 15, 2023, 11:21 PM
40 points
11 comments2 min readLW link

In Defense of Parselmouths

ScrewtapeNov 15, 2023, 11:02 PM
51 points
11 comments10 min readLW link1 review

Life on the Grid (Part 1)

rogersbaconNov 15, 2023, 10:37 PM
12 points
4 comments9 min readLW link
(www.secretorum.life)

Glo­ma­riza­tion FAQ

ZaneNov 15, 2023, 8:20 PM
33 points
5 comments5 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshcNov 15, 2023, 7:00 PM
71 points
2 comments4 min readLW link

EA/​ACX/​LW Novem­ber Santa Cruz Meetup

madmailNov 15, 2023, 6:39 PM
1 point
0 comments1 min readLW link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe CarlsmithNov 15, 2023, 5:16 PM
81 points
28 comments30 min readLW link1 review

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderMNov 15, 2023, 4:36 PM
89 points
9 comments2 min readLW link1 review
(arxiv.org)

AISN #26: Na­tional In­sti­tu­tions for AI Safety, Re­sults From the UK Sum­mit, and New Re­leases From OpenAI and xAI

Nov 15, 2023, 4:07 PM
13 points
0 comments6 min readLW link
(newsletter.safe.ai)

‘The­o­ries of Values’ and ‘The­o­ries of Agents’: con­fu­sions, mus­ings and desiderata

Nov 15, 2023, 4:00 PM
35 points
8 comments24 min readLW link

Ex­pe­riences and learn­ings from both sides of the AI safety job market

Marius HobbhahnNov 15, 2023, 3:40 PM
110 points
4 comments18 min readLW link

Good busi­nesses cre­ate epistemic monopolies

Logan KiellerNov 15, 2023, 2:04 PM
−2 points
2 comments4 min readLW link
(logankieller.substack.com)

A con­cep­tual pre­cur­sor to to­day’s lan­guage ma­chines [Shan­non]

Bill BenzonNov 15, 2023, 1:50 PM
24 points
6 comments2 min readLW link

[Question] Should Ad­vanced Place­ment High School classes dis­cuss Is­rael-Pales­tine? If so, how? If not, why? Who should make this de­ci­sion?

Gesild MukaNov 15, 2023, 4:50 AM
−1 points
5 comments1 min readLW link

Re­in­force­ment Via Giv­ing Peo­ple Cookies

ScrewtapeNov 15, 2023, 4:34 AM
70 points
9 comments6 min readLW link

In­ci­den­tal polysemanticity

Nov 15, 2023, 4:00 AM
43 points
7 comments11 min readLW link

LLMs May Find It Hard to FOOM

RogerDearnaleyNov 15, 2023, 2:52 AM
11 points
30 comments12 min readLW link

Lin­ear­ity Fallacies

hippoNov 15, 2023, 2:23 AM
15 points
0 comments5 min readLW link

SIA Is Just Be­ing a Bayesian About the Fact That One Ex­ists

omnizoidNov 14, 2023, 10:55 PM
3 points
5 comments4 min readLW link