Ex­trap­o­lat­ing from Five Words

Gordon Seidoh Worley15 Nov 2023 23:21 UTC
40 points
11 comments2 min readLW link

In Defense of Parselmouths

Screwtape15 Nov 2023 23:02 UTC
46 points
10 comments10 min readLW link

Life on the Grid (Part 1)

rogersbacon15 Nov 2023 22:37 UTC
12 points
4 comments9 min readLW link
(www.secretorum.life)

Glo­ma­riza­tion FAQ

Zane15 Nov 2023 20:20 UTC
24 points
4 comments5 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshc15 Nov 2023 19:00 UTC
70 points
2 comments4 min readLW link

EA/​ACX/​LW Novem­ber Santa Cruz Meetup

madmail15 Nov 2023 18:39 UTC
1 point
0 comments1 min readLW link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe Carlsmith15 Nov 2023 17:16 UTC
79 points
26 comments30 min readLW link

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderM15 Nov 2023 16:36 UTC
89 points
8 comments2 min readLW link
(arxiv.org)

AISN #26: Na­tional In­sti­tu­tions for AI Safety, Re­sults From the UK Sum­mit, and New Re­leases From OpenAI and xAI

15 Nov 2023 16:07 UTC
12 points
0 comments6 min readLW link
(newsletter.safe.ai)

‘The­o­ries of Values’ and ‘The­o­ries of Agents’: con­fu­sions, mus­ings and desiderata

15 Nov 2023 16:00 UTC
34 points
8 comments24 min readLW link

Ex­pe­riences and learn­ings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC
109 points
4 comments18 min readLW link

Good busi­nesses cre­ate epistemic monopolies

Logan Kieller15 Nov 2023 14:04 UTC
−2 points
2 comments4 min readLW link
(logankieller.substack.com)

A con­cep­tual pre­cur­sor to to­day’s lan­guage ma­chines [Shan­non]

Bill Benzon15 Nov 2023 13:50 UTC
24 points
6 comments2 min readLW link

[Question] Should Ad­vanced Place­ment High School classes dis­cuss Is­rael-Pales­tine? If so, how? If not, why? Who should make this de­ci­sion?

Gesild Muka15 Nov 2023 4:50 UTC
−1 points
5 comments1 min readLW link

Re­in­force­ment Via Giv­ing Peo­ple Cookies

Screwtape15 Nov 2023 4:34 UTC
65 points
9 comments6 min readLW link

In­ci­den­tal polysemanticity

15 Nov 2023 4:00 UTC
43 points
7 comments11 min readLW link

LLMs May Find It Hard to FOOM

RogerDearnaley15 Nov 2023 2:52 UTC
11 points
30 comments12 min readLW link

Lin­ear­ity Fallacies

hippo15 Nov 2023 2:23 UTC
15 points
0 comments5 min readLW link

SIA Is Just Be­ing a Bayesian About the Fact That One Ex­ists

omnizoid14 Nov 2023 22:55 UTC
2 points
5 comments4 min readLW link

AI Align­ment [progress] this Week (11/​12/​2023)

Logan Zoellner14 Nov 2023 22:21 UTC
6 points
0 comments2 min readLW link
(midwitalignment.substack.com)

[Question] When did Eliezer Yud­kowsky change his mind about neu­ral net­works?

[deactivated]14 Nov 2023 21:24 UTC
31 points
15 comments1 min readLW link

Bet­ting on what is un-falsifi­able and un-verifiable

Abhimanyu Pallavi Sudhir14 Nov 2023 21:11 UTC
13 points
0 comments14 min readLW link

Face­book is Pay­ing Me to Post

jefftk14 Nov 2023 19:10 UTC
26 points
5 comments1 min readLW link
(www.jefftk.com)

Feel­ings, Noth­ing More than Feel­ings, About AI

PaulBecon14 Nov 2023 18:50 UTC
−3 points
0 comments3 min readLW link

Kids or No kids

Kids or no kids14 Nov 2023 18:37 UTC
91 points
10 comments13 min readLW link

Rae­mon’s De­liber­ate (“Pur­pose­ful?”) Prac­tice Club

14 Nov 2023 18:24 UTC
61 points
11 comments22 min readLW link

More metal less ore

Logan Kieller14 Nov 2023 16:59 UTC
8 points
3 comments2 min readLW link
(logankieller.substack.com)

A fram­ing for interpretability

Nina Rimsky14 Nov 2023 16:14 UTC
69 points
5 comments4 min readLW link
(ninarimsky.substack.com)

Monthly Roundup #12: Novem­ber 2023

Zvi14 Nov 2023 15:20 UTC
34 points
5 comments33 min readLW link
(thezvi.wordpress.com)

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik Horn14 Nov 2023 12:13 UTC
15 points
5 comments15 min readLW link

[Question] Is there Work on Embed­ded Agency in Cel­lu­lar Au­tomata Toy Models?

Johannes C. Mayer14 Nov 2023 9:08 UTC
9 points
0 comments1 min readLW link

[Question] Would this be Progress in Solv­ing Embed­ded Agency?

Johannes C. Mayer14 Nov 2023 9:08 UTC
9 points
2 comments2 min readLW link

Is In­ter­pretabil­ity All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC
1 point
1 comment1 min readLW link

What is wis­dom?

TsviBT14 Nov 2023 2:13 UTC
32 points
3 comments13 min readLW link

Fes­ti­val Stats 2023

jefftk14 Nov 2023 1:20 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Out of the Box

jesseduffield13 Nov 2023 23:43 UTC
5 points
1 comment7 min readLW link

Loudly Give Up, Don’t Quietly Fade

Screwtape13 Nov 2023 23:30 UTC
138 points
11 comments6 min readLW link

Great Em­pa­thy and Great Re­sponse Ability

positivesum13 Nov 2023 23:04 UTC
16 points
0 comments3 min readLW link
(tryingtruly.substack.com)

The­o­ries of Change for AI Auditing

13 Nov 2023 19:33 UTC
53 points
0 comments18 min readLW link
(www.apolloresearch.ai)

They are made of re­peat­ing patterns

quetzal_rainbow13 Nov 2023 18:17 UTC
49 points
4 comments2 min readLW link

How to Upload a Mind (In Three Not-So-Easy Steps)

13 Nov 2023 18:13 UTC
26 points
0 comments7 min readLW link
(youtu.be)

Non-my­opia stories

lberglund13 Nov 2023 17:52 UTC
28 points
10 comments7 min readLW link

It’s OK to eat shrimp: EAs Make In­valid In­fer­ences About Fish Qualia and Mo­ral Patienthood

Mikhail Samin13 Nov 2023 16:51 UTC
2 points
17 comments1 min readLW link

Sugges­tions for chess puzzles

Zane13 Nov 2023 15:39 UTC
13 points
1 comment1 min readLW link

Why small phe­nomenons are rele­vant to moral­ity ​

Ryo 13 Nov 2023 15:25 UTC
1 point
0 comments3 min readLW link

Op­tion­al­ity ap­proach to ethics

Ryo 13 Nov 2023 15:23 UTC
7 points
2 comments3 min readLW link

Redi­rect­ing one’s own taxes as an effec­tive al­tru­ism method

David Gross13 Nov 2023 15:17 UTC
1 point
34 comments16 min readLW link

AISC Pro­ject: Bench­marks for Stable Reflectivity

jacquesthibs13 Nov 2023 14:51 UTC
17 points
0 comments8 min readLW link

AISC Pro­ject: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyP13 Nov 2023 14:33 UTC
26 points
0 comments12 min readLW link

Bostrom Goes Unheard

Zvi13 Nov 2023 14:11 UTC
81 points
9 comments18 min readLW link