Why it’s nec­es­sary to shoot your­self in the foot

Jacob G-W11 Jul 2023 21:17 UTC
39 points
7 comments2 min readLW link
(g-w1.github.io)

How do low level hy­pothe­ses con­strain high level ones? The mys­tery of the dis­ap­pear­ing di­a­mond.

Christopher King11 Jul 2023 19:27 UTC
17 points
11 comments2 min readLW link

[Question] Do we au­to­mat­i­cally ac­cept propo­si­tions?

Aaron Graifman11 Jul 2023 17:45 UTC
10 points
5 comments1 min readLW link

fMRI LIKE APPROACH TO AI ALIGNMENT /​ DECEPTIVE BEHAVIOUR

Escaque 6611 Jul 2023 17:17 UTC
−1 points
3 comments2 min readLW link

In­tro­duc­ing Fate­book: the fastest way to make and track predictions

11 Jul 2023 15:28 UTC
127 points
34 comments1 min readLW link
(fatebook.io)

My Weirdest Experience

Bridgett Kay11 Jul 2023 14:44 UTC
37 points
19 comments1 min readLW link
(dxmrevealed.wordpress.com)

An­nounc­ing The Roots of Progress Blog-Build­ing Intensive

jasoncrawford11 Jul 2023 14:04 UTC
10 points
0 comments1 min readLW link
(rootsofprogress.org)

OpenAI Launches Su­per­al­ign­ment Taskforce

Zvi11 Jul 2023 13:00 UTC
149 points
40 comments49 min readLW link
(thezvi.wordpress.com)

Cri­tiquing Risks From Learned Op­ti­miza­tion, and Avoid­ing Cached Theories

ProofBySonnet11 Jul 2023 11:38 UTC
1 point
0 comments6 min readLW link

[UPDATE: dead­line ex­tended to July 24!] New wind in ra­tio­nal­ity’s sails: Ap­pli­ca­tions for Epistea Res­i­dency 2023 are now open

11 Jul 2023 11:02 UTC
80 points
7 comments3 min readLW link

Two Hot Takes about Quine

Charlie Steiner11 Jul 2023 6:42 UTC
15 points
0 comments2 min readLW link

Dis­in­cen­tiviz­ing de­cep­tion in mesa op­ti­miz­ers with Model Tampering

martinkunev11 Jul 2023 0:44 UTC
3 points
0 comments2 min readLW link

Drawn Out: a story

Richard_Ngo11 Jul 2023 0:08 UTC
68 points
2 comments8 min readLW link

Defi­ni­tions are about effi­ciency and con­sis­tency with com­mon lan­guage.

Nacruno9610 Jul 2023 23:46 UTC
1 point
0 comments4 min readLW link

Refram­ing Evolu­tion—An in­for­ma­tion wavefront trav­el­ing through time

Joshua Clancy10 Jul 2023 22:36 UTC
1 point
0 comments5 min readLW link
(midflip.org)

GPT-7: The Tale of the Big Com­puter (An Ex­per­i­men­tal Story)

Justin Bullock10 Jul 2023 20:22 UTC
4 points
4 comments5 min readLW link

Cost-effec­tive­ness of pro­fes­sional field-build­ing pro­grams for AI safety research

Dan H10 Jul 2023 18:28 UTC
8 points
5 comments1 min readLW link

Cost-effec­tive­ness of stu­dent pro­grams for AI safety research

Dan H10 Jul 2023 18:28 UTC
15 points
2 comments1 min readLW link

Model­ing the im­pact of AI safety field-build­ing programs

Dan H10 Jul 2023 18:27 UTC
21 points
0 comments1 min readLW link

I think Michael Bailey’s dis­mis­sal of my au­to­g­y­nephilia ques­tions for Scott Alexan­der and Aella makes very lit­tle sense

tailcalled10 Jul 2023 17:39 UTC
45 points
45 comments2 min readLW link

In­cen­tives from a causal perspective

10 Jul 2023 17:16 UTC
27 points
0 comments6 min readLW link

Is the En­dow­ment Effect Due to In­com­pa­ra­bil­ity?

Kevin Dorst10 Jul 2023 16:26 UTC
21 points
10 comments7 min readLW link
(kevindorst.substack.com)

Fron­tier AI Regulation

Zach Stein-Perlman10 Jul 2023 14:30 UTC
20 points
4 comments8 min readLW link
(arxiv.org)

Why is it so hard to change peo­ple’s minds? Well, imag­ine if it wasn’t...

Celarix10 Jul 2023 13:55 UTC
6 points
9 comments6 min readLW link

Con­sider Join­ing the UK Foun­da­tion Model Taskforce

Zvi10 Jul 2023 13:50 UTC
105 points
12 comments1 min readLW link
(thezvi.wordpress.com)

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
116 points
8 comments12 min readLW link

Open-minded updatelessness

10 Jul 2023 11:08 UTC
65 points
21 comments12 min readLW link

Ar­gu­ments against ex­is­ten­tial risk from AI, part 2

Nina Rimsky10 Jul 2023 8:25 UTC
7 points
0 comments5 min readLW link
(ninarimsky.substack.com)

Con­scious­ness as a con­fla­tion­ary al­li­ance term for in­trin­si­cally val­ued in­ter­nal experiences

Andrew_Critch10 Jul 2023 8:09 UTC
190 points
46 comments11 min readLW link

The world where LLMs are possible

Ape in the coat10 Jul 2023 8:00 UTC
20 points
10 comments3 min readLW link

The virtue of determination

Richard_Ngo10 Jul 2023 5:11 UTC
57 points
4 comments4 min readLW link

Some rea­sons to not say “Doomer”

Ruby9 Jul 2023 21:05 UTC
45 points
18 comments4 min readLW link

The Seeker’s Game – Vignettes from the Bay

Yulia9 Jul 2023 19:32 UTC
136 points
18 comments16 min readLW link

[Question] Why have ex­po­sure no­tifi­ca­tion apps been (mostly) dis­con­tinued?

VipulNaik9 Jul 2023 19:07 UTC
10 points
5 comments2 min readLW link

[Question] The Ne­ces­sity of Pri­vacy: A Con­di­tion for So­cial Change and Ex­per­i­men­ta­tion?

Blake9 Jul 2023 18:42 UTC
−8 points
1 comment1 min readLW link

At­tempt­ing to De­con­struct “Real”

herschel9 Jul 2023 16:40 UTC
21 points
23 comments2 min readLW link

Quick pro­posal: De­ci­sion mar­ket re­grantor us­ing man­i­fund (please im­prove)

Nathan Young9 Jul 2023 12:49 UTC
10 points
5 comments5 min readLW link

[Question] Where are the peo­ple build­ing AGI in the non-dumb way?

Johannes C. Mayer9 Jul 2023 11:39 UTC
10 points
19 comments2 min readLW link

[Question] What to read on the “in­for­mal multi-world model”?

mishka9 Jul 2023 4:48 UTC
13 points
23 comments1 min readLW link

Whether LLMs “un­der­stand” any­thing is mostly a ter­minolog­i­cal dispute

RobertM9 Jul 2023 3:31 UTC
10 points
1 comment1 min readLW link

Ta­boo Truth

Tomás B.8 Jul 2023 23:23 UTC
31 points
16 comments2 min readLW link

“View”

herschel8 Jul 2023 23:19 UTC
6 points
0 comments2 min readLW link

[Question] H5N1. Just how bad is the situ­a­tion?

Q Home8 Jul 2023 22:09 UTC
16 points
8 comments1 min readLW link

A Two-Part Sys­tem for Prac­ti­cal Self-Care

Jonathan Moregård8 Jul 2023 21:23 UTC
10 points
0 comments3 min readLW link
(honestliving.substack.com)

Really Strong Fea­tures Found in Resi­d­ual Stream

Logan Riggs8 Jul 2023 19:40 UTC
69 points
6 comments2 min readLW link

Eight Strate­gies for Tack­ling the Hard Part of the Align­ment Problem

scasper8 Jul 2023 18:55 UTC
42 points
11 comments7 min readLW link

“Con­cepts of Agency in Biol­ogy” (Okasha, 2023) - Brief Paper Summary

Nora_Ammann8 Jul 2023 18:22 UTC
40 points
3 comments7 min readLW link

Blan­chard’s Danger­ous Idea and the Plight of the Lu­cid Crossdreamer

Zack_M_Davis8 Jul 2023 18:03 UTC
37 points
135 comments72 min readLW link
(unremediatedgender.space)

Con­tin­u­ous Ad­ver­sar­ial Qual­ity As­surance: Ex­tend­ing RLHF and Con­sti­tu­tional AI

Benaya Koren8 Jul 2023 17:32 UTC
6 points
0 comments9 min readLW link

Com­mentless down­vot­ing is not a good way to fight infohazards

DirectedEvolution8 Jul 2023 17:29 UTC
6 points
9 comments3 min readLW link