An­nounc­ing Athena—Women in AI Align­ment Research

Claire ShortNov 7, 2023, 9:46 PM
80 points
2 comments3 min readLW link

Vote on In­ter­est­ing Disagreements

Ben PaceNov 7, 2023, 9:35 PM
159 points
131 comments1 min readLW link

What is democ­racy for?

JohnstoneNov 7, 2023, 6:17 PM
−5 points
10 comments7 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
38 points
2 comments2 min readLW link
(arxiv.org)

Im­ple­ment­ing De­ci­sion Theory

justinpombrioNov 7, 2023, 5:55 PM
22 points
12 comments3 min readLW link

Mir­ror, Mir­ror on the Wall: How Do Fore­cast­ers Fare by Their Own Call?

nikosNov 7, 2023, 5:39 PM
14 points
5 comments14 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus DeiNov 7, 2023, 5:18 PM
1 point
0 comments3 min readLW link

AMA: Earn­ing to Give

jefftkNov 7, 2023, 4:20 PM
53 points
8 comments1 min readLW link
(www.jefftk.com)

The Stochas­tic Par­rot Hy­poth­e­sis is de­bat­able for the last gen­er­a­tion of LLMs

Nov 7, 2023, 4:12 PM
52 points
21 comments6 min readLW link

Pre­face to the Se­quence on LLM Psychology

Quentin FEUILLADE--MONTIXINov 7, 2023, 4:12 PM
33 points
0 comments2 min readLW link

What I’ve been read­ing, Novem­ber 2023

jasoncrawfordNov 7, 2023, 1:37 PM
23 points
1 comment5 min readLW link
(rootsofprogress.org)

AI Align­ment [Progress] this Week (11/​05/​2023)

Logan ZoellnerNov 7, 2023, 1:26 PM
24 points
0 comments4 min readLW link
(midwitalignment.substack.com)

On the UK Summit

ZviNov 7, 2023, 1:10 PM
74 points
6 comments30 min readLW link
(thezvi.wordpress.com)

Box in­ver­sion revisited

Jan_KulveitNov 7, 2023, 11:09 AM
40 points
3 comments8 min readLW link

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): call for applicants

CallumMcDougallNov 7, 2023, 9:43 AM
56 points
0 commentsLW link

The Per­ils of Professionalism

ScrewtapeNov 7, 2023, 12:07 AM
45 points
1 comment10 min readLW link

How to (hope­fully eth­i­cally) make money off of AGI

Nov 6, 2023, 11:35 PM
171 points
95 comments32 min readLW link1 review

cost es­ti­ma­tion for 2 grid en­ergy stor­age systems

bhauthNov 6, 2023, 11:32 PM
16 points
12 comments7 min readLW link
(www.bhauth.com)

A bet on crit­i­cal pe­ri­ods in neu­ral networks

Nov 6, 2023, 11:21 PM
24 points
1 comment6 min readLW link

Job list­ing: Com­mu­ni­ca­tions Gen­er­al­ist /​ Pro­ject Manager

Gretta DulebaNov 6, 2023, 8:21 PM
49 points
7 comments1 min readLW link

Aske­sis: a model of the cerebellum

MadHatterNov 6, 2023, 8:19 PM
7 points
2 comments1 min readLW link
(github.com)

LQPR: An Al­gorithm for Re­in­force­ment Learn­ing with Prov­able Safety Guarantees

MadHatterNov 6, 2023, 8:17 PM
6 points
0 comments1 min readLW link
(github.com)

ACX Meetup Leipzig

Roman LeipeNov 6, 2023, 6:33 PM
1 point
0 comments1 min readLW link

[Question] Does bulemia work?

lcNov 6, 2023, 5:58 PM
5 points
18 comments1 min readLW link

Why build­ing ven­tures in AI Safety is par­tic­u­larly challenging

HerambNov 6, 2023, 4:27 PM
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

What is true is already so. Own­ing up to it doesn’t make it worse.

RamblinDashNov 6, 2023, 3:49 PM
20 points
2 comments1 min readLW link

An illus­tra­tive model of back­fire risks from paus­ing AI research

Maxime RichéNov 6, 2023, 2:30 PM
33 points
3 comments11 min readLW link

Pro­posal for im­prov­ing state of al­ign­ment research

IknownothingNov 6, 2023, 1:55 PM
2 points
0 comments1 min readLW link

Are lan­guage mod­els good at mak­ing pre­dic­tions?

dynomightNov 6, 2023, 1:10 PM
76 points
14 comments4 min readLW link
(dynomight.net)

Tips, tricks, les­sons and thoughts on host­ing hackathons

gergogasparNov 6, 2023, 11:03 AM
3 points
0 comments11 min readLW link

An­nounc­ing TAIS 2024

BlaineNov 6, 2023, 8:38 AM
23 points
0 comments1 min readLW link
(tais2024.cc)

Ta­boo Wall

ScrewtapeNov 6, 2023, 3:51 AM
19 points
0 comments3 min readLW link

When and why should you use the Kelly crite­rion?

Nov 5, 2023, 11:26 PM
27 points
25 comments16 min readLW link

On Over­hangs and Tech­nolog­i­cal Change

RokoNov 5, 2023, 10:58 PM
50 points
19 comments2 min readLW link

xAI an­nounces Grok, beats GPT-3.5

Nikola JurkovicNov 5, 2023, 10:11 PM
10 points
6 comments1 min readLW link
(x.ai)

Disen­tan­gling four mo­ti­va­tions for act­ing in ac­cor­dance with UDT

Julian StastnyNov 5, 2023, 9:26 PM
35 points
3 comments7 min readLW link

AI as Su­per-Demagogue

RationalDinoNov 5, 2023, 9:21 PM
11 points
12 comments9 min readLW link

EA orgs’ le­gal struc­ture in­hibits risk tak­ing and in­for­ma­tion shar­ing on the margin

ElizabethNov 5, 2023, 7:13 PM
136 points
17 comments4 min readLW link

Eric Sch­midt on re­cur­sive self-improvement

Nikola JurkovicNov 5, 2023, 7:05 PM
24 points
3 comments1 min readLW link
(www.youtube.com)

Pivotal Acts might Not be what You Think they are

Johannes C. MayerNov 5, 2023, 5:23 PM
41 points
13 comments3 min readLW link

The As­sumed In­tent Bias

silentbobNov 5, 2023, 4:28 PM
51 points
13 comments6 min readLW link

Go flash blink­ing lights at printed text right now

lemonhopeNov 5, 2023, 7:29 AM
15 points
9 comments1 min readLW link

Life of GPT

Odd anonNov 5, 2023, 4:55 AM
6 points
2 comments5 min readLW link

Light­ning Talks

ScrewtapeNov 5, 2023, 3:27 AM
6 points
3 comments4 min readLW link

Utility is not the se­lec­tion target

tailcalledNov 4, 2023, 10:48 PM
24 points
1 comment1 min readLW link

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

RokoNov 4, 2023, 10:23 PM
107 points
24 comments6 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasperNov 4, 2023, 8:08 PM
279 points
42 comments3 min readLW link

Ge­netic fit­ness is a mea­sure of se­lec­tion strength, not the se­lec­tion target

Kaj_SotalaNov 4, 2023, 7:02 PM
58 points
44 comments18 min readLW link

The Soul Key

Richard_NgoNov 4, 2023, 5:51 PM
112 points
10 comments8 min readLW link1 review
(www.narrativeark.xyz)

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut CirsteaNov 4, 2023, 5:34 PM
27 points
0 comments1 min readLW link
(arxiv.org)