Re­sponses to ap­par­ent ra­tio­nal­ist con­fu­sions about game /​ de­ci­sion theory

Anthony DiGiovanniAug 30, 2023, 10:02 PM
142 points

57 votes

Overall karma indicates overall quality.

20 comments12 min readLW link1 review

In­vuln­er­a­ble In­com­plete Prefer­ences: A For­mal Statement

SCPAug 30, 2023, 9:59 PM
136 points

44 votes

Overall karma indicates overall quality.

39 comments35 min readLW link

Re­port on Fron­tier Model Training

YafahEdelmanAug 30, 2023, 8:02 PM
122 points

66 votes

Overall karma indicates overall quality.

21 comments21 min readLW link
(docs.google.com)

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

Aug 30, 2023, 5:36 PM
17 points

7 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(arxiv.org)

A Let­ter to the Edi­tor of MIT Tech­nol­ogy Review

JeffsAug 30, 2023, 4:59 PM
0 points

5 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Biose­cu­rity Cul­ture, Com­puter Se­cu­rity Culture

jefftkAug 30, 2023, 4:40 PM
103 points

52 votes

Overall karma indicates overall quality.

11 comments2 min readLW link
(www.jefftk.com)

Why I hang out at LessWrong and why you should check-in there ev­ery now and then

Bill BenzonAug 30, 2023, 3:20 PM
16 points

9 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

“Want­ing” and “lik­ing”

Mateusz BagińskiAug 30, 2023, 2:52 PM
23 points

7 votes

Overall karma indicates overall quality.

3 comments29 min readLW link

Open Call for Re­search As­sis­tants in Devel­op­men­tal Interpretability

Aug 30, 2023, 9:02 AM
56 points

22 votes

Overall karma indicates overall quality.

11 comments4 min readLW link

LTFF and EAIF are un­usu­ally fund­ing-con­strained right now

Aug 30, 2023, 1:03 AM
90 points

37 votes

Overall karma indicates overall quality.

24 comments15 min readLW link
(forum.effectivealtruism.org)

Paper Walk­through: Au­to­mated Cir­cuit Dis­cov­ery with Arthur Conmy

Neel NandaAug 29, 2023, 10:07 PM
36 points

12 votes

Overall karma indicates overall quality.

1 comment1 min readLW link
(www.youtube.com)

An OV-Co­her­ent Toy Model of At­ten­tion Head Superposition

Aug 29, 2023, 7:44 PM
26 points

9 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

The Eco­nomics of the As­teroid Deflec­tion Prob­lem (Dom­i­nant As­surance Con­tracts)

moyamoAug 29, 2023, 6:28 PM
77 points

33 votes

Overall karma indicates overall quality.

71 comments15 min readLW link

Demo­cratic Fine-Tuning

Joe EdelmanAug 29, 2023, 6:13 PM
22 points

19 votes

Overall karma indicates overall quality.

2 comments1 min readLW link
(open.substack.com)

Should ra­tio­nal­ists (be seen to) win?

Will_PearsonAug 29, 2023, 6:13 PM
6 points

4 votes

Overall karma indicates overall quality.

7 comments1 min readLW link

Frank­furt meetup

sultanAug 29, 2023, 6:10 PM
2 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Is­tan­bul meetup

sultanAug 29, 2023, 6:10 PM
3 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Bro­ken Bench­mark: MMLU

awgAug 29, 2023, 6:09 PM
24 points

26 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(www.youtube.com)

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM
12 points

5 votes

Overall karma indicates overall quality.

0 comments8 min readLW link
(newsletter.safe.ai)

Loft Bed Fan Guard

jefftkAug 29, 2023, 1:30 PM
16 points

6 votes

Overall karma indicates overall quality.

3 comments1 min readLW link
(www.jefftk.com)

Dat­ing Roundup #1: This is Why You’re Single

ZviAug 29, 2023, 12:50 PM
87 points

59 votes

Overall karma indicates overall quality.

28 comments38 min readLW link
(thezvi.wordpress.com)

Neu­ral Rec­og­niz­ers: Some [old] notes based on a TV tube metaphor [per­cep­tual con­tact with the world]

Bill BenzonAug 29, 2023, 11:33 AM
4 points

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

Bar­ri­ers to Mechanis­tic In­ter­pretabil­ity for AGI Safety

Connor LeahyAug 29, 2023, 10:56 AM
63 points

41 votes

Overall karma indicates overall quality.

13 comments1 min readLW link
(www.youtube.com)

New­comb Variant

lsusrAug 29, 2023, 7:02 AM
25 points

23 votes

Overall karma indicates overall quality.

23 comments1 min readLW link

[Question] In­cen­tives af­fect­ing al­ign­ment-re­searcher encouragement

Nicholas KrossAug 29, 2023, 5:11 AM
28 points

11 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Any­one want to de­bate pub­li­cly about FDT?

Bentham's BulldogAug 29, 2023, 3:45 AM
14 points

10 votes

Overall karma indicates overall quality.

31 comments1 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

Aug 29, 2023, 1:29 AM
54 points

20 votes

Overall karma indicates overall quality.

3 comments10 min readLW link

An In­ter­pretabil­ity Illu­sion for Ac­ti­va­tion Patch­ing of Ar­bi­trary Subspaces

Aug 29, 2023, 1:04 AM
77 points

28 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

OpenAI API base mod­els are not syco­phan­tic, at any size

nostalgebraistAug 29, 2023, 12:58 AM
183 points

80 votes

Overall karma indicates overall quality.

20 comments2 min readLW link
(colab.research.google.com)

Paradigms and The­ory Choice in AI: Adap­tivity, Econ­omy and Control

particlemaniaAug 28, 2023, 10:19 PM
4 points

2 votes

Overall karma indicates overall quality.

0 comments16 min readLW link

[Question] Hu­man­i­ties In A Post-Con­scious AI World?

NetcentricaAug 28, 2023, 9:59 PM
1 point

2 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

In­tro­duc­ing the Cen­ter for AI Policy (& we’re hiring!)

Thomas LarsenAug 28, 2023, 9:17 PM
123 points

67 votes

Overall karma indicates overall quality.

50 comments2 min readLW link
(www.aipolicy.us)

[Question] 45% to 55% vs. 90% to 100%

yhoisethAug 28, 2023, 7:15 PM
5 points

5 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

The Ev­i­dence for Ques­tion De­com­po­si­tion is Weak

niplavAug 28, 2023, 3:46 PM
22 points

10 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

ACX Meetup Any­where, Bratis­lava, Slovakia

David VargaAug 28, 2023, 3:40 PM
1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

The An­thropic Prin­ci­ple Tells Us That AGI Will Not Be Conscious

nemAug 28, 2023, 3:25 PM
2 points

9 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

No More Freezer Pucks

jefftkAug 28, 2023, 3:20 PM
10 points

2 votes

Overall karma indicates overall quality.

7 comments1 min readLW link
(www.jefftk.com)

The mind as a polyvis­cous fluid

Bill BenzonAug 28, 2023, 2:38 PM
8 points

8 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

[Question] Who can most re­duce X-Risk?

sudhanshu_kasewaAug 28, 2023, 2:38 PM
1 point

3 votes

Overall karma indicates overall quality.

12 comments1 min readLW link

Drinks at a bar

yakimoffAug 28, 2023, 3:13 AM
2 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Dear Self; we need to talk about ambition

ElizabethAug 27, 2023, 11:10 PM
271 points

159 votes

Overall karma indicates overall quality.

28 comments8 min readLW link2 reviews
(acesounderglass.com)

AI pause/​gov­er­nance ad­vo­cacy might be net-nega­tive, es­pe­cially with­out a fo­cus on ex­plain­ing x-risk

Mikhail SaminAug 27, 2023, 11:05 PM
72 points

44 votes

Overall karma indicates overall quality.

9 comments6 min readLW link

Will is­sues are quite nearly skill issues

dkl9Aug 27, 2023, 4:42 PM
1 point

4 votes

Overall karma indicates overall quality.

1 comment3 min readLW link
(dkl9.net)

Xanadu, GPT, and Beyond: An ad­ven­ture of the mind

Bill BenzonAug 27, 2023, 4:19 PM
2 points

4 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

High level overview on how to go about es­ti­mat­ing “p(doom)” or the like

Aryeh EnglanderAug 27, 2023, 4:01 PM
16 points

3 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Try­ing a Wet Suit

jefftkAug 27, 2023, 3:00 PM
35 points

16 votes

Overall karma indicates overall quality.

5 comments1 min readLW link
(www.jefftk.com)

Ap­ply to a small iter­a­tion of MLAB in Oxford

Aug 27, 2023, 2:54 PM
2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

Aug 27, 2023, 2:21 PM
12 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

The Game of Dominance

Karl von WendtAug 27, 2023, 11:04 AM
24 points

12 votes

Overall karma indicates overall quality.

15 comments6 min readLW link

Eliezer Yud­kowsky Is Fre­quently, Con­fi­dently, Egre­giously Wrong

Bentham's BulldogAug 27, 2023, 1:06 AM
−12 points

122 votes

Overall karma indicates overall quality.

97 comments36 min readLW link