K-com­plex­ity is silly; use cross-en­tropy instead

So8res20 Dec 2022 23:06 UTC
137 points
53 comments4 min readLW link2 reviews

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

20 Dec 2022 20:08 UTC
100 points
34 comments1 min readLW link
(www.anthropic.com)

Reflec­tions: Bureau­cratic Hell

Haris Rashid20 Dec 2022 19:22 UTC
−5 points
1 comment1 min readLW link
(www.harisrab.com)

Pro­lifer­at­ing Education

Haris Rashid20 Dec 2022 19:22 UTC
−1 points
2 comments5 min readLW link
(www.harisrab.com)

AGI is here, but no­body wants it. Why should we even care?

MGow20 Dec 2022 19:14 UTC
−22 points
0 comments17 min readLW link

Prop­er­ties of cur­rent AIs and some pre­dic­tions of the evolu­tion of AI from the per­spec­tive of scale-free the­o­ries of agency and reg­u­la­tive development

Roman Leventov20 Dec 2022 17:13 UTC
33 points
3 comments36 min readLW link

I be­lieve some AI doomers are overconfident

FTPickle20 Dec 2022 17:09 UTC
8 points
15 comments2 min readLW link

Note on al­gorithms with mul­ti­ple trained components

Steven Byrnes20 Dec 2022 17:08 UTC
23 points
4 comments2 min readLW link

Marvel Snap: Phase 2

Zvi20 Dec 2022 14:50 UTC
11 points
1 comment13 min readLW link
(thezvi.wordpress.com)

(Ex­tremely) Naive Gra­di­ent Hack­ing Doesn’t Work

ojorgensen20 Dec 2022 14:35 UTC
14 points
0 comments6 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
79 points
22 comments4 min readLW link

Un­der-Ap­pre­ci­ated Ways to Use Flash­cards—Part I

Florence Hinder20 Dec 2022 12:43 UTC
22 points
5 comments5 min readLW link
(thoughtsaver.ghost.io)

EA & LW Fo­rums Weekly Sum­mary (12th Dec − 18th Dec 22′)

Zoe Williams20 Dec 2022 9:49 UTC
10 points
0 comments1 min readLW link

[link, 2019] AI paradigm: in­ter­ac­tive learn­ing from un­la­beled instructions

the gears to ascension20 Dec 2022 6:45 UTC
2 points
0 comments2 min readLW link
(jgrizou.github.io)

[Fic­tion] Un­spo­ken Stone

Gordon Seidoh Worley20 Dec 2022 5:11 UTC
19 points
0 comments5 min readLW link

No­tice when you stop read­ing right be­fore you understand

just_browsing20 Dec 2022 5:09 UTC
59 points
6 comments1 min readLW link

Take 12: RLHF’s use is ev­i­dence that orgs will jam RL at real-world prob­lems.

Charlie Steiner20 Dec 2022 5:01 UTC
25 points
1 comment3 min readLW link

More notes from rais­ing a late-talk­ing kid

Steven Byrnes20 Dec 2022 2:13 UTC
40 points
2 comments6 min readLW link

The “Min­i­mal La­tents” Ap­proach to Nat­u­ral Abstractions

johnswentworth20 Dec 2022 1:22 UTC
53 points
24 comments12 min readLW link

our deep­est wishes

Tamsin Leake20 Dec 2022 0:23 UTC
29 points
0 comments1 min readLW link
(carado.moe)

Shard The­ory in Nine Th­e­ses: a Distil­la­tion and Crit­i­cal Appraisal

LawrenceC19 Dec 2022 22:52 UTC
138 points
30 comments18 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

19 Dec 2022 21:31 UTC
63 points
28 comments10 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC
91 points
30 comments9 min readLW link

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC
28 points
3 comments20 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022 15:19 UTC
79 points
2 comments19 min readLW link

Does ChatGPT’s perfor­mance war­rant work­ing on a tu­tor for chil­dren? [It’s time to take it to the lab.]

Bill Benzon19 Dec 2022 15:12 UTC
13 points
5 comments4 min readLW link
(new-savanna.blogspot.com)

Con­di­tions for Su­per­ra­tional­ity-mo­ti­vated Co­op­er­a­tion in a one-shot Pri­soner’s Dilemma

Jim Buhler19 Dec 2022 15:00 UTC
24 points
4 comments5 min readLW link

Next Level Seinfeld

Zvi19 Dec 2022 13:30 UTC
50 points
8 comments1 min readLW link
(thezvi.wordpress.com)

CEA Disambiguation

jefftk19 Dec 2022 13:20 UTC
24 points
0 comments1 min readLW link
(www.jefftk.com)

Why mechanis­tic in­ter­pretabil­ity does not and can­not con­tribute to long-term AGI safety (from mes­sages with a friend)

Remmelt19 Dec 2022 12:02 UTC
−3 points
9 comments31 min readLW link

Hacker-AI and Cy­ber­war 2.0+

Erland Wittkotter19 Dec 2022 11:46 UTC
2 points
0 comments15 min readLW link

Non-Tech­ni­cal Prepa­ra­tion for Hacker-AI and Cy­ber­war 2.0+

Erland Wittkotter19 Dec 2022 11:42 UTC
2 points
0 comments25 min readLW link

An Effec­tive Grab Bag

stavros19 Dec 2022 10:29 UTC
20 points
1 comment7 min readLW link

Slick hy­per­finite Ram­sey the­ory proof

Alok Singh19 Dec 2022 8:40 UTC
8 points
3 comments1 min readLW link
(alok.github.io)

The True Spirit of Sols­tice?

Raemon19 Dec 2022 8:00 UTC
69 points
31 comments9 min readLW link

The Risk of Or­bital De­bris and One (Cheap) Way to Miti­gate It

clans19 Dec 2022 3:16 UTC
13 points
1 comment4 min readLW link
(locationtbd.home.blog)

Why I think that teach­ing philos­o­phy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC
5 points
0 comments2 min readLW link

A tem­plate for do­ing an­nual reviews

peterslattery19 Dec 2022 3:09 UTC
2 points
0 comments1 min readLW link

Event [Berkeley]: Align­ment Col­lab­o­ra­tor Speed-Meeting

19 Dec 2022 2:24 UTC
18 points
2 comments1 min readLW link

An eas­ier(?) end to the elec­toral college

ejacob19 Dec 2022 2:09 UTC
2 points
2 comments2 min readLW link

How Death Feels

sisyphus18 Dec 2022 23:47 UTC
−7 points
9 comments1 min readLW link

Why Are Women Hot?

Jacob Falkovich18 Dec 2022 23:20 UTC
17 points
19 comments11 min readLW link

[Question] Can we, in prin­ci­ple, know the mea­sure of coun­ter­fac­tual quan­tum branches?

sisyphus18 Dec 2022 22:07 UTC
1 point
15 comments1 min readLW link

Bos­ton Sols­tice 2022 Retrospective

jefftk18 Dec 2022 19:00 UTC
19 points
3 comments5 min readLW link
(www.jefftk.com)

Take 11: “Align­ing lan­guage mod­els” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC
32 points
0 comments2 min readLW link

Bad at Arith­metic, Promis­ing at Math

cohenmacaulay18 Dec 2022 5:40 UTC
100 points
19 comments20 min readLW link1 review

Over­con­fi­dence bubbles

kaputmi18 Dec 2022 2:07 UTC
3 points
0 comments2 min readLW link

Pos­i­tive val­ues seem more ro­bust and last­ing than prohibitions

TurnTrout17 Dec 2022 21:43 UTC
51 points
13 comments2 min readLW link