RSS

Co­he­sion and busi­ness problems

Adam Zerner19 Apr 2024 0:45 UTC
8 points
1 comment4 min readLW link

The Ther­mo­dy­nam­ics of Death

Peter lawless 19 Apr 2024 0:36 UTC
1 point
0 comments10 min readLW link

hy­dro­gen tube transport

bhauth18 Apr 2024 22:47 UTC
20 points
2 comments5 min readLW link
(www.bhauth.com)

A Re­view of In-Con­text Learn­ing Hy­pothe­ses for Au­to­mated AI Align­ment Research

alamerton18 Apr 2024 18:29 UTC
13 points
1 comment15 min readLW link

Blessed in­for­ma­tion, garbage in­for­ma­tion, cursed information

tailcalled18 Apr 2024 16:56 UTC
19 points
2 comments3 min readLW link

[Fic­tion] A Confession

Arjun Panickssery18 Apr 2024 16:28 UTC
28 points
3 comments5 min readLW link
(arjunpanickssery.substack.com)

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
61 points
0 comments12 min readLW link

Co­op­er­a­tion is op­ti­mal, with weaker agents too  -  tldr

Ryo 18 Apr 2024 15:03 UTC
10 points
14 comments4 min readLW link
(medium.com)

How to co­or­di­nate de­spite our bi­ases? - tldr

Ryo 18 Apr 2024 15:03 UTC
3 points
2 comments3 min readLW link
(medium.com)

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

Diffractor18 Apr 2024 8:39 UTC
27 points
0 comments19 min readLW link

An ex­am­i­na­tion of GPT-2′s bor­ing yet effec­tive glitch

MiguelDev18 Apr 2024 5:26 UTC
5 points
3 comments3 min readLW link

[Question] What if Ethics is Prov­ably Self-Con­tra­dic­tory?

Yitz18 Apr 2024 5:12 UTC
2 points
5 comments2 min readLW link

The Mom Test: Sum­mary and Thoughts

Adam Zerner18 Apr 2024 3:34 UTC
37 points
1 comment10 min readLW link

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

18 Apr 2024 0:27 UTC
112 points
13 comments7 min readLW link

AXRP Epi­sode 28 - Su­ing Labs for AI Risk with Gabriel Weil

DanielFilan17 Apr 2024 21:42 UTC
10 points
0 comments65 min readLW link

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

17 Apr 2024 21:09 UTC
26 points
1 comment3 min readLW link
(tiny.cc)

An eth­i­cal frame­work to su­per­sede Utilitarianism

metalcrow17 Apr 2024 17:18 UTC
1 point
4 comments4 min readLW link

Mov­ing on from com­mu­nity living

Vika17 Apr 2024 17:02 UTC
48 points
6 comments3 min readLW link
(vkrakovna.wordpress.com)

Staged release

Zach Stein-Perlman17 Apr 2024 16:00 UTC
9 points
4 comments2 min readLW link

[Question] Dis­com­fort Stacking

Lewis O’Brien17 Apr 2024 14:49 UTC
5 points
11 comments1 min readLW link