RSS

[Question] What is the best way to talk about prob­a­bil­ities you ex­pect to change with ev­i­dence/​ex­per­i­ments?

Will_Pearson19 Apr 2024 15:35 UTC
12 points
8 comments1 min readLW link

When is a mind me?

Rob Bensinger17 Apr 2024 5:56 UTC
67 points
54 comments15 min readLW link

Progress Up­date #1 from the GDM Mech In­terp Team: Full Update

19 Apr 2024 19:06 UTC
31 points
3 comments8 min readLW link

[Question] How to Model the Fu­ture of Open-Source LLMs?

Joel Burget19 Apr 2024 14:28 UTC
10 points
1 comment1 min readLW link

Should we max­i­mize the Geo­met­ric Ex­pec­ta­tion of Utility?

A.H.17 Apr 2024 10:37 UTC
4 points
17 comments9 min readLW link

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
75 points
1 comment12 min readLW link

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

18 Apr 2024 0:27 UTC
131 points
14 comments7 min readLW link

hy­dro­gen tube transport

bhauth18 Apr 2024 22:47 UTC
26 points
6 comments5 min readLW link
(www.bhauth.com)

[In­tro to brain-like-AGI safety] 6. Big pic­ture of mo­ti­va­tion, de­ci­sion-mak­ing, and RL

Steven Byrnes2 Mar 2022 15:26 UTC
68 points
17 comments15 min readLW link

My De­tailed Notes & Com­men­tary from Sec­u­lar Solstice

Jeffrey Heninger23 Mar 2024 18:48 UTC
35 points
11 comments13 min readLW link

A Re­view of In-Con­text Learn­ing Hy­pothe­ses for Au­to­mated AI Align­ment Research

alamerton18 Apr 2024 18:29 UTC
20 points
4 comments15 min readLW link

Progress Up­date #1 from the GDM Mech In­terp Team: Summary

19 Apr 2024 19:06 UTC
18 points
0 comments3 min readLW link

[Question] If digi­tal goods in vir­tual wor­lds in­crease GDP, do we ac­tu­ally be­come richer?

No77e19 Apr 2024 10:06 UTC
4 points
4 comments1 min readLW link

Mid-con­di­tional love

KatjaGrace17 Apr 2024 4:00 UTC
64 points
14 comments2 min readLW link
(worldspiritsockpuppet.com)

Blessed in­for­ma­tion, garbage in­for­ma­tion, cursed information

tailcalled18 Apr 2024 16:56 UTC
20 points
5 comments3 min readLW link

[Linkpost] Prac­ti­cally-A-Book Re­view: Root­claim $100,000 Lab Leak Debate

trevor28 Mar 2024 16:03 UTC
76 points
22 comments2 min readLW link
(www.astralcodexten.com)

Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC
22 points
70 comments1 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
262 points
45 comments12 min readLW link

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkins19 Apr 2024 13:43 UTC
16 points
2 comments27 min readLW link

Your Strength as a Rationalist

Eliezer Yudkowsky11 Aug 2007 0:21 UTC
229 points
123 comments2 min readLW link