RSS

Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC
22 points
95 comments1 min readLW link

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofsky29 Nov 2022 19:30 UTC
69 points
9 comments33 min readLW link
(www.cold-takes.com)

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

30 Apr 2024 18:51 UTC
151 points
34 comments45 min readLW link

Biorisk is an Un­helpful Anal­ogy for AI Risk

Davidmanheim6 May 2024 6:20 UTC
13 points
5 comments1 min readLW link

Key take­aways from our EA and al­ign­ment re­search sur­veys

3 May 2024 18:10 UTC
81 points
6 comments21 min readLW link

in­tro­duc­tion to can­cer vaccines

bhauth5 May 2024 1:06 UTC
61 points
7 comments5 min readLW link
(www.bhauth.com)

Rapid ca­pa­bil­ity gain around su­per­ge­nius level seems prob­a­ble even with­out in­tel­li­gence need­ing to im­prove intelligence

6 May 2024 17:09 UTC
22 points
1 comment4 min readLW link

[Question] Does re­duc­ing the amount of RL for a given ca­pa­bil­ity level make AI safer?

Chris_Leong5 May 2024 17:04 UTC
43 points
13 comments1 min readLW link

GDP per cap­ita in 2050

Hauke Hillebrandt6 May 2024 15:14 UTC
16 points
5 comments1 min readLW link

[Question] Which skin­care prod­ucts are ev­i­dence-based?

Vanessa Kosoy2 May 2024 15:22 UTC
104 points
41 comments1 min readLW link

Ex­plain­ing a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC
73 points
1 comment5 min readLW link

D&D.Sci Long War: Defen­der of Data-mocracy

aphyer26 Apr 2024 22:30 UTC
41 points
17 comments3 min readLW link

Ob­ser­va­tions on Teach­ing for Four Weeks

ClareChiaraVincent6 May 2024 16:55 UTC
9 points
0 comments3 min readLW link

Some Ex­per­i­ments I’d Like Some­one To Try With An Amnestic

johnswentworth4 May 2024 22:04 UTC
46 points
19 comments3 min readLW link

Un­cov­er­ing De­cep­tive Ten­den­cies in Lan­guage Models: A Si­mu­lated Com­pany AI Assistant

6 May 2024 7:07 UTC
58 points
2 comments1 min readLW link
(arxiv.org)

Q&A on Pro­posed SB 1047

Zvi2 May 2024 15:10 UTC
63 points
3 comments44 min readLW link
(thezvi.wordpress.com)

Re­ject­ing Television

Declan Molony23 Apr 2024 4:59 UTC
68 points
9 comments6 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
183 points
75 comments10 min readLW link

[Question] What are some trig­gers that prompt you to do a Fermi es­ti­mate, or to pull up a spread­sheet and make a sim­ple/​rough quan­ti­ta­tive model?

Eli Tyre25 Jul 2021 6:47 UTC
38 points
16 comments1 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
293 points
108 comments17 min readLW link
(dynomight.net)