Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC

22 points

95 comments1 min readLW link

Why Would AI “Aim” To Defeat Humanity?

HoldenKarnofsky29 Nov 2022 19:30 UTC

69 points

9 comments33 min readLW link

(www.cold-takes.com)

Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack and TurnTrout

30 Apr 2024 18:51 UTC

151 points

34 comments45 min readLW link

Biorisk is an Unhelpful Analogy for AI Risk

Davidmanheim6 May 2024 6:20 UTC

13 points

5 comments1 min readLW link

Key takeaways from our EA and alignment research surveys

Cameron Berg, Judd Rosenblatt, florin_pop and AE Studio

3 May 2024 18:10 UTC

81 points

6 comments21 min readLW link

introduction to cancer vaccines

bhauth5 May 2024 1:06 UTC

61 points

7 comments5 min readLW link

(www.bhauth.com)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

Towards_Keeperhood and Davanchama

6 May 2024 17:09 UTC

22 points

1 comment4 min readLW link

[Question] Does reducing the amount of RL for a given capability level make AI safer?

Chris_Leong5 May 2024 17:04 UTC

43 points

13 comments1 min readLW link

GDP per capita in 2050

Hauke Hillebrandt6 May 2024 15:14 UTC

16 points

5 comments1 min readLW link

[Question] Which skincare products are evidence-based?

Vanessa Kosoy2 May 2024 15:22 UTC

104 points

41 comments1 min readLW link

Explaining a Math Magic Trick

Robert_AIZI5 May 2024 19:41 UTC

73 points

1 comment5 min readLW link

D&D.Sci Long War: Defender of Data-mocracy

aphyer26 Apr 2024 22:30 UTC

41 points

17 comments3 min readLW link

Observations on Teaching for Four Weeks

ClareChiaraVincent6 May 2024 16:55 UTC

9 points

0 comments3 min readLW link

Some Experiments I’d Like Someone To Try With An Amnestic

johnswentworth4 May 2024 22:04 UTC

46 points

19 comments3 min readLW link

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Olli Järviniemi and evhub

6 May 2024 7:07 UTC

58 points

2 comments1 min readLW link

(arxiv.org)

Q&A on Proposed SB 1047

Zvi2 May 2024 15:10 UTC

63 points

3 comments44 min readLW link

(thezvi.wordpress.com)

Rejecting Television

Declan Molony23 Apr 2024 4:59 UTC

68 points

9 comments6 min readLW link

Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg and Neel Nanda

27 Apr 2024 11:13 UTC

183 points

75 comments10 min readLW link

[Question] What are some triggers that prompt you to do a Fermi estimate, or to pull up a spreadsheet and make a simple/rough quantitative model?

Eli Tyre25 Jul 2021 6:47 UTC

38 points

16 comments1 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC

293 points

108 comments17 min readLW link

(dynomight.net)