HoldenKarnofsky

Karma: 7,166

Sabotage Evaluations for Frontier Models

David Duvenaud, Joe Benton, Sam Bowman, evhub, mishajw, Eric Christiansen, HoldenKarnofsky, Ethan Perez and Buck

18 Oct 2024 22:33 UTC

95 points

56 comments6 min readLW link

(assets.anthropic.com)

Case studies on social-welfare-based standards in various industries

HoldenKarnofsky20 Jun 2024 13:33 UTC

42 points

0 comments1 min readLW link

Good job opportunities for helping with the most important century

HoldenKarnofsky18 Jan 2024 17:30 UTC

36 points

0 comments4 min readLW link

(www.cold-takes.com)

We’re Not Ready: thoughts on “pausing” and responsible scaling policies

HoldenKarnofsky27 Oct 2023 15:19 UTC

200 points

33 comments8 min readLW link

3 levels of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC

69 points

14 comments7 min readLW link

A Playbook for AI Risk Reduction (focused on misaligned AI)

HoldenKarnofsky6 Jun 2023 18:05 UTC

90 points

42 comments14 min readLW link 1 review

Seeking (Paid) Case Studies on Standards

HoldenKarnofsky26 May 2023 17:58 UTC

69 points

9 comments11 min readLW link

Success without dignity: a nearcasting story of avoiding catastrophe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC

85 points

17 comments15 min readLW link

Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC

267 points

43 comments22 min readLW link 1 review

What does Bing Chat tell us about AI risk?

HoldenKarnofsky28 Feb 2023 17:40 UTC

80 points

21 comments2 min readLW link

(www.cold-takes.com)

How major governments can help with the most important century

HoldenKarnofsky24 Feb 2023 18:20 UTC

29 points

0 comments4 min readLW link

(www.cold-takes.com)

What AI companies can do today to help with the most important century

HoldenKarnofsky20 Feb 2023 17:00 UTC

38 points

3 comments9 min readLW link

(www.cold-takes.com)

Jobs that can help with the most important century

HoldenKarnofsky10 Feb 2023 18:20 UTC

24 points

0 comments19 min readLW link

(www.cold-takes.com)

Spreading messages to help with the most important century

HoldenKarnofsky25 Jan 2023 18:20 UTC

75 points

4 comments18 min readLW link

(www.cold-takes.com)

How we could stumble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC

71 points

18 comments18 min readLW link

(www.cold-takes.com)

Transformative AI issues (not just misalignment): an overview

HoldenKarnofsky5 Jan 2023 20:20 UTC

34 points

6 comments18 min readLW link

(www.cold-takes.com)

Racing through a minefield: the AI deployment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC

38 points

2 comments13 min readLW link

(www.cold-takes.com)

High-level hopes for AI alignment

HoldenKarnofsky15 Dec 2022 18:00 UTC

58 points

3 comments19 min readLW link

(www.cold-takes.com)

AI Safety Seems Hard to Measure

HoldenKarnofsky8 Dec 2022 19:50 UTC

71 points

6 comments14 min readLW link

(www.cold-takes.com)

Why Would AI “Aim” To Defeat Humanity?

HoldenKarnofsky29 Nov 2022 19:30 UTC

69 points

10 comments33 min readLW link

(www.cold-takes.com)