Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar Skalse17 May 2024 19:13 UTC

5 points

0 comments2 min readLW link

DeepMind: Frontier Safety Framework

Zach Stein-Perlman17 May 2024 17:30 UTC

23 points

0 comments3 min readLW link

(deepmind.google)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill and Lee Sharkey

17 May 2024 16:25 UTC

23 points

0 comments4 min readLW link

(publications.apolloresearch.ai)

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and Melissa Samworth

17 May 2024 15:57 UTC

39 points

0 comments1 min readLW link

Is There Really a Child Penalty in the Long Run?

Maxwell Tabarrok17 May 2024 11:56 UTC

20 points

3 comments5 min readLW link

(www.maximum-progress.com)

My Hammer Time Final Exam

adios17 May 2024 9:28 UTC

7 points

1 comment3 min readLW link

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

abstractapplic17 May 2024 0:25 UTC

19 points

7 comments2 min readLW link

To an LLM, everything looks like a logic puzzle

Jesse Richardson16 May 2024 22:21 UTC

10 points

0 comments2 min readLW link

AI Safety Institute’s Inspect hello world example for AI evals

TheManxLoiner16 May 2024 20:47 UTC

3 points

0 comments1 min readLW link

(lovkush.medium.com)

Feeling (instrumentally) Rational

Pi Rogers16 May 2024 18:56 UTC

14 points

5 comments1 min readLW link

Advice for Activists from the History of Environmentalism

Jeffrey Heninger16 May 2024 18:40 UTC

61 points

3 comments6 min readLW link

(blog.aiimpacts.org)

Ninety-five theses on AI

hamandcheese16 May 2024 17:51 UTC

12 points

0 comments7 min readLW link

FMT: a great opportunity for soon-to-be parents

Anton Rodenhauser16 May 2024 13:24 UTC

8 points

1 comment15 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Gunnar_Zarncke16 May 2024 13:09 UTC

47 points

4 comments1 min readLW link

(arxiv.org)

The Dunning-Kruger of disproving Dunning-Kruger

kromem16 May 2024 10:11 UTC

27 points

0 comments5 min readLW link

A case for fairness-enforcing irrational behavior

cousin_it16 May 2024 9:41 UTC

9 points

3 comments2 min readLW link

Podcast: Eye4AI on 2023 Survey

KatjaGrace16 May 2024 7:40 UTC

8 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Against “argument from overhang risk”

RobertM16 May 2024 4:44 UTC

28 points

9 comments5 min readLW link

Do you believe in hundred dollar bills lying on the ground? Consider humming

Elizabeth16 May 2024 0:00 UTC

103 points

10 comments6 min readLW link

(acesounderglass.com)

Introducing Statistical Utility Mechanics: A Framework for Utility Maximizers

J Bostock15 May 2024 21:56 UTC

9 points

0 comments7 min readLW link