DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC

46 points

2 comments4 min readLW link

International Scientific Report on the Safety of Advanced AI: Key Information

Aryeh Englander18 May 2024 1:45 UTC

17 points

0 comments13 min readLW link

Goodhart in RL with KL: Appendix

Thomas Kwa18 May 2024 0:40 UTC

9 points

0 comments6 min readLW link

AI 2030 – AI Policy Roadmap

LTM17 May 2024 23:29 UTC

2 points

0 comments1 min readLW link

Language Models Model Us

eggsyntax17 May 2024 21:00 UTC

57 points

4 comments7 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar Skalse17 May 2024 19:13 UTC

39 points

1 comment2 min readLW link

DeepMind: Frontier Safety Framework

Zach Stein-Perlman17 May 2024 17:30 UTC

59 points

0 comments3 min readLW link

(deepmind.google)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill and Lee Sharkey

17 May 2024 16:25 UTC

38 points

1 comment4 min readLW link

(publications.apolloresearch.ai)

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and Melissa Samworth

17 May 2024 15:57 UTC

57 points

0 comments1 min readLW link

Is There Really a Child Penalty in the Long Run?

Maxwell Tabarrok17 May 2024 11:56 UTC

29 points

5 comments5 min readLW link

(www.maximum-progress.com)

My Hammer Time Final Exam

adios17 May 2024 9:28 UTC

9 points

1 comment3 min readLW link

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

abstractapplic17 May 2024 0:25 UTC

22 points

9 comments2 min readLW link

To an LLM, everything looks like a logic puzzle

Jesse Richardson16 May 2024 22:21 UTC

10 points

0 comments2 min readLW link

AI Safety Institute’s Inspect hello world example for AI evals

TheManxLoiner16 May 2024 20:47 UTC

3 points

0 comments1 min readLW link

(lovkush.medium.com)

Feeling (instrumentally) Rational

Pi Rogers16 May 2024 18:56 UTC

14 points

5 comments1 min readLW link

Advice for Activists from the History of Environmentalism

Jeffrey Heninger16 May 2024 18:40 UTC

68 points

3 comments6 min readLW link

(blog.aiimpacts.org)

Ninety-five theses on AI

hamandcheese16 May 2024 17:51 UTC

13 points

0 comments7 min readLW link

FMT: a great opportunity for soon-to-be parents

Anton Rodenhauser16 May 2024 13:24 UTC

8 points

1 comment15 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Gunnar_Zarncke16 May 2024 13:09 UTC

50 points

4 comments1 min readLW link

(arxiv.org)

The Dunning-Kruger of disproving Dunning-Kruger

kromem16 May 2024 10:11 UTC

31 points

0 comments5 min readLW link