Dan H

Karma: 2,872

newsletter.safe.ai

newsletter.mlsafety.org

[MLSN #1]: ICLR Safety Paper Roundup

Dan H18 Oct 2021 15:19 UTC

59 points

1 comment2 min readLW link

[MLSN #2]: Adversarial Training

Dan H9 Dec 2021 17:16 UTC

26 points

0 comments3 min readLW link

[MLSN #3]: NeurIPS Safety Paper Roundup

Dan H8 Mar 2022 15:17 UTC

45 points

0 comments4 min readLW link

[$20K in Prizes] AI Safety Arguments Competition

Dan H, Kevin Liu, ozhang, ThomasW and Sidney Hough

26 Apr 2022 16:13 UTC

75 points

518 comments3 min readLW link

Introducing the ML Safety Scholars Program

Dan H, ThomasW, Mantas Mazeika, ozhang, Sidney Hough and Kevin Liu

4 May 2022 16:01 UTC

74 points

3 comments3 min readLW link

Introduction to Pragmatic AI Safety [Pragmatic AI Safety #1]

Dan H and ThomasW

9 May 2022 17:06 UTC

80 points

3 comments6 min readLW link

A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H and ThomasW

9 May 2022 17:18 UTC

163 points

6 comments35 min readLW link

Actionable-guidance and roadmap recommendations for the NIST AI Risk Management Framework

Dan H and Tony Barrett

17 May 2022 15:26 UTC

26 points

0 comments3 min readLW link

Complex Systems for AI Safety [Pragmatic AI Safety #3]

Dan H and ThomasW

24 May 2022 0:00 UTC

57 points

2 comments21 min readLW link

Perform Tractable Research While Avoiding Capabilities Externalities [Pragmatic AI Safety #4]

Dan H and ThomasW

30 May 2022 20:25 UTC

51 points

3 comments25 min readLW link

[MLSN #4]: Many New Interpretability Papers, Virtual Logit Matching, Rationalization Helps Robustness

Dan H3 Jun 2022 1:20 UTC

18 points

0 comments4 min readLW link

Open Problems in AI X-Risk [PAIS #5]

Dan H and ThomasW

10 Jun 2022 2:08 UTC

59 points

6 comments36 min readLW link

[Linkpost] Existential Risk Analysis in Empirical Research Papers

Dan H2 Jul 2022 0:09 UTC

40 points

0 comments1 min readLW link

(arxiv.org)

NeurIPS ML Safety Workshop 2022

Dan H26 Jul 2022 15:28 UTC

72 points

2 comments1 min readLW link

(neurips2022.mlsafety.org)

$20K In Bounties for AI Safety Public Materials

Dan H, ThomasW and ozhang

5 Aug 2022 2:52 UTC

71 points

9 comments6 min readLW link

Announcing the Introduction to ML Safety course

Dan H, ThomasW and ozhang

6 Aug 2022 2:46 UTC

73 points

6 comments7 min readLW link

[MLSN #5]: Prize Compilation

Dan H26 Sep 2022 21:55 UTC

14 points

1 comment2 min readLW link

[MLSN #6]: Transparency survey, provable robustness, ML models that predict the future

Dan H12 Oct 2022 20:56 UTC

27 points

0 comments6 min readLW link

[MLSN #8] Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming

Dan H and ThomasW

20 Feb 2023 15:54 UTC

20 points

0 comments4 min readLW link

(newsletter.mlsafety.org)

There are no coherence theorems

20 Feb 2023 21:25 UTC

121 points

114 comments19 min readLW link