RSS

Dan H

Karma: 2,872

newsletter.safe.ai

newsletter.mlsafety.org

[MLSN #1]: ICLR Safety Paper Roundup

Dan H18 Oct 2021 15:19 UTC
59 points
1 comment2 min readLW link

[MLSN #2]: Ad­ver­sar­ial Training

Dan H9 Dec 2021 17:16 UTC
26 points
0 comments3 min readLW link

[MLSN #3]: NeurIPS Safety Paper Roundup

Dan H8 Mar 2022 15:17 UTC
45 points
0 comments4 min readLW link

[$20K in Prizes] AI Safety Ar­gu­ments Competition

26 Apr 2022 16:13 UTC
75 points
518 comments3 min readLW link

In­tro­duc­ing the ML Safety Schol­ars Program

4 May 2022 16:01 UTC
74 points
3 comments3 min readLW link

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

9 May 2022 17:06 UTC
80 points
3 comments6 min readLW link

A Bird’s Eye View of the ML Field [Prag­matic AI Safety #2]

9 May 2022 17:18 UTC
163 points
6 comments35 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

17 May 2022 15:26 UTC
26 points
0 comments3 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

24 May 2022 0:00 UTC
57 points
2 comments21 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
51 points
3 comments25 min readLW link

[MLSN #4]: Many New In­ter­pretabil­ity Papers, Vir­tual Logit Match­ing, Ra­tion­al­iza­tion Helps Robustness

Dan H3 Jun 2022 1:20 UTC
18 points
0 comments4 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
59 points
6 comments36 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan H2 Jul 2022 0:09 UTC
40 points
0 comments1 min readLW link
(arxiv.org)

NeurIPS ML Safety Work­shop 2022

Dan H26 Jul 2022 15:28 UTC
72 points
2 comments1 min readLW link
(neurips2022.mlsafety.org)

$20K In Boun­ties for AI Safety Public Materials

5 Aug 2022 2:52 UTC
71 points
9 comments6 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

6 Aug 2022 2:46 UTC
73 points
6 comments7 min readLW link

[MLSN #5]: Prize Compilation

Dan H26 Sep 2022 21:55 UTC
14 points
1 comment2 min readLW link

[MLSN #6]: Trans­parency sur­vey, prov­able ro­bust­ness, ML mod­els that pre­dict the future

Dan H12 Oct 2022 20:56 UTC
27 points
0 comments6 min readLW link

[MLSN #8] Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

20 Feb 2023 15:54 UTC
20 points
0 comments4 min readLW link
(newsletter.mlsafety.org)

There are no co­her­ence theorems

20 Feb 2023 21:25 UTC
121 points
114 comments19 min readLW link