RSS

Charbel-Raphaël

Karma: 972

Charbel-Raphael Segerie

https://​​crsegerie.github.io/​​

Against Al­most Every The­ory of Im­pact of Interpretability

Charbel-Raphaël17 Aug 2023 18:44 UTC
279 points
69 comments26 min readLW link

In­tro­duc­tory Text­book to Vi­sion Models Interpretability

28 Jul 2023 17:32 UTC
41 points
0 comments1 min readLW link
(github.com)

Task de­com­po­si­tion for scal­able over­sight (AGISF Distil­la­tion)

Charbel-Raphaël25 Jul 2023 13:34 UTC
23 points
0 comments19 min readLW link
(docs.google.com)

An Overview of AI risks—the Flyer

17 Jul 2023 12:03 UTC
16 points
0 comments1 min readLW link
(docs.google.com)

In­tro­duc­ing EffiS­ciences’ AI Safety Unit

30 Jun 2023 7:44 UTC
61 points
0 comments12 min readLW link

Im­prove­ment on MIRI’s Corrigibility

9 Jun 2023 16:10 UTC
54 points
8 comments13 min readLW link

Thriv­ing in the Weird Times: Prepar­ing for the 100X Economy

8 May 2023 13:44 UTC
23 points
16 comments2 min readLW link