RSS

Charbel-Raphaël

Karma: 1,629

Charbel-Raphael Segerie

https://​​crsegerie.github.io/​​

Living in Paris

Against Al­most Every The­ory of Im­pact of Interpretability

Charbel-Raphaël17 Aug 2023 18:44 UTC
315 points
83 comments26 min readLW link

Davi­dad’s Bold Plan for Align­ment: An In-Depth Explanation

19 Apr 2023 16:09 UTC
154 points
29 comments21 min readLW link

Com­pendium of prob­lems with RLHF

Charbel-Raphaël29 Jan 2023 11:40 UTC
120 points
16 comments10 min readLW link

[Question] What con­vinc­ing warn­ing shot could help pre­vent ex­tinc­tion from AI?

13 Apr 2024 18:09 UTC
100 points
17 comments2 min readLW link

My in­tel­lec­tual jour­ney to (dis)solve the hard prob­lem of consciousness

Charbel-Raphaël6 Apr 2024 9:32 UTC
37 points
41 comments30 min readLW link

Re­sults from the Tur­ing Sem­i­nar hackathon

7 Dec 2023 14:50 UTC
29 points
1 comment6 min readLW link

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC
27 points
0 comments19 min readLW link
(docs.google.com)

aisafety.info, the Table of Content

Charbel-Raphaël31 Dec 2023 13:57 UTC
23 points
1 comment11 min readLW link

An Overview of AI risks—the Flyer

17 Jul 2023 12:03 UTC
20 points
0 comments1 min readLW link
(docs.google.com)

AI Safety 101 - Chap­ter 5.2 - Un­re­stricted Ad­ver­sar­ial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC
17 points
0 comments19 min readLW link

AI Safety 101 - Chap­ter 5.1 - Debate

Charbel-Raphaël31 Oct 2023 14:29 UTC
14 points
0 comments13 min readLW link

Easy fix­ing Voting

Charbel-Raphaël2 Oct 2022 17:03 UTC
12 points
2 comments1 min readLW link

[Question] How to im­press stu­dents with re­cent ad­vances in ML?

Charbel-Raphaël14 Jul 2022 0:03 UTC
12 points
2 comments1 min readLW link

New Hackathon: Ro­bust­ness to dis­tri­bu­tion changes and ambiguity

Charbel-Raphaël31 Jan 2023 12:50 UTC
11 points
3 comments1 min readLW link

Open ap­pli­ca­tion to be­come an AI safety pro­ject mentor

Charbel-Raphaël29 Sep 2022 11:27 UTC
10 points
0 comments1 min readLW link
(docs.google.com)