RSS

paulfchristiano(Paul Christiano)

Karma: 26,999

Ma­trix com­ple­tion prize results

paulfchristiano20 Dec 2023 15:40 UTC
40 points
0 comments2 min readLW link
(www.alignment.org)

Thoughts on re­spon­si­ble scal­ing poli­cies and regulation

paulfchristiano24 Oct 2023 22:21 UTC
215 points
33 comments6 min readLW link

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristiano31 Jul 2023 16:04 UTC
191 points
34 comments11 min readLW link

Self-driv­ing car bets

paulfchristiano29 Jul 2023 18:10 UTC
229 points
41 comments5 min readLW link
(sideways-view.com)

ARC is hiring the­o­ret­i­cal researchers

12 Jun 2023 18:50 UTC
126 points
12 comments4 min readLW link
(www.alignment.org)

Prizes for ma­trix com­ple­tion problems

paulfchristiano3 May 2023 23:30 UTC
163 points
51 comments1 min readLW link
(www.alignment.org)

My views on “doom”

paulfchristiano27 Apr 2023 17:50 UTC
239 points
33 comments2 min readLW link
(ai-alignment.com)

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
60 points
7 comments47 min readLW link

Thoughts on the im­pact of RLHF research

paulfchristiano25 Jan 2023 17:23 UTC
234 points
101 comments9 min readLW link

Can we effi­ciently dis­t­in­guish differ­ent mechanisms?

paulfchristiano27 Dec 2022 0:20 UTC
88 points
30 comments16 min readLW link
(ai-alignment.com)

Three rea­sons to cooperate

paulfchristiano24 Dec 2022 17:40 UTC
82 points
14 comments10 min readLW link
(sideways-view.com)

Can we effi­ciently ex­plain model be­hav­iors?

paulfchristiano16 Dec 2022 19:40 UTC
64 points
3 comments9 min readLW link
(ai-alignment.com)

AI al­ign­ment is dis­tinct from its near-term applications

paulfchristiano13 Dec 2022 7:10 UTC
254 points
21 comments2 min readLW link
(ai-alignment.com)

Find­ing gliders in the game of life

paulfchristiano1 Dec 2022 20:40 UTC
100 points
7 comments16 min readLW link
(ai-alignment.com)

Mechanis­tic anomaly de­tec­tion and ELK

paulfchristiano25 Nov 2022 18:50 UTC
133 points
21 comments21 min readLW link
(ai-alignment.com)

De­ci­sion the­ory and dy­namic inconsistency

paulfchristiano3 Jul 2022 22:20 UTC
79 points
33 comments10 min readLW link
(sideways-view.com)

AI-Writ­ten Cri­tiques Help Hu­mans No­tice Flaws

paulfchristiano25 Jun 2022 17:22 UTC
137 points
5 comments3 min readLW link
(openai.com)

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
870 points
219 comments18 min readLW link2 reviews

What is causal­ity to an ev­i­den­tial de­ci­sion the­o­rist?

paulfchristiano17 Apr 2022 16:00 UTC
45 points
26 comments5 min readLW link
(sideways-view.com)

ELK prize results

9 Mar 2022 0:01 UTC
135 points
50 comments21 min readLW link