RSS

Paul Colognese

Karma: 390

Personal website

Ex­plain­ing the AI Align­ment Prob­lem to Ti­be­tan Bud­dhist Monks

Paul Colognese7 Mar 2024 9:00 UTC
20 points
3 comments6 min readLW link

Ano­ma­lous Con­cept De­tec­tion for De­tect­ing Hid­den Cognition

Paul Colognese4 Mar 2024 16:52 UTC
24 points
3 comments10 min readLW link

Hid­den Cog­ni­tion De­tec­tion Meth­ods and Bench­marks

Paul Colognese26 Feb 2024 5:31 UTC
22 points
11 comments4 min readLW link

Notes on In­ter­nal Ob­jec­tives in Toy Models of Agents

Paul Colognese22 Feb 2024 8:02 UTC
16 points
0 comments8 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul Colognese20 Oct 2023 14:53 UTC
15 points
0 comments5 min readLW link

[Question] Po­ten­tial al­ign­ment tar­gets for a sovereign su­per­in­tel­li­gent AI

Paul Colognese3 Oct 2023 15:09 UTC
29 points
4 comments1 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

28 Sep 2023 19:30 UTC
69 points
4 comments21 min readLW link

[Linkpost] Fron­tier AI Task­force: first progress report

Paul Colognese7 Sep 2023 19:06 UTC
21 points
0 comments4 min readLW link
(www.gov.uk)

Aligned AI via mon­i­tor­ing ob­jec­tives in Au­toGPT-like systems

Paul Colognese24 May 2023 15:59 UTC
27 points
4 comments4 min readLW link

Towards a solu­tion to the al­ign­ment prob­lem via ob­jec­tive de­tec­tion and eval­u­a­tion

Paul Colognese12 Apr 2023 15:39 UTC
9 points
7 comments12 min readLW link

De­ci­sion Trans­former Interpretability

6 Feb 2023 7:29 UTC
84 points
13 comments24 min readLW link