RSS

AI Auditing

TagLast edit: 4 Aug 2025 22:59 UTC by Raemon

Formerly “auditing games”

Au­tomat­ing Au­dit­ing: An am­bi­tious con­crete tech­ni­cal re­search proposal

evhub11 Aug 2021 20:32 UTC
89 points
13 comments14 min readLW link1 review

A trans­parency and in­ter­pretabil­ity tech tree

evhub16 Jun 2022 23:44 UTC
163 points
11 comments18 min readLW link1 review

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
141 points
15 comments13 min readLW link

Au­dit­ing games for high-level interpretability

Paul Colognese1 Nov 2022 10:44 UTC
33 points
1 comment7 min readLW link

[Question] What progress have we made on au­to­mated au­dit­ing?

LawrenceC6 Jul 2024 1:49 UTC
38 points
1 comment1 min readLW link

Put­ting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC
54 points
14 comments2 min readLW link

Towards Align­ment Au­dit­ing as a Num­bers-Go-Up Science

Sam Marks4 Aug 2025 22:30 UTC
121 points
15 comments6 min readLW link

Hid­den Cog­ni­tion De­tec­tion Meth­ods and Bench­marks

Paul Colognese26 Feb 2024 5:31 UTC
22 points
11 comments4 min readLW link
No comments.