RSS

Fabien Roger

Karma: 1,194

Bench­marks for De­tect­ing Mea­sure­ment Tam­per­ing [Red­wood Re­search]

5 Sep 2023 16:44 UTC
78 points
6 comments20 min readLW link
(arxiv.org)

When AI cri­tique works even with mis­al­igned models

Fabien Roger17 Aug 2023 0:12 UTC
23 points
0 comments2 min readLW link

Pass­word-locked mod­els: a stress case for ca­pa­bil­ities evaluation

Fabien Roger3 Aug 2023 14:53 UTC
118 points
10 comments6 min readLW link

Sim­plified bio-an­chors for up­per bounds on AI timelines

Fabien Roger15 Jul 2023 18:15 UTC
20 points
4 comments5 min readLW link

LLMs Some­times Gen­er­ate Purely Nega­tively-Re­in­forced Text

Fabien Roger16 Jun 2023 16:31 UTC
175 points
10 comments7 min readLW link