Matthew Rahtz

Karma: 58

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah and Vlad Mikulik

20 Jul 2023 10:50 UTC

44 points

3 comments2 min readLW link

(arxiv.org)

Specification gaming: the flip side of AI ingenuity

Vika, Vlad Mikulik, Matthew Rahtz, tom4everitt, Zac Kenton and janleike

6 May 2020 23:51 UTC

69 points

9 comments6 min readLW link