Erik Jenner comments on Natural Abstractions: Key Claims, Theorems, and Critiques

Erik Jenner 29 Dec 2024 17:35 UTC
8 points
0
I think this was a very good summary/distillation and a good critique of work on natural abstractions; I’m less sure it has been particularly useful or impactful.
I’m quite proud of our breakdown into key claims; I think it’s much clearer than any previous writing (and in particular makes it easier to notice which sub-claims are obviously true, which are daring, which are or aren’t supported by theorems, …). It also seems that John was mostly on board with it.
I still stand by our critiques. I think the gaps we point out are important and might not be obvious to readers at first. That said, I regret somewhat that we didn’t focus more on communicating an overall feeling about work on natural abstractions, and our core disagreements. I had some brief back-and-forth with John in the comments, where it seemed like we didn’t even disagree that much, but at the same time, I still think John’s writing about the agenda was wildly more optimistic than my views, and I don’t think we made that crisp enough.
My impression is that natural abstractions are discussed much less than they were when we wrote the post (and this is the main reason why I think the usefulness of our post has been limited). An important part of the reason I wanted to write this was that many junior AI safety researchers or people getting into AI safety research seemed excited about John’s research on natural abstractions, but I felt that some of them had a rosy picture of how much progress there’d been/how promising the direction was. So writing a summary of the current status combined with a critique made a lot of sense, to both let others form an accurate picture of the agenda’s progress while also making it easier for them to get started if they wanted to work on it. Since there’s (I think) less attention on natural abstractions now, it’s unsurprising that those goals are less important.
As for why there’s been less focus on natural abstractions, my guess is a combination of at least:
- John has been writing somewhat less about it than during his peak-NAH-writing.
- Other directions have gotten off the ground and have captured a lot of excitement (e.g. evals, control, and model organisms).
- John isn’t mentoring for MATS anymore, so junior researchers don’t get exposure to his ideas through that.
It’s also possible that many became more pessimistic about the agenda without public fanfare, or maybe my impression of relative popularity now vs then is just off.
I still think very high effort distillations and critiques can be a very good use of time (and writing this one still seems reasonable ex ante, though I’d focus more on nailing a few key points and less on being super comprehensive).