David Matolcsi comments on Notes on handling non-concentrated failures with AI control: high level methods and different regimes

David Matolcsi 31 Mar 2025 9:51 UTC
6 points
5
This post was a very dense read, and it was hard for me to digest what the main conclusions were supposed to be. Could you write some concrete scenarios that you think are central examples of schemers causing non-concentrated failures? While reading the post, I never knew what situation to imagine: An AI is doing philosophical alignment research but intentionally producing promising-looking crackpotry? It is building cyber-sec infrastructure but leaving in a lot of vulnerabilities? Advising the President, but having a bias towards advocating for integrating AI into the military?

I think these problems are all pretty different in what approaches are promising in preventing them, so it would be useful to see what you think the most likely non-concentrated failures are, so we can read the post with that in mind.
As another point, you could really write conclusion sections. There are a lot of different points made in the post, and it’s hard to see which are the most important to get across to the reader in your opinion. A conclusion section would help a lot in that.
In general, I think that among all the people I know, you might be the one who has the biggest difference in how good you are at explaining concepts in person, and how bad you are at communicating them in blog posts. (Strangely, your LW comments are also very good and digestible, more similar to your in person communication than to your long-form posts, I don’t know why.) I think it could be high leverage for you to experiment some with making your posts more readable. Using more concrete examples and writing conclusion sections would go a long way in improving your posts in general, but I felt compelled to comment here because this post was especially hard to read without them.
- ryan_greenblatt 1 Apr 2025 15:58 UTC
  13 points
  7
  Parent
  - I probably should have used a running example in this post—this just seems like a mostly unforced error.
  - I considered writing a conclusion, but decided not to because I wanted to spend the time on other things and I wasn’t sure what I would say that was useful and not just a pure restatement of things from earlier. This post is mostly a high level framework + list of considerations, so it doesn’t really have a small number of core points.
  - This post is a relatively low effort post as indicated by “Notes on”, possibly I should have flagged this more.
  - I think comments / in person are easier to understand than my blog posts as I often try to write blog posts that have lots and lots of content which is all grouped together, but without a specific thesis. I typically have either 1 point or a small number of points in comments / in person. Also, it’s easier to write in response to something as there is an assumed level of context already etc.
  - David Matolcsi 1 Apr 2025 17:52 UTC
    1 point
    0
    Parent
    Thanks for the reply. If you have time, I’m still interested in hearing what would be a realistic central example of non-concentrated failure that’s good to imagine while reading the post.