ryan_greenblatt comments on The Problem

ryan_greenblatt 10 Aug 2025 22:12 UTC
7 points
3
This comment is responding just to the paragraph arguing I’m wrong due to the context of “narrow domains”. I feel like the arguments in that paragraph are very tenuous. This doesn’t alter the rest of the point you’re making, so I thought I would contain this in a separate comment.

Now, responding to the first part of that paragraph:

On another level, I think you’re wrong. Note the use of the word “narrow domains” in the sentence before the one you quote.

For context the full original bullet from the text is:

Overestimating human cognitive ability, relative to what’s possible. Even in the absence of feedback loops, AI systems routinely blow humans out of the water in narrow domains. As soon as AI can do X at all (or very soon afterwards), AI vastly outstrips any human’s ability to do X. This is a common enough pattern in AI, at this point, to barely warrant mentioning. It would be incredibly strange if this pattern held for every skill AI is already good at, but suddenly broke for the skills AI can’t yet match top humans on, such as novel science and engineering work.

I interpreted the sentence about “narrow domains” as being separate from the sentence starting with “As soon as AI can do X at all”. In other words, I interpreted “X” to refer to the broad class of things where AIs have started being able to do that “thing”, rather than just things in narrow domains.

I think my interpretation is a very natural reading of this bullet. I think the natural reading is like: first there is a point about AIs often being much better than humans in narrow domains (and this not requiring feedback loops like recursive self-improvement (or AI R&D automation)), then there is a second point that AIs very quickly or immediately are much better than humans as soon as they can do something at all regardless of what that thing is (implicitly, as soon as they are within the human range for some reasonable sense of this), and finally there is a third point that this will apply to skills/domains where AIs are currently worse than humans like science and engineering.

Here are some more specific reasons why I think my interpretation is natural:
- I thought the reason the bullet starts by talking about “narrow domains” was to argue that AI being much better than humans is more common than you might think (the argument is that it is routine in narrow domains).
- The text says “It would be incredibly strange if this pattern held for every skill AI is already good at”. The use of “every skill” implies that this doesn’t just apply to skills in narrow domains, but rather everything that AI is already good at.
- The text ends up talking about “novel science and engineering work”, so it’s implicitly trying to generalize to things which aren’t narrow domains. I assumed that the generalization happened before this.
I would be surprised if a substantial fraction of people reading this bullet assume that X is only considering the category of narrow domains.

The rest of the paragraph says:

Playing Go is a reasonable choice of “narrow domain,” but detecting dogs is an even better one. Suppose that you want to detect dogs for a specific task where you need <10% accuracy, and skilled humans have ~5% accuracy, when trying (ie its comparable to ImageNet). If you need <10%, then AlexNet is not able to do that narrow task! It is not “at all.” Maybe GoogLeNet counts (in 2014) or maybe Microsoft’s ResNet (in 2015). At this point you have a computer system with comparable ability to a human that is skilled at the task and trying to do it. Is AI suddenly able to vastly outstrip human ability? Yes! The AI can identify images faster, more cheaply, and with no issues of motivation or fatigue.

This feels like a very gerrymandered sense of “As soon as AI can do X at all”. Yes, I agree with the claim that “As soon as AI can fully automate some specific task humans would otherwise do X (or very soon afterwards), AI will be much cheaper and faster at this task than humans (while removing annoyances associated with working with humans)”. But this is a very unnatural way to interpret “As soon as AI can do X at all”! This seems especially true because I interpret the broader point as arguing for superhuman qualitative ability rather than superhuman cost and speed. (And in general, I interpret MIRI as often arguing for rapidly reaching qualitatively superhuman ability.)

Sorry to respond in such detail, but this paragraph felt tenuous to me, so I thought it would be useful to push back in detail for local validity reasons.