Mateusz Bagiński comments on Anti-Slop Interventions?

Mateusz Bagiński 6 Feb 2025 10:08 UTC
LW: 4 AF: 3
0
AF
I think Abram is saying the following:
- Currently, AIs are lacking capabilities that would meaningfully speed up AI Safety research.
- At some point, they are gonna get those capabilities.
- However, by default, they are gonna get those AI Safety-helpful capabilities roughly at the same time as other, dangerous capabilities (or at least, not meaningfully earlier).
  - In which case, we’re not going to have much time to use the AI Safety-helpful capabilities to speed up AI Safety research sufficiently for us to be ready for those dangerous capabilities.
- Therefore, it makes sense to speed up the development of AIS-helpful capabilities now. Even if it means that the AIs will acquire dangerous capabilities sooner, it gives us more time to use AI Safety-helpful capabilities to prepare for dangerous capabilities.
- Steven Byrnes 6 Feb 2025 13:56 UTC
  LW: 5 AF: 4
  5
  AF Parent
  Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.
  If that’s what you’re doing, then that’s bad. You shouldn’t do it. Like, if AI alignment researchers want AI that produces less slop and is more helpful for AIS, we could all just hibernate for six months and then get back to work. But obviously, that won’t help the situation.
  And a second possibility is, there are ways to make AI more helpful for AI safety that are not simultaneously directly addressing the primary bottlenecks to AI danger. And we should do those things.
  The second possibility is surely true to some extent—for example, the LessWrong JargonBot is marginally helpful for speeding up AI safety but infinitesimally likely to speed up AI danger.
  I think this OP is kinda assuming that “anti-slop” is the second possibility and not the first possibility, without justification. Whereas I would guess the opposite.
  - Mateusz Bagiński 6 Feb 2025 14:07 UTC
    LW: 4 AF: 3
    0
    AF Parent
    Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.
    TBC, I was thinking about something like: “speed up the development of AIS-helpful capabilities by 3 days, at the cost of speeding up the development of dangerous capabilities by 1 day”.
    - Steven Byrnes 6 Feb 2025 15:33 UTC
      LW: 4 AF: 3
      2
      AF Parent
      I think it’s 1:1, because I think the primary bottleneck to dangerous ASI is the ability to develop coherent and correct understandings of arbitrary complex domains and systems (further details), which basically amounts to anti-slop.
      If you think the primary bottleneck to dangerous ASI is not that, but rather something else, then what do you think it is? (or it’s fine if you don’t want to state it publicly)
      What links here?
      Steven Byrnes's comment on Anti-Slop Interventions? by abramdemski (6 Feb 2025 18:36 UTC; 2 points)
      - abramdemski 6 Feb 2025 17:57 UTC
        LW: 6 AF: 4
        3
        AF Parent
        So, rather than imagining a one-dimensional “capabilities” number, let’s imagine a landscape of things you might want to be able to get AIs to do, with a numerical score for each. In the center of the landscape is “easier” things, with “harder” things further out. There is some kind of growing blob of capabilities, spreading from the center of the landscape outward.
        Techniques which are worse at extrapolating (IE worse at “coherent and correct understanding” of complex domains) create more of a sheer cliff in this landscape, where things go from basically-solved to not-solved-at-all over short distances in this space. Techniques which are better at extrapolating create more of a smooth drop-off instead. This is liable to grow the blob a lot faster; a shift to better extrapolation sees the cliffs cast “shadows” outwards.
        My claim is that cliffs are dangerous for a different reason, namely that people often won’t realize when they’re falling off a cliff. The AI seems super-competent for the cases we can easily test, so humans extrapolate its competence beyond the cliff. This applies to the AI as well, if it lacks the capacity for detecting its own blind spots. So RSI is particularly dangerous in this regime, compared to a regime with better extrapolation.
        This is very analogous to early Eliezer observing the AI safety problem and deciding to teach rationality. Yes, if you can actually improve people’s rationality, they can use their enhanced capabilities for bad stuff too. Very plausibly the movement which Eliezer created has accelerated AI timelines overall. Yet, it feels plausible that without Eliezer, there would be almost no AI safety field.
        What links here?
        abramdemski's comment on Anti-Slop Interventions? by abramdemski (6 Feb 2025 18:03 UTC; 2 points)
        Steven Byrnes 6 Feb 2025 18:36 UTC
        LW: 2 AF: 2
        0
        AF Parent
        I’m still curious about how you’d answer my question above. Right now, we don’t have ASI. Sometime in the future, we will. So there has to be some improvement to AI technology that will happen between now and then. My opinion is that this improvement will involve AI becoming (what you describe as) “better at extrapolating”.
        If that’s true, then however we feel about getting AIs that are “better at extrapolating”—its costs and its benefits—it doesn’t much matter, because we’re bound to get those costs and benefits sooner or later on the road to ASI. So we might as well sit tight and find other useful things to do, until such time as the AI capabilities researchers figure it out.
        …Furthermore, I don’t think the number of months or years between “AIs that are ‘better at extrapolating’” and ASI is appreciably larger if the “AIs that are ‘better at extrapolating’” arrive tomorrow, versus if they arrive in 20 years. In order to believe that, I think you would need to expect some second bottleneck standing between “AIs that are ‘better at extrapolating’”, and ASI, such that that second bottleneck is present today, but will not be present (as much) in 20 years, and such that the second bottleneck is not related to “extrapolation”.
        I suppose that one could argue that availability of compute will be that second bottleneck. But I happen to disagree. IMO we already have an absurdly large amount of compute overhang with respect to ASI, and adding even more compute overhang in the coming decades won’t much change the overall picture. Certainly plenty of people would disagree with me here. …Although those same people would probably say that “just add more compute” is actually the only way to make AIs that are “better at extrapolation”, in which case my point would still stand.
        I don’t see any other plausible candidates for the second bottleneck. Do you? Or do you disagree with some other part of that? Like, do you think it’s possible to get all the way to ASI without ever making AIs “better at extrapolating”? IMO it would hardly be worthy of the name “ASI” if it were “bad at extrapolating” :)
      - Mateusz Bagiński 6 Feb 2025 15:55 UTC
        LW: 2 AF: 1
        0
        AF Parent
        If you think the primary bottleneck to dangerous ASI is not that, but rather something else, then what do you think it is?
        So far in this thread I was mostly talking from the perspective of my model(/steelman?) of Abram’s argument.
        I think the primary bottleneck to dangerous ASI is the ability to develop coherent and correct understandings of arbitrary complex domains and systems
        I mostly agree with this.
        Still, this doesn’t^[1] rule out the possibility of getting an AI that understands (is superintelligent in?) one complex domain (specifically here, whatever is necessary to meaningfully speed up AIS research) (and maybe a few more, as I don’t expect the space of possible domains to be that compartmentalizable), but is not superintelligent across all complex domains that would make it dangerous.
        It doesn’t even have to be a superintelligent reasoner about minds. Babbling up clever and novel mathematical concepts for a human researcher to prune could be sufficient to meaningfully boost AI safety (I don’t think we’re primarily bottlenecked on mathy stuff but it might help some people and I think that’s one thing that Abram would like to see).
        ^
        Doesn’t rule out in itself but perhaps you have some other assumptions that imply it’s 1:1, as you say.