johnswentworth comments on johnswentworth’s Shortform

johnswentworth 16 Apr 2025 15:28 UTC
8 points
0
Sure, they are more-than-zero helpful. Heck, in a relative sense, they’d be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.
One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.
When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.
Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.
- Thane Ruthenis 16 Apr 2025 16:25 UTC
  7 points
  1
  Parent
  The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most
  … or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
  How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
  To what extent are the three AGI labs alive vs. dead players, then?
  - OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
  - Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
  - DeepMind is even now consistently outputting some interesting non-LLM research.
  I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
  This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
  One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
  But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
  On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.
- tlevin 21 Apr 2025 4:14 UTC
  3 points
  1
  Parent
  I disagree that the default would’ve been that the board would’ve been “easy for the labs to capture” (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn’t have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).
  Also, I’m confused when policy skeptics say things like “sure, it might slow down timelines by a factor of 2-3, big deal.” Having 2-3x as much time is indeed a big deal!
  - johnswentworth 21 Apr 2025 5:33 UTC
    2 points
    0
    Parent
    Probably not going to have a discussion on the topic right now, but out of honest curiosity: did you read the bill?
- Charbel-Raphaël 17 Apr 2025 17:45 UTC
  3 points
  0
  Parent
  I’m glad we agree “they’d be one of the biggest wins in AI safety to date.”
  “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that
  How so? It’s pretty straightforward if the model is still contained in the lab.
  “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all
  I think ticking boxes is good. This is how we went to the Moon, and it’s much better to do this than to not do it. It’s not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.
  we simply do not have a way to reliably tell which models are and are not dangerous
  How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: “A model might sandbag.” Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn’t be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.