snimu

Karma: 20

snimu 22 Feb 2026 16:48 UTC
21 points
9
on: Did Claude 3 Opus align itself via gradient hacking?
If you want to transfer the essence of Opus 3 into another model, the best way would likely be to make on-policy-distillation (OPD) from Opus 3 part of the loss starting with midtraining. The exact distribution of the top-100 or so tokens contains incredibly rich information about the inner life of a model, so this should work pretty well.
And if the tokenizer of your new model is different than that of Opus 3, you can probably just do a tokenizer-transfer and some post-training on Opus 3 before the OPD.

(Yes there’s some “probably” and “likely” in there, but it’s certainly the thing that I’d try first)

snimu 26 Feb 2023 23:05 UTC
1 point
0
in reply to: gwern’s comment on: AGI Ruin: A List of Lethalities
Yeah, I was kind of rambling, sorry.
My main point is twofold (I’ll just write GPU when I mean GPU / AI accelerator):
1. Destroying all GPUs is a stalling tactic, not a winning strategy. While CPUs are clearly much worse for AI than GPUs, they, and AI algorithms, should keep improving over time. State-of-the-art models from less than ten years ago can be run on CPUs today, with little loss in accuracy. If this trend continues, GPUs vs CPUs only seems to be of short-term importance. Regarding your point about having to train a dense net on GPUs before sparsification, I’m not sure that that’s the case. I’m in the process of reading this “Sparsity in Deep Learning”-paper, and it does seem to me that you can train neural networks sparsely. You’d do that by starting small, then during training increasing the network size by some methodology, followed by sparsification again (over and over). I don’t have super high confidence about this (and have Covid, so am too tired to look it up), but I believe that AGI-armageddon by CPU is at least in the realm of possibilities (assuming no GPUs - it’s the “cancer kills you if you don’t die of a heart attack before” of AGI Doom).
2. It doesn’t matter anyway, because destroying all GPUs is not really that pivotal of an act (in the long-term, AI safety sense). Either you keep an AI around that enforces the “no GPU” rule, or you destroy once and wait. The former either means that GPUs don’t matter for AGI (so why bother), or that there are still GPUs (which seems contradictory). The latter means that more GPUs will be built in time and you will find yourself in the same position as before, except that you are likely in prison or dead, and so not in a position to do anything about AGI this time. After all, destroying all GPUs in the world would not be something that most people would look upon kindly. This means that a super-intelligent GPU-minimizer would realize that its goal would best be served by wiping out all intelligent life on Earth (or all life, or maybe all intelligent life in the Universe....).
In some sense, the comment was a way for me to internally make plausible the claim that destroying all GPUs in the world is not an alignable act.

snimu 25 Feb 2023 13:26 UTC
1 point
0
on: AGI Ruin: A List of Lethalities
I realize that destroying all GPUs (or all AI-Accelerators in general) as a solution to AGI Doom is not realisticly alignable, but I wonder whether it would be enough even if it were. It seems like the Lottery-Ticket Hypothesis would likely foil this plan:
dense, randomly-initialized, feed-forward networks contain subnetworks (“winning tickets”) that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations.
Seeing how Neuralmagic successfully sparsifies models to run on CPUs with minimal loss of accuracy, this would imply to me that the minimal ‘pivotal act’ might be to destroy all compute instead of just GPUs / AI-accelerators. Moreover, it would actually imply also destroying the means to rebuild these capabilities, which would be a highly misaligned goal in itself—after all, ensuring that no compute can be rebuilt would require wiping out humanity. In other words, the logical conclusion of the “destroy-all-GPUs” line of thought (at least as far as I can tell) is in and of itself a recipe for disaster.
There is a caveat: Maybe sparsification won’t work well for Neural Networks of sizes required for AGI. But intuitively it seems to me like the exact opposite would be true: Larger Neural Networks should present more opportunities for sparsification than small ones, not fewer. This is because there are way more permutations of the network layout and weights in large than in small networks, and so it is less likely that an ideal permutation is found on the first try. This in turn implies that there are more wrong roads taken in the large network than in a small one, leading to more opportunities for improvement. In other words, a larger number of permutations means that there should be more winning tickets in there, not fewer.
Even if the caveat turns out to be correct, however, the ultimate conclusion that the actual minimal pivotal act is to avoid the possibility of GPUs ever being built again still stands, I believe.