Loosely construed, the foundings of DeepMind, OpenAI, and Anthropic were the result of x-risk capacity-building. If you count those (and it’s not clear that you should), then capacity-building has probably been strongly net negative to date.
MichaelDickens
My big concern is, to what extent do capacity-building efforts create pipelines into AI capabilities work that shortens timelines? This happens to at least some extent, but I don’t know how big an effect it is.
I would rather say that you should stop them in whatever way is most effective. If a peaceful compromise gets them to stop, do that. If forcing them to stop works better, do that. There is nothing morally wrong with trying to stop someone who’s trying to kill you, but that doesn’t mean it’s your best strategy.
The exact quote is
The safety benefits to the world of other labs adopting better alignment techniques should outweigh the risks to Anthropic’s commercial advantage. (Except insofar as Anthropic’s plan is to win the race to superintelligence and take over the world [...])
I interpreted this as saying that if Anthropic takes over the world, then the prior sentence is false, because in that case other labs’ safety wouldn’t matter. I didn’t interpret this as saying Anthropic definitely wants to take over the world.
It looks like the “All” section on my profile does not actually show all my comments. I was looking for a comment I wrote a while ago (specifically this one) and I couldn’t find it on my profile. I managed find it by hovering over the GitHub-style dots one at a time, but those are not searchable like a profile page. I wish the “All” section would show the text of all my comments.
I think it’s orthogonal to utilitarianism? Any branch of ethics (consequentialism, deontology, virtue ethics) can assign value to preventing animal suffering and can, in principle, allow for cross-species comparisons. Not every version of ethical theory allows for cross-species comparisons, but there are versions of deontology, virtue ethics, and non-utilitarian consequentialism that allow for it.
The most obvious example is “intuitionist deontology”, which holds that preventing more suffering is generally better than preventing less suffering, even though preventing suffering is supererogatory.
So far, AI safety companies have a very bad track record of sticking to their guns on safety. They almost inevitably devolve into making things worse. Not to say it’s impossible for a good AI safety startup to exist, but I would be EXTREMELY wary of any startup in the space, and they would need to clear a very high bar to demonstrate their integrity and why their “AI safety for profit” plan is going to work when it failed so many times before.
do these people spend most of their days doing neither deep work
My rough guess is that only 1–5% of jobs involve deep work. Something like 1⁄3 of jobs are manufacturing and 1⁄3 are service/retail, none of which involve deep work.
nor being in social situations that would be rude to suddenly step away from?
I have met many people who believe a phone call takes priority over all other forms of social interaction, for some reason.
(My preferences are the same as yours FWIW.)
Anthropic refused to help build fully autonomous weapons or conduct domestic surveillance.
Previously, a DOW representative said Claude is the best AI model.
Therefore, presumably, DOW would only entertain a new deal with OpenAI if DOW would be allowed to use ChatGPT for surveillance + autonomous weapons. If ChatGPT has the same restrictions as Claude, then there would be no reason for DOW to use ChatGPT.
The main thing that saddened me about this post isn’t Anthropic breaking and weakening its commitments—that was expected to happen. It’s that Holden seems to be adopting the same shirking-of-responsibility stance on government regulations that Dario has been taking for a while.
That sounds pretty much true to me for two reasons:
Elected officials will tend to campaign on positions that they actually believe, because being honest is easier than lying.
The liars tend to care more about holding office than about their personal vision of what policies the government should have. If they lie to get elected, they still have to vote for what their constituents want (on high-salience issues), or else they won’t get re-elected.
I sort of expect the actual alignment researchers at the actual labs to understand this, and not naively accept such responses at face value. but then why do they keep using prompts like this and naively accepting the results?
I don’t have any special knowledge about this situation, but there have been many times in my life where I saw an expert doing something that seemed obviously silly, and I thought, they’re an expert, surely they know what they’re doing and I’m just missing something. But then it turned out that no, they didn’t know what they were doing.
(The default outcome of the expert turning out to be right and me being wrong has also happened plenty.)
In the end, it works. 112 lawmakers supported our campaign in little more than a year. And it looks like things will only snowball from here.
As of a year ago, I thought DIP-style work was our best hope to avoid extinction, but only as a Hail Mary. ControlAI’s success in this regard is a big positive update that it’s not as much of a long shot as I’d thought. Still a long way to go but you’ve far exceeded my expectations so far. Keep up the good work.
I think there’s value in making the same argument in different ways, iterating and trying to find the best version of it. If someone does an especially good job arguing for a position that I’ve already seen argued before, I don’t mind upvoting it.
I’m not sure how exactly this fits in to the discussion, but I feel it is worth mentioning that all plausible moral systems ascribe value to consequences. If you have two buttons where button A makes 100 people 10% happier, and button B makes 200 people 20% happier, and there are no other consequences, then any sane version of deontology/virtue ethics says it’s better to push button B.
So e.g. if your virtue ethics AI predictably causes bad consequences, then you can be a staunch virtue ethicist and still believe that this AI is bad.
I don’t use tax filing software myself so take this with a grain of salt, but I’d think the best answer is to use tax filing software (FreeTaxUSA, TurboTax, etc.). The income tax rules are simple, but coupled with a very very long list of exceptions, most of which don’t apply to you. If a book or website tried to list out the most useful exceptions, most of them would still be useless to you personally. Tax filing software uses a decision tree to figure out which exceptions are relevant.
Tax filing software is annoying to use so I wish I had a better answer, but AFAIK there isn’t one.
(If I don’t use tax filing software, how did I learn which exceptions I need to know about? I dunno, I just kinda learned them over time. But I also have a lot of blind spots.)
The old-school AI safety folks are way more into weird bullet-biting than the animal welfare people, and I can’t think of a single one who would think that conscious and sentient beings should be tortured or who would fail to engage seriously with the question of whether or not nonhuman animals are conscious or sentient beings.
One example immediately comes to mind: Eliezer Yudkowsky, the epitome of old-school AI safety person, is highly confident that animals have no moral patienthood. (Which isn’t the same the claim you made but it’s strongly related.)
The AI safety community really is what you get when you care about sentient beings and then on top of that think ASI and the far future are a big deal.
I don’t think this is true either? Maybe people are doing the right thing in private, but in public I hardly ever see AI safety people grapple with what AI alignment means for non-human welfare; and when animal welfare concerns are brought up, I routinely see AI safety people dismiss the concerns out of hand, a la “some humans care about animals, therefore an aligned AI will also care about animals.” I think that argument is probably correct, but the stakes of being wrong are extremely high so I am not satisfied with that sort of surface-level argument; and I think most AI safety people put an unjustified level of confidence in shallow arguments like that.
ETA: I should say that I do absolutely see some AI safety folks who give appropriate care to animal welfare concerns, it’s just not typical IME.
In traditional AI safety, we think about aligning AIs, but it might be more tractable to simply increase the odds that AIs take animal welfare seriously,[7] for example by ensuring their specs/constitutions include it, creating benchmarks to incentivize model developers to train for it, or providing model developers with relevant data to train on.[8]
I agree that all those things are good ideas, and are plausible candidates for the most cost-effective thing in the world to do at current margins.
the constitution (which also seems largely AI-written)
What makes you think that?
The first three books took about a year each; books four and five took 5–6 years. Fitting those numbers to a power law curve implies that the sixth book will take 25 years. Only 10 more years to go!
I think I care less than you do about this question. My reasoning is: if there’s high uncertainty about the difficulty of alignment, then we should behave as if alignment is hard. Therefore, we should behave as if alignment is hard.
There’s an asymmetric payoff to being wrong. If you assume alignment is hard when it’s easy, then you unnecessarily delayed the singularity, which has some opportunity cost (and runs some risk that we go extinct from something unrelated to AI in the meantime). If you assume alignment is easy when it’s hard, then everyone dies. The downside to being wrong in the latter case is worst. Therefore, we should behave as if alignment is hard (or at least skew our behavior in that direction).