fwiw I agree with most but not all details, and I agree that Anthropic’s commitments and policy advocacy have a bad track record, but I think that Anthropic capabilities is nevertheless net positive, because Anthropic has way more capacity and propensity to do safety stuff than other frontier AI companies.
I wonder what you believe about Anthropic’s likelihood of noticing risks from misalignment relative to other companies, or of someday spending >25% of internal compute on (automated) safety work.
If people work for Anthropic because they’re misled about the nature of the company, I don’t think arguments on whether they’re net-positive have any local relevance.
Still, to reply: They are one of the companies in the race to kill everyone.
Spending compute on automated safety work does not help. If the system you’re running is superhuman, it kills you instead of doing your alignment homework; if it’s not superhuman, it can’t solve your alignment homework.
Anthropic is doing some great research; but as a company at the frontier, their main contribution could’ve been making sure that no one builds ASI until it’s safe; that there’s legislation that stops the race to the bottom; that the governments understand the problem and want to regulate; that the public is informed of what’s going on and what legislation proposes.
Instead, Anthropic argues against regulation in private, lies about legislation in public, misleads its employees about its role in various things.
***
If Anthropic had to not stay at the frontier to be able to spend >25% of their compute on safety, do you expect they would?
Do you really have a coherent picture of the company in mind, where it is doing all the things it’s doing now (such as not taking steps that would slow down everyone), and yet would behave responsibly when it matters most and also pressure not to is the highest?
fwiw I agree with most but not all details, and I agree that Anthropic’s commitments and policy advocacy have a bad track record, but I think that Anthropic capabilities is nevertheless net positive, because Anthropic has way more capacity and propensity to do safety stuff than other frontier AI companies.
I wonder what you believe about Anthropic’s likelihood of noticing risks from misalignment relative to other companies, or of someday spending >25% of internal compute on (automated) safety work.
If people work for Anthropic because they’re misled about the nature of the company, I don’t think arguments on whether they’re net-positive have any local relevance.
Still, to reply: They are one of the companies in the race to kill everyone.
Spending compute on automated safety work does not help. If the system you’re running is superhuman, it kills you instead of doing your alignment homework; if it’s not superhuman, it can’t solve your alignment homework.
Anthropic is doing some great research; but as a company at the frontier, their main contribution could’ve been making sure that no one builds ASI until it’s safe; that there’s legislation that stops the race to the bottom; that the governments understand the problem and want to regulate; that the public is informed of what’s going on and what legislation proposes.
Instead, Anthropic argues against regulation in private, lies about legislation in public, misleads its employees about its role in various things.
***
If Anthropic had to not stay at the frontier to be able to spend >25% of their compute on safety, do you expect they would?
Do you really have a coherent picture of the company in mind, where it is doing all the things it’s doing now (such as not taking steps that would slow down everyone), and yet would behave responsibly when it matters most and also pressure not to is the highest?