I think it’s possible that the government officials believe what they’re saying, but don’t understand that we’ve been in the regime of ‘deployed models can be jailbroken cheaply and safeguards are brittle / an unsolved technical problem’ this entire time. Plus (press around) mythos-class capabilities invited heightened scrutiny.
The first-order common-sense thing regarding prosaic harms has been ‘stop releasing new models’ since ~Opus 4.
[not saying that’s my position; just that it was the point at which a prima facie common sense document was modified to accommodate increased capabilities that lacked concomitant (prosaic) safety guarantees, which seems like a pretty good place to draw the line of ‘if regular people understood what was happening, even without the x-risk piece, they would lose their minds’]
So now a larger number of people are catching up, or feel more accountable to a public that is catching up, so they’re inconsistently applying a basically reasonable (outside view) standard: if it’s dangerous for a model to be jailbroken, develop better protections, or don’t release the model!
..and then they read Anthropic saying similar things about OAI models as either adversarial or too embarrassing to admit, so they’re inconsistently applying the standard.
This hypothesis predicts the kind of regime Nikola is talking about. The coming wave of releases, and degree of state involvement, will be very telling!
Agreed. I know a Trump-fan who is confident that AI companies will implement a “top level command” that will protect against the risks that Eliezer warns about. He dismisses any opinions that look like they come from left-of-center sources as fake news. The White House likely employs some similar people. But I feel confused as to what role Bessent is playing here.
I think it’s possible that the government officials believe what they’re saying, but don’t understand that we’ve been in the regime of ‘deployed models can be jailbroken cheaply and safeguards are brittle / an unsolved technical problem’ this entire time. Plus (press around) mythos-class capabilities invited heightened scrutiny.
The first-order common-sense thing regarding prosaic harms has been ‘stop releasing new models’ since ~Opus 4.
[not saying that’s my position; just that it was the point at which a prima facie common sense document was modified to accommodate increased capabilities that lacked concomitant (prosaic) safety guarantees, which seems like a pretty good place to draw the line of ‘if regular people understood what was happening, even without the x-risk piece, they would lose their minds’]
So now a larger number of people are catching up, or feel more accountable to a public that is catching up, so they’re inconsistently applying a basically reasonable (outside view) standard: if it’s dangerous for a model to be jailbroken, develop better protections, or don’t release the model!
..and then they read Anthropic saying similar things about OAI models as either adversarial or too embarrassing to admit, so they’re inconsistently applying the standard.
This hypothesis predicts the kind of regime Nikola is talking about. The coming wave of releases, and degree of state involvement, will be very telling!
Agreed. I know a Trump-fan who is confident that AI companies will implement a “top level command” that will protect against the risks that Eliezer warns about. He dismisses any opinions that look like they come from left-of-center sources as fake news. The White House likely employs some similar people. But I feel confused as to what role Bessent is playing here.