Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
What do you mean here by “does not mean anything”?
It seems clear to me that there’s some notion of off-the-record that journalists understand.
This might vary on details, and I agree is probably not legally binding, but it does seem to mean something.
I appreciate the feedback. That’s interesting about the plane vs. car analogy—I tended to think about these analogies in terms of life/casualties, and for whatever reason, describing an internal test-flight didn’t rise to that level for me (and if it’s civilian passengers, that’s an external deployment). I also wanted to convey the idea not just that internal testing could cause external harm, but that you might irreparably breach containment. Anyway, appreciate the explanation, and I hope you enjoyed the post overall!
Scaffolding for sure matters, yup!
I think you’re generally correct that the most-capable version hasn’t been created, though there are times where AI companies do have specialized versions for a domain internally, and don’t seem to be testing these anyway. It’s reasonable IMO to think that these might outperform the unspecialized versions.
Daniel said:
Thanks for doing this, I found the chart very helpful! I’m honestly a bit surprised and sad to see that task-specific fine-tuning is still not the norm. Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like “All of this will be worse than useless if they don’t eventually make fine-tuning an important part of the evals” and everyone was like “yep of course we’ll get there eventually, for now we will do the weaker elicitation techniques.” It is now almost three years later...
The post is now live on Substack, and link-posted to LW:
https://stevenadler.substack.com/p/ai-companies-should-be-safety-testing
I’ve only seen this excerpt, but it seems to me like Jack isn’t just arguing against regulation because it might slow progress—and rather something more like:
“there’s some optimal time to have a safety intervention, and if you do it too early because your timeline bet was wrong, you risk having worse practices at the actually critical time because of backlash”
This seems probably correct to me? I think ideally we’d be able to be cautious early and still win the arguments to be appropriately cautious later too. But empirically, I think it’s fair not to take as a given?
You might find this post interesting and relevant if you haven’t seen it before: https://www.econlib.org/archives/2017/04/iq_with_conscie.html
I’d guess that was “I have a lecture series with her” :-)
I think they mean heuristics for who is ok to dehumanize / treat as “other” or harm
Strong endorse; I was discussing this with Daniel, and my read of various materials is that many labs are still not taking this as seriously as they ought to—working on a post about this, likely up next week!
Very useful post! Thanks for writing it.
is robust to ontological updates
^ I think this might be helped by an example of the sort of ontological update you’d expect might be pretty challenging; I’m not sure that I have the same things in mind as you here
(I imagine one broad example is “What if AI discovers some new law of physics that we’re unaware of”, but it isn’t super clear to me how that specifically collides with value-alignment-y things?)
I appreciate the question you’re asking, to be clear! I’m less familiar with Anthropic’s funding / Dario’s comments, but I don’t think the magnitudes of ask-vs-realizable-value are as far off for OpenAI as your comment suggests?
Eg, If you compare OpenAI’s reported raised at $157B most recently, vs. what its maximum profit-cap likely was in the old (still current afaik) structure.
The comparison gets a little confusing, because it’s been reported that this investment was contingent on for-profit conversion, which does away with the profit cap.
But I definitely don’t think OpenAI’s recent valuation and the prior profit-cap would be magnitudes apart.
(To be clear, I don’t know the specific cap value, but you can estimate it—for instance by analyzing MSFT’s initial funding amount, which is reported to have a 100x capped-profit return, and then adjust for what % of the company you think MSFT got.)
(This also makes sense to me for a company in a very competitive industry, with high regulatory risk, and where companies are reported to still be burning lots and lots of cash.)
If the companies need capital—and I believe that they do—what better option do they have?
I think you’re imagining cash-rich companies choosing to sell portions for dubious reasons, when they could just keep it all for themselves.
But in fact, the companies are burning cash, and to continue operating they need to raise at some valuation, or else not be able to afford the next big training run.
The valuations at which they are raising are, roughly, where supply and demand equilibriate for the amounts of cash that they need in order to continue operating. (Possibly they could raise at higher valuations from taking on less-scrupulous investors, but to date I believe some of the companies have tried to avoid this.)
Interesting material yeah—thanks for sharing! Having played a bunch of these, I think I’d extend this to “being correctly perceived is generally bad for you”—that is, it’s both bad to be a bad liar who’s known as bad, and bad to be good liar who’s known as good (compared to this not being known). For instance, even if you’re a bad liar, it’s useful to you if other players have uncertainty about whether you’re actually a good liar who’s double-bluffing.
I do think the difference between games and real-life may be less about one-time vs repeated interactions, and more about the ability to choose one’s collaborators in general? Vs teammates generally being assigned in the games.
One interesting experience I’ve had, which maybe validates this: I played a lot of One Night Ultimate Werewolf with a mixed-skill group. Compared to other games, ONUW has relatively more ability to choose teammates—because some roles (like doppelgänger or paranormal investigator, or sometimes witch) essentially can choose to join the team of another player.
Suppose Tom was the best player. Over time, more and more players in our group would choose actions that made them more likely to join Tom’s team, which was basically a virtuous cycle for Tom: in a given game, he was relatively more likely to have a larger number of teammates—and # teammates is a strong factor in likelihood of winning.
But, this dynamic would have applied equally in a one-time game I think, provided people knew this about Tom and still had a means of joining his team.
I feel for the position you’re in—I wish I had more that is useful to say. I also worry about future career prospects, what a world looks like where people can’t find work, etc. I think it’s really understandable to be feeling concerned
If I were in your position, I’d try to separate out “Should I go into something like a trade?” from “And if so, should I leave college now?” If you think you’d enjoy a trade, that does strike me as a reasonable career to choose (which might or might not mean leaving college). At least in the US there’s pretty good money to be made by being a small business owner of a reliable trade service, or so is my impression. Note that this is different of course than being new to the trade and reasonably might take a while to transition over (not sure exactly how long), but many trades are undersupplied even at a worker level (again in the US)
I think there’s a broader question to consider here, which is “what are your values/goals for life”, both professionally and personally. If your preferred lens were social impact, that might look different than if you’re eg just trying to live a happy enough, stable enough life with the people you love. I don’t have great advice here, but I wonder if you’ve looked over resources like 80,000 Hours in terms of thinking about career choice?