Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
Reid Hoffman used to be on the OpenAI Board, which might be another contributor to the name collision here
I don’t think that makes the analogy bad? I agree that’s a helpful distinction to track, but for people who don’t know about START etc, I think it’s more helpful for them to know than not.
I wonder if parts of this essay were written a few years ago & not updated for publication?
This is the part that most strongly suggests it IMO:
Three years ago, AI struggled with elementary school arithmetic problems and was barely capable of writing a single line of code.
This line links to the GPT-3 paper, which was published in 2020 about a model trained in 2019 - so that’s six years ago, not three.
I also find the specific claims made about ‘three years ago’ to be confusing: Three years ago (early 2023) GPT-4 already existed, which could do pretty hard calculus problems.
And three years ago (again, early 2023), Microsoft Copilot had already been a product for a year and a half (released summer 2021), which certainly was capable of writing lines of code. I’m not sure the exact % of OpenAI employees who used it day-to-day, but it was substantial.
This all leads me to wonder what happened in this particular passage. (I don’t think this is super significant for the impact of the piece overall though.)
Thanks for this—very very interesting document.
One of the hard constraints is (emphasis mine):
Engage or assist any individual group attempting to seize unprecedented and illegitimate degrees of absolute societal, military, or economic control;
Maybe a nitpick, but I suspect that shouldn’t be an ‘and’?
It’s hard for me to imagine what something like ‘unprecedented but legitimate absolute societal/military/economic control’ looks like. (I understand of course that part of the constitution’s intent is for Claude to be less pedantic, and so maybe nits like this don’t matter much.)
Separately, there’s a slight typo, at least on the published version:
other entities.We
JFYI that the footnotes here jump to the right place on Substack; wasn’t sure how to quickly port them to LW, and felt like they made the page a bit cluttered here
By “succeeding” you mean getting Safe ASI, as opposed to getting any ASI at all, right? At least that’s how I read you, but at first I thought you meant “their RSI probably won’t lead to ASI”
(A more extreme example is that AIs can locate a somewhat subtle needle inside of 200k words in a single forward pass while it would probably take a human well over an hour.)
At first I wondered how quickly a human could do this, with tooling?
The thing I was trying to get at is, like, distinguishing reading-speed from reasoning-speed, though in retrospect I think these may not be very separable in this case.
I guess there’s the degenerate case of feeding those words to an AI and saying “what’s the needle?”
I had meant something that still involved human cognition, just with faster rifling through the text. Like maybe a method that embedded the text, and then you could search through it more quickly.
But in retrospect, the “still uses cognition” version is probably just asking the model “What are a few possible needles?” and then using your judgment among the options.
I really appreciate that the charts show which models are frontier; I’d like to see more groups adopt that convention
Yeah I agree that works and feel slightly sheepish not to have already internalized that as the term to use?
I guess there’s still some distinction between an objective as a single thing, vs drives as, like, heuristics that will tend to contribute toward shaping the overall objective? I’m not sure, still feel a bit fuzzy and should probably sit with it more
Maybe instead of talking about AI having ‘goals’ or ‘drives,’ which sound biological, it would be helpful to use a more mechanical-sounding term, like a ‘target state,’ or ‘termination conditions,’ or ‘prioritization heuristics’?
A surprising number of people seem to bounce off the idea of ‘AI being dangerous’ once they encounter anything that feels like anthropomorphism.
My suggestions above are all kind of clunky (and ‘termination conditions’ has an unfortunate collision with The Terminator). But I think the spirit is correct of ‘we should find ways to describe AI’s behavioral tendencies with as little anthropomorphizing as possible’?
Hmm I don’t think so? If you buy land for $X, that’s the floor on what you could reasonably assess it at, which is basically the status quo world. So we’re in the status quo until someone comes along and bids up the price to their willingness-to-pay: Then, the asset either moves to someone who values it more, or you start paying higher taxes on it. I think either branch is preferable to the status quo?
Yup this makes more sense imo, basically having a right of refusal on the sale, but reflecting the now-assessed-higher tax rate
“should you be trying to dispose of all of your money before the singularity makes it worthless”
This is pretty different than my model of what would happen? Though I admittedly haven’t spent a ton of time thinking through it. I just don’t see why money would lose value though; I expect that some goods would still remain scarce, positional, etc (land in high-demand cities being a strong example), which would seem to cut against that happening?
Maybe I’ve been misusing it or seeing it misused, but I thought it meant something more like “called a thing ahead of time” or “made a good prediction” and therefore treated as more credible in the future?
Presumably you’d still feel productivity effects from not having a monitor, having worse ergonomics, etc?
I was surprised to see you say above that you’d anticipate flying way more often! Are there times you’ve wanted to fly recently but held off because you couldn’t spare the lost hours of flying? (I would have expected the bigger barrier to be the loss of productive hours from, say, being out-of-the-office in the destination itself)
I’ve been wondering about this in terms of my own writing, whether I should be working on multiple pieces at once to a greater degree than I am. Thinking aloud a bit:
I guess part of the question is, what are the efficiency effects of batch-processing, vs the more diluted feedback signal from multiple ‘coming off the production line’ at once? Though in my case, I’d probably still stagger the publication, and so maybe that’s less of a concern (though there may still be some dilution from having shallower focus on each piece-in-process).
Thanks for sharing this—jfyi I interpretted the title differently than I think you meant it? More like you were saying “You should do multiple of a thing at once, but not too many.”
Whereas I now think you mean something more like “It’s best if you can do one of a thing at a time,” which doesn’t code to me as a small batch (because one-at-a-time seems non-batchy). With constraints, of course, that sometimes a pure one-at-a-time isn’t doable.
FWIW I’m pretty doubtful of this point about it being weird / or even anyone noticing or caring?
Like, for someone not going into politics, what’s the world in which their $3500 donations to a few AI safety-centric candidates ends up causing fallout? It seems pretty unlikely to me, but maybe I’ve misunderstood the concern
I believe they aren’t taking more witnesses unfortunately :/