TBC, I’m definitely NOT thinking of this as an argument for funding AI safety.
I’m definitely interested in hearing other ways of splitting it up! This is one of the points of making this post. I’m also interested in what you think of the ways I’ve done the breakdown! Since you proposed an alternative, I guess you might have some thoughts on why it could be better :)
I see your points as being directed more at increasing ML researchers respect for AI x-risk work and their likelihood of doing relevant work. Maybe that should in fact be the goal. It seems to be a more common goal.
I would describe my goal (with this post, at least, and probably with most conversations I have with ML people about Xrisk) as something more like: “get them to understand the AI safety mindset, and where I’m coming from; get them to really think about the problem and engage with it”. I expect a lot of people here would reason in a very narrow and myopic consequentialist way that this is not as good a goal, but I’m unconvinced.
TBC, it’s an unconference, so it wasn’t really a talk (although I did end up talking a lot :P).
How sure are you that the people who showed up were objecting out of deeply-held disagreements, and not out of a sense that objections are good?
Seems like a false dichotomy. I’d say people were mostly disagreeing out of not-very-deeply-held-at-all disagreements :)
Another important improvement I should make: rephrase these to have the type signature of “heuristic”!
No, my goal is to:
Identify a small set of beliefs to focus discussions around.
Figure out how to make the case for these beliefs quickly, clearly, persuasively, and honestly.
And yes, I did mean >1%, but I just put that number there to give people a sense of what I mean, since “non-trivial” can mean very different things to different people.
Oh sure, in some special cases. I don’t this this experience was particularly representative.
Yeah I’ve had conversations with people who shot down a long list of concerned experts, e.g.:
Stuart Russell is GOFAI ==> out-of-touch
Shane Legg doesn’t do DL, does he even do research? ==> out-of-touch
Ilya Sutskever (and everyone at OpenAI) is crazy, they think AGI is 5 years away ==> out-of-touch
Anyone at DeepMind is just marketing their B.S. “AGI” story or drank the koolaid ==> out-of-touch
But then, even the big 5 of deep learning have all said things that can be used to support the case....
So it kind of seems like there should be a compendium of quotes somewhere, or something.
A few questions and comments:
Why the arrow from “agentive AI” to “humans are economically outcompeted”? The explanation makes it sounds like it should point to “target loading fails”??
Suggestion: make the blue boxes without parents more apparent? e.g. a different shade of blue? Or all sitting above the other ones? (e.g. “broad basin of corrigibility” could be moved up and left).
I pushed this post out since I think it’s good to link to it in this other post. But there are at least 2 improvements I’d like to make and would appreciate help with:
Is there a better reference for ” a number of experts have voiced concerns about AI x-risk ”? I feel like there should be by now...
I just realized it would be nice to include examples where these heuristics lead to good judgments.
The link to user preferences is broken. Is there still this feature built-in? Or does the firefox thing still work?
Can you give a concrete example for why the utility function should change?
I couldn’t say without knowing more what “human safety” means here.
But here’s what I imagine an example pivotal command looking like: “Give me the ability to shut-down unsafe AI projects for the foreseeable future. Do this while minimizing disruption to the current world order / status quo. Interpret all of this in the way I intend.”
OK, I think that makes some sense.
I dont know how I’d fill out the row, since I don’t understand what is covered by the phrase “human safety”, or what assumptions are being made about the proliferation of the technology, or more specifically, the characteristics of the humans who do possess the tech.
I think I was imagining that the pivotal tool AI is developed by highly competent and safety-conscious humans who use it to perform a pivotal act (or series of pivotal acts) that effectively precludes the kind of issues mentioned in Wei’s quote there.
Linda organized it as two 2 day unconferences held back-to-back
Can you explain how that is different from a 4-day unconference, more concretely?
I think the workshop would be a valuable use of three days for anyone actively working in AI safety, even if they consider themselves “senior” in the field: it offered a valuable space for reconsidering basic assumptions and rediscovering the reasons why we’re doing what we’re doing.
This read to me as a remarkably strong claim; I assumed you meant something slightly weaker. But then I realized you said “valuable” which might mean “not considering opportunity cost”. Can you clarify that?
And if you do mean “considering opportunity cost”, I think it would be worth giving your ~strongest argument(s) for it!
For context, I am a PhD candidate in ML working on safety, and I am interested in such events, but unsure if they would be a valuable use of my time, and OTTMH would expect most of the value to be in terms of helping others rather than benefitting my own understanding/research/career/ability-to-contribute (I realize this sounds a bit conceited, and I didn’t try to avoid that except via this caveat, and I really do mean (just) OTTMH… I think the reality is a bit more that I’m mostly estimating value based on heuristics). If I had been in the UK when they happened, I would probably have attended at least one.
But I think I am a bit unusual in my level of enthusiasm. And FWICT, such initiatives are not receiving much resources (including money and involvement of senior safety researchers) and potentially should receive A LOT more (e.g. 1-2 orders of magnitude). So the case for them being valuable (in general or for more senior/experienced researchers) is an important one!
Does an “AI safety success story” encapsulate just a certain trajectory in AI (safety) development?
Or does it also include a story about how AI is deployed (and by who, etc.)?
I like this post a lot, but I think it ends up being a bit unclear because I don’t think everyone has the same use cases in mind for the different technologies underlying these scenarios, and/or I don’t think everyone agrees with the way in which safety research is viewed as contributing to success in these different scenarios… Maybe fleshing out the success stories, or referencing some more in-depth elaborations of them would make this clearer?