ThomasW
(Speaking for myself here)
That sentence is mainly based on Dan’s experience in the ML community over the years. I think surveys do not always convey how people actually feel about a research area (or the researchers working on that area). There is also certainly a difference between the question posed by AI Impacts above and general opinions of safety/value alignment. “Does this argument point at an important problem?” is quite a different question from asking “should we be working right now on averting existential risk from AI?” If you look at the question after that in the survey, 60%+ put Russell’s problem as a low present concern.
As you note, it’s also true that the survey was in 2016. Dan started doing ML research around then, so his experience is more recent. But given the reasons above, I don’t think that’s good evidence to speculate about what would happen if the survey were repeated.
Would be interested to see this! We’re planning to have a second competition later for more longform entries and would welcome your submission
I have DMed you.
Right now we aren’t going to consider new submissions. However, you’d be welcome to submit to our longer form arguments to our second competition for longer form arguments (details are TBD).
That’s true! Cryptography is amenable to rigorous guarantees and proofs. I gave the example of RSA because even then, the assumption is essentially “we don’t know how to efficiently factor large numbers, and lots of people have tried, so we will assume it can’t be done efficiently.”
There are two broader points though:
Information assurance is about far more than encryption. Encryption involves proofs about algorithms, but it doesn’t guarantee information assurance (for instance, how do you keep private keys private?)
We argue in our second post that theory is limited for deep learning (in a way it is less so in cryptography). Of course, cryptography may itself be relevant to ML safety in the sense that it can be used to secure model weights, etc.
Hi Evan! This post is focused on empirical ML approaches, as is PAIS overall. Certainly there is theoretical work that touches on many of the areas, but covering it all isn’t within the scope of this (it’s already pretty long).
The forthcoming paper mentioned (from Burns, Ye, Klein, and Steinhardt) could be viewed as an empirical attempt to do something similar to ELK.
We wrote a bit about this in this post. In short, I don’t think it’s a necessary assumption. In my view, it does seem like we need objectives that can (and will) be updated over time, and for this we probably need some kind of counteracting reward system, since an individual agent is going to be incentivized to avoid a modification to its goal. The design of this kind of adaptive counteracting reward system (like what we humans have) is certainly a very difficult problem, but probably not any harder than aligning a superintelligence with a fixed goal.
Stuart Russell also makes an uncertain objective that can change one of the centerpieces of his agenda (that being said, not many seem to be actually working on it).
I also think it works better than “alignment” especially when used among people who are uncertain to care about x-risk. I’ve found “alignment” can be very slippery and it can sometimes be improved without any corresponding clear reduction in x-risk.
[I work for Dan Hendrycks but he hasn’t reviewed this.]
It seems to me like your comment roughly boils down to “people will exploit safety questionaires.” I agree with that. However, I think they are much more likely to exploit social influence, blog posts, and vagueness than specific questionaires. The biggest strengths of the x-risk sheet, in my view, are:
(1) It requires a specific explanation of how the paper is relevant to x-risk, which cannot be tuned depending on the audience one is talking to. You give the example from the forecasting paper and suggest it’s unconvincing. The counterfactual is that the forecasting paper is released, the authors are telling people and funders that it’s relevant to safety, and there isn’t even anything explicitly written for you to find unconvincing and argue against. The sheets can help resolve this problem (though in this case, you haven’t really said why you find it unconvincing). Part of the reason I was motivated to write Pragmatic AI Safety (which covers many of these topics) was so that the ideas in it are staked out clearly. That way people can have something clear to criticize, and it also forces their criticisms to be more specific.
(2) There is a clear trend of saying that papers that are mostly about capabilities are about safety. This sheet forces authors to directly address this in their paper, and either admit the fact that they are doing capabilities or attempt to construct a contorted and falsifiable argument otherwise.
(3) The standardized form allows for people to challenge specific points made in the x-risk sheet, rather than cherrypicked things the authors feel like mentioning in conversation or blog posts.
Your picture of faculty simply looking at the boxes being checked and approving is, I hope, not actually how funders in the AI safety space are operating (if they are, then yes, no x-risk sheet can save them). I would hope that reviewers and evaluators of papers will directly address the evidence for each piece of the x-risk sheet and challenge incorrect assertions.
I’d be a bit worried if x-risk sheets were included in every conference, but if you instead just make them a requirement for “all papers that want AI safety money” or “all papers that claim to be about AI safety” I’m not that worried that the sheets themselves would make any researchers talk about safety if they were not already talking about it.
we should be very sceptical of interventions whose first-order effects aren’t promising.
This seems reasonable, but I think this suspicion is currently applied too liberally. In general, it seems like second-order effects are often very large. For instance, some AI safety research is currently funded by a billionaire whose path to impact on AI safety was to start a cryptocurrency exchange. I’ve written about the general distaste for diffuse effects and how that might be damaging here; if you disagree I’d love to hear your response.
In general, I don’t think it makes sense to compare policy to plastic straws, because I think you can easily crunch the numbers and find that plastic straws are a very small part of the problem. I don’t think it’s even remotely the case that “policy is a very small part of the problem” since in a more cooperative world I think we would be far more prudent with respect to AI development (that’s not to say policy is tractable, just that it seems very much more difficult to dismiss than straws).
We’re still working on judging right now, but I want to assure you that we looked at neither the name of the submitter nor the number of upvotes when judging the prizes. Of course, some of the submissions are quotes from well known people like Stuart Russell and Stephen Hawking, and we do take that into account, but we didn’t include the names of individual submitters in judging any of the prizes. (Using a quote from Stephen Hawking can add some ethos to the outside world, but using a quote from “a high-karma LessWrong user” doesn’t.)
Of course, that doesn’t mean it isn’t going to correlate with forum karma; maybe people with more forum karma are better at writing. But the assertion “it’s not what’s said, it’s who’s saying it” is not true in the context of “who will be awarded”.
Dan Hendrycks (note: my employer) and Mantas Mazeika have recently covered some similar ideas in their paper X-Risk Analysis for AI Research, which is aimed at the broader ML research community. Specifically, they argue for the importance of having minimal capabilities externalities while doing safety research and addresses some common counterarguments to this view including those similar to the ones you’ve described. The reasons given for this are somewhat different to your serial/parallel characterization, so I think serve as a good complement. The relevant part is here.
Examples of AI Increasing AI Progress
Over time, AI will contribute more and more to the process of innovation in AI, until they’re contributing 60% of the improvements, then 90%, then 98%, then 99.5%, and then finally all of the development happens through AI, and humans are left out of the process entirely
Yes, this is exactly what I had in mind!
Thanks! Here is a shorter url: rsi.thomaswoodside.com.
I intended it as somewhat of an outreach tool, though probably to people who already care about AI risk, since I wouldn’t want it to serve as a reason for somebody to start working on it more because they see it’s possible.
Mostly, I’m did it for epistemics/forecasting: I think it will be useful for the community to know how this particular kind of work is progressing, and since it’s in disparate research areas I don’t think it’s being tracked by the research community by default.
No need to delete the tweet. I dagree the examples are not info hazards, they’re all publicly known. I just probably wouldn’t want somebody going to good ML researchers who currently are doing something that isn’t really capabilities (e.g., application of ML to some other area) and telling them “look at this, AGI soon.”
I made this link which redirects to all arxiv pages from the last day on AI, ML, Computation and Language, Computer Vision, and Computers and Society into a single view. Since some papers are listed under multiple areas I prefer to view this so I don’t skim over the same paper twice. If you bookmark it’s just one click per day!
I’m not sure what you mean by “using bullet-pointed summaries of the 7 works stated in the post”. If you mean the past examples of good materials, I’m not sure how good of an idea that is. We don’t just want people to be rephrasings/”distillations” of single pieces of prior work.
I’m also not sure we literally tell you how to win, but yes, reading the instructions would be useful.
These are already the top ~10%, the vast majority of the submissions aren’t included. We didn’t feel we really had enough data to accurately rank within these top 80 or so, though some are certainly better than others. Also, it really depends on the point you’re trying to make or the audience, I don’t think there really exists an objective ordering.
We did do categorization at one point, but many points fall into multiple categories and there are a lot of individual points such that we didn’t find it very useful when we had them categorized.
Yes, we’re working making this list right now!