ThomasW

Karma: 1,228

Center for AI Safety

ThomasW 17 May 2022 16:46 UTC
1 point
in reply to: Cornelis Dirk Haupt’s comment on: Introducing the ML Safety Scholars Program
Yes, we’re working making this list right now!

ThomasW 26 May 2022 17:21 UTC
1 point
in reply to: joshc’s comment on: A Bird’s Eye View of the ML Field [Pragmatic AI Safety #2]
(Speaking for myself here)
That sentence is mainly based on Dan’s experience in the ML community over the years. I think surveys do not always convey how people actually feel about a research area (or the researchers working on that area). There is also certainly a difference between the question posed by AI Impacts above and general opinions of safety/value alignment. “Does this argument point at an important problem?” is quite a different question from asking “should we be working right now on averting existential risk from AI?” If you look at the question after that in the survey, 60%+ put Russell’s problem as a low present concern.
As you note, it’s also true that the survey was in 2016. Dan started doing ML research around then, so his experience is more recent. But given the reasons above, I don’t think that’s good evidence to speculate about what would happen if the survey were repeated.

ThomasW 28 May 2022 5:24 UTC
1 point
in reply to: SueE ’s comment on: [$20K in Prizes] AI Safety Arguments Competition
Would be interested to see this! We’re planning to have a second competition later for more longform entries and would welcome your submission

ThomasW 28 May 2022 6:12 UTC
2 points
in reply to: SueE ’s comment on: [$20K in Prizes] AI Safety Arguments Competition
I have DMed you.

ThomasW 28 May 2022 15:39 UTC
8 points
in reply to: Gyrodiot’s comment on: [$20K in Prizes] AI Safety Arguments Competition
Right now we aren’t going to consider new submissions. However, you’d be welcome to submit to our longer form arguments to our second competition for longer form arguments (details are TBD).

ThomasW 2 Jun 2022 17:29 UTC
2 points
in reply to: gsastry’s comment on: Perform Tractable Research While Avoiding Capabilities Externalities [Pragmatic AI Safety #4]
That’s true! Cryptography is amenable to rigorous guarantees and proofs. I gave the example of RSA because even then, the assumption is essentially “we don’t know how to efficiently factor large numbers, and lots of people have tried, so we will assume it can’t be done efficiently.”
There are two broader points though:
1. Information assurance is about far more than encryption. Encryption involves proofs about algorithms, but it doesn’t guarantee information assurance (for instance, how do you keep private keys private?)
2. We argue in our second post that theory is limited for deep learning (in a way it is less so in cryptography). Of course, cryptography may itself be relevant to ML safety in the sense that it can be used to secure model weights, etc.

ThomasW 10 Jun 2022 4:00 UTC
3 points
in reply to: Evan R. Murphy’s comment on: Open Problems in AI X-Risk [PAIS #5]
Hi Evan! This post is focused on empirical ML approaches, as is PAIS overall. Certainly there is theoretical work that touches on many of the areas, but covering it all isn’t within the scope of this (it’s already pretty long).
The forthcoming paper mentioned (from Burns, Ye, Klein, and Steinhardt) could be viewed as an empirical attempt to do something similar to ELK.

ThomasW 10 Jun 2022 4:12 UTC
2 points
on: why assume AGIs will optimize for fixed goals?
We wrote a bit about this in this post. In short, I don’t think it’s a necessary assumption. In my view, it does seem like we need objectives that can (and will) be updated over time, and for this we probably need some kind of counteracting reward system, since an individual agent is going to be incentivized to avoid a modification to its goal. The design of this kind of adaptive counteracting reward system (like what we humans have) is certainly a very difficult problem, but probably not any harder than aligning a superintelligence with a fixed goal.
Stuart Russell also makes an uncertain objective that can change one of the centerpieces of his agenda (that being said, not many seem to be actually working on it).

ThomasW 2 Jul 2022 15:41 UTC
7 points
5
in reply to: Andrew_Critch’s comment on: Safetywashing
I also think it works better than “alignment” especially when used among people who are uncertain to care about x-risk. I’ve found “alignment” can be very slippery and it can sometimes be improved without any corresponding clear reduction in x-risk.

ThomasW 3 Jul 2022 20:43 UTC
12 points
2
in reply to: jacobjacob’s comment on: Safetywashing
[I work for Dan Hendrycks but he hasn’t reviewed this.]
It seems to me like your comment roughly boils down to “people will exploit safety questionaires.” I agree with that. However, I think they are much more likely to exploit social influence, blog posts, and vagueness than specific questionaires. The biggest strengths of the x-risk sheet, in my view, are:
(1) It requires a specific explanation of how the paper is relevant to x-risk, which cannot be tuned depending on the audience one is talking to. You give the example from the forecasting paper and suggest it’s unconvincing. The counterfactual is that the forecasting paper is released, the authors are telling people and funders that it’s relevant to safety, and there isn’t even anything explicitly written for you to find unconvincing and argue against. The sheets can help resolve this problem (though in this case, you haven’t really said why you find it unconvincing). Part of the reason I was motivated to write Pragmatic AI Safety (which covers many of these topics) was so that the ideas in it are staked out clearly. That way people can have something clear to criticize, and it also forces their criticisms to be more specific.
(2) There is a clear trend of saying that papers that are mostly about capabilities are about safety. This sheet forces authors to directly address this in their paper, and either admit the fact that they are doing capabilities or attempt to construct a contorted and falsifiable argument otherwise.
(3) The standardized form allows for people to challenge specific points made in the x-risk sheet, rather than cherrypicked things the authors feel like mentioning in conversation or blog posts.
Your picture of faculty simply looking at the boxes being checked and approving is, I hope, not actually how funders in the AI safety space are operating (if they are, then yes, no x-risk sheet can save them). I would hope that reviewers and evaluators of papers will directly address the evidence for each piece of the x-risk sheet and challenge incorrect assertions.
I’d be a bit worried if x-risk sheets were included in every conference, but if you instead just make them a requirement for “all papers that want AI safety money” or “all papers that claim to be about AI safety” I’m not that worried that the sheets themselves would make any researchers talk about safety if they were not already talking about it.

ThomasW 3 Jul 2022 20:55 UTC
13 points
6
in reply to: MathiasKB’s comment on: Safetywashing
we should be very sceptical of interventions whose first-order effects aren’t promising.
This seems reasonable, but I think this suspicion is currently applied too liberally. In general, it seems like second-order effects are often very large. For instance, some AI safety research is currently funded by a billionaire whose path to impact on AI safety was to start a cryptocurrency exchange. I’ve written about the general distaste for diffuse effects and how that might be damaging here; if you disagree I’d love to hear your response.
In general, I don’t think it makes sense to compare policy to plastic straws, because I think you can easily crunch the numbers and find that plastic straws are a very small part of the problem. I don’t think it’s even remotely the case that “policy is a very small part of the problem” since in a more cooperative world I think we would be far more prudent with respect to AI development (that’s not to say policy is tractable, just that it seems very much more difficult to dismiss than straws).

ThomasW 7 Jul 2022 16:33 UTC
2 points
in reply to: RedMan’s comment on: Right now, you’re sitting on a REDONKULOUS opportunity to help solve AGI (and rake in $$$)
We’re still working on judging right now, but I want to assure you that we looked at neither the name of the submitter nor the number of upvotes when judging the prizes. Of course, some of the submissions are quotes from well known people like Stuart Russell and Stephen Hawking, and we do take that into account, but we didn’t include the names of individual submitters in judging any of the prizes. (Using a quote from Stephen Hawking can add some ethos to the outside world, but using a quote from “a high-karma LessWrong user” doesn’t.)
Of course, that doesn’t mean it isn’t going to correlate with forum karma; maybe people with more forum karma are better at writing. But the assertion “it’s not what’s said, it’s who’s saying it” is not true in the context of “who will be awarded”.

ThomasW 15 Jul 2022 16:54 UTC
16 points
0
on: PSA about differential technological development
Dan Hendrycks (note: my employer) and Mantas Mazeika have recently covered some similar ideas in their paper X-Risk Analysis for AI Research, which is aimed at the broader ML research community. Specifically, they argue for the importance of having minimal capabilities externalities while doing safety research and addresses some common counterarguments to this view including those similar to the ones you’ve described. The reasons given for this are somewhat different to your serial/parallel characterization, so I think serve as a good complement. The relevant part is here.

Examples of AI Increasing AI Progress

ThomasW17 Jul 2022 20:06 UTC

107 points

14 comments1 min readLW link

ThomasW 18 Jul 2022 0:40 UTC
4 points
0
in reply to: Matthew Barnett’s comment on: Examples of AI Increasing AI Progress
Over time, AI will contribute more and more to the process of innovation in AI, until they’re contributing 60% of the improvements, then 90%, then 98%, then 99.5%, and then finally all of the development happens through AI, and humans are left out of the process entirely
Yes, this is exactly what I had in mind!

ThomasW 18 Jul 2022 19:00 UTC
4 points
0
in reply to: Vika’s comment on: Examples of AI Increasing AI Progress
Thanks! Here is a shorter url: rsi.thomaswoodside.com.
I intended it as somewhat of an outreach tool, though probably to people who already care about AI risk, since I wouldn’t want it to serve as a reason for somebody to start working on it more because they see it’s possible.
Mostly, I’m did it for epistemics/forecasting: I think it will be useful for the community to know how this particular kind of work is progressing, and since it’s in disparate research areas I don’t think it’s being tracked by the research community by default.

ThomasW 19 Jul 2022 16:10 UTC
3 points
1
in reply to: Vika’s comment on: Examples of AI Increasing AI Progress
No need to delete the tweet. I dagree the examples are not info hazards, they’re all publicly known. I just probably wouldn’t want somebody going to good ML researchers who currently are doing something that isn’t really capabilities (e.g., application of ML to some other area) and telling them “look at this, AGI soon.”

ThomasW 19 Jul 2022 23:19 UTC
5 points
1
on: A daily routine I do for my AI safety research work
I made this link which redirects to all arxiv pages from the last day on AI, ML, Computation and Language, Computer Vision, and Computers and Society into a single view. Since some papers are listed under multiple areas I prefer to view this so I don’t skim over the same paper twice. If you bookmark it’s just one click per day!

ThomasW 5 Aug 2022 22:30 UTC
6 points
4
in reply to: trevor’s comment on: $20K In Bounties for AI Safety Public Materials
I’m not sure what you mean by “using bullet-pointed summaries of the 7 works stated in the post”. If you mean the past examples of good materials, I’m not sure how good of an idea that is. We don’t just want people to be rephrasings/”distillations” of single pieces of prior work.
I’m also not sure we literally tell you how to win, but yes, reading the instructions would be useful.

ThomasW 5 Aug 2022 22:32 UTC
5 points
1
in reply to: ClipMonger’s comment on: $20K In Bounties for AI Safety Public Materials
These are already the top ~10%, the vast majority of the submissions aren’t included. We didn’t feel we really had enough data to accurately rank within these top 80 or so, though some are certainly better than others. Also, it really depends on the point you’re trying to make or the audience, I don’t think there really exists an objective ordering.
We did do categorization at one point, but many points fall into multiple categories and there are a lot of individual points such that we didn’t find it very useful when we had them categorized.

ThomasW

Ex­am­ples of AI In­creas­ing AI Progress

Examples of AI Increasing AI Progress