TW123

Karma: 1,240

Risks from AI Overview: Summary

Dan H, Mantas Mazeika and TW123

18 Aug 2023 1:21 UTC

25 points

1 comment13 min readLW link

(www.safe.ai)

TW123 15 Jul 2023 14:23 UTC
10 points
4
in reply to: Nicholas Kross’s comment on: Why was the AI Alignment community so unprepared for this moment?
We weren’t intending to use the contest to do any direct outreach to anyone (not sure how one would do direct outreach with one liners in any case) and we didn’t use it for that. I think it was less useful than I would have hoped (nearly all submissions were not very good), but ideas/anecdotes surfaced have been used in various places and as inspiration.

It is also interesting to note that the contest was very controversial on LW, essentially due to it being too political/advocacy flavored (though it wasn’t intended for “political” outreach per se). I think it’s fair for research that LW has had those kinds of norms, but it did have a chilling effect on people who wanted to do the kind of advocacy that many people on LW now deem useful/necessary.

Catastrophic Risks from AI #6: Discussion and FAQ

Dan H, Mantas Mazeika and TW123

27 Jun 2023 23:23 UTC

25 points

1 comment13 min readLW link

(arxiv.org)

Catastrophic Risks from AI #5: Rogue AIs

Dan H, Mantas Mazeika and TW123

27 Jun 2023 22:06 UTC

15 points

0 comments22 min readLW link

(arxiv.org)

Catastrophic Risks from AI #4: Organizational Risks

Dan H, Mantas Mazeika and TW123

26 Jun 2023 19:36 UTC

23 points

0 comments21 min readLW link

(arxiv.org)

TW123 24 Jun 2023 23:50 UTC
3 points
0
in reply to: Logan Zoellner’s comment on: Catastrophic Risks from AI #3: AI Race
It is possible that AI would allow for the creation of brain-computer interfaces such that we can productively merge with AI systems. I don’t think this would apply in that case since that would be a true “augmentation.”
If that doesn’t happen, though, or before that happens, I think this is a real possibility. The disanalogy is that our brains wouldn’t add anything to sufficiently advanced AI systems, unlike books, which are useless without our brains to read them.
Today, many people are weaker physically than in previous times because we don’t need to do as much physical labor. I don’t see why the same couldn’t happen with our minds. Of course, many people go to the gym, and people will probably also continue to learn things to keep sharp. If that becomes a strong and widespread cultural norm, then we wouldn’t have this problem. But it doesn’t seem guaranteed that would happen.

Catastrophic Risks from AI #3: AI Race

Dan H, Mantas Mazeika and TW123

23 Jun 2023 19:21 UTC

18 points

9 comments29 min readLW link

(arxiv.org)

Catastrophic Risks from AI #2: Malicious Use

Dan H, Mantas Mazeika and TW123

22 Jun 2023 17:10 UTC

38 points

1 comment17 min readLW link

(arxiv.org)

Catastrophic Risks from AI #1: Introduction

Dan H, Mantas Mazeika and TW123

22 Jun 2023 17:09 UTC

40 points

1 comment5 min readLW link

(arxiv.org)

TW123 1 Jun 2023 1:50 UTC
23 points
15
in reply to: Jan_Kulveit’s comment on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
Hi Jan, I appreciate your feedback.
I’ve been helping out with this and I can say that the organizers are working as quickly as possible to verify and publish new signatures. New signatures have been published since the launch, and additional signatures will continue to be published as they are verified. There is a team of people working on it right now and has been since launch.
The main obstacles to extremely swift publication are:
- First, determining who meets our bar for name publication. We think the letter will have greater authority (and coordination value) if all names are above a certain bar, and so some effort needs to be put into determining whether signatories meet that bar.
- Second, as you mention verification. Prior to launch, CAIS built an email verification system that ensures that signatories must verify their work emails in order for their signature to be valid. However, this has required some tweaks, such as making the emails more attention grabbing and adding some language on the form itself that makes clear that people should expect an email (before these tweaks, some people weren’t verifying their emails).
- Lastly, even with verification, some submissions are still possibly fake (from email addresses that we aren’t sure are the real person) and need to be further assessed.
These are all obstacles that simply require time to address, and the team is working around the clock. In fact, I’m writing this comment on their behalf so that they can focus on the work they’re doing. We will publish all noteworthy signatures as quickly as we can, which should be within a matter of days (as I said above, some have already been published and this is ongoing). We do take your feedback that perhaps we should have hired more people so that verification could be swifter.
In response to your feedback, we have just added language in the form and email that makes clear signatures won’t show up immediately so that we can verify them. This might seem very obvious, but when you are running something with so many moving parts as this entire process has been, it is easy to miss things.
Thank you again for your feedback.

TW123 29 Apr 2023 19:22 UTC
11 points
4
on: Research agenda: Supervising AIs improving AIs
I’ve been collecting examples of this kind of thing for a while now here: ai-improving-ai.safe.ai.

In addition to algorithmic and data improvements I’ll add there are also some examples of AI helping to design hardware (e.g. GPU architectures) and auxiliary software (e.g. for datacenter cooling).

TW123 12 Apr 2023 1:10 UTC
14 points
4
on: Request to AGI organizations: Share your views on pausing AI progress
At the time of this post, the FLI letter has been signed by 1 OpenAI research scientist, 7 DeepMind research scientists/engineers, and 0 Anthropic employees.
“1 OpenAI research scientist” felt weird to me on priors. 0 makes sense, if the company gave some guidance (e.g. legal) to not sign, or if the unanimous opinion was that it’s a bad idea to sign. 7 makes sense too—it’s about what I’d expect from DeepMind and shows that there’s a small contingent of people really worried about risk. Exactly 1 is really weird—there are definitely multiple risk conscious people at OpenAI, but exactly one of them decided to sign?
I see a “Yonas Kassa” listed as an OpenAI research scientist, but it’s very unclear who this person is. I don’t see any LinkedIn or Google Scholar profile of this name associated with OpenAI. Previously, I know many of the signatures were inaccurate, so I wonder if this one is, too?
Anyway, my guess is that actually zero OpenAI researchers, and that both OpenAI and Anthropic employees have decided (as a collective? because of a top down directive? for legal reasons? I have no idea) to not sign.
What links here?
- Request to AGI organizations: Share your views on pausing AI progress by Orpheus16 (11 Apr 2023 17:30 UTC; 141 points)
- Request to AGI organizations: Share your views on pausing AI progress by Akash (EA Forum; 11 Apr 2023 17:30 UTC; 85 points)

[MLSN #9] Verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans

Dan H and TW123

11 Apr 2023 16:03 UTC

11 points

0 comments6 min readLW link

(newsletter.mlsafety.org)

[MLSN #8] Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming

Dan H and TW123

20 Feb 2023 15:54 UTC

20 points

0 comments4 min readLW link

(newsletter.mlsafety.org)

TW123 14 Feb 2023 17:35 UTC
4 points
1
in reply to: habryka’s comment on: Is InstructGPT Following Instructions in Other Languages Surprising?
Later in the thread Jan asks, “is this interpretability complete?” which I think implies that his intuition is that this should be easier to figure out than other questions (perhaps because it seems so simple). But yeah, it’s kind of unclear why he is calling out this in particular.

TW123 14 Feb 2023 4:07 UTC
14 points
4
on: Is InstructGPT Following Instructions in Other Languages Surprising?
I find myself surprised/confused at his apparent surprise/confusion.
Jan doesn’t indicate that he’s extremely surprised or confused? He just said he doesn’t know why this happens. There’s a difference between being unsurprised by something (e.g. by observing something similar before) and actually knowing why it happens.To give a trivial example, hunter gatherers from 10,000 BC would not have been surprised if a lightning strike caused fire, but would be quite clueless (or incorrect) as to why or how this happens.
I think Quintin’s answer is a good possible hypothesis (though of course it leads to the further question of how LLMs learn language-neutral circuitry).

TW123 30 Jan 2023 22:24 UTC
7 points
0
on: Why I hate the “accident vs. misuse” AI x-risk dichotomy (quick thoughts on “structural risk”)
In addition to the post you linked, there is also an earlier post on this topic that I like.
I also co wrote a post that looks at specific structural factors related to AI safety.

TW123 14 Jan 2023 23:39 UTC
3 points
1
in reply to: kyleherndon’s comment on: What’s the deal with AI consciousness?
Thanks so much for writing this, quite useful to see your perspective!
First, I don’t think that you’ve added anything new to the conversation. Second, I don’t think what you have mentioned even provides a useful summary of the current state of the conversation: it is neither comprehensive, nor the strongest version of various arguments already made.
Fair enough!
I don’t think that’s a popular opinion here. And while I think some people might just have a cluster of “brain/thinky” words in their head when they don’t think about the meaning of things closely, I don’t think this is a popular opinion of people in general unless they’re really not thinking about it.
I’ve seen this in the public a very surprising amount. For example see the New York Times article linked. Agree it’s not remotely popular on LessWrong.
Citation needed.
Fair enough. I’m not very sympathetic to panpsychism, but it probably could have been worth mentioning. Though I am not really sure how much it would add for most readers.
Assuming we make an AI conscious, and that consciousness is actually something like what we mean by it more colloquially (human-like, not just panpsychistly), it isn’t clear that this makes it a moral concern.
That’s true; and it might be a moral concern without consciousness. But on many moral accounts, consciousness is highly relevant. I think probably most people would say it is relevant.
Meanwhile, I feel like there is a lot of lower hanging fruit in neuroscience that would also help solve this problem more easily later in addition to actually being useful now.
Curious what research you think would do here?
Same thing as above, and also the prevailing view here is that it is much more important that AI will kill us, and if we’re theoretically spending (social) capital to make these people care about things, the not killing us is astronomically more important.
I agree with this. But at the same time the public conversation keeps talking about consciousness. I wanted to address it for that reason, and really address it, rather than just brush it aside. I don’t really think it’s true that discussion of this detracts from x-risk; both point in the direction of being substantially more careful, for example.
They cannot choose not to because they don’t know what it is, so this is unactionable and useless advice.
Good point. I think I had meant to say that researchers should not try to do this. I will edit the post to say that.
I think my recommendations are probably not well targeted enough; I didn’t really specify to whom I was recommending them to. I’ll try to avoid doing that in the future.

TW123 14 Jan 2023 22:26 UTC
2 points
0
in reply to: Richard_Kennaway’s comment on: What’s the deal with AI consciousness?
I agree with this. If we are able to design consciousness such that a system is fulfilled by serving humans, then it’s possible that would be morally alright. I don’t think there is a strong enough consensus that I’d feel comfortable locking it in, but to me it seems ok.

By default though, I think we won’t be designing consciousness intentionally, and it will just emerge, and I don’t think that’s too likely to lead to this sort of situation.

TW123 13 Jan 2023 16:18 UTC
9 points
0
on: Beware safety-washing
A related post: https://www.lesswrong.com/posts/xhD6SHAAE9ghKZ9HS/safetywashing

[Realized this is contained in a footnote, but leaving this comment here in case anyone missed it].