ThomasW
It is possible that AI would allow for the creation of brain-computer interfaces such that we can productively merge with AI systems. I don’t think this would apply in that case since that would be a true “augmentation.”
If that doesn’t happen, though, or before that happens, I think this is a real possibility. The disanalogy is that our brains wouldn’t add anything to sufficiently advanced AI systems, unlike books, which are useless without our brains to read them.
Today, many people are weaker physically than in previous times because we don’t need to do as much physical labor. I don’t see why the same couldn’t happen with our minds. Of course, many people go to the gym, and people will probably also continue to learn things to keep sharp. If that becomes a strong and widespread cultural norm, then we wouldn’t have this problem. But it doesn’t seem guaranteed that would happen.
Hi Jan, I appreciate your feedback.
I’ve been helping out with this and I can say that the organizers are working as quickly as possible to verify and publish new signatures. New signatures have been published since the launch, and additional signatures will continue to be published as they are verified. There is a team of people working on it right now and has been since launch.
The main obstacles to extremely swift publication are:
First, determining who meets our bar for name publication. We think the letter will have greater authority (and coordination value) if all names are above a certain bar, and so some effort needs to be put into determining whether signatories meet that bar.
Second, as you mention verification. Prior to launch, CAIS built an email verification system that ensures that signatories must verify their work emails in order for their signature to be valid. However, this has required some tweaks, such as making the emails more attention grabbing and adding some language on the form itself that makes clear that people should expect an email (before these tweaks, some people weren’t verifying their emails).
Lastly, even with verification, some submissions are still possibly fake (from email addresses that we aren’t sure are the real person) and need to be further assessed.
These are all obstacles that simply require time to address, and the team is working around the clock. In fact, I’m writing this comment on their behalf so that they can focus on the work they’re doing. We will publish all noteworthy signatures as quickly as we can, which should be within a matter of days (as I said above, some have already been published and this is ongoing). We do take your feedback that perhaps we should have hired more people so that verification could be swifter.
In response to your feedback, we have just added language in the form and email that makes clear signatures won’t show up immediately so that we can verify them. This might seem very obvious, but when you are running something with so many moving parts as this entire process has been, it is easy to miss things.
Thank you again for your feedback.
I’ve been collecting examples of this kind of thing for a while now here: ai-improving-ai.safe.ai.
In addition to algorithmic and data improvements I’ll add there are also some examples of AI helping to design hardware (e.g. GPU architectures) and auxiliary software (e.g. for datacenter cooling).
At the time of this post, the FLI letter has been signed by 1 OpenAI research scientist, 7 DeepMind research scientists/engineers, and 0 Anthropic employees.
“1 OpenAI research scientist” felt weird to me on priors. 0 makes sense, if the company gave some guidance (e.g. legal) to not sign, or if the unanimous opinion was that it’s a bad idea to sign. 7 makes sense too—it’s about what I’d expect from DeepMind and shows that there’s a small contingent of people really worried about risk. Exactly 1 is really weird—there are definitely multiple risk conscious people at OpenAI, but exactly one of them decided to sign?
I see a “Yonas Kassa” listed as an OpenAI research scientist, but it’s very unclear who this person is. I don’t see any LinkedIn or Google Scholar profile of this name associated with OpenAI. Previously, I know many of the signatures were inaccurate, so I wonder if this one is, too?
Anyway, my guess is that actually zero OpenAI researchers, and that both OpenAI and Anthropic employees have decided (as a collective? because of a top down directive? for legal reasons? I have no idea) to not sign.
Later in the thread Jan asks, “is this interpretability complete?” which I think implies that his intuition is that this should be easier to figure out than other questions (perhaps because it seems so simple). But yeah, it’s kind of unclear why he is calling out this in particular.
I find myself surprised/confused at his apparent surprise/confusion.
Jan doesn’t indicate that he’s extremely surprised or confused? He just said he doesn’t know why this happens. There’s a difference between being unsurprised by something (e.g. by observing something similar before) and actually knowing why it happens.To give a trivial example, hunter gatherers from 10,000 BC would not have been surprised if a lightning strike caused fire, but would be quite clueless (or incorrect) as to why or how this happens.
I think Quintin’s answer is a good possible hypothesis (though of course it leads to the further question of how LLMs learn language-neutral circuitry).
In addition to the post you linked, there is also an earlier post on this topic that I like.
I also co wrote a post that looks at specific structural factors related to AI safety.
Thanks so much for writing this, quite useful to see your perspective!
First, I don’t think that you’ve added anything new to the conversation. Second, I don’t think what you have mentioned even provides a useful summary of the current state of the conversation: it is neither comprehensive, nor the strongest version of various arguments already made.
Fair enough!
I don’t think that’s a popular opinion here. And while I think some people might just have a cluster of “brain/thinky” words in their head when they don’t think about the meaning of things closely, I don’t think this is a popular opinion of people in general unless they’re really not thinking about it.
I’ve seen this in the public a very surprising amount. For example see the New York Times article linked. Agree it’s not remotely popular on LessWrong.
Citation needed.
Fair enough. I’m not very sympathetic to panpsychism, but it probably could have been worth mentioning. Though I am not really sure how much it would add for most readers.
Assuming we make an AI conscious, and that consciousness is actually something like what we mean by it more colloquially (human-like, not just panpsychistly), it isn’t clear that this makes it a moral concern.
That’s true; and it might be a moral concern without consciousness. But on many moral accounts, consciousness is highly relevant. I think probably most people would say it is relevant.
Meanwhile, I feel like there is a lot of lower hanging fruit in neuroscience that would also help solve this problem more easily later in addition to actually being useful now.
Curious what research you think would do here?
Same thing as above, and also the prevailing view here is that it is much more important that AI will kill us, and if we’re theoretically spending (social) capital to make these people care about things, the not killing us is astronomically more important.
I agree with this. But at the same time the public conversation keeps talking about consciousness. I wanted to address it for that reason, and really address it, rather than just brush it aside. I don’t really think it’s true that discussion of this detracts from x-risk; both point in the direction of being substantially more careful, for example.
They cannot choose not to because they don’t know what it is, so this is unactionable and useless advice.
Good point. I think I had meant to say that researchers should not try to do this. I will edit the post to say that.
I think my recommendations are probably not well targeted enough; I didn’t really specify to whom I was recommending them to. I’ll try to avoid doing that in the future.
I agree with this. If we are able to design consciousness such that a system is fulfilled by serving humans, then it’s possible that would be morally alright. I don’t think there is a strong enough consensus that I’d feel comfortable locking it in, but to me it seems ok.
By default though, I think we won’t be designing consciousness intentionally, and it will just emerge, and I don’t think that’s too likely to lead to this sort of situation.
A related post: https://www.lesswrong.com/posts/xhD6SHAAE9ghKZ9HS/safetywashing
[Realized this is contained in a footnote, but leaving this comment here in case anyone missed it].
It looks like I got one or possibly two strong downvotes, but it doesn’t seem like from either of the commenters. If you downvoted this (or think you understand why it was downvoted), please let me know in the comments so I can improve!
Consciousness therefore only happens if it improves performance at the task we have assigned. And some tasks like interacting directly with humans it might improve performance.
I don’t think this is necessarily true. Consciousness could be a side effect of other processes that do improve performance.
The way I’ve heard this put: a polar bear has thick hair so that it doesn’t get too cold, and this is good for its evolutionary fitness. The fact that the hair is extremely heavy is simply a side effect of this. Consciousness could possibly me similar.
I don’t think these are necessarily bad suggestions if there were a future series. But my sense is that John did this for the people in the audience, somebody asked him to record it so he did, and now he’s putting them online in case they’re useful to anyone. It’s very hard to make good production quality lectures, and it would have required more effort. But it sounds like John knew this and decided he would rather spend his time elsewhere, which is completely his choice to make. As written, these suggestions feel a bit pushy to me.
Sorry if I missed it earlier in the thread, but who is this “polymath”?
Yeah, I agree with all this.
I just do not think that the post is written for people who think “slowing down AI capabilities is robustly good.” If people thought that, then why do they need this post? Surely they don’t need somebody to tell them to think about it?
So it seems to me like the best audience for this post would be those (including those at some AI companies, or those involved in policy, which includes people reading this post) who currently think something else, for example that the robustly good thing is for their chosen group to be ahead so that they can execute whatever strategy they think they alone can do correctly.
The people I’ve met who don’t want to think about slowing down AI capabilities just don’t seem to think that slowing down AI progress would be robustly good, because that just wouldn’t be a consistent view! They often seem to have some view that nothing is robustly good, or maybe some other thing (“get more power”) is robustly good. Such people just won’t really be swayed by the robust priors thing, or maybe they’d be swayed in the other direction.
The claim being made is something like the following:1) AGI is a dangerous technology.2) It is robustly good to slow down dangerous technologies.3) Some people might say that you should not actually do this because of [complicated unintelligible reason].4) But you should just do the thing that is more robustly good.I argue that many people (yes, you’re right, in ways that conflict with one another) believe the following:1) X is a dangerous country.2) It is robustly good to always be ahead of X in all technologies, including dangerous ones.3) Some people might say that you should not actually do this because of [complicated unintelligible reason]. This doesn’t make very much sense.4) But you should just do the thing that is more robustly good.My point is that which argument is the obvious, robust one, and which one is the weird inside view one depends on your perspective. Therefore, it doesn’t seem like (4) is a very good generalized argument. For example, if I were one of these powerful people, I think it would be wrong for me to be convinced to “focus on the robustly good measures, not the weird inside view measures” because it would lead me to do bad things like trying to advance AI capabilities. As a result, the argument seems suspect to me. It feels like it only works for this community, or people who are already very concerned by AI x-risk.In comparison, there are specific arguments like “AGI is dangerous” and “slowing down dangerous technologies is actually robustly good” (some of these were presented in this post) that I think are, ironically, must more robustly good, because they don’t seem to have negative effects as reliably when presented to people who hold beliefs I think are wrong.Edit: I no longer endorse this comment. It claims too much, specifically that any reasoning procedure is suspect if it leads to people who believe false premises taking bad actions.
I think what I was really trying to get at in my original comment was that that particular argument seems aimed at people who already think that it would be robustly good to slow down dangerous technologies. But the people who would most benefit from this post are those who do not already think this; for them it doesn’t help much and might actively hurt.
There are things that are robustly good in the world, and things that are good on highly specific inside-view models and terrible if those models are wrong. Slowing dangerous tech development seems like the former, whereas forwarding arms races for dangerous tech between world superpowers seems more like the latter.
It may seem the opposite to some people. For instance, my impression is that for many adjacent to the US government, “being ahead of China in every technology” would be widely considered robustly good, and nobody would question you at all if you said that was robustly good. Under this perspective the idea that AI could pose an existential risk is a “highly specific inside-view model” and it would be terrible if we acted on the model and it is wrong.
I don’t think your readers will mostly think this, but I actually think a lot of people would, which for me makes this particular argument seem entirely subjective and thus suspect.
I think this is all true, but also since Yale CS is ranked poorly the graduate students are not very strong for the most part. You certainly have less competition for them if you are a professor, but my impression is few top graduate students want to go to Yale. In fact, my general impression is often the undergraduates are stronger researchers than the graduate students (and then they go on to PhDs at higher ranked places than Yale).
Yale is working on strengthening its CS department and it certainly has a lot of money to do that. But there are a lot of reasons that I am not that optimistic. There is essentially no tech scene in New Haven, New Haven is not that great in general, the Yale CS building is extremely dingy (I think this has an actual effect on people), and it’s really hard to affect the status quo. However, I’m more optimistic that Yale will successfully forge a niche of interdisciplinary research, which is really a strength of the university.
We weren’t intending to use the contest to do any direct outreach to anyone (not sure how one would do direct outreach with one liners in any case) and we didn’t use it for that. I think it was less useful than I would have hoped (nearly all submissions were not very good), but ideas/anecdotes surfaced have been used in various places and as inspiration.
It is also interesting to note that the contest was very controversial on LW, essentially due to it being too political/advocacy flavored (though it wasn’t intended for “political” outreach per se). I think it’s fair for research that LW has had those kinds of norms, but it did have a chilling effect on people who wanted to do the kind of advocacy that many people on LW now deem useful/necessary.