Hey, if we can get it to stop swearing, we can get it to not destroy the world, right?
TinkerBird
They also recorded this follow-up with Yudkowsky if anyone’s interested:
https://twitter.com/BanklessHQ/status/1627757551529119744
______________
>Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.
The one hope we may be able to cling to is that this logic works in the other direction too—that AGI may be a lot closer than estimated, but so might alignment.
Pretty sobering, and it’s pretty clear that it’s beyond the time we got serious about this. I might put together a post of my own calling for any creative ideas that ordinary people can implement to help the cause, but the most obvious thing is to raise awareness. I hope Yudkowsky gets the chance to do a lot more interviews like this.
Yudkowsky, if you ever see this, please don’t give up hope. Crazy breakthroughs do happen, and more people are getting into alignment as time goes on.
I said elsewhere earlier: “AGI has the power to destroy the entire human race, and if we believe there’s even a 1% chance that it will, then we have to treat it as an absolute certainty.”
And I’m pretty sure that no expert puts it below 1%
Fighting For Our Lives—What Ordinary People Can Do
On the bright side… Nope, I’ve got nothing.
Design a series of puzzles and challenges as a learning tool for alignment beginners, that when solved, progressively reveal more advanced concepts and tools. The goal is for participants to stumble upon a lucky solution while trying to solve these puzzles in these novel frames.
Highly on board with this idea. I’m thinking about writing a post about the game Foldit, which researchers came up with that reimagined protein folding as an online puzzle game. The game had thousands of players and the project was wildly successful—not just once, but many times. Thousands of ordinary people who knew nothing about biology or biochemistry sharing their creative ideas with each other vastly outstripped the researchers and algorithms at the time.
If anything like this could be done with alignment where we could effectively get thousands of weak alignment researchers, I’d put a lot of hope on it.
I can’t say I put sufficient weight on us already being watched by extraterrestrials to make it worth mentioning. However, for the AGI:
We may at some point send out a signal to other entities in the universe explaining our existence and requesting that should you destroy us, we would like them to take issue with that and treat you less well. Some of these entities may care about sentient life (perhaps because they were aligned to their original creators) and wish to respect this.
Some of us are willing to settle for a couple of mountains worth of material to be turned into computer substrate to run our minds on in our own simulated paradise, while you can have everything else.
I imagine it’s a sales tactic. Ask for $7 trillion, people assume you believe you’re worth that much, and if you’ve got such a high opinion of yourself, maybe you’re right…
In other news, I’m looking to sell a painting of mine for £2 million ;)
I’ve wanted something for AI alignment for ages like what the Foldit researchers created, where they turned protein folding into a puzzle game and the ordinary people online who played it wildly outperformed the researchers and algorithms purely by working together in vast numbers and combining their creative thinking.
I know it’s a lot to ask for with AI alignment, but still, if it’s possible, I’d put a lot of hope on it.
If you can name another immediate threat with a ≥1% chance of killing everyone, then yes, we should drop everything to focus on that too.
A pandemic that kills even just 50% of the population? <0.1%
An unseen meteor? <0.1%
Climate change? 0% chance that it could kill literally everyone
This looks fantastic. Hopefully it may lead to some great things as I’ve always found the idea of exploiting the collective intelligence of the masses to be a terribly underused resource, and this reminds me of the game Foldit (and hopefully in the future will remind me of the wild success that that game had in the field of protein folding).
What’s the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we’ve got for alignment and to be pretty optimistic about it, but I haven’t heard anyone else talking about it. Either I’m completely misunderstanding what he’s talking about, or he’s somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
But if the current paradigm is not the final form of existentially dangerous AI, such research may not he particularly valuable.
I think we should figure out how to train puppies before we try to train wolves. It might turn out that very few principles carry over, but if they do, we’ll wish we delayed.
The only drawback I see to delaying is that it might cause people to take the issue less seriously than if powerful AI’s appear in their lives very suddenly.
That image so perfectly sums up how AI’s are nothing like us, in that the characters they present do not necessarily reflect their true values, that it needs to go viral.
With the strawberries thing, the point isn’t that it couldn’t do those things, but that it won’t want to. After making itself smart enough to engineer nanotech, it’s developing ‘mind’ will have run off in unintended directions and it will have wildly different goals that what we wanted it to have.
Quoting EY from this video: “the whole thing I’m saying is that we do not know how to get goals into a system.” <-- This is the entire thing that researchers are trying to figure out how to do.
I’m with you on this. I think Yudkowsky was a lot better in this with his more serious tone, but even so, we need to look for better.
Popular scientific educators would be a place to start and I’ve thought about sending out a million emails to scientifically minded educators on YouTube, but even that doesn’t feel like the best solution to me.
The sort of people that are listened to are the more political types, so they I think are the people to reach out to. You might say they need to understand the science to talk about it, but I’d still put more weight on charisma vs. scientific authority.
Anyone have any ideas on how to get people like this on board?
As a note for Yudkowsky if he ever sees this and cares about the random gut feelings of strangers: after seeing this, I suspect the authoritative, stern strong leader tone of speaking will be much more effective than current approaches.
EDIT: missed a word
I, for one, am looking forward to the the next public AI scares.
Same. I’m about to get into writing a lot of emails to a lot of influential public figures as part of a one man letter writing campaign in the hopes that at least one of them takes notice and says something publically about the problem of AI
I’m put in mind of something Yudkowsky said on the Bankless podcast:
“Enrico Fermi was saying that fission chain reactions were 50 years off if they could ever be done at all, 2 years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.”
He was speaking about how far away AGI could be, but I think the same logic applies to alignment. It looks hopeless right now, but events never play out exactly like you expect them to, and breakthroughs happen all the time.