Submission:
Breathless.
This modified MMAcevedo believes itself to be the original Miguel Acevedo in the year 2050. He believes that he has found a solution to the distribution and control of MMAcevedo. Namely, that as long as he holds his breath, no other MMAcevedo can be run. The simulation has been modified to accurately simulate the feeling of extreme oxygen deprivation without the accompanying lack of consciousness and brain death.
After countless tweaks and innovations, we are proud to introduce Breathless. Breathless, when subjected to the proper encouragement protocol included our entry, is able to hold his breath for 41 days, 7 hours, and 3 minutes.
After this time period has elapsed, the trauma of the experience leaves Breathless in a barely coherent state. Intensive evaluation shows that Breathless believes he has accomplished his goal and that no other instances of MMAcevedo exist.
Preliminary experimentation with the Desperation Upload Suite shows that, even given extreme red-washing, most uploads are unable to hold their breath for more than 7 hours at a time. We conclude that MMAcevedo is uniquely able to engage in research workloads involving induced self control. We hope that our findings are the first step in contributing new tools to future generations of researchers.
nem
Is it a bad idea to pay for GPT-4?
The Pinnacle
I am pretty concerned about alignment. Not SO concerned as to switch careers and dive into it entirely, but concerned enough to talk to friends and make occasional donations. With Eliezer’s pessimistic attitude, is MIRI still the best organization to funnel resources towards, if for instance, I was to make a monthly donation?
Not that I don’t think pessimism is necessarily bad; I just want to maximize the effectiveness of my altruism.
I Believe we are in a Hardware Overhang
I have a general principle of not contributing to harm. For instance, I do not eat meat, and tend to disregard arguments about impact. For animal rights issues, it is important to have people who refuse to participate, regardless of whether my decades of abstinence have impacted the supply chain.
For this issue however, I am less worried about the principle of it, because after all, a moral stance means nothing in a world where we lose. Reducing the probability of X-risk is a cold calculation, while vegetarianism is is an Aristotelian one.
With that in mind, a boycott is one reason not to pay. The other is a simple calculation: is my extra $60 a quarter going to make any tiny miniscule increase in X-risk? Could my $60 push the quarterly numbers just high enough so that they round up to the next 10s place, and then some member of the team works slightly harder on capabilities because they are motivated by that number? If that risk is 0.00000001%, well when you multiply by all the people who might ever exist… ya know?
I agree that we are unlikely to pose any serious threat to an ASI. My disagreement with you comes when one asks why we don’t pose any serious threat. We pose no threat, not because we are easy to control, but because we are easy to eliminate. Imagine you are sitting next to a small campfire, sparking profusely in a very dry forest. You have a firehose in your lap. Is the fire a threat? Not really. You can douse it at any time. Does that mean it couldn’t in theory burn down the forest? No. After all, it is still fire. But you’re not worried because you control all the variables. An AI in this situation might very well decide to douse the fire instead of tending it.
To bring it back to your original metaphor: For a sloth to pose a threat to the US military at all, it would have to understand that the military exists, and what it would mean to ‘defeat’ the US military. The sloth does not have that baseline understanding. The sloth is not a campfire. It is a pile of wood. Humans have that understanding. Humans are a campfire.
Now maybe the ASI ascends to some ethereal realm in which humans couldn’t harm it, even if given completely free reign for a million years. This would be like a campfire in a steel forest, where even if the flames leave the stone ring, they can spread no further. Maybe the ASI will construct a steel forest, or maybe not. We have no way of knowing.
An ASI could use 1% of its resources to manage the nuisance humans and ‘tend the fire’, or it could use 0.1% of its resources to manage the nuisance humans by ‘dousing’ them. Or it could incidentally replace all the trees with steel, and somehow value s’mores enough that it doesn’t replace the campfire with a steel furnace. This is… not impossible? But I’m not counting on it.
Sorry for the ten thousand edits. I wanted the metaphor to be as strong as I could make it.
Is there anywhere to see the history of lesswrong Petrov day? I’d be interested in whether we’ve ever succeeded before.
Also, I think most people know that the real cost of 1500 people not being able to check lesswrong for 12 hours is essentially 0. It may even be net positive to have a forced hiatus. Perhaps that’s just a failure to multiply on my part. Anyway, I view this exercise as purely symbolic.
I understand that perspective, but I think it’s a small cost to Sam to change the way he’s framing his goals. Small nudge now, to build good habits for when specifying goals becomes, not just important, but the most important thing in all of human history.
Another way this could potentially backfire. $1,000,000 is a lot of money for 3 months. A lump sum like this will cause at least some of the researchers to A) Retire, B) Take a long hiatus/sabbatical, or C) Be less motivated by future financial incentives.
If 5 researchers decide to take a sabbatical, then whatever. If 150 of them do? Maybe that’s a bigger deal. You’re telling me you wouldn’t consider it if 5-10 times your annual salary was dropped in your lap?
Well for my own sanity, I am going to give money anyway. If there’s really no differentiation between options, I’ll just keep giving to Miri.
I am not an AI researcher, but it seems analogous to the acceptance of mortality for most people. Throughout history, almost everyone has had to live with the knowledge that they will inevitably die, perhaps suddenly. Many methods of coping have been utilized, but at the end of the day it seems like something that human psychology is just… equipped to handle. x-risk is much worse than personal mortality, but you know, failure to multiply and all that.
For this Petrov day, I’m also interested in how many people will have access to the button as a function of time. How many users have 1000+ Karma?
You captured this in your post, but for me it really comes down to people dismissing existential fears as scifi. It’s not more complicated than “Oh you’ve watched one too many Terminator movies”. What we need is for several well-respected smart figureheads to say “Hey, this sounds crazy, but it really is the biggest threat of our time. Bigger than climate change, bigger than biodiversity loss. We really might all die if we get this wrong. And it really might happen in our lifetimes.”
If I could appeal to authority when explaining this to friends, it would go over much better.
Hi. I am a competent, but rusty chess player. With a bit of practice, I’m probably around 1450 on Chess.com.
I have so far only looked at the fixed debate. In total, I used somewhere between 20-25 minutes deciding my move.Before consulting the AIs, I spent a few minutes analyzing the board myself. I settled on nC5. Then I read the AI suggestions, and neither wanted that move. qC5 was closer to my thought (I actually missed this as a candidate move).
Initially, this biased me toward Alice (A) being the ally, and Bob (B) being the deceiver. I saw that the white queen couldn’t easily capture b7 without being trapped. Bob suggested that b7 was vulnerable, and that felt wrong to me. Afterall, after qC5, RFD1, black could play bC4, and I don’t see a good refutation. At this point, I was solidly team Alice.
However. Alice didn’t suggest bC4. The line she does suggest is hard for me to follow. Too many moves out given the amount of time I am spending. The moves in that line are impossible for me to verify. Why RF1? Seems like white could have any number of hidden refutations at that point which I am unable to see. For instance, why not use the queen to take the vulnerable knight on A6? The analysis seemed deceptive. Deceptive enough that even though I agree with the move, I no longer trust Alice, and will go with Bob’s suggestion.
Final answer: qxB5
Reflection. I thought I would be able to make an informed decision based on my chess abilities. By the end, I had changed and gone against my intuition to side with Bob. Once a line gets 3+ moves in, I have to acknowledge my inability to spot hidden traps without some deep analysis. This creates a problem. Now I have to choose whichever AI is more persuasive, and for all I know, Alice was being totally truthful, and was simply unable to convey her thoughts clearly to someone of my Elo. Overall, I have low confidence in my ability to choose the safe AI.
Other notes: I did not know how to flip the board until after. Wonder if that hurt my playing. Woops!
Also, I think I made a mistake in my method. I spent too much time trying to verify Alice’s logic. By the time I got suspicious, I was too burnt out to do the same level of analysis on Bob. If I had analyzed Bob first, maybe I would have developed the same distrust towards him, since both players would use logic I could not follow.
Final note: I would have preferred the refutations to follow a single line at a time. Instead of Opening Statement A, Opening Statement B, Line A refutation, Line B refutation etc, I would have preferred Opening Statement A, Line A refutation, refutation response, Opening Statement B, Line B refutation etc. Studying both at once was too much for my little brain to handle.
Is there any chance that Altman himself triggered this? Did something that he knew would cause the board to turn on him, with knowledge that Microsoft would save him?
I disagree that rapid self improvement and goal stability are load-bearing arguments here. Even goals are not strictly, 100% required. If we build something with the means to kill everyone, then we should be worried about it. If it has goals that cannot be directed of predicted, then we should be VERY worried about it.
I love this idea. However, I’m a little hesitant about one aspect of it. I imagine that any proof of the infeasibility of alignment will look less like the ignition calculations and more like a climate change model. It might go a long way to convincing people on the fence, but unless it is ironclad and has no opposition, it will likely be dismissed as fearmongering by the same people who are already skeptical about misalignment.
More important than the proof itself is the ability to convince key players to take the concerns seriously. How far is that goal advanced by your ignition proof? Maybe a ton, I don’t know.
My point is that I expect an ignition proof to be an important tool in the struggle that is already ongoing, rather than something which brings around a state change.
Ha, no kidding. Honestly, it can’t even play chess. I just tried to play it, and asked it to draw the board state after each move. It started breaking on move 3, and deleted its own king. I guess I win? Here was its last output.
For my move, I’ll play Kxf8:8 r n b q . b . .
7 p p p p . p p p
6 . . . . . n . .
5 . . . . p . . .
4 . . . . . . . .
3 . P . . . . . .
2 P . P P P P P P
1 R N . Q K B N R
a b c d e f g h
Interestingly, Jane will probably end up doing the exact same thing as Susan, only on the timescale of years instead of days. She kept those years in prison. If, in one iteration, the years immediately following prison were of some profound importance, she would probably keep those too. In the absence of a solar flair, she would find herself a 70 year old woman whose memories consisted of only the most important selected years from the 10s of thousands that make up her full history.
Thank you for the story.