Ok, how are people losing as gatekeepers? I can’t imagine losing as a gatekeeper, so I have to think that they must not be trying or they must be doing it for a publicity benefit. I’ll give anyone $100 if they can convince me to let them out of the box. Judging by the comments here though, I’m guessing no one will step up and that one of the alternative explanations (publicity, publication bias, etc.) is responsible for the posted results.
Note: if anyone actually is considering playing against me, I will let you interview me beforehand and will honestly answer any questions, even about weakspots or vulnerabilities. I’m also open to negotiating any rules or wager amounts. The only rules I’m attached to are: No real world consequences (including social/moral) and the gatekeeper is not required to let the AI out, even if they believe their character might have in the situation. The only thing I ask is that you be confident you can win or have won before because I want someone to prove me wrong. If you’re not confident and have no track record, I am still willing to play, but I may ask you to put up some ante on your side to make sure you take it seriously.
k64
I like the Alt-Viliam thought experiment. For myself, I have trouble projecting where I’d be other than: less money, more friends. I was very Christian and had a lot of friends through the Church community, so I likely would have done that instead of getting into prediction markets (which works out since presumably I’d be less good at prediction markets). I think your point about rationality preventing bad outcomes is a good one. There aren’t a lot of things in my life I can point to and say “I wouldn’t have this if I weren’t a rationalist”, but there are a lot of possible ways I could have gone wrong into some unhappy state—each one unlikely, but taken together maybe not.
I also like your points about the time limitations we face and the power of a community. That said, even adjusting for the amount of time we can spend, it’s not like 5 of us solve quantum gravity in 10 or even 100 months. As for the community—that may be really important. It’s possible that communal effects are orders of magnitude above individual ones. But if the message was that we could only accomplish great things together, that was certainly not clear (and also raises the question of why our community building has been less than stellar).Based on the responses I’ve gotten here, perhaps a better question is: “why did I expect more out of rationality?”
There’s a phenomenon I’ve observed where I tend to believe things more than most people, and it’s hard to put my finger on exactly what is going on there. It’s not that I believe things to be true more often (in fact, its probably less), but rather that I take things more seriously or literally—but neither of those quite fit either.
I experienced it in church. People would preach about the power of prayer and how much more it could accomplish than our efforts. I believed them and decided to go to church instead of study for my test and pray that I’d do well. I was surprised that I didn’t and when I talked to them they’d say “that wasn’t meant for you—that was what God said to those people thousands of years ago—you can’t just assume it applies to you”. Ok, yeah, obvious in hindsight. But then I swear they’d go back up and preach like the Bible did apply to me. And when I tried to confirm that they didn’t mean this, they said “of course it applies to you. It’s the word of God and is timeless and applies to everyone”. Right, my mistake. I’d repeat this with various explanations of where I had failed. Sometimes I didn’t have enough faith. Sometimes I was putting words in God’s mouth. Sometimes I was ignoring the other verses on the topic. However, every time, I was doing something that everyone else tacitly understood not to do—taking the spoken words as literal truth and updating my expectations and actions based on them. It took me far longer to realize this than it should have because, perversely, when I asked them about this exact hypothesis, they wholeheartedly denied it and assured me they believed every word as literal truth.
It’s easy to write that off as a religious phenomenon, and I mostly did. But I feel like I’ve brushed up against it in secular motivational or self-help environments. I can’t recall an instance here, but it feels like I reason: this speaker is either correct, lying, or mistaken, and other people don’t feel like its any of the above—or rather, choose correct until I start applying it to real life and then there’s always something wrong about how I apply it. Sometimes I get some explanation of what I’m doing wrong, but almost always there’s this, confusion about why I’m doing this.
I don’t know if that’s what is happening here, but if so, then that is surprising to me, because I had assumed that it was my rationalism or some other mental characteristic I’d expect to find here that was the cause of this disconnect. I read Class project and while it is obviously fiction, it is such boring fiction, in the middle of two posts telling us that we should do better than science that it seemed clear to me that it was meant as an illustration of the types or magnitudes of things we could accomplish. I don’t think I’m being overly literal here—I’m specifically considering context, intent, style. Almost the whole story is just a lecture, and nothing interesting happens—it is clearly not aimed at entertainment. It sits in the middle of a series about truth, specifically next to other non-fiction posts echoing the same sentiment. It’s just really difficult for me to believe it was just intended for whimsy and could just have easily been a whimsical story about a cat talking to a dandelion. Combine that with non-fiction posts telling us to shut up and do the impossible or that we should be sitting on a giant heap of utility, and the message seems clear.
However, the responses I’ve gotten to this posts feel very much like the same confusion I’ve experienced in the past. I get this “what did you expect?” vibe, and I’m sure I’m not the only one who read the referenced posts. So did others read them and think “Yes, Eliezer says to do the impossible and specifically designates the AI box experiment as the least impossible thing, but clearly he doesn’t mean we could do something like convince Sam Altman or Elon Musk not to doom humanity, (or in personal life, something like have a romantic relationship with no arguments and no dissastisfaction).”?
If the default ASI we build will know what we mean and not do things like [turn us into paperclips], which is obviously not what we intended—then AI alignment would be no issue. AI misalignment can only exist if the AI is misaligned with our intent. If it is perfectly aligned with the users intended meaning but the user is evil, that’s a separate issue with completely different solutions.
If instead, you mean something more technical, like, it will “know” what we meant but not care, and “do” what we didn’t mean, or that “literal” refers to language parsing, while ASI misalignment will be due to conceptual misspecification—then I agree with you but don’t think that trying to make that distinction will be helpful to a non-technical reader.
This post is good (disagree for bad) for some other reason.
This is helpful for LW readers.
I should post this to facebook.
The first (down)vote coming in made me realize an issue. I assume that the vote indicates the quality of my content, aka whether I should post this to facebook. I’m nervous about sounding crazy to my friends anyway, and I certainly don’t want to post something that might be net harmful to AI safety advocacy!
However, it occurs to me that it is also possible that the vote could mean something else like “this topic isn’t appropriate for LW”. In an effort to disambiguate those, please agree or disagree with my comments below.
I don’t know if you’ll find it helpful, but you inspired me to write up and share a post I plan to make on Facebook.
How I’m telling my friends about AI Safety
Yes, that counts for me. My idea is to convince rich people or non-philanthropists to donate out of self-interest, and now that I think about it, the US government is effectively rich and arguably non-philanthropic.
Online, I’m seeing several sources say that pre-orders actually hurt on Amazon, because the Amazon algorithm cares about sales and reviews after launch and doesn’t count pre-orders. Anyone know about this? If I am buying on Amazon should I wait til launch, or conversely if I’m pre-ordering should I buy elsewhere?
Any info on what counts as “bulk”. I share an amazon prime account with my family so if we each want to buy copies, does it need to be separate orders, separate shipping/billing addresses, separate accounts, or separate websites to not count as “bulk”?
[Question] Can you donate to AI advocacy?
If we’re just talking calories, the necessary condition for sleep to be advantageous should be that the calories obtainable at night aren’t sufficient to cover the caloric cost of being active. With your 20% example and 16 hours of foraging, daytime foraging must have provided at least (16+8*80%)/16 = 140% of the calories it cost, meaning that even being able to obtain one seventh the calories foraging at night would pay for the extra cost relative to sleep. Intuitively, it seems like most animals would be able to do this and would get more calories from not sleeping.
That is a serious concern. It is possible that advocacy could backfire. That said, I’m not sure the correct hypothesis isn’t just “rich people start AI companies and sometimes advocacy isn’t enough to stop this”. Either way, the solution seems to be better advocacy. Maybe split testing, focus testing, or other market research before deploying a strategy. Devoting some intellectual resources to advocacy improvement, at least short term.
As for the knowledge bottleneck—I think that’s a very good point. My comment doesn’t remove that bottleneck, just shift it to advocacy (i.e. maybe we need better knowledge on how or what to advocate).
Like you said—the rich people can do the bulk of the donating to research on alignment. Less rich people can either focus on advocacy or donating to those doing advocacy. If the ecosystem is already doing this, then that’s great!
k64′s Shortform
Unlike most other charitable causes, AI safety affects rich people. This suggests to me that advocacy may be a more effective strategy than direct funding.
I like the terminology. I have the opposite conclusion though, at least if “king power” means the power to convince others to do things. There’s a property to a world that determines whether wizard or king power is stronger and it’s something like the variance or skew of ability*. In worlds where one person can accomplish, on their own, as much physical change as hundreds or thousands of others, wizard power is superior to king power. A powerful wizard with their magic will do more than a king with his army. In worlds where there is low variance in ability, king power is superior.
Many magical or anime worlds have high ability variance and training gains are linear or exponential instead of logarithmic or some other plateau function. Our world appears to have low variance in ability. For all the Chuck Norris memes, there is really no one who stands a chance against an army. In unarmed combat, I’d guess that we cap out around 10x, even the best fighter in the world couldn’t fight more than 10 average men simultaneously. Other fields may have different ratios, but in general it seems easier to convince/pay a team of smart people to do X than to do it yourself.
If I could take Eliezer’s or Nick Bostrom’s belief about the importance of AI safety/alignment and transfer it to Trump, I would.
*technically the ratio of this and a convinceability factor, but most fictional worlds don’t tend to alter convinceability.
Very well said. I also think more is possible—not nearly as much more as I originally thought, but there is always room for improvement and I do think there’s a real possibility that community effects can be huge. I mean, individual humans are smarter than individual animals, but the real advantages have accrued though society, specialization, teamwork, passing on knowledge, and sharing technology—all communal activities.
And yeah, probably the main barrier boils down to the things you mentioned. People who are interested in self-improvement and truth are a small subset of the population[1]. Across the country/world there are lots of them, but humans have some psychological thing about meeting face to face, and the local density most places is below critical mass. And having people move to be closer together would be a big ask even if they were already great friends, which the physical distance makes difficult. As far as I can see, the possible options are:
1. Move into proximity (very costly)
2. Start a community with the very few nearby rationalists (difficult to keep any momentum)
3. Start a community with nearby non-rationalists (could be socially rewarding, but likely to dampen any rationality advantage)
4. Teach people nearby to be rational (ideal, but very difficult)
5. Build an online community (LW is doing this. Could try video meetings, but I predict it would still feel less connected that in person and make momentum difficult)
5b. Try to change your psychology so that online feels like in person. (Also, difficult)
6. Do it without a community (The default, but limited effectiveness)
So, I don’t know—maybe when AR gets really good we could all hang out in the “metaverse” and it will feel like hanging out in person. Maybe even then it won’t—maybe it’s just literally having so many other options that makes the internet feel impersonal. If so, weird idea—have LW assign splinter groups and that’s the only group you get (maybe you can move groups, but there’s some waiting period so you can’t ‘hop’). And of course, maybe there just isn’t a better option than what we’re already doing.
Personally—I’m trying to start regularly calling my 2 brothers. They don’t formally study rationality but they care about it and are pretty smart. The family connection kinda makes up for the long distance and small group size, but it’s still not easy to get it going. I’d like to try to get a close-knit group of friends where I live, though they probably won’t be rationalists. But I’ll probably need to stop doing prediction markets to have the time to invest for that.
Oh, and what you said about the 5 stages makes a lot of sense—my timing is probably just not lined up with others, and maybe in a few years someone else will ask this and I’ll feel like “well I’m not surprised by what rationalists are accomplishing—I updated my model years ago”.
I read Alexander Scott say that peddling ‘woo’ might just be the side effect of a group taking self-improvement seriously and lacking the ability to fund actual studies and I think that hypothesis makes sense.