I think it’s worth noting that I also have had times where I was impressed with your tact. The two examples that jump to mind are 1) a tweet where you gently questioned Nate Silver’s position that expressing probabilities as frequencies instead of percentages was net harmful, and 2) your “shut it all down” letter to NYT, especially the part where you talk about being positively surprised by the sanity of people outside the industry and the text about Nina losing a tooth. Both of those struck me as emotionally perceptive.
k64
The thing I wonder every time this topic comes up is: why is this the question raised to our attention? Why aren’t we instead asking whether AlphaFold is conscious? Or DALL-E? I’d feel a lot less wary of confirmation bias here if people were as likely to believe that a GPT that output the raw token numbers was conscious as they are to believe it when those tokens are translated to text in their native language.
Also, I think it is worth separating the question of “can LLMs introspect” (have access to their internal state) vs “are LLM’s conscious”.
I’m curious how you’d see moral and long-term considerations playing into this. For instance:
1. Saving for retirement produces no experienced benefit for many years and will only ever complete a single investment cycle in a lifetime.
2. Donating to or working on global health, x-risk, etc. produces no experienced benefit ever in most cases.
Yet, in both cases, individuals seem capable of exercising willpower to do these activities for many years.
I can think of 3 models currently that could explain this:
1. They just are “dead willpower”, but your willpower system gets enough income from shorter term investments to allow it to continue to invest in things that will not pay out any time soon.
2. Your willpower system has “stock” that gives it value based on the prediction of experienced benefit that hasn’t been experienced yet.
3. You experience satisfaction from seeing the 401k numbers go up or feeling like a moral person and that is the payout your willpower gets.
How do you see moral and long-term considerations interacting with your toy model?
I do think that this is probably part of my misprediction—that I simply idealize others too much and don’t give enough credit to how inconsistent humans actually are. “Idealize” is probably just the Good version of “flatten”, with “demonize” being the Bad version, both of which are probably because it takes less neurons to model someone else that way.
I actually just recently had the displeasure of stumbling upon that reddit and it made me sad that people wanted to devote their energies to just being unkind without a goal. So I’m probably also not modeling how my own principle of avoiding offense unless helpful would erode over time. I’ve seen it happen to many public figures on twitter—it seems to be part of the system.
I like this perspective. I would agree that there is more to knowing and being known by others than simply Aumann Agreement on empirical fact. I also probably have a tendency to expect more explicit goal-seeking from others than myself.
I haven’t thought this through before, but I notice two things that affect how open I am. The first is how much the communication is private, has non-verbal cues, and has an existing relationship. So right now, I’m not writing this with a desired consequence in mind, but I am filtering some things out subconsciously—like if we were in person talking right now, I might launch into a random anecdote, but while writing online I stay on a narrower path.The second is that I generally only start running my “consequentialist program” once I anticipate that someone may be upset by what I say. The anticipation of offense is what triggers me to think either “but it still needs to be said” or “saying this won’t help”. So maybe my implicit question was less “why does Eliezer not aim all his communication at his goals” and more “why doesn’t he seem to have the same guardrail I do about only causing offense if it will help”, which is a more subjective standard.
I accept your correction that I misquoted you. I paraphrased from memory and did miss real nuance. My bad.
Looking at the comment now, I do see that it has a score of −43 currently, and is the only negative karma comment on the post. So maybe a more interesting question is why I (and presumably several others) interpreted it as insult when logical content of “Intelligence(having <30y timeline in 2025) > Intelligence(potted plant)” doesn’t contain any direct insult. My best guess is that people are running informal inference on “do they think of me as lower status”, and any comparison to a lower intelligence entity is likely to trigger that. For instance, I actually find the thing you just said suggesting that I could have an LLM explain an LSAT-style question to me, to be insulting because it implies that you assign decent probability to my intelligence being lower than LLM or LSAT level. (Of course, I rank it less than “calling someone out publicly, even politely”, so I still feel vague social debt to you in this interaction.) I also anticipate that you might respond that you are justified in that assumption given that I seem to not have understood something an LLM could, and that that would only serve to increase the perceived status threat.
The “polite about the house burning” is something I have changed my mind about recently. I initially judged some of your stronger rhetoric as unhelpful because it didn’t help me personally, but have seen enough people say otherwise that I now lean toward that being the right call. The remaining confusion I have is over the instances where you take extra time to either raise your own status or lower someone else’s instead of keeping discussion focused on the object level. Maybe that’s simply because, like me, you sometimes just react to things. Maybe, as someone else suggested, its some sort of punishment strategy. If it is actually intentionally aimed at some goal, I’d be curious to know.
I’m sorry to hear about your health/fatigue. That’s a very unfortunate turn of events, for everyone really. I think your overall contribution is quite positive, so I would certainly vote that you keep talking rather than stop! If I got a vote on the matter, I’d also vote that you leave status out of conversations and play to your strength of explaining complicated concepts in a way that is very intuitive for others. In fact, as much as I had high hopes for your research prospects, I never directly experienced any of that—the thing that has directly impressed me, (and if I’m honest, the only reason I assume you’d also be great at research) has been the way you make new insights accessible through your public writing. So, consider this my vote for more of that.
I suspect that some of my dissonance does result from an illusion of consistency and a failure to appreciate how multi-faceted people can really be. I naturally think of people as agents and not as a collection of different cognitive circuits. I’m not ready to assume that this explains all of the gap between my expectations and reality, but it’s probably part of it.
I think this is an important perspective, especially for understanding Eliezer, who places a high value on truth/honesty, often directly over consequentialist concerns.
While this explains true but unpleasant statements like “[Individual] has substantially decreased humanity’s odds of survival”, it doesn’t seem to explain statements like the potted plant one or other obviously-not-literally-true statements, unless one takes the position that full honesty also requires saying all the false and irrational things that pass through one’s head as well. (And even then, I’d expect to see an immediate follow-up of “that’s not true of course”).
I agree with this decision. You reference the comment in one of your answers. If it starts taking over, it should be removed, but can otherwise provide interesting meta-commentary.
I think this makes sense as a model of where he is coming from. As a strategy, my understanding of social dynamics is that “I told you so” makes it harder, not easier, for people to change their minds and agree with you going forward.
Not an answer to the question, but I think it’s worth noting that people asking for your opinion on EA may not be precise with what question they ask. For example, it’s plausible to me that someone could ask “has EA been helpful” when their use case for the info is something like “would a donation to EA now be +EV”, and not be conscious of the potential difference between the two questions.
I agree that we’ll make new puzzles that will be more rewarding. I don’t think that suffering need be involuntary to make its elimination meaningful. If I am voluntarily parched and struggling to climb a mountain with a heavy pack (something that I would plausibly reject ASI help with), I would nevertheless feel appreciation if some passerby offered me a drink or lightening my load. Given a guarantee of safety from permanent harm, I think I’d plausibly volunteer to play a role in some game that involved some degree of suffering that could be alleviated.
[Question] Why does Eliezer make abrasive public comments?
there are also donation opportunities for influencing AI policy to advance AI safety which we think are substantially more effective than even the best 501c3 donation opportunities
Would you be willing to list these (or to DM me if there’s a reason to not list publicly)?
I began to write a long comment about how to possibly identify poverty-restoring forces, but I think we actually should take a step back and ask:
Why do we care about poverty in the first place?
”The utility function is not up for grabs”
Sure, but poverty seems like a rather complex idea to really be directly in our utility function, instead of instrumentally.
“Well we care about poverty because it causes suffering”
Ok. But why not just talk about reducing suffering then?
”Suffering can have multiple causes. It is helpful to focus on a single cause at a time to produce solutions”
Sure—but we just said we don’t know what causes it, so that’s not why. Why don’t we just talk about eliminating suffering?
”Because that would feel too...utilitarian. Too sterile. Cancer is unfortunate, but poverty is just wrong.”
And that’s exactly it I think—we care about ‘poverty’ in particular because we care about justice. There is something worse about someone dying of a preventable disease. So poverty is not simply a state of resources or of hedonic experiences. It’s not even about the poor. Someone suffering of an unpreventable cause is unfortunate. They only become poor once others have the ability to help them and doesn’t. We also care about suffering for itself, but poverty is actually a moral defect we see in the other humans who don’t help.
Once we frame the discussion this way, it becomes easy to see why universal basic income might not fix human moral defects.*
*And even if we object that poverty is not about just the moral defect, but about it also indirectly causing suffering , it is still much easier to see why UBI might not prevent human moral defects from indirectly causing suffering.
Quote voice seems to “win” this exchange, but I think there are 3 things it is missing:
1. I can’t know someone else’s joy level with certainty, but despite quote voice accusing unquote voice of having problems taking joy in the real—I don’t hear the joy in quote voice (save for the last reply). Maybe QV is just using “joy in the real” as an applause light instead of actually practicing it.
2. “And you claim to be surprised by this?”—Lack of surprise may be a symptom of having a perfect model of the world, but more often it is a symptom of not actually predicting with your model. For mortals, surprise at the real state of things should be a common occurrence—it is akin to admitting fallibility. Perhaps more importantly, in this conversation, it seems to be shutting down curiosity.
3. Even after the call-out on “explain any possible human behavior”, QV continues to use “well it has to [work] somehow” to imply “my specific model of the world is correct”. If UQV was arguing for magic or theism, then these responses would make sense, but as is, they seem like a way to avoid admitting “I don’t know”.
Very well said. I also think more is possible—not nearly as much more as I originally thought, but there is always room for improvement and I do think there’s a real possibility that community effects can be huge. I mean, individual humans are smarter than individual animals, but the real advantages have accrued though society, specialization, teamwork, passing on knowledge, and sharing technology—all communal activities.
And yeah, probably the main barrier boils down to the things you mentioned. People who are interested in self-improvement and truth are a small subset of the population[1]. Across the country/world there are lots of them, but humans have some psychological thing about meeting face to face, and the local density most places is below critical mass. And having people move to be closer together would be a big ask even if they were already great friends, which the physical distance makes difficult. As far as I can see, the possible options are:
1. Move into proximity (very costly)
2. Start a community with the very few nearby rationalists (difficult to keep any momentum)
3. Start a community with nearby non-rationalists (could be socially rewarding, but likely to dampen any rationality advantage)
4. Teach people nearby to be rational (ideal, but very difficult)
5. Build an online community (LW is doing this. Could try video meetings, but I predict it would still feel less connected that in person and make momentum difficult)
5b. Try to change your psychology so that online feels like in person. (Also, difficult)
6. Do it without a community (The default, but limited effectiveness)
So, I don’t know—maybe when AR gets really good we could all hang out in the “metaverse” and it will feel like hanging out in person. Maybe even then it won’t—maybe it’s just literally having so many other options that makes the internet feel impersonal. If so, weird idea—have LW assign splinter groups and that’s the only group you get (maybe you can move groups, but there’s some waiting period so you can’t ‘hop’). And of course, maybe there just isn’t a better option than what we’re already doing.
Personally—I’m trying to start regularly calling my 2 brothers. They don’t formally study rationality but they care about it and are pretty smart. The family connection kinda makes up for the long distance and small group size, but it’s still not easy to get it going. I’d like to try to get a close-knit group of friends where I live, though they probably won’t be rationalists. But I’ll probably need to stop doing prediction markets to have the time to invest for that.
Oh, and what you said about the 5 stages makes a lot of sense—my timing is probably just not lined up with others, and maybe in a few years someone else will ask this and I’ll feel like “well I’m not surprised by what rationalists are accomplishing—I updated my model years ago”.- ^
I read Alexander Scott say that peddling ‘woo’ might just be the side effect of a group taking self-improvement seriously and lacking the ability to fund actual studies and I think that hypothesis makes sense.
- ^
Ok, how are people losing as gatekeepers? I can’t imagine losing as a gatekeeper, so I have to think that they must not be trying or they must be doing it for a publicity benefit. I’ll give anyone $100 if they can convince me to let them out of the box. Judging by the comments here though, I’m guessing no one will step up and that one of the alternative explanations (publicity, publication bias, etc.) is responsible for the posted results.
Note: if anyone actually is considering playing against me, I will let you interview me beforehand and will honestly answer any questions, even about weakspots or vulnerabilities. I’m also open to negotiating any rules or wager amounts. The only rules I’m attached to are: No real world consequences (including social/moral) and the gatekeeper is not required to let the AI out, even if they believe their character might have in the situation. The only thing I ask is that you be confident you can win or have won before because I want someone to prove me wrong. If you’re not confident and have no track record, I am still willing to play, but I may ask you to put up some ante on your side to make sure you take it seriously.
I like the Alt-Viliam thought experiment. For myself, I have trouble projecting where I’d be other than: less money, more friends. I was very Christian and had a lot of friends through the Church community, so I likely would have done that instead of getting into prediction markets (which works out since presumably I’d be less good at prediction markets). I think your point about rationality preventing bad outcomes is a good one. There aren’t a lot of things in my life I can point to and say “I wouldn’t have this if I weren’t a rationalist”, but there are a lot of possible ways I could have gone wrong into some unhappy state—each one unlikely, but taken together maybe not.
I also like your points about the time limitations we face and the power of a community. That said, even adjusting for the amount of time we can spend, it’s not like 5 of us solve quantum gravity in 10 or even 100 months. As for the community—that may be really important. It’s possible that communal effects are orders of magnitude above individual ones. But if the message was that we could only accomplish great things together, that was certainly not clear (and also raises the question of why our community building has been less than stellar).Based on the responses I’ve gotten here, perhaps a better question is: “why did I expect more out of rationality?”
There’s a phenomenon I’ve observed where I tend to believe things more than most people, and it’s hard to put my finger on exactly what is going on there. It’s not that I believe things to be true more often (in fact, its probably less), but rather that I take things more seriously or literally—but neither of those quite fit either.
I experienced it in church. People would preach about the power of prayer and how much more it could accomplish than our efforts. I believed them and decided to go to church instead of study for my test and pray that I’d do well. I was surprised that I didn’t and when I talked to them they’d say “that wasn’t meant for you—that was what God said to those people thousands of years ago—you can’t just assume it applies to you”. Ok, yeah, obvious in hindsight. But then I swear they’d go back up and preach like the Bible did apply to me. And when I tried to confirm that they didn’t mean this, they said “of course it applies to you. It’s the word of God and is timeless and applies to everyone”. Right, my mistake. I’d repeat this with various explanations of where I had failed. Sometimes I didn’t have enough faith. Sometimes I was putting words in God’s mouth. Sometimes I was ignoring the other verses on the topic. However, every time, I was doing something that everyone else tacitly understood not to do—taking the spoken words as literal truth and updating my expectations and actions based on them. It took me far longer to realize this than it should have because, perversely, when I asked them about this exact hypothesis, they wholeheartedly denied it and assured me they believed every word as literal truth.
It’s easy to write that off as a religious phenomenon, and I mostly did. But I feel like I’ve brushed up against it in secular motivational or self-help environments. I can’t recall an instance here, but it feels like I reason: this speaker is either correct, lying, or mistaken, and other people don’t feel like its any of the above—or rather, choose correct until I start applying it to real life and then there’s always something wrong about how I apply it. Sometimes I get some explanation of what I’m doing wrong, but almost always there’s this, confusion about why I’m doing this.
I don’t know if that’s what is happening here, but if so, then that is surprising to me, because I had assumed that it was my rationalism or some other mental characteristic I’d expect to find here that was the cause of this disconnect. I read Class project and while it is obviously fiction, it is such boring fiction, in the middle of two posts telling us that we should do better than science that it seemed clear to me that it was meant as an illustration of the types or magnitudes of things we could accomplish. I don’t think I’m being overly literal here—I’m specifically considering context, intent, style. Almost the whole story is just a lecture, and nothing interesting happens—it is clearly not aimed at entertainment. It sits in the middle of a series about truth, specifically next to other non-fiction posts echoing the same sentiment. It’s just really difficult for me to believe it was just intended for whimsy and could just have easily been a whimsical story about a cat talking to a dandelion. Combine that with non-fiction posts telling us to shut up and do the impossible or that we should be sitting on a giant heap of utility, and the message seems clear.
However, the responses I’ve gotten to this posts feel very much like the same confusion I’ve experienced in the past. I get this “what did you expect?” vibe, and I’m sure I’m not the only one who read the referenced posts. So did others read them and think “Yes, Eliezer says to do the impossible and specifically designates the AI box experiment as the least impossible thing, but clearly he doesn’t mean we could do something like convince Sam Altman or Elon Musk not to doom humanity, (or in personal life, something like have a romantic relationship with no arguments and no dissastisfaction).”?
I just tried criticizing my ingroup. Did my blood boil? No. My Scotsmen got truer. Every time I could identify a flawed behavior, it felt inappropriate to include those people in my “real ingroup”. Now, if I had a more objectively defined group based on voting record or religious belief or something, then maybe I’d be able to force my brain to keep them in my ingroup, but right now, my brain flips to “sure, I’m happy to criticize those people giving us a bad name. Look, I’m criticizing my ingroup!”
I tried 2 other experiments:
1. Think about criticisms toward my ingroup that do make me angry—maybe those are the ones hitting home.
Result: I found myself disagreeing with all of them. And my brain asked “what, am I supposed to like wrongheaded arguments just because they are against my group?”
2. Just go straight for the inner-est group I have: me.
Result: I was able to think of criticisms of myself and it didn’t make my blood boil, and it wouldn’t to write them. I suspect that when I shrink the group to {me}, I may expect extra social points for criticizing myself, making it much more palatable.
So, my quick experiment suggests that, at least for someone without a clearly defined in-group, trying to criticize one’s ingroup can be more ‘slippery’ difficult than ‘grueling’ difficult.