This is so much fun! I wish I could download them!
Quadratic Reciprocity
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.
After some introspection, I think these are the mechanisms that made me feel that way:
They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to make very confident claims like that. This is more linked to confidence in body language and tone rather than their confidence in their own claims though both matter.
Credentialism: them being unwilling to explain things and taking it as a given that they were correct because I didn’t have the specific experiences or credentials that they had without mentioning what specifically from gaining that experience would help me understand their argument.
Not letting me speak and interrupting quickly to take down the fuzzy strawman version of what I meant rather than letting me take my time to explain my argument.
Morality: I felt like one of my cherished values was being threatened.
The other person was relatively smart and powerful, at least within the specific situation. If they were dumb or not powerful, I would have just found the conversation amusing instead.
The other person assumed I was dumb or naive, perhaps because they had met other people with the same position as me and those people came across as not knowledgeable.
The other person getting worked up, for example, raising their voice or showing other signs of being irritated, offended, or angry while acting as if I was the emotional/offended one. This one particularly stings because of gender stereotypes. I think I’m more calm and reasonable and less easily offended than most people. I’ve had a few conversations with men where it felt like they were just really bad at noticing when they were getting angry or emotional themselves and kept pointing out that I was being emotional despite me remaining pretty calm (and perhaps even a little indifferent to the actual content of the conversation before the conversation moved to them being annoyed at me for being emotional).
The other person’s thinking is very black-and-white, thinking in terms of a very clear good and evil and not being open to nuance. Sort of a similar mechanism to the first thing.
Some examples of claims that recently triggered me. They’re not so important themselves so I’ll just point at the rough thing rather than list out actual claims.
AI killing all humans would be good because thermodynamics god/laws of physics good
Animals feel pain but this doesn’t mean we should care about them
We are quite far from getting AGI
Women as a whole are less rational than men are
Palestine/Israel stuff
Doing the above exercise was helpful because it helped me generate ideas for things to try if I’m in situations like that in the future. But it feels like the most important thing is to just get better at noticing what I’m feeling in the conversation and if I’m feeling bad and uncomfortable, to think about if the conversation is useful to me at all and if so, for what reason. And if not, make a conscious decision to leave the conversation.
Reasons the conversation could be useful to me:
I change their mind
I figure out what is true
I get a greater understanding of why they believe what they believe
Enjoyment of the social interaction itself
I want to impress the other person with my intelligence or knowledge
Things to try will differ depending on why I feel like having the conversation.
Advice of this specific form has been has been helpful for me in the past. Sometimes I don’t notice immediately when the actions I’m taking are not ones I would endorse after a bit of thinking (particularly when they’re fun and good for me in the short-term but bad for others or for me longer-term). This is also why having rules to follow for myself is helpful (eg: never lying or breaking promises)
women more often these days choose not to make this easy, ramping up the fear and cost of rejection by choosing to deliberately inflict social or emotional costs as part of the rejection
I’m curious about how common this is, and what sort of social or emotional costs are being referred to.
Sure feels like it would be a tiny minority of women doing it but maybe I’m underestimating how often men experience something like this.
My goals for money, social status, and even how much I care about my family don’t seem all that stable and have changed a bunch over time. They seem to be arising from some deeper combination of desires to be accepted, to have security, to feel good about myself, to avoid effortful work etc. interacting with my environment. Yet I wouldn’t think of myself as primarily pursuing those deeper desires, and during various periods would have self-modified if given the option to more aggressively pursue the goals that I (the “I” that was steering things) thought I cared about (like doing really well at a specific skill, which turned out to be a fleeting goal with time).
Current AI safety university groups are overall a good idea and helpful, in expectation, for reducing AI existential risk
Things will basically be fine regarding job loss and unemployment due to AI in the next several years and those worries are overstated
It is very unlikely AI causes an existential catastrophe (Bostrom or Ord definition) but doesn’t result in human extinction. (That is, non-extinction AI x-risk scenarios are unlikely)
EAs and rationalists should strongly consider having lots more children than they currently are
In my head, I’ve sort of just been simplifying to two ways the future could go: human extinction within a relatively short time period after powerful AI is developed or a pretty good utopian world. The non-extinction outcomes are not ones I worry about at the moment, though I’m very curious about how things will play out. I’m very excited about the future conditional on us figuring out how to align AI.
I’m curious about, for people who think similarly to Katja, what kind of story are you imagining that leads to that? Does the story involve authoritarianism (but I think even then, the world in which the leader of any of the current leading labs has total control and a superintelligent AI that does whatever they want, that future is probably much much more fun and exciting for me than the present—and I like my present life!)? Does it involve us being only presented with pretty meh options for how to build the future because we can’t agree on something that wholly satisfies everyone? Does it involve multi-agent scenarios with the AIs or the humans controlling the AIs being bad at bargaining so we end up with meh futures that no one really wants? I find a bunch of stories pretty unlikely after I think about them but maybe I’m missing something important.
This is also something I’d be excited to have a Dialogue with someone about. Maybe just fleshing out what kind of future you’re imagining and how you’re imagining we end up in that situation.
Topics I would be excited to have a dialogue about [will add to this list as I think of more]:
I want to talk to someone who thinks p(human extinction | superhuman AGI developed in next 50 years) < 50% and understand why they think that
I want to talk to someone who thinks the probability of existential risk from AI is much higher than the probability of human extinction due to AI (ie most x-risk from AI isn’t scenarios where all humans end up dead soon after)
I want to talk to someone who has thoughts on university AI safety groups (are they harmful or helpful?)
I want to talk to someone who has pretty long AI timelines (median >= 50 years until AGI)
I want to have a conversation with someone who has strong intuitions about what counts as high/low integrity behaviour. Growing up I sort of got used to lying to adults and bureaucracies and then had to make a conscious effort to adopt some rules to be more honest. I think I would find it interesting to talk to someone who has relevant experiences or intuitions about how minor instances of lying can be pretty harmful.
If you have a rationality skill that you think can be taught over text, I would be excited to try learning it.
I mostly expect to ask questions and point out where and why I’m confused or disagree with your points rather than make novel arguments myself, though am open to different formats that make it easier/more convenient/more useful for the other person to have a dialogue with me.
I attended an AI pause protest recently and thought I’d write up what my experience was like for people considering going to future ones.
I hadn’t been to a protest ever before and didn’t know what to expect. I will probably attend more in the future.
Some things that happened:
There were about 20ish people protesting. I arrived a bit after the protest had begun and it was very easy and quick to get oriented. It wasn’t awkward at all (and I’m normally pretty socially anxious and awkward). The organisers had flyers printed out to give away and there were some extra signs I could hold up.
I held up a sign for some of the protest and tried handing out flyers the rest of the time. I told people who passed by that we were talking about the danger from AI and if they’d like a flyer. Most of them declined but a substantial minority accepted the flyer.
I got the sense that a lot of people who picked up a flyer weren’t just doing it to be polite. For example, I had multiple people walking by mention to me that they agreed with the protest. A person in a group of friends who walked by looked at the flyer and mentioned to their friends that they thought it was cool someone was talking about this.
There were also people who got flyers who misunderstood or didn’t really care for what we were talking about. For example, a mother pointed at the flyer and told her child “see, this is why you should spend less time on your phone.”
I think giving out the flyers was a good thing overall. Some people seemed genuinely interested. Others, even those who rejected it, were pretty polite. Felt like a wholesome experience. If I had planned more for the protest, I think I would have liked to print my own flyers, I also considered adding contact details to the flyers in case people wanted to talk about the content. It would have been interesting to get a better sense of what people actually thought.
During the protest, a person was using a megaphone to talk about AI risk and there were chants and a bit of singing at the end. I really liked the bit at the end, it felt a bit emotional for me in a good way and I gave away a large fraction of the flyers near the end when more people stopped by to see what was going on.
I overheard some people talk about wanting to debate us. I was sad I didn’t get the chance to properly talk to them (plausibly I could have started a conversation while they were waiting for the pedestrian crossing lights to turn green). I think at a future protest, I would like to have a “debate me” or “ask me questions” sign to be able to talk to people in more depth rather than just superficially.
It’s hard to give people a pitch for AI risk in a minute
I feel more positive about AI pause advocacy after the protest, though I do feel uneasy because of not having total control of the pause AI website and the flyers. It still feels roughly close to my views though.
I liked that there were a variety of signs at the protest, representing a wider spectrum of views than just the most doomy ones. Something about there being multiple people with whom I would probably disagree a lot with being there made it feel nicer.
Lots more people are worried about job loss than extinction and want to hear about that. The economist in me will not stop giving them an optimistic picture of AI and employment before telling them about extinction. This is hard to do when you only have a couple of minutes but it feels good being honest about my actual views.
Things I wish I’d known in advance:
It’s pretty fun talking to strangers! A person who was there briefly asked about AI risk, I suggested podcast episodes to him, and he invited me to a Halloween party. It was cool!
I did have some control over when I was photographed and could choose to not be in photos that might be on Twitter if I didn’t feel comfortable with that yet.
I could make my own signs or flyers that represented my views accurately (though it’s still good to have the signs not have many words)
Are there specific non-obvious prompts or custom instructions you use for this that you’ve found helpful?
There are physical paperback copies of the first two books in Rationality A-Z: Map and Territory and How to Actually Change Your Mind. They show up on Amazon for me.
E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it’s what’s available
That feels concerning. Are there any obvious things that would help with this situation, eg: better career planning and reflection resources for people in this situation, AI safety folks being more clear about what they see as the value/disvalue of working in those types of capability roles?
Seems weird for someone to explicitly want a “safety-adjacent” job unless there are weird social dynamics encouraging people to do that even when there isn’t positive impact to be had from such a job.
Most people still have the Bostromiam “paperclipping” analogy for AI risk in their head. In this story, we give the AI some utility function, and the problem is that the AI will naively optimize the utility function (in the Bostromiam example, a company wanting to make more paperclips results in an AI turning the entire world into a paperclip factory).
That is how Bostrom brought up the paperclipping example in Superintelligence but my impression was that the paperclipping example originally conceived by Eliezer prior to the Superintelligence book was NOT about giving an AI a utility function that it then naively optimises. Text from Arbital’s page on paperclip:
The popular press has sometimes distorted the notion of a paperclip maximizer into a story about an AI running a paperclip factory that takes over the universe. (Needless to say, the kind of AI used in a paperclip-manufacturing facility is unlikely to be a frontier research AI.) The concept of a ‘paperclip’ is not that it’s an explicit goal somebody foolishly gave an AI, or even a goal comprehensible in human terms at all. To imagine a central example of a supposed paperclip maximizer, imagine a research-level AI that did not stably preserve what its makers thought was supposed to be its utility function, or an AI with a poorly specified value learning rule, etcetera; such that the configuration of matter that actually happened to max out the AI’s utility function looks like a tiny string of atoms in the shape of a paperclip.
That makes your section talking about “Bostrom/Eliezer analogies” seem a bit odd, since Eliezer, in particular, had been concerned about the problem of “the challenge is getting AIs to do what it says on the tin—to reliably do whatever a human operator tells them to do” very early on.
Visiting London and kinda surprised by how there isn’t much of a rationality community there relative to the bay area (despite there being enough people in the city who read LessWrong, are aware of the online community, etc.?) Especially because the EA community seems pretty active there. The rationality meetups that do happen seem to have a different vibe. In the bay, it is easy to just get invited to interesting rationalist-adjacent events every week by just showing up. Not so in London.
Not sure how much credit to give to each of these explanations:Berkeley just had a head start and geography matters more than I expected for communities
Berkeley has lightcone infrastructure but the UK doesn’t have a similar rationalist organisation (but has a bunch of EA orgs)
The UK is just different culturally from the bay area, people are less weird or differ in some other trait that makes having a good rationality community here harder
see the current plan here EAG 2023 Bay Area The current alignment plan, and how we might improve it
Link to talk above doesn’t seem to work for me.
Outside view: The proportion of junior researchers doing interp rather than other technical work is too high
Quite tangential[1] to your post but if true, I’m curious about what this suggests about the dynamics of field-building in AI safety.
Seems to me like certain organisations and individuals have an outsized influence in funneling new entrants into specific areas, and because the field is small (and has a big emphasis on community building) this seems more linked to who is running programmes that lots of people hear about and want to apply to (eg: Redwood’s MLAB, REMIX) or taking the time to do field-building-y stuff in general (like Neel’s 200 Concrete Open Problems in Mechanistic Interpretability) rather than the relative quality and promise of their research directions.
It did feel to me like in the past year, some promising university students I know invested a bunch in mechanistic interpretability because they were deferring a bunch to the above-mentioned organisations and individuals to an extent that seems bad for actually doing useful research and having original thoughts. I’ve also been at AI safety events and retreats and such where it seemed to me like the attendees were overupdating on points brought up by whichever speakers got invited to speak at the event/retreat.I guess I could see it happening in the other direction as well with new people overupdating on for example Redwood moving away from interpretability or the general vibe being less enthusiastic about interp without a good personal understanding of the reasons.
- ^
I’d personally guess that the proportion is too high but also feel more positively about interpretability than you do (because of similar points as have been brought up by other commenters).
- ^
Other podcasts that have at least some relevant episodes: Hear This Idea, Towards Data Science, The Lunar Society, The Inside View, Machine Learning Street Talk
From the comment thread:
What are specific regulations / existing proposals that you think are likely to be good? When people are protesting to pause AI, what do you want them to be speaking into a megaphone (if you think those kinds of protests could be helpful at all right now)?