Upgrading the AI Safety Community

Special thanks to Justis Mills for pointing out important caveats.

NicholasKross

So broadly, we wanted to discuss alignment community uplift, including: intelligence enhancement, megaprojects, and more!

Notably, different ways of augmenting the AI safety community can feed off each other. Intelligence enhancement can improve megaprojects, megaprojects can be points for coordination, both of these can make the whole community more adaptable, and so on.

(Note for readers: This dialogue is less an “argument”, and more “getting our concepts and cruxes on the table”. So it won’t have much of the debate-style back-and-forth that you usually see in Dialogues.)

I think we’re starting with intelligence enhancement, correct?

Intelligence Enhancement

trevor

Yes, you recently did a post about that to aggregate people’s findings.

Recently, Genesmith and kman’s post opening the Overton window on adult human gene editing and Richard Ngo’s post on Moloch being as much about inadequate technology/​understandings as coordination failures, I feel like people here have a better sense of what historically unprecedented things are possible in a transforming world. Including things like upgrading the AI safety community, or the AI safety community leading the world on human upgrading in general.

Also recently, Yudkowsky wrote:

I currently guess that a research community of non-upgraded alignment researchers with a hundred years to work, picks out a plausible-sounding non-solution and kills everyone at the end of the hundred years.”

I’m much more focused on intelligence enhancement for the purpose of making the AI safety community better positioned to save the world, e.g. intelligence-augmented people researching and facilitating AI policy/​AI pause, instead of technical alignment.

But either way, the value here is obviously intense. And we can get impressive results, quick, and keep them too. The 2020s will probably move faster, even without slow takeoff or AI transformations. And for intelligence amplification, the AI safety community is one of the world’s best candidates to figure things out as we go along and be the first to ride each wave. We’re exactly the kind of people to reap the benefits as they come, no matter how strange, and use them to make things happen on our terms.

NicholasKross

Yeah, intelligence enhancement would help for both “solving highly difficult technical/​mathematical problems” and “doing politicking/​coordination well enough to make real governance progress”.

I’m excited about different enhancement techniques, but with a few caveats:

  • I usually specify “adult human intelligence enhancement”. Children’s intelligence is easy to augment (e.g. iodized salt), and through embryo selection you can start even earlier. But AI timelines seem too short to raise a whole generation of great rationalists and then push them into solving everything in the remaining time.

  • Most intelligence-enhancement proposals for alignment (like >80% of cyborgism discussions) are basically “use the LLM to do stuff”. Which (1) could exacerbate the capabilities-increasing trend for LLMs, and (2) is nowhere near the same league as genetic enhancements and brain computers.

    If the problem is hard enough to demand geniuses, LLM usefulness shades into capability dangerousness, at an enhancement level well below the harder options.

  • Most techniques that people actually try, don’t work. Stimulants work, tDCS probably doesn’t work, and everything else is expensive and restricted to medical trials (or hasn’t even gotten to that point yet!).

I can go into more depth about any of these.

trevor

LLMs seem to be pretty great for learning, particularly maths. But my thinking about LLMs for intelligence augmentation is that, over time, it will incrementally open up surprisingly deep access to your mind and beliefs and values.

Using LLMs intimately enough for intelligence enhancement would require quite a bit of trust in the provider, where there are many points of failure that are easy for attackers to exploit e.g. chip backdoors in the H100s or the operating systems. Open source also doesn’t do enough for this.

I might go into this later if it comes up. For now let’s focus on the things people are sleeping on.

NicholasKross

Yeah!

  • Ultrasound neuromodulation is probably my favorite enhancement proposal so far, due to its power and noninvasiveness. But unless an Oculus-style multiple-OOMs cost reduction happens, my guess is nobody will get to tinker with it in their garage and make it useful for alignment particularly soon. (And Oculus got to ride the smartphone revolution in small cheap decent consumer electronics/​displays, and was invented by a smartphone repairman. In contrast, “normal” ultrasound machines are still in the $25k range. This is still cheaper than the machines used for neuromodulation, but damn.)

    I could also get surprised by the pace of innovation here. I mean, I got a tDCS machine off Amazon for <$150. But again, the broad pattern is “If I can buy it, it won’t work, unless it’s a simulant. If it’s likely to work, it’s not on a fast-track (that I know of) to be used in alignment”.

    Hardware hackers: this is your chance!

  • Neuralink/​BCIs: I have pretty reasonable apprehension towards putting a Neuralink-brand device in my skull. An open-source alternative could maybe emerge (somehow?), but then that still requires surgery to implant.

  • Cooler-than-normal LLM/​foundation-model uses: I’ve seen a lot of these. Some of them are good, some of them are lame, and some of them require more alignment-y breakthroughs to even start working well. And they’re all debatably exfohazards, so I won’t signal-boost them here.

  • Stimulants: Helpful for a lot of people, including me. (None of this dialogue is medical advice, or other professional advice.)

trevor

Yeah, all of these have incredible startup potential (that is, if bay area VCs ever fund anything adjacent to AI safety ever again, lol).

Hardware hackers: this is your chance!

If anyone here is a hardware hacker or thinking you’re becoming one, I think you’re completely clueless about how important you might become 5 years out from now. The hardware rabbit hole probably runs way deeper than you’re aware of.

I’ve been a big fan of neurofeedback for a while now. Ultrasound modulation looks really interesting, but I’ve mostly focused on stuff substantially cheaper than fMRI because most of my research has focused on massive sample sizes (aka big data). Millions of people, usually, many of them clustering around 100 IQ and other forms of central tendency.

The Chinese government actually probably set up substantial research in Electroencephalograms (EEGs) to find viable human biodata using “brain scanning” hats on potentially many thousands of workers; this allegedly started more than 5 years ago and probably featured substantial military involvement as well.

But they almost certainly weren’t using that data for intelligence enhancement; not at the level of researching the effects of the Sequences or the CFAR handbook or Tuning cognitive strategies, anyway. Their types are more interested in researching stuff like lie detection (the US military as well, they apparently started on fMRI machines for lie detectors more than 10 years ago).

Although, it’s a fair warning for people who think the field is completely uncharted territory. Unless this is all disinformation, to use the possibility of functional lie detection to panic enemy intelligence agencies or whatever, large-scale brain imaging probably isn’t untouched. The Department of Defense is invested too, according to Constantin’s subsequent post on the major players in the neuromodulation scene:

Attune Neuro, per their website, is a clinical-stage medical device company funded by, among others, the US Special Operations Command and the US Air Force (!)

They have $3.75M in funding, were founded in 2019, and are based in Menlo Park, CA.

They got an SBIR grant to build wearable devices with steerable ultrasound arrays to target the thalamus and develop a (nonaddictive) sleep aid protocol to help with the insomnia that accompanies recovery from opiate addiction.

A big issue with this is that MIRI are the ones who want human intelligence amplification, but EA/​OP are the ones with the money needed to actually get in on this tech, and the EA AI orgs like ARC seem pretty interested in using slow takeoff itself for alignment, rather than using human intelligence amplification as an alternative slow takeoff like what MIRI wants. SBF probably would have put a ton of money into it; the expected value is pretty high for not-betting-everything-on-the-prediction-that-slow-takeoff-will-go-OK.

According to Sarah’s post, most of the ecosystem seems focused on basically worthless mental health treatments. I can see Dustin/​Jaan/​Vitalik/​Altman/​Graham/​Elon investing in these on a for-profit basis, since those three are sane enough to invest in intelligence amplification’s practical applications, rather than easy healthcare money.

Basically, if all the investors in the space are thinking about neuromodulation for healthcare and none of them are thinking about neuromodulation for intelligence amplification, then neuromodulation intelligence amplification is basically an untouched market (although most of the existing startups, with the actual machinery, probably all already have stronger ties to their healthcare-oriented early investors).

I’m actually bullish on Neuralink, even though brain implants are an obvious hack/​dystopia risk, entirely because it’s Elon Musk making it, which means that it will probably have a real off-switch that actually cuts the power to the OS and the SoC sensors, unlike the fake off-switches that all the smartphone and computer and IoT companies love to put in all their products for some mysterious reason.

Not that Neuralink is safe. Persuading people to want to keep it on is actually a trivial engineering task, Tristan Harris’s Social Dilemma documentary does a great job explaining how this works. But if neuralink gets a real not-fake off switch, it’s no more or less dangerous than a neuromodulation headset.

I’ve heard about flashing lights or beeping sounds that notify you whenever your brain state gets closer or further from a specific high-value state, in order to guide you towards that brain state or that brain state’s prerequisites, e.g. guiding you towards moments of inspiration and making them last longer, or detecting brain states known to indicate that a cognitive bias just happened and beeping to warn you. Let’s start with brain implements/​interface first.

NicholasKross

Oh yeah, I’d somehow forgotten about neurofeedback! That last paragraph is a good idea. I bet some interesting stuff could be done combining ML with brain-data. A small taste would be those studies that train NNs to come up with your mental images, a crude “mind-reading” setup that could probably be modified and extended to aid learning.

trevor

Multimodality helps with this, right? e.g. using eyetracking to find exactly what words your eyes are looking at at any given time on the screen, how fast your eyes move or other kinds of movement changes caused by reading different kinds of concepts, and then combining the eyetracking data with the language model with the brain sensors/​imaging as three neutral network layers, that each look at the human mind from a different angle?

NicholasKross

I’d imagine it’d help, yes. Though imho such technologies need to have good UX in mind, which is extra-important when the medium is potentially your whole mind. E.g. you don’t want your deep thinking interrupted by 20 modalities of distracting HUD elements.

There are so many somewhat-promising angles… lots of potential for N=1 tinkering, for anyone with the time and equipment. If even 10 people were doing this as a serious hardware-hacking-type hobby, I’d be surprised if they didn’t discover something legitimately useful to alignment/​governance people, even if it was something “relatively” minor.

trevor

Woah, I just read this section from that article you sent about ultrasound:

it’s a huge opportunity for research. If you can “turn up” or “turn down” any brain region at will, safely enough to mess around with it on healthy human subjects, you can develop a functional atlas of the brain, where you find out exactly what each part is doing.

That absolutely blew my mind. It uses the same causality analysis as Zack M Davis’s Optimized Propaganda with Bayesian Networks, except by sensing and modulating activity in brain tissue, rather than surveys given to large numbers of people, but both of them run interpretability on the human brain. Neuromodulation is bottlenecked on safety trials, of course, so it might take too long before it’s valuable for AI safety.

Maybe it would be easier to do it safely if it used nonlinear acoustics to resonate with skull bones, instead of at the frequency that affects neural activity. That way, you aren’t messing with the neural activity at all, you’re only vibrating the inner skull; just have it automatically turn it down as they approach the right direction, so that thinking in beneficial directions feels more comfortable.

Either way, you could have smart people doing Squirrel’s Tuning in an fMRI machine, or having moments of insight in the machine, if you map the brain well enough at key moments via the fMRI machines, you could even find out how to suppress the parts of the brain that inhibit cognitive strategy tuning, or to halt the feedback loops that prevent or conclude moments of insight.

such technologies need to have good UX in mind, which is extra-important when the medium is potentially your whole mind. E.g. you don’t want your deep thinking interrupted by 20 modalities of distracting HUD elements.

The recent iPhone’s “dynamic island” is great for this. They also have variable refresh rates, and if you oscillate those at different times, then that would also be a safe stimulus e.g. cause discomfort and ratchet down the discomfort as they approach the correct brain state. You can stack weaker versions of these stimuli together, to create multi-source discomfort that eases as they move in the correct direction. Maybe also discomfort from barely-audible or inaudible high or low frequency sounds.

That’s a lot of sources of relief, possibly stackable. Plenty of safe ways to deploy the positive reinforcement at precisely the right moments, no need for any heroin or juice reward.

Becomes:

Monkey Smart Phone GIF - Monkey Smart Phone Primate - Discover & Share GIFs

Frankly, unhelpful/​wasteful brain states should be uncomfortable, and helpful/​gainful brain states should be the comfortable ones.

We just have to set up the positive and negative reinforcements correctly, instead of cluelessly ignoring them and getting reinforced in random directions, like our hunter gatherer ancestors.

Figuring things out as we go along won’t be easy, these Chesterton-Schelling fences will drop all sorts of bizarre curve balls on us, but we’re already starting out with some of the best people for the job, and as we go along we’ll be creating even better people.

NicholasKross

People have slept on these, I think, because they see the lack of large progress from most nootropics, and also because tDCS and similar devices are, at the consumer level, approaching scam territory. Hopefully we’ve gotten across how promising intelligence enhancement in general (and neurological interventions in particular) can be, with little-used techniques and some creativity.

I’m ready to move on to another topic in our list, what do you think?

trevor

Yes. I can definitely imagine people from the rationalist community, including the people currently doing alignment research, spending, like, an hour a week or half an hour per day thinking of galaxy-brained ways to apply advancements like these towards human intelligence amplification.

I heard rumors that some of the more extreme people in EA were talking about donating kidneys and volunteering in early Covid vaccine trials. They should instead be volunteering their brains for human safety trials, developing a smart, neurodiverse, quant-skilled, creative, and insightful community of practice as envisioned by Constantin so that we can get a rich understanding of the tech’s potential before it gets tried on people like Yudkowsky, Christiano, Sutskever, or Wentworth. That is what being hardcore and saving-the-world-for-real looks like, especially for people who aren’t satisfied with their current impact level.

According to Constantin on neuromodulation communities of practice:

Requirements/​desiderata for a community of practice around neuromodulation:

  • You need to first have a reliable device and a reasonable degree of confidence that it won’t cause acute brain injury.

    • since we’re not really there yet on ultrasound, it may make sense to start building the self-experimentation practices around a less-powerful but better-characterized technology like tDCS, which I’d guess has no “real” effects but is safe enough that it’s sold as a consumer product.

  • You want as much domain expertise as possible in your founding research group — medicine, neuroscience, psychology, etc.

  • Ideally you’d also want people who are highly sensitive and articulate about their perceptual, mental, emotional and intersubjective experiences. Artists, meditators, athletes, therapists — people who’d be able to spontaneously notice subtle things like “I’m able to discuss sensitive topics with less defensiveness” or “My peripheral vision has gotten better” or “I find myself focusing totally on the present moment”.

  • You might want quantitative measurement modalities (fMRI, EEG, heart rate monitors, etc) but of course there’s a tradeoff between what you can measure in a lab on immobilized isolated subjects and what you can bring to a “realistic” environment like a home (where people can move around, interact with each other, etc.)

  • And you’d want to work towards developing an (initially informal) collection of things to try, “experiments” to run, aiming primarily for common-sense validity and helpfulness. You don’t need a questionnaire or rating scale to tell that caffeine makes you feel more alert — but you might want to actually measure reaction time rather than trusting your subjective impression that you’re “faster.”

The main goal here is not to substitute informal trial-and-error for “real science” but to use it to generate hypotheses that are worth testing more rigorously...

Of course, informal self-experimentation is controversial, and may not fit into standard research/​funding models, but it really seems necessary for such an open-ended search.

We’re probably among the people best positioned to lead on this. I doubt that any of the startups on Constantin’s list are anywhere near the level of Tuning your Cognitive Strategies, and Raemon’s trial group is already pushing the edges of that field, and that whole paradigm is basically zero tech and always has been.

It makes me think of the closing note on Yudkowsky’s List of Lethalities:

This situation you see when you look around you is not what a surviving world looks like. The worlds of humanity that survive have plans… started trying to solve their important lethal problems earlier than this. Half the people going into string theory shifted into AI alignment instead and made real progress there...

A lot of those better worlds will die anyways. It’s a genuinely difficult problem, to solve something like that on your first try.

trevor

So to what extent is AI safety disproportionately opposed to nootropics?

I get the impression that some people are thinking “this didn’t work, it’s time for people to let go” and other people are like “deferring to social reality is a bad heuristic, and pills could obviously do something like this, therefore I will keep going”.

But I didn’t ask anyone in DC or while I was living in Berkeley.

NicholasKross

My impression (again, not medical advice) is that nootropics mostly “work” when someone has a deficiency/​condition that makes them useful. Like how vegans need vitamin B12 supplementation, some people’s bodies might need a nootropic. (Creatine apparently works better for vegans, along these lines?)

They should instead be volunteering their brains for human safety trials, developing a smart, neurodiverse, quant-skilled, creative, and insightful community of practice as envisioned by Constantin so that we can get a rich understanding of the tech’s potential before we try it on people like Yudkowsky, Christiano, Sutskever, or Wentworth. That is what being hardcore and saving-the-world-for-real looks like, especially for people who aren’t satisfied with their current impact level.

I’ve definitely considered volunteering for this, and would probably do so in any intelligence-enhancement trials. But I’m probably an outlier here.

  • Early trials are likely to be for people with e.g. Alzheimer’s, not enhancement trials for healthy people.

  • Encouraging “just” kidney donations, alone, has given mixed results. It’d be pretty horrific to put lots of social pressure on people to experiment with their brains, outside of just “hey this option exists” or “I did this and here’s how it went”.

But Constantin’s post definitely gives a good idea of what such experimentation could look like, after neuromodulation gets safer and better-understood.

trevor

Well, one way or another, if human intelligence amplification takes off among silicon valley VCs (ideally giving them and the military an outlet other than AI) then hopefully the nootropic problem will get figured out on its own.

NicholasKross

Yeah. And/​or gets obsoleted by neuro tech.

Megaprojects

trevor

A while ago you wrote Alignment Megaprojects: You’re Not Even Trying to Have Ideas. What megaproject ideas are you most excited about now, especially in the context of a transformative world? We’re taking into account AI timelines, Cyborgism and other applications and automations from LLMs, successes in some of the intelligence amplification stuff we’ve talked about, and community health and power games we’ve seen in and around EA and AI labs over the last 3 years. Even global upheavals like COVID.

NicholasKross

I came up with a couple:

  • Formalization Office”: Subcontract out to math grad students (from inside and outside the existing alignment field). Formalize, go in-depth on, formally prove/​disprove, and check the work of alignment researchers. Would probably link their findings in comments on relevant posts. Basically “roving outside-ish peer review that doesn’t have actual power over you”.

  • Mathopedia”: A multimodal-pedagogy wiki for mathematics, geared towards formal/​agent-foundations-type alignment research. Lately I’ve been thinking of how to make it an “armory of mathematical tactics”, like “This MATS math problem mentions X Y Z, therefore we should use tools from the field of S, and in particular T and U.”

Note that neither of these are “mega” in terms of “require lots of funding”, although they both seem relatively easy to scale up with more money: simply pay more people to do more of the thing. The Formalization Office can check more of the field’s work quicker by contracting more math grads, and Mathopedia can cover more fields of mathematics and create more types of resources.

More-centrally-”mega” megaprojects have of course been proposed: Emulating the brains of top alignment researchers, developing the enhancement technologies we discussed above, and many more I’m forgetting. Some of these are bigger “slam-dunks” than the others, but any of these (if successful) could move the needle on P(doom).

trevor

All sorts of milestones might surprise us by being closer than we think. Like the unpredictability of AI capabilities advancements, but each nugget is good news instead of bad news.

I like Mathopedia- I think we can get way better at skilling people up, so that our ability to pump out quantitative thinkers is based more on their inherent ability to intelligently/​creatively deploy math, rather than their inherent pace and comfort with learning the math for the first time. The process of learning math has generally been pretty needlessly soulkilling in our civilization, and I don’t think people realize how big it would be if we fixed that and went in a historically unprecedented direction.

The official term for “spamming examples and explanations to streamline math learning” is called “Intuition flooding”, as referenced here. You can brute force concepts into yourself or someone else with helpful examples, if you see enough examples then the underlying patterns will either jump out of you, or be wordlessly/​implicitly/​subconsciously taken into account. More examples mean more patterns fitting together, and lower risk that you miss one of the key patterns that’s central to understanding the whole thing.

It has a ton of potential to increase the rate of learning by an order of magnitude, even if mainstream education started capitalizing on it a while ago. Especially if people like @Valentine are called upon to return from their cold sleep because the world needs them.

I think that AI safety can heavily boost the quant skills of most members. Someday, when the descendants of humanity have spread from star to star, they won’t tell the children about the history of Ancient Earth until they’re old enough to bear it, and when they learn, they’ll weep to hear that such a thing as “math trauma” had still existed at our tech level.

I think we have the drive, we have the capability, we have the pre-existing talent pools, and we have the ability to capitalize on that. We can convert the last of our smart thinkers into smart quant thinkers, and derive more than enough value from them to justify the process. Expanding the Bayesian Conspiracy, maybe there’s even a critical mass that we don’t know about because we haven’t gotten close yet.

NicholasKross

(Exact-wording nitpick: Logic and maths, themselves, still seem to involve a lot of things that would seem from afar as “wordcel”. )

But broadly I agree: Our community not only has lots of quantitative thinkers (which you can find in other “nerdy” communities), but itself brings unusually quantitative approaches to things, from Bayes’ Rule itself to prediction markets.

The trick now is to extend from “normal-level quantitative thinking” to “advanced mathematical thinking”.

I frame this as going from “Scott-Alexander-level” maths familiarity (statistics, probability, birds-eye-view knowledge of linear algebra, things that 3Blue1Brown makes videos about) to “maths-PhD-level” familiarity (real analysis, topology, category theory, most fields of mathematics at the research level, and not just sort-of knowing what these are but being able to fluently work with them at the cutting edge).

trevor

Absolutely. Such a huge problem with prediction markets right now is the terrible wheat-to-chaff ratio. I think that prediction markets need a critical mass of talent before they can become incredible, and that we might be really far from reaching that critical mass.

I can’t wait to see what else we get if there are other critical masses that we have yet to reach; maybe there are other innovations on the same level as prediction markets themselves, and we just don’t have enough talent to see them coming.

Just imagine where we would be by now if, 10 years ago, we had five times as many people in EA and the AI safety community who were at or near the “maths-PhD-level”. We might have been doing solid research on human uploads.

dath ilan

trevor

What about the Megaproject of “moving most AI safety people to the Bay Area”? Getting everyone in one place seems like a good idea, similar to dath ilan. More in-person conversations, different kinds of thinkers working on the same whiteboard, etc.

Bay Area people have been setting up group houses, buying the Rose Garden Inn and conference venues. There’s obvious problems e.g. various social dynamics, but the groundwork (the land) has been laid.

My thinking is that if we had more people just realized “wow, billions of people are going to die, my family and friends are going to die”, they’d be more comfortable with Spartan lifestyles, except with galaxy-brained ways to maximize the number of people working in one place (e.g. these Capsule hotels give a sense of what can be discovered with sufficiently outside-the-box thinking). That lowers the cost of people visiting for just a month, which provides intense value for things like sourcing talent for specific projects:

Capsule Hotels in Tokyo: How to Experience Them | Tokyo Cheapo
Portable?

The global situation with AI really is plenty harsh enough to justify outside-the-box solutions like this; according to this joke tweet from Yudkowsky:

my friends who are parents don’t seem very impressed with my latest theory of childrearing, that you should start out your kids in a giant dirt pit under regular attack by robotic velociraptors… so that afterwards ordinary Earth standards of living will seem to them like supreme luxury because their baseline brain expectations will have been set very low

Maybe not everyone is OK with the EA ascetic/​extremophile lifestyle, but spending a year like that (while your talent is being evaluated) isn’t too harsh, it’s a great way to increase the talent per square foot, and is a great way of filtering people who are serious about saving the world. Those are the things that have worked for me, anyway; there are tons of options to mitigate the harm caused by San Francisco’s layout and prices, including building a space-optimized city within a city.

Better options for people to physically move to a better space also seems like an important part of the solution for a really interesting talent-sourcing problem raised by Habryka about a week ago:

The world is adversarial in the sense that if you are smart and competent, there are large numbers of people and institutions optimizing to get you to do things that are advantageous to them, ignoring your personal interests.

Most smart people “get got” and end up orienting their lives around some random thing they don’t even care about that much, because they’ve gotten their OODA loop captured by some social environment that makes it hard for them to understand what is going on or learn much about what they actually want to do with their lives.

I think that being anchored a specific space, e.g. a house and community in Michigan or Miami, is exactly the kind of thing that would suppress people’s ability to think about the global situation or decide what they want to do about it, same as a high-paying and interesting job at Facebook or Wall Street or maybe even OpenAI. A sleepy boring life is an attractor state.

NicholasKross

This kind of sparked a tangent I thought of, but I think it’s an important point for AI safety:

A high percent of people in rationality/​EA/​AI alignment have mental and emotional problems.

We should not actually force autistic and ADHD people to share bathrooms in order to save the world. We should instead cultivate and treat researchers with methods used for top quant traders, top Google programmers, and top basketball players.

We want more people getting involved in alignment research, and we should not want to add arbitrary roadblocks, even if (as a completely made-up position that hopefully nobody has) we justify it with “oh, we need costly signaling to avoid grifters”.

It’s good for Newton to learn and rederive lots of the advanced math of his time period. It’s good for Einstein to be “agenty” enough to think strategically about which lines of research are more promising.

It’s bad for Newton to have to squeeze his own ink and chop down his own trees to make notebooks. It’s bad to require Einstein to be the project manager the telescope/​observatory whose observations are used to test relativity.

The field doesn’t get to skip funding “the weirdos”. You don’t actually get to leap from “Yudkowsky-type folks care about being ‘agentic’” to “alignment researchers need lots of executive function or we won’t fund them”.

It’s like we (correctly) noticed “hey, our community is bad at executive-function-type things because we have individual-level executive-function issues”, and then (incorrectly) deduced “needle-moving scientists will be high-executive-functioning people (rather than the semi-overspecialized geniuses who debatably make most breakthroughs in general)”, which then tempts us towards “Why should we have to recruit people? Or train them, for that matter? If they’re smart/​high-executive-function enough, they’ll find their way here”.

(It’s one thing if your timelines actually are too short for that to work, but if you have $20M to spend, you should hedge the timeline by spending some percent of your budget on sharpen-the-saw activities.)

Optimally, the community’s “story” would be “Yudkowsky expended lots of agenty-ness/​executive function in the early 2000s to create and fundraise for MIRI, then MIRI trained and funded an ecosystem of good work”. Instead, the story is “that first step happened, then MIRI almost exclusively did in-house things very slowly for years, then OpenAI et al came along, and now most alignment funding is going towards things that the big lab companies already want to do”. We could talk about hindsight (as with any past decision anyone’s ever made) and beliefs-about-scaling, but that’s kind of where my “hedging” point comes in above.

(I have more thoughts on how people get confused on how this works, w.r.t. scientific breakthroughs, but I haven’t yet finished reading and processing Originality vs. Correctness, which is (so far) a useful summary of people’s competing confused views.)

trevor

Options are still good, and what I suggested is highly valuable as an option for people to take, but it’s also important to keep Moloch in mind; we wouldn’t want the equilibria to shift far enough towards the spartan lifestyle that people end up pressured to go that route (e.g. because it’s cheaper and therefore scales better), especially if the spartan lifestyle is harmful or abhorrent to our best thinkers.

I still think that, as a rule of thumb, it makes sense to look at ourselves as probably less than ~20-30 years out from the velociraptors. People should look at themselves and their place in the world through multiple lenses, and the “doomer” lens should be one of them based on the calculated probability of it being true, not how weird it looks to others. I think creating a little slice of dath ilan is a sensible response to that.

But it’s also the case that under extreme circumstances, damaging people’s mental health turns them into a risk to themselves and the people around them, and also to AI safety as a whole. And there’s also journalists at play, always looking for material, but I think people will find clever ways to improve the aesthetic of these ascetics.

We probably disagree on whether it ought to be ~4% or ~40% who do this; I think we could do a separate dialogue debating Spartanism later. You’ve convinced me that it shouldn’t be above ~40%.

“Why should we have to recruit people? Or train them, for that matter? If they’re smart/​high-executive-function enough, they’ll find their way here”.

I totally agree. This was bad practice, we could have gotten more people, better people (by whatever definition of value e.g. cooperative capabilities). We could have gone faster, we could have had so much more to work with by now.

Figuring out things as we go along is just critical to the process; my thinking is that accelerating the intelligence, coordination, and scale of the AI safety community will also produce people prepared to find galaxy-brained ways to figure out things as they go along. It’s not easy, tearing down Chesterton-Schelling fences has unpredictable results, and as primates, humans are predisposed to concealing problems as they emerge. But great posts like Zvi’s Immoral Mazes and Simler’s Social Status will make it easier to anticipate and mitigate the growing pains.

On the topic of executive functioning, I also think that’s a huge area of improvement for AI safety. Humans beings are generally meandering about, aimlessly, and we could get way more out of the AI safety community by increasing everyone’s daily production (via the multiplier effect, since people are contributing to each other’s work).

There’s the obvious stuff; everyone reads the CFAR handbook, and seriously attempts the techniques inside (even if 50% of them don’t work). There’s other instruction manuals for the human brain, like Tuning Cognitive Strategies and Raemon’s Feedbackloopfirst rationality. There’s reading the Sequences, because being wrong about something important is an absolutely crippling failure mode, and humans are incredibly predisposed to becoming and remaining wrong about something important for like 10 years.

But I’d argue that a really big part of it is that people are dithering about and not spending their days thinking.

People should not have “shower thoughts”. Or rather, they shouldn’t be noticeable or distinct. In the 90s and 2000s people would zone out and have “shower thoughts” while reading books, the extropy email list, and sometimes even watching TV. Social media and other commercial entertainment both inhibit independent thought and worm their way into every second of free time you have; the addiction is optimized to make people return to the platform, not stay on it, which means the algorithms will be finding weird ways (e.g. skinner box) to optimize

I think this is actually a huge problem, especially for our best thinkers. People need to already be spending 2+ hours a day distraction-free, in order to see if the results from things like hardcore meditation and the CFAR handbook are actually coming from cognitive enhancement. It’s possible that a funky meditation technique is merely preventing media use, forcing you to make the correct decision to be alone with your thoughts, the way long showers or driving do.

I also think this might be a big reason why CFAR often didn’t get the results they were hoping for; people need to change their routines from their previous suboptimal states in order to get better at thinking, and the current media paradigm is heavily incentivized to combine AI with massive amounts of behavior data to optimize for stabilizing people’s daily routines, if nothing else to control for variables in order to improve data quality.

Like, if a martian came to Earth and saw someone scrolling on their phone and said:

Martian: “how do you not notice the danger!? It’s so obvious that scrolling news feeds are an optimal environment for hacking the human brain!”
Human: “I don’t know, my brain sort of turns off and I lose self-awareness while I do it”
Martian: “That is unbelievably suspicious. Shouldn’t that alone have been enough to clue you in that something’s going terribly wrong?”
Human: “Oh come on. All my friends are doing it. Everyone is doing it! I need this in order to know what’s popular! Only [low-status people] complain about it.”

At which point the martian flips the table and leaves.

Many of our best people will be spending most of their thinking time scrolling social media, or other hyperoptimized attention grabbers, because those are attractor states. And if instant gratification is real then this will also be immensely harming their ability to think at all. Either way, most of their best thoughts will be in the shower or in the car.

Social media detox is hard because the algorithm may or may not be arriving at skinner-box-like strategies to optimize to maximize the feeling of emptiness when you aren’t using social media. It’s an adversarial environment, so I think that trying to wean yourself off is initiating a battle that you’re not gonna win, as the system doesn’t just understand your mind in ways you don’t, but it understands many people’s minds in ways they don’t, and it experienced winning against them and it knows how to find ways that you’re similar and different from the other cases.

But one way or another, if you stop having “shower thoughts” when you’re in the shower or driving, that’s how you know you’ve done enough. This seems like one of the biggest ways to uplift the AI safety community.

When we’re sourcing new talent, the 18-22 year olds, we’re the only ones who have a purpose sufficiently worthy to actually persuade them to spend time thinking instead of on entertainment media. Nobody else sourcing 18-22 year olds can do that, except the military.

It’s a big lifestyle change, but getting smarter clearly pays for itself, when you’re part of something like AI safety that’s both coherent and that actually matters. And frankly, it’s nothing compared to the velociraptors.

Is that all we’ve got about megaprojects for now?

NicholasKross

Sure, let’s move to the next topic.

Adaptability

“In times of change, learners inherit the earth, while the learned find themselves beautifully equipped to deal with a world that no longer exists”

—Eric Hoffer

trevor

Alright. A lot of people are talking about utilizing slow takeoff to help solve alignment. Looking closely at the current situation with social media and the use of AI for information warfare has really given me a sense of perspective about what’s possible during slow takeoff, except more on the AI governance angle, and AI transforming society in general.

The last four years were all over the place, COVID and a new Cold-War-like paradigm in international relations; ChatGPT hasn’t changed much yet but I don’t see that holding, especially with open source LLMs. And, worst of all, AI acceleration and turmoil within AI labs can suddenly burn a portion of the remaining timeline, by hard-to-pin-down amounts.

My thinking is that we need people better at adapting to new circumstances as they emerge, muddling through those terrible things that arise due to increasing optimization power arriving on earth, being flexible but also intelligent enough to find galaxy-brained solutions to otherwise-insurmountable problems.

If AI applications allow big tech companies like Microsoft or governments to expand persuasion tech substantially beyond social media, for example, how do we roll with the punches if the punches use hyperoptimized psychology to attack us at the very core of our being?

Reducing cognitive and emotional predictability hardens the AI safety community against that specific threat, but getting better at pumping out dozens of galaxy-brained solutions per day will do a much better job of reducing predictability, and also increasing capabilities and resilience in all kinds of ways. Like with the growing pains, except external threats instead of internal threats.

That will boost the AI safety community’s ability to adapt and thrive and ride the wave of all the other terrible things that emerge during slow takeoff, not just the new human manipulation paradigm that works on anyone which is just one example.

Katja Grace characterized accelerating AI as a “fire hose of new cognitive labor, and it’s sort of unclear where it will go”. And I think that’s a really good way to put it. All sorts of things can end up being pumped out of this.

I’m thinking that there will be 0-2 more “black balls” just like human manipulation tech, that are also crushing, but in completely different ways, yet nonetheless we must survive and thrive.

I think that current-gen human manipulation tech is a fantastic example of just how far out the curve balls will be, intensely disorienting to us, and how traditional power often pivots around them. But maybe human manipulation tech is an unusually severe/​extreme case, and we probably won’t get 0-2 more “black balls” from slow takeoff that are as extremely terrible as manipulation tech.

That’s not where I’m betting though, especially lie detection tech, which only has its negative reputation due to having been around for ~100 years in a civilization with no ML (they were named “polygraph” tests because they would literally generate multiple graphs, and a person had to eyeball them all at once!)

Uplifting the AI safety community, such that galaxy-brained solutions are pumped out at a sufficient pace to mitigate damage, and even capitalize on opportunities or anticipate threats before they happen, seems like the best way to resolve these challenges.

NicholasKross

getting better at pumping out dozens of galaxy-brained solutions per day

This sounds slightly insane as-written, what do you mean by this?

NicholasKross

(I broadly agree with the rest of this point, with the extra note that “accelerating alignment and governance” can itself help “decelerate” AI capabilities, as our community gets better at pointing out and working against unsafe AI development.)

trevor

Yeah, by “pumping out dozens of galaxy-brained solutions per day” I just mean people are better at thinking of solutions that work really well. Some of these solutions will be complicated, some will be elegant, some will be brutally effective, others will allow us to muddle through impossible-seeming situations and mitigate immense amounts of damage.

Smarter people will make us better at doing this. I think Anna Salamon’s Humans are not automatically strategic describes this incredibly well. What would a superintelligence do? In our current world, a big part of that is converting your mind’s ability to do abstract reasoning into actually getting up and taking the best possible actions, every hour and day (including deliberate rest, preventing burnout, and other things that cause positive reinforcement and prevent negative reinforcement).

In the recent words of Scott Alexander, a big problem with Effective Altruism and AI Safety today is:

ACTUALLY DO THESE THINGS! DON’T JUST WRITE ESSAYS SAYING THEY’RE “OBVIOUS” BUT THEN NOT DO THEM!

In the transformative world, the best thing to do might be different from agency and motivation hacks, maybe more about getting things right the first time under extreme circumstances, like when the protagonist in that scene had the foresight and mental agility to grab the drone at the first opportunity, even though something incredibly awful was happening to him just moments before. But we’ll need to be flexible and intelligent either way.

This might seem radical or extreme. But if we get the velociraptors in ~a decade or three, we’ll look back on our lives, and realize that the only deranged thing we did was not doing enough. It will be obvious, in hindsight, that the times we tried harder and made mistakes were far better than the time we spent barely trying at all.

Extracting much more value from the Sequences

trevor

What’s your thoughts on increasing the readership rates of Yudkowsky’s Sequences, the CFAR handbook, Scott Alexander’s Codex, Projectlawful/​HPMOR, Tuning cognitive strategies/​feedbackloopfirst rationality, etc? What’s the bottleneck preventing people from reading and growing from these, and what can we do about that bottleneck?

They offer so many unique ways to get better habits into your brain so you can go out and get what you want out of life, instead of being confused and constantly betraying yourself; just like getting groups of people into better Nash Equilibria where they aren’t constantly betraying each other.

Intelligence amplification technology is great, but a superintelligent AI could write a manual that uses true knowledge to show you how to exploit the full reaches of your mind, probably a couple orders of magnitude more effectively than the CFAR handbook.

And stuff like the CFAR handbook is not only something we already have, it’s something that the entire human race has barely bothered to try at, for the entirety of human history. We don’t know how big the gap is between the CFAR Handbook and a superintelligence-written true-instruction-manual-for-being-a-human-brain that lets us think optimally to maximize results.

That gap might be very small, maybe we need ten smart people working on it instead of five, maybe five was more than enough but they didn’t have SquirrelInHell’s Tuning Your Cognitive Strategies to work off of, and now they do. Maybe all five used social media and other entertainment media so frequently that they lost the ability to measure effectiveness of independent thought. Everything in the AI safety community boils down to inadequacy and growing pains.

I’d argue that a big part of the value from intelligence augmentation tech just comes from making people better at applying the advice that’s already described by the Sequences, CFAR handbook, and the others.

I think a big part of it might be the lack of positive reinforcement, it’s basically work, e.g. like this thing Yudkowsky retweeted:

NicholasKross

Oooooh, I have a lot of thoughts on this! (Mainly from having a hobbyist interest in making YouTube videos and other art, which led into a longtime fixation with “How do pieces of media get popular?”)

A key mental model here is an “attention funnel”, much like the purchase “funnel” used by advertisers. Let’s run through the funnel with a YouTube video:

  1. You see a thumbnail and title.

  2. If interested, you click on the video and start watching it.

  3. If the video’s boring or grating or thereabouts, you stop watching the video (e.g. by closing the browser tab). Otherwise, you keep watching.

  4. If the video ever gets to be boring/​grating/​etc, you’re more likely to stop watching the video.

A similar thing happens with fanfiction:

  1. You see a title (or hear about it from a friend).

  2. If interested, you search/​click the fic, and maybe read the summary.

  3. If it’s interesting (compelling, holds your attention, etc), you keep reading. Otherwise you stop. Maybe you come back later, but we’re focusing on “your attention-driven decisions”.

HPMOR and other rationalist works are (to a certain kind of personality) highly compelling! They often have dramatic pacing, short chapters/​sections, and the things that subjectively go into “good writing” (even at the micro level of diverse sentence structures, or precision in wording). Because they’re stories (and/​or forceful polemics), they usually contain the emotional-loading that humans enjoy.

To get back to the central point: I think the bottleneck is the top of the funnel right now. Everyone’s heard of HP, but “only” HP fanfic nerds (and/​or existing rationalists) have heard of HPMOR. The Sequences and related works have even narrower funnel-tops.

There are ways this could be done more or less efficiently/​directly, but RationalAnimations is a decent example of “basic AI alignment ideas can be presented compellingly-enough to reach lots of people”.

If you’re a “creative type” in the community… well, to whatever degree artistic realms can have Hamming Questions, you might enjoy exploring “What kinds of rationalist/​AI-alignment/​EA media could be made, that would reach a lot of people?”

The other bottleneck (and, under shorter timelines, probably the more important one), seems to be the “bottom of the attention funnel, but top of the AI alignment funnel”. Like, plenty of people have resigned themselves to being “words and stats people”, and think they can’t join “ML and/​or maths people”.

If I could magically get 20% more EAs into full-time AI alignment, I would! (Well, modulo caveats about what type of research they do.)

trevor

And if I could get everyone in AI alignment to spend 2 hours a week thinking about solutions for AI governance, that would be great too.

In response to the OpenAI conflict, Yudkowsky wrote:

The Twitterati are vastly overestimating how much focus I spend on individual AI companies. I currently see their collective as an inevitable downhill slide, whose leaders could not change anything even if they wanted to. It’s international treaties or bust.

And for me that was a pretty big update in favor of tapping into the alignment researcher’s talent base to solve the impossible-seeming global problems like US-China coordination on AI issues, which I’ve been researching for years without knowing that AI governance would become so significant relative to all other work areas.

Like, plenty of people have resigned themselves to being “words and stats people”, and think they can’t join “ML and/​or maths people”.

Yes, I totally agree on the maths issue. I think it goes both ways though; the impression I’ve gotten from spending years in DC is that the people here rarely have the quant skills to apply Bayesian thinking to AI governance at all. But it might make more sense to have the AI governance people get the alignment researchers in the Bay Area up to speed so that the alignment researchers can think up galaxy-brained solutions to the problems, rather than trying to boost the quant skills of the AI governance people so they can start playing catch-up with the people in the bay area who have been pioneering Bayesian thinking for decades.

James Medlock on X: "There was graffiti in a bathroom stall I frequented  that said something like “specialization is for insects. I AM NOT AN ANT!”  I think it

I think the bottleneck is the top of the funnel right now. Everyone’s heard of HP, but “only” HP fanfic nerds (and/​or existing rationalists) have heard of HPMOR. The Sequences and related works have even narrower funnel-tops.

There are ways this could be done more or less efficiently/​directly, but RationalAnimations is a decent example of “basic AI alignment ideas can be presented compellingly-enough to reach lots of people”.

I agree here. When I imagine my 2014 or 2013 self, before I had heard of AI safety, I can easily imagine myself watching a 10 minute video and thinking “jesus christ” and bookmarking it. And then, later on, if I was the type of person to be pragmatic making the world a better place, I’d be more likely to remember it again in a month or a week or so, I’d be like “wait, what the fuck was up with that?” and watch it again. Ideally, people would read the Sequences or WWOTF and there wouldn’t be any other onramp for AI safety, but that’s not the world we live in.

Maybe the podcasts that Yud and others are doing is a better version of that, better at filtering for smart insightful people due to covering a lot of ground on interesting futurist topics, or better for intuition flooding making the current situation intuitively understandable by approaching it from a ton of different angles, e.g. spending 10 minutes chatting about theory of mind and then another 10 on instrumental convergence and then another 10 on AI race economics.

I was actually thinking more about increasing the Sequences and CFAR handbook reading rate within AI Safety, rather than getting the rest of the world on board. That’s also a huge issue though, and inextricably linked to getting the new talent required for the AI safety community to thrive at all.

I think that rereading the Sequences, for example, could have averted a lot of harm, a la this tweet (I have https://​​twitter.com/​​ESYudkowsky bookmarked and you should too):

I think that rereading the Sequences every 5-10 years, or even more frequently, makes sense for most people doing AI safety, but there’s a better way (and we should really be getting deep into the mindset of finding a better way).

I think that we should be reading the Sequences like the Bible. Specifically, reading various Yudkowsky and other great rationalist quotes every day, like Bible quotes. There’s tons of apps, websites, newsletters, and paper calendars that give you one bible quote per day.

There’s some pretty decent stuff in the Bible and those quotes were searched and found by the optimization pressure of lots of people looking. It works very well, for millions of non-priests. But Yudkowsky’s writings are far better for this than the Bible.

It would plausibly make sense to have some kind of feed that gives people random Yudkowsky quotes every day.

This is similar to reading the Sequences in random order, which I also highly recommend; if you look at entertainment media or social media after drinking coffee in the morning, you are stacking two strong addiction/​behavior reinforcing things, and you should really substitute the media with reading a randomly selected Sequence so your brain associates that with the rush instead.

But aside from the badass EA ascetic extremophiles, most people want to do quick and easy things, hence using great rationalist quotes like Bible quotes. It’s a great way to start the day, or in the middle of the day.

There could be different quotes per person per day, in order to maximize the multiplier effect via people being boosted in unique ways and then talking with each other; of course, this requires a large enough portion of AI safety would do this, and for the AI safety community this is like herding cats.

Meanwhile, the incredible CFAR handbook sequence is somewhat long (~20,000 words) but explains difficult and important concepts really well, and the Hammertime sequence is much shorter and reread-able, but that comes with the cost of weaker intuitiveness and making it easier to forget.

Ideally someone like Duncan could do a really great job creating a middle ground between the two, although it’s possible that the CFAR handbook sequence was already pretty intensely shortened given the complexity of the underlying concepts. Duncan’s writing and explanations are pretty great, and my prior probability is that I’d have a hard time preserving the value if I tried to distill it to ~75% of the original length.

Converting a shortened version into flashcards would conceivably be very valuable, but the main thing is that people just need to read it, and get their friends to read it, because they’re all still well worth reading, even at their current length.

When people debate the value or effectiveness of the Sequences, they’re generally thinking, like, slap the book down in front of someone and tell them to read it. Maybe see what they’re like 5-10 years after finishing. But the bigger question is whether there’s ways to do this orders of magnitude more effectively.

NicholasKross

I actually reread useful/​mental-model-helping segments of posts and books, but I think I’m an outlier in this community.

If we’re really going on the “Bible quotes”(!?!?!) analogous setup, we should look more at “sequences of bite-sized things on the same topic”. Like “sequence slices” or “quote sequences” or “subsection sequences” or “passage tags”. E.g. every CFAR Handbook passage about cruxes. Or every Yudkowsky quote about complexity of value.

On the one hand, this can seem like “oversimplifying/​shallow-ifying rationalist writings”, at the expense of deeper reading. On the other hand, quotes can be a gateway to reading the full posts. (And, if we’re being honest, better rationalist-leaning decision making often starts with “remembering a relevant quote”, even if the quote is “Think for yourself!” or “Think for 5 minutes on the clock”.)

trevor

I actually reread useful/​mental-model-helping segments of posts and books, but I think I’m an outlier in this community.

That should definitely not be an outlier thing. I had Yudkowsky’s “Politics is the mind killer” post bookmarked for years after I started skilling up for AI policy. I would have been way ahead by now if I had known about Yud’s entire political prerequisites and read one of those per month. This stuff is fantastic reference material.

Absolutely, so much of it is just prompting people so they go back to taking the advice and reasserting control over their lives. It’s frankly quite disturbing to see people slowly regress back to the mediocrity of the daily life of the average American/​European; the gradual loss of your newfound agency really is a tide that you have to fight. Get the Moloch out of your brain and keep it out, and also get the Moloch out of your groups.

At the same time, getting people into groups where they can keep each other going, that’s also hard. Humans are primates, which means we have a drive to vanquish rivals; for us, that often means taking important concepts and turning them into opportunities to gain advantage over your fellow humans (including weaponizing that concept itself to throw accusations at rivals, e.g. “person X is bad for AI safety”).

There’s so many options, there’s basically choice overload. Nothing here is easy to solve (anything easy to solve probably would have already been resolved e.g. as growing pains), but there’s still tons of low hanging fruit that’s well worth working on (or at least spending an hour a week deliberately thinking about).

And this is the kind of stuff that we really should be doing now, starting when we’re ahead, when there’s still room for growth, rather than when the velociraptors are at the gates.

NicholasKross

Agreed.

To cap this off, we’ve found quite a few paths to “uplifting” or “upgrading” the AI alignment and governance communities. Many of these options are disjunctive (if nootropics keeps failing, neuromodulation might work). Some of them are even cheap and easy to do by yourself (like Sequences quote collections).

The community has been around for well over a decade, and we still see potential low-hanging fruit. You could see this as “wow, our community is really inadequate”, and that’s often true. But I also see it as “wow, there’s so many things that haven’t been EMH’d away, and they could help AI safety!”.

Of course, this relies on people with free time (and money) to experiment with these. If someone finds something, they could find something that moves the needle on P(doom). That seems worth some trial and error!

trevor

Absolutely. Also, I think the situation could probably improve substantially if everyone could set aside >1 hour a week seriously thinking about the question “In light of recent events and new information, how can the AI safety community get better?”

(with a focus on using new technology to solve major problems that were previously assumed to be intractable, e.g. something like lie detectors, and improve things for everyone, rather than intimidating specific people or groups into giving ground or redistributing resources)

Setting aside one hour a week to think about something important (without disrupting focus) would be less than 1% of most people’s waking hours. It’s probably the kind of thing where most of us will end up looking back and wishing that we did a lot more of it, and sooner, before things got hot.