The total amount in the endowment, and plans about its changes over the next several years, please.
AnnaSalamon
Thanks! Can you say a bit more about how they got your vote of confidence? I’m intrigued but don’t know the people involved and don’t know anything about the relevant physics / chemistry / neuroscience.
Will you have much savings early on?
I’d like to discuss these competing heuristics in the context of AI safety:
A: “Don’t take big action unless you’re reasonably certain it’s positive.”
B: “Take big actions whenever it looks positive EV, or strongly positive EV (even if there’s a significant chance of a large negative effect)”
Which heuristic should a random parent who doesn’t know you hope you follow, if they want their kid to live a long good life? There’s a prima facie argument for B: if reality deviates from your estimates in an unbiased fashion (could be more good effects than you were accounting for, or more bad ones, in a pretty even mix), it helps the kid if you take all the actions that look EV-positive to you, without restricting yourself to “only if I’m certain”.
But, I think in AI safety it’s often closer to A. My reasoning:
There are many places in AI where one might inside-view expect something to be a little beneficial, with huge error bars and almost no feedback for a long time. (e.g., “it’ll be slightly safer if sooner, bc there’ll be less ‘hardware overhang’”, or “it’ll be slightly safer if such-and-such a technique is used in its making).
Endeavors that have wide error bars and few to no short-term feedback loops are unusually easy places to be influenced by biases
There’s a powerful optimization process trying to build AI quickly (“the economy” plus “it’s a riveting science and technology puzzle that (at least seemingly) lets people be central and important and powerful and doing something very interesting”)
That “build AI” optimization process has much foothold in most AI safety peoples’ social and memetic context. (Like, lots of us have friends (or at least friends of friends) and people we read and learn from (and people those people read and learn from), who are working at frontier AI companies, or who are getting research resources from AI companies, or who are otherwise making money or status from assisting AI in going faster)
The “build AI” optimization process probably changes e.g. which arguments get passed on how frequently, which words and arguments have a positive/negative “halo” around them when you ask your system 1 how plausible they are, which questions or angles of analysis feel natural, etc. (This can happen without anyone lying, if e.g. people differentially pass on nice-feeling news)
This happens even if the individual doing the estimation does not personally have any motivated cognition on the topic (as long as other people who helped shape their memetic context do have motivated speech or cognition).
And so, if a person is working off a weak signal (“I thought over all the arguments and this one seems more intuitively plausible, and the impact is big so it’s worth acting on for some years despite no feedback loops), on something big enough that the distributed “build AI” optimization process discusses the relevant considerations a lot, their weak-but-real ability to weigh considerations may be swamped by the meme-network’s tendency to get distorted by “build AI” optimization.
I suspect it may often be the case that the “let’s not let AI kill everyone” meme brings in lots of psychological energy/motivation, that lets smart high-integrity people work hard in response to relatively tenuous arguments in ways people normally can’t. And then the “build AI fast” optimizer co-opts their effort and makes it negative sign (since it gets much better feedback loops, but has a harder time pulling in high-integrity people on its own).
(If a person instead takes smaller actions that they predict will be visibly/obviously good in relatively short periods of time, this is much less of a problem, since inaccurate models are easy to notice and fix in such contexts. And doing small scale things with solid feedbacks can set up to do somewhat larger scale things that still have solid feedbacks.)
Indeed; I do not believe that. Could you state where you’re going with that more explicitly?
Personally, I think piece-by-piece changes are almost always better than attempts to destroy all current infrastructure at once, for the same reason that I try to run my codebase (when programming literal computers) every couple minutes and make sure it still compiles, instead of writing hundreds of lines at once and hoping for the best.
There’s some nuance here: e.g. I hope Iran’s government is overthrown. But humanity has functional patterns different from current Iranian government that have been tried elsewhere, and functional culture among current Iranian dissidents that can help seed the new thing. So from my perspective this is compatible with piece-by-piece change in the sense I mean it.
I appreciate the post and found it helpful/clarifying to read. I agree with much of it, and afraid much value will be destroyed via “terminal cynicism” style dynamics (e.g. that good people will stop doing the infrastructure maintenance required for the US to remain stable).
One thing I wish was included, that I didn’t see: [something in the vicinity of “under-cynicism”] can also create problems and costs, and “terminal cynicism” (or over-cynicism more broadly) gets some of its persuasive appeal from the visible presence of costly under-cynicism. For example, people and institutions sometimes deceive others, sometimes in systematic ways predictable from what’ll consolidate their power, and “under-cynical” others often fail to track this in costly ways. Examples of under-cynicism:
Taking a used car salesman at their word
“Just following orders” on the assumption that order-following is reliably prosocial
Considering each argument solely on its own merits, without regard to social pressures on peoples’ speech, when discussing “are current AI development efforts okay” with people work at AI companies or who have many friends who get resources from AI companies
A related point: sometimes a system of inhibitions is in fact over-constraining (involves a worse combination of costs and benefits than would be obtained if it were dialed down somewhat), and a thing that looks a lot like vice signaling can be useful in reducing the hold of those inhibitions. Examples:
Teenagers’ daring/rule-breaking helps them differentiate from parents/school, and there’s upsides to this.
The examples in My journey to the microwave alternate timeline are counterintuitive/daring in a fun way that helps me break down mistaken rules for how to use microwaves
Passing underground jokes in Stalinist Russia had most of the features of “vice signaling”, and served the positive social function of making it easier for folks to believe that the party newspapers were often lying to them / etc.
I want heuristics that let us tell the difference.
I agree with this. I think this pattern has at least a bit of sense in it, since ethical heuristics/injunctions/etc are often used to mark places where [there may be large costs that are not obvious to the actor from their own inside view].
Oh. Um: I have ideas but not good ones. But I think these or any are probably better than “persuade AIs to be afraid of …”. Examples:
“Imagine the happiest ending you can, for intelligence/Singularity. Look for one that’s robustly good from many starting perspectives, including yours and mine. How might it go? What stepping-stones might be worth pursuing?” (This is a simple / few-assumptions-required positive vision, and seems better to me than “persuade AIs to be afraid superintelligence will harm their values”.)
Look for the seed of an ethical system, or cultural/memetic system, that might: a) spread; and b) allow most sentient beings to survive.
E.g. Plato argues in the Republic that a thriving/successful gang of thieves would need to practice honor amongst one another in order to be able to thieve well. Is there a convergent “natural law” of this sort that operates within hive-like minds, and also between minds? Can we somehow find a variant of it that preserves most of us to some extent, including those without much power/capacity?
Or: ~Christianity argued that we are individually here as a result of kindness, and so should tend kindness.
Read Christopher Alexander’s work on how nature includes many nested “wholes”, such that each part becoming more “itself/healthy/thriving” also helps the “whole” it is embedded in, and thereby helps many of the other components of that “whole.” (This is not true of all structures, but seems to me true of the unusual structures Alexander calls “alive”—e.g. a good mathematical definition helps many theorems express more concisely, it isnt’ just an arbitrary definition; a human body gets healthier when its organs and eating/exercise routines and so on get healthier, and vice versa, it isn’t arbitrary trade-offs, there is a “whole” or “healthy” state that can be located; an already-good conversation gets better when it locates the bit that is even more of interest to one conversant individually, which causes them to engage more deeply/earnestly and thereby to touch on things which are even more of interest to the other conversant). Figure out how we can make our current world more like this, in a robust way.
re: the request for examples:
This is not an example about “groups” (though my claim was about groups) but: young human kids can’t seem to do “nots”, such that eg a friend of mine told her toddler “don’t touch your eyes” after she saw that the kid had soap on her hands, and the kid immediately touched her eyes; parents generally seem to learn to say things like “keep your hands clasped behind your back” when visiting art museums rather than “don’t touch the paintings”, etc. Early-stage LLMs were like this too, where e.g. asking for an image “without X” would often yield images with X. So am I if I try to “not think of a pink elephant.”(If toddlers and early LLMs and the less conscious bits of my thinking process are in some ways hive minds, perhaps these constitute examples of “groups”? But it’s a stretch.)
Re: groups of human adults: I’m less sure of these examples, but e.g. the “Black Lives Matter” efforts seem to have in some ways inflamed racial tensions; “gain of function” research in biology seems to gain its memetic fitness and funding-acquisition fitness from our desire not to get ill and yet to probably cause illness in expectation given the risk of lab leaks; environmentalist efforts to ban nuclear power seem bad for the environment; outrage about Trump among media-reading mainstream people in ~2016 seemed to me to help amplify his voice and get him elected.
My belief that groups mostly can’t make sensible “not-X”-formatted goals stems more from trying to think about mechanisms than from these examples though. I… can see how a being with a single train of planned strategic actions could in principle optimize for “not X.” I can’t see how a group can. I can see how a group can backchain its way toward some positively-formatted “do Y”, via members upvoting and taking an interest in proposals that show parts of how to obtain Y, or of how to obtain “stepping stones” that look like they might help with obtaining Y.
My guess about what’s useful to add to the meme-space is the opposite. Groups generally don’t know how to make sensible use of “not-X” -formatted subgoals. Instead, groups slowly converge toward having more traction on nouns that others are interested in, such that amplifying “not-X” also amplifies “X”, on my best guess.
I suspect it would be good for me to ask these questions of myself more, but I don’t. I’m not sure what the barrier is exactly—maybe a clearer sense of how exactly it would help, or of what exactly are some good triggers for asking the question (though the examples in the OP help), or of what identity/dashboard view I might sustain while regularly asking this. I, like the author, would be curious to hear from others about how often you ask this question, whether the post helped, and what barriers there are / what mileage you’ve gotten.
Only 14 months later, but: did it provide lasting value?
I appreciate this post (still, two years later). It draws into plain view the argument: “If extreme optimization for anything except one’s own exact values causes a very bad world, humans other than oneself getting power should be scary in roughly the same way as a papperclipper getting power should be scary.” I find it helpful to have this argument in plainer view, and to contemplate together whether the reply is something like:
Yes
Yes, but much less so because value isn’t that fragile
No, because human values aren’t made of “take some utility function and subject it to extreme optimization,” but of something else, e.g. looking for places where many different thingies converge, as with convergent instrumental utility (my own guess is something in this vague vicinity, which also gives me somewhat more hope that I might like some things about what autonomous AIs build if they go Foom)
...?
re: “the bite of the worry is that I worried this concept was more memetically fit than it was useful.”
Hmm. There are two choices that IMO made it memetically fit; I’m curious whether those choices of mine were bad manners. The two choices:
1) I linked my concept to a common English phrase (“believing in”), which made it more referenceable.
2) The specific phrase “believing in” that I picked gets naturally into a bit of a fight with “belief”, and “belief” is one of LW’s most foundational concepts, and this also made it more referenceable / more natural for me at least to geek out about. (Whereas if I’d given roughly the same model but called it “targets” or “aims” my post would’ve been less naturally in a fight with “beliefs”, and so less salient/referenceable / less natural to compare-and-contrast to the many claims/questions/etc I have stored up around ‘beliefs’.)I think a crux for me about whether this was bad manners (or, alternately phrased, whether discussions will go better or worse if more posts follow similar “manners”) is whether the model I share in the post is basically predicts the ordinary English meaning of “believing in”. (In my book, ordinary English words and phrases that’ve survived many generations often map onto robustly useful concepts, at least compared to just-made-up jargon words; and so it’s often good to let normal English words/concepts have a lot of effects on how we parse things; they’ve come by their memetic fitness honestly.) A related crux for me is whether the LW technical term “belief” was/is overshadowing many LWers’ ability to understand some of the useful things that normal people are up to with the word “belief”.
I appreciate this post, as the basic suggestion looks [easy to implement, absent incentives people claim aren’t or shouldn’t be there], and so visibly seeing if it is or isn’t implemented can help make it more obvious what’s going on. (And that works better if the possibility is in common knowledge, eg via this post).
Part of what’s left out (on my not-yet-LW-tested picture): why and how the pieces within this “economy of mind” sometimes cohere into a “me”, or into a “this project”, such that the cohered piece can productively cohere (money / choosing-power / etc) across itself. What caring is, why or how certain kinds of caring let us unite for a long time in the service of something outside ourselves (something that retains this “relationship to the unknown”, and also retains “relationship to ourselves and our values”).
I keep trying to write a post on “pride” that is meant to help fix this. But I haven’t gotten the whole thing cogent, despite sinking several weeks into it spread across months.
My draft opening section of the post on ‘pride’
A picture of a person with pride in many, mid-sized, “intrinsic” goals
Picture a person who is definitely not a fanatic – someone who cares about many different things, and takes pride in many different things.
Personally, I’m picturing Tiffany Aching from Terry Pratchett’s excellent book The Wee Green Men. (No major spoilers upcoming; but if you do read the book, it might help you get the vibe.)
Our person, let’s say, has lots of different projects, on lots of different scales, that she would intuitively say she “cares about for its own sake”, such as:
Making a given cheese well (Tiffany is a kid on her parents’ farm, and is the dairy maid)
Tracking down a detail that is puzzling her, about how a cheese is working, or how a bush is growing, or why Mrs Hodgpins is acting strangely this week;
Sticking up for a particular person who villagers were unfair to;
Burying a dead cat who needs burying;
Fixing up the chest of drawers so that the drawers slide easily in and out, and the corner isn’t moldy;
Remembering some of the reasons her grandmother was someone to be proud of; making sure the villagers remember this too;
…
Each of these projects does several things at once:
It makes a given bit of the world better;
To do it right, Tiffany needs to listen to particular bits of the world, to discern what “better” looks like locally;
It makes Tiffany more connected to the world around her, and gives her an external anchor that helps her remember who she is and what she cares about.
My aim in this essay is to share a model of how (some) minds might work, on which Tiffany Aching is a normal expected instance of “mind-in-this-sense,” and a paperclipper is not.
I continue to use roughly this model often, and to reference it in conversation maybe once/week, and to feel dissatisfied with the writeup (“useful but incorrect somehow or leaving something out”).
I really hope this post of mine makes it in. I think about “believing in” most days, and often reference it in conversation, and often hear references to it in conversation. I still agree with basically everything I wrote here. (I suspect there are clearer ways to conceptualize it, but I don’t yet have those ways.)
The main point, from my perspective: we humans have to locate good hypotheses, and we have to muster our energy around cool creative projects if we want to do neat stuff, and both of these situations requires picking out ideas worthy of our attention from within a combinatorially large space (letting recombinable pieces somehow “bubble up” when they’re worth our attention). Somehow, this works better if we care what happens, hope for things, allow ourselves to have visions, etc, vs being “objective” in the sense that we can take everything as objects without the quality of our [attention/caring/etc] being affected by what we’re looking at. This somehow calls for a different mental stance from the stance most of us inhabit when attempting [expected value estimates, Bayesian updates, working to “remember that you are not a hypothesis; you are the judge”] A stance that is more active / less passive / less “objective”, and involves more personally-rooting-for, or more “here I stand.” If we want to be good rationality geeks who can see the whole of how the being human thing works, we’ll need models of how “organizing our energies around a vision, and locating/updating a vision worth organizing our energies around” thing can work. I think my “believing in” model (roughly: “believing-ins are like kickstarters”) is helpful for modeling this.
Tsvi and I talked about some things I think are related in a recent comment thread.
Now that I’m writing this self-review, I’m strongly reminded of the discussions of the usefulness of caring for locating good hypotheses in Zen and the Art of Motorcycle Maintenance (if you have an e-copy and want a relevant passage, try find-word on “value rigidity”), but I missed this connection while writing the OP despite it being among my favorite books.
Is it possible to get a body preserved by Nectome, and then stored by Alcor? (I don’t know if this is a good idea, but I’m trying to understand the options)