(I work at MIRI.)News that SIAI received funds from Epstein actually came as a surprise to us. (We found out about this a few days before OP’s question went up.) Epstein had previously approached us in 2016 looking for organizations to donate to, and we decided against pursuing the option; we didn’t realize there was any previous interaction between MIRI/SIAI and Epstein or his foundations. The 2009 donation was brought to our attention when someone sent us a Miami Herald article that included SIAI in a spreadsheet of organizations that received money from one of Epstein’s foundations. We couldn’t initially find evidence of the donation in our records, so we had to go digging a bit; it was apparently seed money for OpenCog while they were getting up and running rather than money for “real” SIAI stuff, hence current staff being out of the loop.
Sayre-McCord in SEP’s “Moral Realism” article:
Moral realists are those who think that [...] moral claims do purport to report facts and are true if they get the facts right. Moreover, they hold, at least some moral claims actually are true. [...]
As a result, those who reject moral realism are usefully divided into (i) those who think moral claims do not purport to report facts in light of which they are true or false (noncognitivists) and (ii) those who think that moral claims do carry this purport but deny that any moral claims are actually true (error theorists).
Joyce in SEP’s “Moral Anti-Realism” article:
Traditionally, to hold a realist position with respect to X is to hold that X exists in a mind-independent manner (in the relevant sense of “mind-independence”). On this view, moral anti-realism is the denial of the thesis that moral properties—or facts, objects, relations, events, etc. (whatever categories one is willing to countenance)—exist mind-independently. This could involve either (1) the denial that moral properties exist at all, or (2) the acceptance that they do exist but that existence is (in the relevant sense) mind-dependent. Barring various complications to be discussed below, there are broadly two ways of endorsing (1): moral noncognitivism and moral error theory. Proponents of (2) may be variously thought of as moral non-ojectivists, or idealists, or constructivists.
So, everyone defines “non-realism” so as to include error theory and non-cognitivism; some people define it so as to also include all or most views on which moral properties are in some sense “subjective.”
These ambiguities seem like good reasons to just avoid the term “realism” and talk about more specific positions, though I guess it works to think about a sliding scale where substantive realism is at one extreme, error theory and non-cognitivism are at the other extreme, and remaining views are somewhere in the middle.
Example: Eliezer’s Extrapolated Volition is easy to round off to “constructivism”, By Which It May Be Judged to “substantive realism”, and Orthogonality Thesis and The Gift We Give To Tomorrow to “subjectivism”. I’m guessing it’s not a coincidence that those are also the most popular answers in the poll above, and that no one of them has majority support.
(Though I don’t think I could have made a strong prediction like this a priori. If non-cognitivism or error theory had done better, someone could have said “well, of course!”, citing LessWrong’s interest in signaling or their general reductionist/eliminativist/anti-supernaturalist tendencies.)
The most popular meta-ethical views on LessWrong seem to be relatively realist ones, with views like non-cognitivism and error theory getting significantly less support. From the 2016 LessWrong diaspora survey (excluding people who didn’t pick one of the options):
772 respondents (39.5%) voted for “Constructivism: Some moral statements are true, and the truth of a moral statement is determined by whether an agent would accept it if they were undergoing a process of rational deliberation. ‘Murder is wrong’ can mean something like ‘Societal agreement to the rule “donot murder” is instrumentally rational’.”
550 respondents (28.2%) voted for “Subjectivism: Some moral statements are true, but not universally, and the truth of a moral statement is determined by non-universal opinions or prescriptions, and there is no nonattitudinal determinant of rightness and wrongness. ‘Murder is wrong’ means something like’My culture has judged murder to be wrong’ or ‘I’ve judged murder to be wrong’.”
346 respondents (17.7%) voted for “Substantive realism: Some moral statements are true, and the truth of a moral statement is determined by mind-independent moral properties. ‘Murder is wrong’ means that murder has an objective mind-independent property of wrongness that we discover by empirical investigation, intuition, or some other method.”
186 respondents (9.5%) voted for “Non-cognitivism: Moral statements don’t express propositions and can neither be true nor false. ‘Murder is wrong’ means something like ‘Boo murder!’.”
99 respondents (5.1%) voted for “Error theory: Moral statements have a truth-value, but attempt to describe features of the world that don’t exist. ‘Murder is wrong’ and ‘Murder is right’ are both false statements because moral rightness and wrongness aren’t features that exist.”
I suspect that a lot of rationalists would be happy to endorse any of the above five views in different contexts or on different framings, and would say that real-world moral judgment is complicated and doesn’t cleanly fit into exactly one of these categories. E.g., I think Luke Muehlhauser’s Pluralistic Moral Reductionism is just correct.
Old discussion of this on LW: https://www.lesswrong.com/s/fqh9TLuoquxpducDb/p/synsRtBKDeAFuo7e3
Facebook comment I wrote in February, in response to the question ‘Why might having beauty in the world matter?’:
I assume you’re asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist.
[… S]ome reasons I think this:
1. If it cost me literally nothing, I feel like I’d rather there exist a planet that’s beautiful, ornate, and complex than one that’s dull and simple—even if the planet can never be seen or visited by anyone, and has no other impact on anyone’s life. This feels like a weak preference, but it helps get a foot in the door for beauty.
(The obvious counterargument here is that my brain might be bad at simulating the scenario where there’s literally zero chance I’ll ever interact with a thing; or I may be otherwise confused about my values.)
2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary preferences) and terminal and instrumental values are easily altered and interchanged in our brain, our prior should be that at least some people really do have weird preferences like that at least some of the time.
(And if it’s just a few other people who value beauty, and not me, I should still value it for the sake of altruism and cooperativeness.)
3. If morality isn’t “special”—if it’s just one of many facets of human values, and isn’t a particularly natural-kind-ish facet—then it’s likelier that a full understanding of human value would lead us to treat aesthetic and moral preferences as more coextensive, interconnected, and fuzzy. If I can value someone else’s happiness inherently, without needing to experience or know about it myself, it then becomes harder to say why I can’t value non-conscious states inherently; and “beauty” is an obvious candidate. My preferences aren’t all about my own experiences, and they aren’t simple, so it’s not clear why aesthetic preferences should be an exception to this rule.
4. Similarly, if phenomenal consciousness is fuzzy or fake, then it becomes less likely that our preferences range only and exactly over subjective experiences (or their closest non-fake counterparts). Which removes the main reason to think unexperienced beauty doesn’t matter to people.
Combining the latter two points, and the literature on emotions like disgust and purity which have both moral and non-moral aspects, it seems plausible that the extrapolated versions of preferences like “I don’t like it when other sentient beings suffer” could turn out to have aesthetic aspects or interpretations like “I find it ugly for brain regions to have suffering-ish configurations”.
Even if consciousness is fully a real thing, it seems as though a sufficiently deep reductive understanding of consciousness should lead us to understand and evaluate consciousness similarly whether we’re thinking about it in intentional/psychologizing terms or just thinking about the physical structure of the corresponding brain state. We shouldn’t be more outraged by a world-state under one description than under an equivalent description, ideally.
But then it seems less obvious that the brain states we care about should exactly correspond to the ones that are conscious, with no other brain states mattering; and aesthetic emotions are one of the main ways we relate to things we’re treating as physical systems.
As a concrete example, maybe our ideal selves would find it inherently disgusting for a brain state that sort of almost looks conscious to go through the motions of being tortured, even when we aren’t the least bit confused or uncertain about whether it’s really conscious, just because our terminal values are associative and symbolic. I use this example because it’s an especially easy one to understand from a morality- and consciousness-centered perspective, but I expect our ideal preferences about physical states to end up being very weird and complicated, and not to end up being all that much like our moral intuitions today.
Addendum: As always, this kind of thing is ridiculously speculative and not the kind of thing to put one’s weight down on or try to “lock in” for civilization. But it can be useful to keep the range of options in view, so we have them in mind when we figure out how to test them later.
It might also be worth comparing CAIS and “tool AI” to Paul Christiano’s IDA and the desiderata MIRI tends to talk about (task-directed AGI [1,2,3], mild optimization, limited AGI).
At a high level, I tend to think of Christiano and Drexler as both approaching alignment from very much the right angle, in that they’re (a) trying to break apart the vague idea of “AGI reasoning” into smaller parts, and (b) shooting for a system that won’t optimize harder (or more domain-generally) than we need for a given task. From conversations with Nate, one way I’d summarize MIRI-cluster disagreements with Christiano and Drexler’s proposals is that MIRI people don’t tend to think these proposals decompose cognitive work enough. Without a lot more decomposition/understanding, either the system as a whole won’t be capable enough, or it will be capable by virtue of atomic parts that are smart enough to be dangerous, where safety is a matter of how well we can open those black boxes.
In my experience people use “tool AI” to mean a bunch of different things, including things MIRI considers very important and useful (like “only works on a limited task, rather than putting any cognitive work into more general topics or trying to open-endedly optimize the future”) as well as ideas that don’t seem relevant or that obscure where the hard parts of the problem probably are.
For Facebook, I use FBPurity to block my news feed. Then if there are particular individuals I especially want to follow, I add them to a Facebook List.
For ‘things that aren’t an accident but aren’t necessarily conscious or endorsed’, another option might be to use language like ‘decision’, ‘action’, ‘choice’, etc. but flagged in a way that makes it clear you’re not assuming full consciousness. Like ‘quasi-decision’, ‘quasi-action’, ‘quasi-conscious’… Applied to Zack’s case, that might suggest a term like ‘quasi-dissembling’ or ‘quasi-misleading’. ‘Dissonant communication’ comes to mind as another idea.
When I want to emphasize that there’s optimization going on but it’s not necessarily conscious, I sometimes speak impersonally of “Bob’s brain is doing X”, or “a Bob-part/agent/subagent is doing X”.
I personally wouldn’t point to “When Will AI Exceed Human Performance?” as an exemplar on this dimension, because it isn’t clear about the interesting implications of the facts it’s reporting. Katja’s take-away from the paper was:
In the past, it seemed pretty plausible that what AI researchers think is a decent guide to what’s going to happen. I think we’ve pretty much demonstrated that that’s not the case. I think there are a variety of different ways we might go about trying to work out what AI timelines are like, and talking to experts is one of them; I think we should weight that one down a lot.
I don’t know whether Katja’s co-authors agree with her about that summary, but if there’s disagreement, I think the paper still could have included more discussion of the question and which findings look relevant to it.
The actual Discussion section makes the opposite argument instead, listing a bunch of reasons to think AI experts are good at foreseeing AI progress. The introduction says “To prepare for these challenges, accurate forecasting of transformative AI would be invaluable. [...] The predictions of AI experts provide crucial additional information.” And the paper includes a list of four “key findings”, none of which even raise the question of survey respondents’ forecasting chops, and all of which are worded in ways that suggest we should in fact put some weight on the respondents’ views (sometimes switching between the phrasing ‘researchers believe X’ and ‘X is true’).
The abstract mentions the main finding that undermines how believable the responses are, but does so in such a way that someone reading through quickly might come away with the opposite impression. The abstract’s structure is:
To adapt public policy, we need to better anticipate [AI advances]. Researchers predict [A, B, C, D, E, and F]. Researchers believe [G and H]. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.
If it slips past your attention that G and H are massively inconsistent, it’s easy for the reader to come away thinking the abstract is saying ‘Here’s a list of of credible statements from experts about their area of expertise’ as opposed to ‘Here’s a demonstration that what AI researchers think is not a decent guide to what’s going to happen’.
Humans might not be a low-level atom, but obviously we have to privilege the hypothesis ‘something human-like did this’ if we’ve already observed a lot of human-like things in our environment.
Suppose I’m a member of a prehistoric tribe, and I see a fire in the distance. It’s fine for me to say ‘I have a low-ish prior on a human starting the fire, because (AFAIK) there are only a few dozen humans in the area’. And it’s fine for me to say ‘I’ve never seen a human start a fire, so I don’t think a human started this fire’. But it’s not fine for me to say ‘It’s very unlikely a human started that fire, because human brains are more complicated than other phenomena that might start fires’, even if I correctly intuit how and why humans are more complicated than other phenomena.
The case of Thor is a bit more complicated, because gods are different from humans. If Eliezer and cousin_it disagree on this point, maybe Eliezer would say ‘The complexity of the human brain is the biggest reason why you shouldn’t infer that there are other, as-yet-unobserved species of human-brain-ish things that are very different from humans’, and maybe cousin_it would say ‘No, it’s pretty much just the differentness-from-observed-humans (on the “has direct control over elemental forces” dimension) that matters, not the fact that it has a complicated brain.’
If that’s a good characterization of the disagreement, then it seems like Eliezer might say ‘In ancient societies, it was much more reasonable to posit mindless “supernatural” phenomena (i.e., mindless physical mechanisms wildly different from anything we’ve observed) than to posit intelligent supernatural phenomena.’ Whereas the hypothetical cousin-it might say that ancient people didn’t have enough evidence to conclude that gods were any more unlikely than mindless mechanisms that were similarly different from experience. Example question: what probability should ancient people have assigned to
The regular motion of the planets is due to a random process plus a mindless invisible force, like the mindless invisible force that causes recently-cooked food to cool down all on its own.
The regular motion of the planets is due to deliberate design / intelligent intervention, like the intelligent intervention that arranges and cooks food.
Also the discussion of deconfusion research in https://intelligence.org/2018/11/22/2018-update-our-new-research-directions/ and https://www.lesswrong.com/posts/Gg9a4y8reWKtLe3Tn/the-rocket-alignment-problem , and the sketch of ‘why this looks like a hard problem in general’ in https://www.lesswrong.com/posts/zEvqFtT4AtTztfYC4/optimization-amplifies and https://arbital.com/p/aligning_adds_time/ .
MIRIx events are funded by MIRI, but we don’t decide the topics or anything. I haven’t taken a poll of MIRI researchers to see how enthusiastic different people are about formal verification, but AFAIK Nate and Eliezer don’t see it as super relevant. See https://www.lesswrong.com/posts/xCpuSfT5Lt6kkR3po/my-take-on-agent-foundations-formalizing-metaphilosophical#cGuMRFSi224RCNBZi and the idea of a “safety-story” in https://www.lesswrong.com/posts/8gqrbnW758qjHFTrH/security-mindset-and-ordinary-paranoia for better attempts to characterize what MIRI is looking for.
ETA: From the end of the latter dialogue,
In point of fact, the real reason the author is listing out this methodology is that he’s currently trying to do something similar on the problem of aligning Artificial General Intelligence, and he would like to move past “I believe my AGI won’t want to kill anyone” and into a headspace more like writing down statements such as “Although the space of potential weightings for this recurrent neural net does contain weight combinations that would figure out how to kill the programmers, I believe that gradient descent on loss function L will only access a result inside subspace Q with properties P, and I believe a space with properties P does not include any weight combinations that figure out how to kill the programmer.”
Though this itself is not really a reduced statement and still has too much goal-laden language in it.
Rather than putting the emphasis on being able to machine-verify all important properties of the system, this puts the emphasis on having strong technical insight into the system; I usually think of formal proofs more as a means to that end. (Again caveating that some people at MIRI might think of this differently.)
A tricky thing about this is that there’s an element of cognitive distortion in how most people evaluate these questions, and play-acting at “this distortion makes sense” can worsen the distortion (at the same time that it helps win more trust from people who have the distortion).
If it turned out to be a good idea to try to speak to this perspective, I’d recommend first meditating on a few reversal tests. Like: “Hmm, I wouldn’t feel any need to add a disclaimer here if the text I was recommending were The Brothers Karamazov, though I’d want to briefly say why it’s relevant, and I might worry about the length. I’d feel a bit worried about recommending a young adult novel, even an unusually didactic one, because people rightly expect YA novels to be optimized for less useful and edifying things than the “literary classics” reference class. The insights tend to be shallower and less common. YA novels and fanfiction are similar in all those respects, and they provoke basically the same feeling in me, so I can maybe use that reversal test to determine what kinds of disclaimers or added context make sense here.”
(If I want to express stronger gratitude than that, I’d rather write it out.)
On slack, Thumbs Up, OK, and Horns hand signs meet all my minor needs for thanking people.
Can’t individuals just list ‘Reign of Terror’ and then specify in their personalized description that they have a high bar for terror?
We’d talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).
I asked Eliezer if it made sense to cross-post this from Arbital, and did the cross-posting when he approved. I’m sorry it wasn’t clear that this was a cross-post! I intended to make this clearer, but my idea was bad (putting the information on the sequence page) and I also implemented it wrong (the sequence didn’t previously display on the top of this post).
This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins in media res stylistically, it doesn’t have any prereqs or context beyond “this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics.”
Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:
So we’re talking about how to make good decisions, or the idea of ‘bounded rationality’, or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of ‘expected utility’ or ‘utility functions’. And before we even ask what those are, we might first ask, Why?
There have been loose plans for a while to cross-post content from Arbital to LW (maybe all of it; maybe just the best or most interesting stuff), but as I mentioned downthread, we’re doing more cross-post experiments sooner than we would have because Arbital’s been having serious performance issues.
I assume you mean ‘no one has this responsibility for Arbital anymore’, and not that there’s someone else who has this responsibility.