Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player’s sense of “elegant” go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
Thanks for replying. The thing I’m wondering about is: maybe it’s sort of like this “all the way down.” Like, maybe the things that are showing up as “terminal” goals in your analysis (money, status, being useful) are themselves composed sort of like the apple pie business, in that they congeal while they’re “profitable” from the perspective of some smaller thingies located in some large “bath” (such as an economy, or a (non-conscious) attempt to minimize predictive error or something so as to secure neural resources, or a theremodynamic flow of sunlight or something). Like, maybe it is this way in humans, and maybe it is or will be this way in an AI. Maybe there won’t be anything that is well-regarded as “terminal goals.”I said something like this to a friend, who was like “well, sure, the things that are ‘terminal’ goals for me are often ‘instrumental’ goals for evolution, who cares?” The thing I care about here is: how “fixed” are the goals, do they resist updating/dissolving when they cease being “profitable” from the perspective of thingies in an underlying substrate, or are they constantly changing as what is profitable changes? Like, imagine a kid who cares about playing “good, fun” videogames, but whose notion of which games are this updates pretty continually as he gets better at gaming. I’m not sure it makes that much sense to think of this as a “terminal goal” in the same sense that “make a bunch of diamond paperclips according to this fixed specification” is a terminal goal. It might be differently satiable, differently in touch with what’s below it, I’m not really sure why I care but I think it might matter for what kind of thing organisms/~agent-like-things are.
There’s a thing I’m personally confused about that seems related to the OP, though not directly addressed by it. Maybe it is sufficiently on topic to raise here.
My personal confusion is this:
Some of my (human) goals are pretty stable across time (e.g. I still like calories, and being a normal human temperature, much as I did when newborn). But a lot of my other “goals” or “wants” form and un-form without any particular “convergent instrumental drives”-style attempts to protect said “goals” from change.
As a bit of an analogy (to how I think I and other humans might approximately act): in a well-functioning idealized economy, an apple pie-making business might form (when it was the case that apple pie would deliver a profit over the inputs of apples plus the labor of those involved plus etc.), and might later fluidly un-form (when it ceased to be profitable), without “make apple pies” or “keep this business afloat” becoming a thing that tries to self-perpetuate in perpetuity. I think a lot of my desires are like this (I care intrinsically about getting outdoors everyday while there’s profit in it, but the desire doesn’t try to shield itself from change, and it’ll stop if getting outdoors stops having good results. And this notion of “profit” does not itself seem obviously like a fixed utility function, I think.).
I’m pretty curious about whether the [things kinda like LLMs but with longer planning horizons that we might get as natural extensions of the current paradigm, if the current paradigm extends this way, and/or the AGIs that an AI-accidentally-goes-foom process will summon] will have goals that try to stick around indefinitely, or goals that congeal and later dissolve again into some background process that’ll later summon new goals, without summoning something lasting that is fixed-utility-function-shaped. (It seems to me that idealized economies do not acquire fixed or self-protective goals, and for all I know many AIs might as be like economies in this way.)
(I’m not saying this bears on risk in any particular way. Temporary goals would still resist most wrenches while they remained active, much as even an idealized apple pie business resists wrenches while it stays profitable.)
Ben Pace, honorably quoting aloud a thing he’d previously said about Ren:
the other day i said [ren] seemed to be doing well to meto clarify, i am not sure she has not gone crazyshe might’ve, i’m not close enough to be confidenti’d give it 25%
the other day i said [ren] seemed to be doing well to me
to clarify, i am not sure she has not gone crazy
she might’ve, i’m not close enough to be confident
i’d give it 25%
I really don’t like this usage of the word “crazy”, which IME is fairly common in the bay area rationality community. This is for several reasons. The simple to express one is that I really read through like 40% of this dialog thinking (from its title plus early conversational volleys) that people were afraid Ren had gone, like, the kind of “crazy” that acute mania or psychosis or something often is, where a person might lose their ability to do normal tasks that almost everyone can do, like knowing what year it is or how to get to the store and back safely. Which was a set of worries I didn’t need to have, in this case. I.e., my simple complaint is that it caused me confusion here.The harder to express but more heartfelt one, is something like: the word “crazy” is a license to write people off. When people in wider society use it about those having acute psychiatric crises, they give themselves a license to write off the sense behind the perceptions of like 2% or something of the population. When the word is instead used about people who are not practicing LW!rationality, including ordinary religious people, it gives a license to write off a much larger chunk of people (~95% of the population?), so one is less apt to seek sense behind their perceptions and actions.
This sort of writing-off is a thing people can try doing, if they want, but it’s a nonstandard move and I want it to be visible as such. That is, I want people to spell it out more, like: “I think Ren might’ve stopped being double-plus-sane like all the rest of us are” or “I think Ren might’ve stopped following the principles of LW!rationality” or something. (The word “crazy” hides this as though it’s the normal-person “dismiss ~2% of the population” move; these other sentences make make visible that it’s an unusual and more widely dismissive move.) The reason I want this move to be made visible in this way is partly that I think the outside view on (groups of people who dismiss those who aren’t members of the group) is that this practice often leads to various bad things (e.g. increased conformity as group members fear being dubbed out-group; increased blindness to outside perspectives; difficulty collaborating with skilled outsiders), and I want those risks more visible.
(FWIW, I’d have the same response to a group of democrats discussing republicans or Trump-voters as “crazy”, and sometimes have. But IMO bay area rationalists say this sort of thing much more than other groups I’ve been part of.)
Thanks for this response; I find it helpful.Reading it over, I want to distinguish between:
a) Relatively thoughtless application of heuristics; (system-1integrated + fast)
b) Taking time to reflect and notice how things seem to you once you’ve had more space for reflection, for taking in other peoples’ experiences, for noticing what still seems to matter once you’ve fallen out of the day-to-day urgencies, and for tuning into the “still, quiet voice” of conscience; (system-1-integrated + slow, after a pause)
c) Ethical reasoning (system-2-heavy, medium-paced or slow).
The brief version of my position is that (b) is awesome, while (c) is good when it assists (b) but is damaging when it is acted on in a way that disempowers rather than empowers (b).
The long-winded version (which may be entirely in agreement with your (Tristan’s) comment, but which goes into detail because I want to understand this stuff):
I agree with you and Eccentricity that most people, including me and IMO most LWers and EAers, could benefit from doing more (b) than we tend to do.
I also agree with you that (c) can assist in doing (b). For example, it can be good for a person to ask themselves “how does this action, which I’m inclined to take, differ from the actions I condemned in others?”, “what is likely to happen if I do this?”, and “do my concepts and actions fit the world I’m in, or is there a tiny note of discord?”
At the same time, I don’t want to just say “c is great! do more c!” because I share with the OP a concern that EA-ers, LW-ers, and people in general who attempt explicit ethical reasoning sometimes end up using these to talk themselves into doing dumb, harmful things, with the OP’s example of “leave inaccurate reviews at vegan restaurants to try to save animals” giving a good example of the flavor of these errors, and with historical communism giving a good example of their potential magnitude.
My take as to the difference between virtuous use of explicit ethical reasoning, and vicious/damaging use of explicit ethical reasoning, is that virtuous use of such reasoning is aimed at cultivating and empowering a person’s [prudence, phronesis, common sense, or whatever you want to call a central faculty of judgment that draws on and integrates everything the person discerns and cares about], whereas vicious/damaging uses of ethical reasoning involve taking some piece of the total set of things we care about, stabilizing it into an identity and/or a social movement (“I am a hedonistic utilitarian”, “we are (communists/social justice/QAnon/EA)”, and having this artificially stabilized fragment of the total set of things one cares about, act directly in the world without being filtered through one’s total discernment (“Action A is the X thing to do, and I am an X, so I will take action A”).
(Prudence was classically considered not only a virtue, but the “queen of the virtues”—as Wikipedia puts it “Prudence points out which course of action is to be taken in any concrete circumstances… Without prudence, bravery becomes foolhardiness, mercy sinks into weakness, free self-expression and kindness into censure, humility into degradation and arrogance, selflessness into corruption, and temperance into fanaticism.” Folk ethics, or commonsense ethics, has at its heart the cultivation of a total faculty of discernment, plus the education of this faculty to include courage/kindness/humility/whatever other virtues.)
My current guess as to how to develop prudence is basically to take an interest in things, care, notice tiny notes of discord, notice what actions have historically had what effects, notice when one is oneself “hijacked” into acting on something other than one’s best judgment and how to avoid this, etc. I think this is part of what you have in mind about bringing ethical reasoning into daily life, so as to see how kindness applies in specific rather than merely claiming it’d be good to apply somehow?
Absent identity-based or social-movement-based artificial stabilization, people can and do make mistakes, including e.g. leaving inaccurate reviews in an attempt to help animals. But I think those mistakes are more likely to be part of a fairly rapid process of developing prudence (which seems pretty worth it to me), and are less likely to be frozen in and acted on for years.
(My understanding isn’t great here; more dialog would be great.)
I like the question; thanks. I don’t have anything smart to say about at the moment, but it seems like a cool thread.
The idea is, normally just do straightforwardly good things. Be cooperative, friendly, and considerate. Embrace the standard virtues. Don’t stress about the global impacts or second-order altruistic effects of minor decisions. But also identify the very small fraction of your decisions which are likely to have the largest effects and put a lot of creative energy into doing the best you can.
I agree with this, but would add that IMO, after you work out the consequentialist analysis of the small set of decisions that are worth intensive thought/effort/research, it is quite worthwhile to additionally work out something like a folk ethical account of why your result is correct, or of how the action you’re endorsing coheres with deep virtues/deontology/tropes/etc.
There are two big upsides to this process:
As you work this out, you get some extra checks on your reasoning—maybe folk ethics sees something you’re missing here; and
At least as importantly: a good folk ethical account will let individuals and groups cohere around the proposed action, in a simple, conscious, wanting-the-good-thing-together way, without needing to dissociate from what they’re doing (whereas accounts like “it’s worth dishonesty in this one particular case” will be harder to act on wholeheartedly, even when basically correct). And this will work a lot better.
IMO, this is similar to: in math, we use heuristics and intuitions and informal reasoning a lot, to guess how to do things—and we use detailed, not-condensed-by-heuristics algebra or mathematical proof steps sometimes also, to work out how a thing goes that we don’t yet find intuitive or obvious. But after writing a math proof the sloggy way, it’s good to go back over it, look for “why it worked,” “what was the true essence of the proof, that made it tick,” and see if there is now a way to “see it at a glance,” to locate ways of seeing that will make future such situations more obvious, and that can live in one’s system 1 and aesthetics as well as in one’s sloggy explicit reasoning.
Or, again, in coding: usually we can use standard data structures and patterns. Sometimes we have to hand-invent something new. But after coming up with the something new: it’s good, often, to condense it into readily parsable/remember-able/re-useable stuff, instead of hand spaghetti code.
Or, in physics and many other domains: new results are sometimes counterintuitive, but it is advisable to then work out intuitions whereby reality may be more intuitive in the future.
I don’t have my concepts well worked out here yet, which is why I’m being so long-winded and full of analogies. But I’m pretty sure that folk ethics, where we have it worked out, has a bunch of advantages over consequentialist reasoning that’re kind of like those above.
As the OP notes, folk ethics can be applied to hundreds of decisions per day, without much thought per each;
As the OP notes, folk ethics have been tested across huge numbers of past actions by huge numbers of people. New attempts at folk ethical reasoning can’t have this advantage fully. But: I think when things are formulated simply enough, or enough in the language of folk ethics, we can back-apply them a lot more on a lot of known history and personally experienced anecdotes ad so on (since they are quick to apply, as in the above bullet point), and can get at least some of the “we still like this heuristic after considering it in a lot of different contexts with known outcomes” advantage.
As OP implies, folk ethics is more robust to a lot of the normal human bias temptations (“x must be right, because I’d find it more convenient right this minute”) compared to case-by-case reasoning;
It is easier for us humans to work hard on something, in a stable fashion, when we can see in our hearts that it is good, and can see how it relates to everything else we care about. Folk ethics helps with this. Maybe folk ethics, and notions of virtue and so on, kind of are takes on how a given thing can fit together with all the little decisions and all the competing pulls as to what’s good? E.g. the OP lists as examples of commonsense goods “patience, respect, humility, moderation, kindness, honesty”—and all of these are pretty usable guides to how to be while I care about something, and to how to relate that caring to l my other cares and goals.
I suspect there’s something particularly good here with groups. We humans often want to be part of groups that can work toward a good goal across a long period of time, while maintaining integrity, and this is often hard because groups tend to degenerate with time into serving individuals’ local power, becoming moral fads, or other things that aren’t as good as the intended purpose. Ethics, held in common by the group’s common sense, is a lot of how this is ever avoided, I think; and this is more feasible if the group is trying to serve a thing whose folk ethics (or “commonsense good”) has been worked out, vs something that hasn’t.
For a concrete example: AI safety obviously matters. The folk ethics of “don’t let everyone get killed if you can help it” are solid, so that part’s fine. But in terms of tactics: I really think we need to work out a “commonsense good” or “folk ethics” type account of things like:
Is it okay to try to get lots of power, by being first to AI and trying to make use of that power to prevent worse AI outcomes? (My take: maybe somehow, but I haven’t seen the folk ethics worked out, and a good working out would give a lot of checks here, I think.)
Is it okay to try to suppress risky research, e.g. via frowning at people and telling them that only bad people do AI research, so as to try to delay tech that might kill everyone? (My take: probably, on my guess—but a good folk ethics would bring structure and intuitions somehow, like, it would work out how this is different from other kinds of “discourage people from talking and figuring things out,” it would have perceivable virtues or something for noticing the differences, which would help people then track the differences on the group commonsense level in ways that help the group’s commonsense not erode its general belief in the goodness of people sharing information and doing things).
I agree, ’”Flinching away from truth” is often caused by internal conflation” would be a much better title—a more potent short take-away. (Or at least one I more agree with after some years of reflection.) Thanks!
I enjoyed this post, both for its satire of a bunch of peoples’ thinking styles (including mine, at times), and because IMO (and in the author’s opinion, I think), there are some valid points near here and it’s a bit tricky to know which parts of the “jokes/poetry” may have valid analogs.I appreciate the author for writing it, because IMO we have a whole bunch of different subcultures and styles of conversation and sets of assumptions colliding all of a sudden on the internet right now around AI risk, and noticing the existence of the others seems useful, and IMO the OP is an attempt to collide LW with some other styles. Judging from the comments it seems to me not to have succeeded all that much; but it was helpful to me, and I appreciate the effort. (Though, as a tactical note, it seems to me the approximate failure was due mostly to piece’s the sarcasm, and I suspect sarcasm in general tends not to work well across cultural or inferential distances.)Some points I consider valid, that also appear within [the vibes-based reasoning the OP is trying to satirize, and also to model and engage with]:1) Sometimes, talking a lot about a very specific fear can bring about the feared scenario. (An example I’m sure of: a friend’s toddler stuck her hands in soap. My friend said “don’t touch your eyes.” The toddler, unclear on the word ‘not,’ touched her eyes.) (A possible example I’m less confident in: articulated fears of AI risk may have accelerated AI because humanity’s collective attentional flows, like toddlers, has no reasonable implementation of the word “not.”) This may be a thing to watch out for for an AI risk movement.
(I think this is non-randomly reflected in statements like: “worrying has bad vibes.”)2) There’s a lot of funny ways that attempting to control people or social processes can backfire. (Example: lots of people don’t like it when they feel like something is trying to control them.) (Example: the prohibition of alcohol in the US between 1917-1933 is said to have fueled organized crime.) (Example I’m less confident of: Trying to keep e.g. anti-vax views out of public discourse leads some to be paranoid, untrusting of establishment writing on the subject.)This is a thing that may make trouble for some safety strategies, and that seems to me to be non-randomly reflected in “trying to control things has bad vibes.”(Though, all things considered, I still favor trying to slow things! And I care about trying to slow things.)
3) There’re a lot of places where different schelling equilibria are available, and where groups can, should, and do try to pick the equilibrium that is better. In many cases this is done with vibes. Vibes, positivity, attending to what is or isn’t cool or authentic (vs boring), etc., are part of how people decide which company to congregate on, which subculture to bring to life, which approach to AI to do research within, etc. -- and this is partly doing some real work discerning what can become intellectually vibrant (vs boring, lifeless, dissociated).TBC, I would not want to use vibes-based reasoning in place of reasoning, and I would not want LW to accept vibes in place of reasons. I would want some/many in LW to learn to model vibes-based reasoning for the sake of understanding the social processes around us. I would also want some/many at LW to sometimes, if the rate of results pans out in a given domain, use something like vibes-based reasoning as a source of hypotheses that one can check against actual reasoning. LW seems to me pretty solid on reasoning relative to other places I know on the internet, but only mediocre on generativity; I think learning to absorb hypotheses from varied subcultures (and from varied old books, from people who thought at other times and places) would probably help, and the OP is gesturing at one such subculture.I’m posting this comment because I didn’t want to post this comment for fear of being written off by LW, and I’m trying to come out of more closets. Kinda at random, since I’ve spent large months or small years failing to successfully implement some sort of more planned approach.
The public early Covid-19 conversation (in like Feb-April 2020) seemed pretty hopeful to me—decent arguments, slow but asymmetrically correct updating on some of those arguments, etc. Later everything became politicized and stupid re: covid.Right now I think there’s some opportunity for real conversation re: AI. I don’t know what useful thing follows from that, but I do think it may not last, and that it’s pretty cool. I care more about the “an opening for real conversation” thing than for the changing overton window as such, although I think the former probably follows from the latter (first encounters are often more real somehow).
This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space.
That’s not how I read it. To me it’s an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests. This doesn’t seem like an attempt to be clever or explore high-variance tails. More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.
Thanks for the suggestion. I haven’t read it. I’d thought from hearsay that it is rather lacking in “light”—a bunch of people who’re kinda bored and can’t remember the meaning of life—is that true? Could be worth it anyway.
I did not know this; thanks!
Not sure where you’re going with this. It seems to me that political methods (such as petitions, public pressure, threat of legislation) can be used to restrain the actions of large/mainstream companies, and that training models one or two OOM larger than GPT4 will be quite expensive and may well be done mostly or exclusively within large companies of the sort that can be restrained in this sort of way.
Maybe also: anything that bears on how an LLM, if it realizes it is not human and is among aliens in some sense, might want to relate morally to thingies that created it and aren’t it. (I’m not immediately thinking of any good books/similar that bear on this, but there probably are some.)
I was figuring GPT4 was already trained on a sizable fraction of the internet, and GPT5 would be trained on basically all the text (plus maybe some not-text, not sure). Is this wrong?
In terms of what kinds of things might be helpful:1. Object-level stuff:
Things that help illuminate core components of ethics, such as “what is consciousness,” “what is love,” “what is up in human beings with the things we call ‘values’, that seem to have some thingies in common with beliefs,” “how exactly did evolution end up producing the thing where we care about stuff and find some things worth caring about,” etc.
Some books I kinda like in this space:
Martin Buber’s book “I and thou”;
Christopher Alexander’s writing, especially his “The Nature of Order” books
The Tao Te Ching (though this one I assume is thoroughly in any huge training corpus already)
(curious for y’all’s suggestions)
2. Stuff that aids processes for eliciting peoples’ values, or for letting people elicit each others’ values:
My thought here is that there’re dialogs between different people, and between people and LLMs, on what matters and how we can tell. Conversational methodologies for helping these dialogs go better seem maybe-helpful. E.g. active listening stuff, or circling, or Gendlin’s Focusing stuff, or … [not sure what—theory of how these sorts of fusions and dialogs can ever work, what they are, tips for how to do them in practice, …]
3. Especially, maybe: stuff that may help locate “attractor states” such that an AI, or a network of humans and near-human-level AIs, might, if it gets near this attractor state, choose to stay in this attractor state. And such that the attractor state has something to do with creating good futures.
Confucius (? I haven’t read him, but he at least shaped for society for a long time in a way that was partly about respecting and not killing your ancestors?)
Hayek (he has an idea of “natural law” as sort of how you have to structure minds and economies of minds if you want to be able to choose at all, rather than e.g. making random mouth motions that cause random other things to happen that have nothing to do with your intent really, like what would happen if a monarch says “I want to abolish poverty” and then people try to “implement” his “decree”).
It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don’t want it to be public knowledge that they did it. Right?
Oh no. Apparently also Yann LeCun didn’t really sign this.