G Gordon Worley III
Values, Valence, and Alignment
Here’s another: AI being x-risky makes me the bad guy.
That is, if I’m an AI researcher and someone tells me that AI poses x-risks, I might react by seeing this as someone telling me I’m a bad person for working on something that makes the world worse. This is bad for me because I derive import parts of my sense of self from being an AI researcher: it’s my profession, my source of income, my primary source of status, and a huge part of what makes my life meaningful to me. If what I am doing is bad or dangerous, that threatens to take much of that away (if I also want to think of myself as a good person, meaning I either have to stop doing AI work to avoid being bad or stop thinking of myself as good), and an easy solution to that is to dismiss the arguments.
This is more generally a kind of motivated cognition or rationalization, but I think it’s worth considering a specific mechanism because it better points towards ways you might address the objection.
Sort of related to a couple points you already brought up (not in personal experience, outsiders not experts, science fiction), but worrying about AI x-risk is also weird, i.e. it’s not a thing everyone else is worrying about, so you use some of your weirdness-points to publicly worry about it, and most people have very low weirdness budgets (because of not enough status to afford more weirdness, low psychological openness, etc.).
Sometimes we’re at a functional local maxima, but we’re not pointed in the right direction globally, and frankly speaking our lack of a high energy parameter is our saving grace – our inability to directly muck up our emotional landscape.
I’ve heard a similar story in meditation circles about why integration work is important: greater awakening enables greater agency/freedom-of-action, and without integration, virtue, and ethics (these are traditionally combined via the paramita of sila) that can be dangerous because it can let a person run off in personally dangerous or socially bad directions that they were previously only managing not to because they weren’t more capable, essentially protecting themselves from themselves with their own failure.
Interesting.
I don’t recall anymore, it’s been too long for me to remember enough specifics to answer your question. It’s just an impression or cached thought I have that I carry around from past study.
Here, the optimal decisions would be the higher-order outputs which maximize higher-order utility. They are decisions about what to value or how to decide rather than about what to do.
What constitutes utility here, then? For example, some might say utility is grounded in happiness or meaning, in economics we often measure utility in money, and I’ve been thinking along the lines of grounding utility (through value) in minimization of prediction error. It’s fine that you are concerned with higher-order processes (I’m assuming by that you mean processes about processes, like higher-order outputs is outputs about outputs, higher-order utility is utility about utility), and maybe you are primarily concerned with abstractions that let you ignore these details, but then it must still be that those abstractions can be embodied in specifics at some point or else they are abstractions that don’t describe reality well. After all, meta-values/preferences/utility functions are still values/preferences/utility functions.
To capture rational values, we are trying to focus on the changes to values that flow out of satisfying one’s higher-order decision criteria. By unrelated distortions of value, I pretty much mean changes in value from any other causes, e.g. from noise, biases, or mere associations.
How do you distinguish whether something is a distortion or not? You point to some things that you consider distortions, but I’m still unclear on the criteria by which you know distortions from the rational values you are looking for. One person’s bias may be another person’s taste. I realize some of this may depend on how you identify higher-order processes, but even if that’s the case we’re still left with the question as it applies to those directly, i.e. is some particular higher-order decision criterion a distortion or rational?
In the code and outline I call the lack of distortion Agential Identity (similar to personal identity). I had previously tried to just extract the criteria out of the brain and directly operate on them. But now, I think the brain is sufficiently messy that we can only simulate many continuations and aggregate them. That opens up a lot of potential to stray far from the original state. This Agential Identity helps ensure we’re uncovering your dispositions rather than that of a stranger or a funhouse mirror distortion.
This seems strange to me, because much of what makes a person unique lies in their distortions (speaking loosely here), not in their lack. Normally when we think of distortions they are taking an agent away from a universal perfected norm, and that universal norm would ideally be the same for all agents if it weren’t for distortions. What leads you to think there are some personal dispositions that are not distortions and not universal because they are caused by the shared rationality norm?
I tend to think of Hegel as primarily important for his contributions to the development of Western philosophy (so even if he was wrong on details he influenced and framed the work of many future philosophers by getting aspects of the framing right) and for his contributions to methodology (like standardizing the method of dialectic, which on one hand is “obvious” and people were doing it before Hegel, and on the other hand is mysterious and the work of experts until someone lays out what’s going on).
A brain’s rational utility function is the utility function that would be arrived at by the brain’s decision algorithm if it were to make more optimal decisions while avoiding unrelated distortions of value.
By what mechanism do you think we can assess how unrelated and how much distortion of value is happening? Put another way, what are “values” in this model such that they are are separate from the utility function and how could you measure whether or not the utility function is better optimizing for those values?
Note that MAPLE is a young place, less than a decade old in its current form. So, much of it is “experimental.” These ideas aren’t time-tested. But my personal experience of them has been surprisingly positive, so far.
I think it’s worth sharing that 3 of the ideas you brought up are, at least within zen, historically common to monastic practice, albeit changed in ways to better fit the context of MAPLE. You call them the care role, the ops role, and the schedule; I see them as analogues of the jisha, the jiki, and the schedule.
The jisha, in a zen monastery, is first and foremost the attendant of the abbot (caveat: some monasteries every teacher and high-ranking priest will have their own jisha). But in addition to this, the jisha is thought of as the “mother” of the sangha, with responsibilities to care for the monks, nuns, and guests, care for the sick, organize cleaning, and otherwise be supportive of the needs of people. This is similar to your care role in some ways, but MAPLE seems to have focused more on the care aspect and dropped the gendered-role aspects.
The jiki (also jikijitsu or jikido) is responsible for directing the movement of the students. They are the “father” to the jisha’s “mother”, serving as (possibly strict) disciplinarians to keep the monastery operating as intended by the abbot, enforcing rules and handing out punishments. This sounds similar to the Ops role, albeit probably with fewer slaps to the face and blows to the head.
The schedule is, well, the schedule. I expect MAPLE’s schedule, though “young”, is building on centuries of monastic schedule tradition while adding in new things. I think it’s worth adding that the schedule is also there to support deep practice, because there’s a very real way that having to make decisions can weaken samadhi, and having all decisions eliminated creates the space in which calm abiding can more easily arise.
Depends what you care about.
What Dharma traditions in particular so you have in mind, because I can’t think of one i would describe as saying everyone had innate “moral” perfection unless you sufficiently twist around the word “moral” such that it’s use is confusing at best.
Everything that is not a literal quote from the previous post is new.
No. I would rather receive a strong upvote. If I receive a comment I would prefer it contain some useful content.
Doxa, Episteme, and Gnosis Revisited
Story stats are my favorite feature of Medium. Let me tell you why.
I write primarily to impact others. Although I sometimes choose to do very little work to make myself understandable to anyone who is more than a few inferential steps behind me and then write out on a far frontier of thought, nonetheless my purpose remains sharing my ideas with others. If it weren’t for that, I wouldn’t bother to write much at all, and certainly not in the same way as I do when writing for others. Thus I care instrumentally a lot about being able to assess if I am having the desired impact so that I can improve in ways that might help serve my purposes.
LessWrong provides some good, high detail clues about impact: votes and comments. Comments on LW are great, and definitely better in quality and depth of engagement than what I find other places. Votes are also relatively useful here, caveat the weaknesses of LW voting I’ve talked about before. If I post something on LW and it gets lots of votes (up or down) or lots of comments, relative to what other posts receive, then I’m confident people have read what I wrote and I impacted them in some way, whether or not it was in the way I had hoped.
That’s basically where story stats stop on LessWrong. Here’s a screen shot of the info I get from Medium:
For each story you can see a few things here: views, reads, read ratio, and fans, which is basically likes. I also get an email every week telling me about the largest updates to my story stats, like how many additional views, reads, and fans a story had in the last week.
If I click the little “Details” link under a story name I get more stats: average read time, referral sources, internal vs. external views (external views are views on RSS, etc.), and even a list of “interests” associated with readers who read my story.All of this is great. Each week I get a little positive reward letting me know what I did that worked, what didn’t, and most importantly to me, how much people are engaging with things I wrote.
I get some of that here on LessWrong, but not all of it. Although I’ve bootstrapped myself now to a point where I’ll keep writing even absent these motivational queues, I still find this info useful for understanding what things I wrote that people liked best or found most useful and what they found least useful. Some of that is mirrored here by things like votes, but it doesn’t capture all of it.
I think it would be pretty cool if I could see more stats about my posts on LessWrong similar to what I get on Medium, especially view and read counts (knowing that “reads” is a ultimately a guess based on some users allowing Javascript that lets us guess that they read it).
Possibly related but with a slightly different angle, you may have missed my work on trying to formally specify the alignment problem, which is pointing to something similar but arrives at somewhat different results.
It’s true that not all of online advertising does nothing. We should expect, if nothing else, online advertising to continue to serve the primary and original purpose of advertising, which is generating choice awareness, and certainly my own experience backs this up: I am aware of any number of products and services only because I saw ads for them on Facebook, Google search, SlateStarCodex, etc.. To the extent that advertising helps people become aware of choices they otherwise would not have become aware of such that on the margin they may take that choice (since you make none of the choices you don’t know how to make), it would seem to function successfully, assuming it can be had at a price low enough to produce positive return on investment.
However, my own experience in the industry suggests that most spend that goes beyond generating more than zero awareness is poorly spent. Much to the dismay of marketing departments, you can’t usually spend your way through ads to growth. Other forms of marketing look better (content marketing can work really great and can be a win-win when done right).
I’m excited for the rest of this miniseries. I’m similarly interested in cybernetics and am sad it failed for what in hindsight seem to be obvious and unavoidable reasons (interdisciplinary & easily cooped to justify bullshit). My own thinking has taken me in a direction convergent with cybernetics, as I’ve investigated a bit in the past.
This seems not quite right to me, in that I doubt we can draw this equivalence. In the case of mathematical proofs and the units with which to measure angles, we can be indifferent between the choices in the case that our purpose (what we care about; our telos) is proving a statement true or having a measure of an angle, respectively, but if we care about length of proof or proof assumptions (maybe we want a proof of a theorem that doesn’t rely on the axiom of choice) or angle units supported by a calculator or elegance of working with particular units then there is a difference between these that matters.
So it is with explanations. If our purpose is to make predictions about quantum effects, then a theory about how quantum mechanics works isn’t important, only that the mathematical model predicts reality, and metaphysical questions are moot. But if our purpose is to understand what’s going on beyond what can be predicted using quantum mechanics, then we care a lot about which interpretation of quantum mechanics is correct because it does make predictions about the thing we care about.
This kind of not-caring-because-it-works is only practical so long as it is pragmatic to a particular purpose. Perhaps many people should be more pragmatic, but that seems a separate issue, and there are many reasons why what is pragmatic for one purpose may not be for another, so I think your view is true but insufficient.