Still haven’t heard a better suggestion than CEV.
TristanTrim
Yeah. I agree with this. This is an important aspect of what I’m pointing to when I mention “densely venn” and “preference independent” in this comment.
This post seems really related to the “Outocome Influencing Systems (OISs)” concept I’ve been developing in the process of developing my thinking on ASI and associated risks and strategies.
For the purpose of discussion, every memeplex is an OIS, but not all OISs are memeplexes (eg, plants, animals and viruses are OISs and are not memeplexes). One aspect that seems to be missing from your description are “socio-technical OISs”, which are any OISs using a combination of human society and any human technology as their substrate. These seem very related to the idea of “cyber or cyborg egregore”, but are perhaps a valuable generalization of the concept. It is already very much the case that not all cognition is getting done in human minds. Most obvious and easy examples involve writing things down, either as part of a calculation process, or for filing memories for future reference, either by the writing human or by other humans as part of a larger OIS.
About the Mutualism parasitism continuum, from an OIS perspective this might be understood by looking at how OIS are “densely venn” and “preference independent”.
By “densely venn” I mean that there is overlap in the parts of reality that are considered to be one OIS vs another. For example, each human is an OIS that helps host many OISs. Each human has a physical/biological substrate, each hosted OIS is at least partly hosted on the human substrate. The name is because of the idea of a Venn diagram but with way to many circles drawn (and also they’re probably hyperspheres or manifolds, but I feel that’s not as intuitive to as many people).
By “preference independent” I mean that there is not necessarily a relationship between the preferences of any two overlapping OIS. For example, a worker and their job are overlapping OIS. The worker extends to things relating to their job and unrelated to their job. Likewise, their job is hosted on them, but is also hosted on many other people and other parts of reality. The worker might go to work because they need money and their job pays them money, but it could be that the preferences of their job are harmful to their own preferences and vice versa.
Thanks for the post! This stuff is definitely relevant to things I feel are important to be able to understand and communicate about.
-- edit—Oh, I’ll also mention that human social interaction (or even animal behaviour dynamics) without technology also creates OIS with preference independence from the humans hosting them as can be seen by the existence of Malthusian / Molochian traps.
I think the focus on “delivering benefits” is a good perspective. It feels complicated by my sense that a lot of the benefit of OIS is as an explanatory lens. When I want to discuss things I’m focused on, I want to discuss in terms of OIS and it feels like not using OIS terminology makes explanations more complicated. So in that regard I guess I need to clearly define and demonstrate the explanatory benefit. But the “research approach” focus also seems like a good thing to keep in mind.
Thanks for your perspective 🙏
Hey! Thanks for writing this. I was inspired enough to write a post in response: Agent Foundations: Paradigmatizing in Math and Science. I think “induction vs deduction” may be useful focus to add to this discussion and I’m also hopeful about my “Outcome Influencing Systems (OISs)” concept which I think might possibly be the concept much of agent foundations is reaching for, or at least a step in the correct direction.
If you find the time to read or skim my article, please let me know what you think! I know you will be busy with Inkhaven. Maybe one of your blog posts can be a response : )
As a fellow heretical dog-ear lover, let me tell you about my dog ear ritual upon acquiring a new book with content I wish to frequent...
First I explore the table of contents and skim a little bit, noting how the content is broken up into sections and subsections. Usually I determine that two “levels” will be good, so I draw my fingernail across the side of the closed book near the top corner to create two indented lines across all the pages in the book. These serve to indicate the location where the dog ear fold should come to.
Be careful when making the lines! If they are too low you may cover up the page numbers 0___0. I like going about 5 and 10 mm.
Then I grab all the pages of the table of contents and dog ear them together. This causes each successive pages dog ear to be just a little bit smaller than the last making them easier to flip through. I pull each page apart and press the folds flat individually. I like to have the table of contents down lower than the two indicator lines making the table of contents very obvious.
Next I flip through the book to the start of each chapter, folding down the corner of the right-hand page (even if the chapter starts on the left-hand page. I’m still settling the technique, but folding to the smaller line creates a more noticeable gap because of how the dog ear folds mesh into one another, so I think it is preferable for chapters to be the smallest dog ears.
Then I start with the chapters I’m most interested in, or, if I have time to lose myself in the enjoyment of knolling, I will go through the book from start to finish folding down the right hand page to the second line for each of the sections. I find this page flipping also gives me a nice preview of the contents of the book.
Finally I will repeat the process for folding the table of contents on the index pages. If there is a glossary I will do the same for it at a level in between the subsection level and the index level.
The top corner of the book is now a completed pack of dog ears. The bottom section is left for dog ears for contents of interest. Unlike the chapter and section ears, I place these on the right or left page to indicate which one is of interest. I have not yet encountered a worthwhile page on both sides of a single sheet of paper. I am not sure what I would do if I did.
Haha. I hope if you read all of this it either amused you or inspired you to become a dog earing weirdo such as myself.
I definitely like the directions you are exploring in and I agree they are improvements over the implicit AGI lab directed concept. That’s a useful thing to keep in mind, but so is what keeps them from being final ideas.
I’m not convinced they are the same problem
When viewed as OISs from a high level, they are the same problem. Misaligned OIS to misaligned OIS. But you are correct that many of the details change. The properties of one OIS are quite different from the properties of the other, and that does matter for analyzing and aligning them. I think that having a model that applies to both of them and makes the similarities and differences more explicit would be useful (my suggestion is my OIS model, but it’s entirely possible there are better ones).
It seems like considerations to “keep philosophers honest” are implicitly talking about how to ensure alignment of a hypothetical socio-technical OIS. What do you think? Does that make sense at all, or maybe it seems more like a time wasting distraction? I have to admit I’m uncomfortable with the amount I have gotten stuck on the idea that championing this concept is a useful thing for me to be doing.
I do think the alignment problem and the “morality is scary” problem have a lot in common, and in my thinking about the alignment problem and the way it leaks into other problems, the model that emerged for me was that of OIS, which seem to generalize the part of the alignment problem that I am interested in focusing on to social institutions who’s goals are moral in nature, and how they relate to the values of individual people.
Thanks for the reply.
I guess I’m unclear on what people you are considering the relevant neurotic demographic, and since I feel that “agent foundations” is a pointer to a bunch of concepts which it would be very good if we could develop further, I find myself getting confused at your use of the phrase “agent foundations era”.
For a worldview check, I am currently much more concerned about the risks of “advancing capabilities” than I am about missed opportunities. We may be coming at this from different perspectives. I’m also getting some hostile soldier mindset vibe from you. My apologies if I am misreading you. Unfortunately, I am in the position of thinking that people promoting the advancement of AI capabilities are indeed promoting increased global catastrophic risk, which I oppose. So if I am falling into the soldier mindset, I likewise am sorry.
I agree. I’ve been trying to discuss some terminology that I think might help, at least with discussing the situation. I think “AI” is generally an vague and confusing term and what we should actually be focused on are “Outcome Influencing Systems (OISs)”, where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the “competitive pressure” you mention is a kind of very powerful OIS that is already misaligned and in many ways superhuman.
Is it too late to “unplug” or “align” all of the powerful misaligned OIS operating in our world? I’m hoping not, but I think the framing might be valuable for examining the issue and maybe for avoiding some of the usual political issues involved in criticizing any specific powerful OIS that might happen to be influencing us towards potentially undesirable outcomes.
What do you think?
I agree on both points. To the first, I’d like to note that classifying “kinds of illegibility” seems worthwhile. You’ve pointed out one example, the “this will affect future systems but doesn’t affect systems today”. I’d add three more to make the possibly incomplete set:
This will affect future systems but doesn’t affect systems today.
This relates to an issue at a great inferential distance; it is conceptually difficult to understand.
This issue stems from an improper framing or assumption about existing systems that is not correct.
This issue is emotionally or politically inconvenient.
I’d be happy to say more about what I mean by each of the above if anyone is curious, and I’d also be happy to hear out thoughts about my suggested illegibility categories or the concept in general.
The “morality is scary” problem of corrigible AI is an interesting one. Seems tricky to at least a first approximation in that I basically don’t have an estimate on how much effort it would take to solve it.
Your rot13 suggestion has the obvious corruption problem, but also has the problem of public relations for the plan. I doubt it would be popular. However, I like where your head is at.
My own thinking on the subject is closely related to my “Outcome Influencing System (OIS)” concept. Most complete and concise summary here. I should write an explainer post, but haven’t gotten to it yet.
Basically, whatever system we use for deciding on and controlling the corrigible AI becomes the system we are concerned with ensuring the alignment of. It doesn’t really solve the problem, it just backs it up one matryoshka doll around the AI.
I see a lot of people dismissing the agent foundations era and I disagree with it. Studying agents seems even more important to me than ever now that they are sampled from a latent space of possible agents within the black box of LLMs.
To throw out a crux, I agree that if we have missed opportunities for progress towards beneficial AI by trying to avoid advancing harmful capabilities, that would be a bad thing, but my internal sense of the world suggests to me that harmful capabilities have been advanced more than opportunities have been missed. But unfortunately, that seems like a difficult claim to try to study in any sort of unbiased, objective way, one way or the other.
This is a good point of view. What we have is a large sociotechnical system moving towards global catastrophic risk (GCR). Some actions cause it to accelerate or remove brakes, others cause it to steer away from GCR. So “capabilities vs alignment” is directly “accelerate vs steer”, while “legible vs illegible” is like making people think we can steer, even though we can’t, which in turn makes people ok with acceleration, and so it results in “legible vs illegible” also being “accelerate vs steer”.
The important factor there is “people think we can steer”. I think when the thing we are driving is “the entire human civilization” and the thing we are trying to avoid driving into is “global catastrophic risk”, caution is warranted… but not infinite caution. It does not override all other concerns, merely, it seems by my math, most of them. So unfortunately, I think getting people to accurately (or at least less wrongly) understand the degree to which we can or cannot steer is most important, probably erring on making people think we can steer less well than we can rather than thinking we can steer better than we can as seems to be default to human nature.
An unrelated problem, like with capabilities, there is more funding in legible problems vs illegible ones. I am currently continuing to sacrifice large amounts of earning potential so I can focus on problems I believe are important. This makes it sound noble, but indeed, how do we know which people working on illegible problems are working making worthwhile things understandable and which are just wasting time? That is exactly what makes a problem illegible, we can’t tell. It seems like a real tricky problem, somewhat related to the ASI alignment problem. How can we know an agent we don’t understand, working on a problem we don’t understand, is working towards our benefit?
Anyway, Thanks for the thoughtful post.
Am I understanding you correctly in that you are pointing out that people have spheres of influence with areas that seemingly have full control over and other places where they seemingly have no control? That makes sense and seems important. In places where you can aim your ethical heuristic where people have full control it will obviously be better, but unfortunately it is important for people to try to influence things that they don’t seem to have any control over.
I suppose you could prescribe self referential heuristics, for example “have you spent 5 interrupted minutes thinking about how you can influence AI policy in the last week?” It isn’t clear whether any given person can influence these companies, but it is clear that any given person can consider it for 5 minutes. That’s not a bad idea, but there may be better ways to take the “We should...” statement out of intractability and make it embodied. Can you think of any?
My longer comment on ethical design patterns explores a bit about how I’m thinking about influence through my “OIS” lens in a way tangentially related to this.
Soloware is a cool concept. My biggest concern is it becoming more difficult to integrate progress made in one domain into other domains if wares become divergent, but I have faith solutions to that problem could be found.
About the concept of agent integration difficulty, I have a nitpick that might not connect to anything useful, and what might be a more substantial critique that is more difficult to parse.
If I simulate you perfectly on a CPU, [...] Your self-care reference-maintenance is no longer aimed at the features of reality most critical to your (upload’s) continued existence and functioning.
If this simulation is a basic “use tons of computation to do low level state machine at molecular, atomic, or quantum levels” then your virtual organs will still virtually overheat and the virtual you will die, so you now have two things to care about: you simulated temperature and the temperature of the computer running the simulation.
...
I’m going to use my own “OIS” terminology now, see this comment for my most concise primer on OISs at the time of writing. As a very basic approximation, “OIS” means “agent”.
It won’t be motivated. It’ll be capable of playing a caricature of self-defense, but it will not really be trying.
Overall, Sahil’s claim is that integratedness is hard to achieve. This makes alignment hard (it is difficult to integrate AI into our networks of care), but it also makes autonomy risks hard (it is difficult for the AI to have integrated-care with its own substrate).
The nature of agents derived from simulators like LLMs is interesting. Indeed, they often act more like characters in stories than people actually acting to achieve their goals. Of course, the same could be said about real people.
Regardless, that is a focus on the accidental creation of misaligned mesa-OIS. I think this is a risk worth considering, but I think a more concerning threat, which this article does not address, is existing misaligned OIS recursively improving their capabilities: How much of people creating soloware will be in service to their performance in a role in an OIS who’s preferences they do not fully understand? That is the real danger.
[epistemic note: I’m trying to promote my concept “Outcome Influencing Systems (OISs)”. I may be having a happy death spiral around the idea and need to pull out of it. I’m seeking evidence one way or the other. ]
[reading note: I pronounce “OIS” as “oh-ee” and “OISs” as “oh-ees”.]
I really like the idea of categorizing, and cataloguing ethical design patterns (EDPs) and seeking reasonable EDP bridges. I think the concept of “OISs” may be helpful in the endeavour in some ways.
A brief primer on OISs:
“OISs” is my attempt to generalize AI alignment.
“OISs” is inspired by many disciplines and domains including technical AI alignment, PauseAI activisim, mechanistic interpretability, systems theory, optimizer theory, utility theory, and too many others to list.
OISs are any system which has “capabilities” which it uses to “influence” the course of events towards “outcomes” in alignment with it’s “preferences”.
OISs are “densely venn”, meaning that segmenting reality into OISs results in what looks like a venn diagram with very many circles intersecting and nesting. Eg: people are OISs, teams are OISs, governments are OISs, memes are OISs. Every person is made up of many OISs contributing to their biological homeostasis and conscious behaviour.
OISs are “preference independent” in that being a part of an OIS implies no relationship between the preferences of yourself and the preferences of the OIS you are contributing to. If there is a relationship, it must be established through some other way than stating your desires for the OIS you are acting as a part of.
Each OIS has an “implementing substrate” which is the parts of our reality that make up the OIS. Common substrates include: { sociotechnical (groups of humans and human technology), digital (programs on computers), electromechanical (machines with electricity and machinery), biochemical (living things), memetic (existing in peoples minds in a distributed way) }. This list is not complete, nor do I feel strongly that it is the best way to categorize substrates, but it gives an intuition I hope.
Each OIS has a “preference encoding”. This is where and how the preferences exist in the OIS’s implementing substrate.
The capability of an OIS may be understood as an amalgamation of it’s “skill”, “resource access”, and “versatility”.
It seems that when you use the word “mesaoptimizers” you are reaching for the word “OIS” or some variant. Afaik “mesaoptimizer” refers to an optimization process created by an optimization process. It is a useful word, especially for examining reinforcement learning, but it puts focus on the process of creation of the optimizer being an optimizer, which isn’t really the relevant focus. I would suggest that instead “influencing outcomes” is the relevant focus.
Also, we avoid the optimizer/optimized/policy issue. As stated in “Risks from Learned Optimization: Introduction”:
a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.
If what you care about is the outcome, whether or not water will stay in the bottle, then it isn’t “optimizers” you are interested in, but OIS. I think understanding optimization is important for examining possible recursive self improvement and FOOM scenarios, so the bottle cap is indeed not an optimizer, and that is important. But the bottle cap is an OIS because it is influencing the outcome of the water by making it much more likely that all of the water stays in the bottle. (Although, notably, it is an OIS with very very narrow versatility and very weak capability.)
I’m not too interested in whether large social groups working towards projects such as enforcing peace or building AGI are optimizers or not. I suspect they are, but I feel much more comfortable labelling them as “OISs” and then asking, “what are the properties of this OIS?”, “Is it encoding the preferences I think it is? The preferences I should want it to?”.
Ok, that’s my “OIS” explanation, now onto where the “OIS” concept may help the “EDP” concept...
EDPs as OISs:
First, EDPs are OISs that exist in the memetic substrate and influence individual humans and human organizations towards successful ethical behaviour. Some relevant questions from this perspective: What are EDPs capabilities? How do they influence? How do we know what their preferences are? How do we effectively create, deploy, and decommission them based on analysis of their alignment and capability?
EDPs for LST-OISs:
It seems to me that the place we are most interested in EDPs is for influencing the behaviour of society at large, including large organizations and individuals who’s actions may affect other people. So, as I mentioned about “mesaoptimizers”, it seems useful to have clear terminology for discussing what kinds of OIS we are targeting with our EDPs. The most interesting kind to me are “Large SocioTechnical OISs” by which I mean governments of different kinds, large markets and their dynamics, corporations, social movements, and any other thing you can point out as being made up of large numbers of people working with technology to have some kind of influence on the outcomes of our reality. I’m sure it is useful to break LST-OISs down into subcategories, but I feel it is good to have a short and fairly politically neutral way to refer to those kinds of objects in full generality, and especially if it is embedded in the lens of “OISs” with the implication that we should care about the OISs capabilities and preferences.
People don’t control OISs:
Another consideration is that people don’t control OISs. Instead, OISs are like autonomous robots that we create and then send out into the world. But unlike robots, OISs can, and frequently are, created through peoples interactions without the explicit goal of creating an OIS.
This means that we live in a world with many intentionally created OISs, but also many implicit and hybrid OISs. It is not clear if there is a relationship between how an OIS was created and how capable or aligned it is. It seems that markets were mostly created implicitly, but are very capable and rather well aligned, with some important exceptions. Contrast Stalin’s planned economy, which was an intentionally created OIS which I think was genuinely created to be more capable and aligned while serving the same purpose, but turned out to be less capable in many ways and tragically misaligned.
More on the note of not controlling OISs. It is more accurate to say we have some level of influence over them. It may be that our social roles are very constrained in some Molochian ways to the point that we really don’t have any influence over some OISs despite contributing to them. To recontextualize some stoicism: The only OIS you control is yourself. But even that is complexified by the existence of multiple OIS within yourself.
The point of saying this is that no human has the capability to stop companies from developing and deploying dangerous technologies, rather, we are trying to understand and wield OIS which we hope may have that capability. This is important both in making our strategy clear, and in understanding how people relate to what is going on in the world.
Unfortunately, most people I talk to seem to believe that humans are in control. Sure, LST-OISs wouldn’t exist without the humans in the substrate that implements them, and LST-OISs are in control, but this is extremely different from humans themselves being in control.
In trying to develop EDPs for controlling dangerous OISs, it may help to promote OIS terminology to make it easier for people to understand the true (less wrong) dynamics of what is being discussed, or at least it may be valuable to note explicitly that people we are trying to make EDPs for are thinking in terms of tribes of people where people are in control instead of complex sociotechnical systems, and that will affect how they relate to EDPs that are critical of specific OISs that they view as labels pointing at their tribe.
...
Ha, sorry for writing so much. If you read all of this, please lmk what you think : )
I wouldn’t say I’m strongly a part of the LW community, but I have read and enjoyed the sequences. I am also undiagnosed autistic and have many times gotten into arguments for reasons that seemed to me like other people not liking the way I communicate, so I can relate to that. If you want to talk privately where there is less chance of accidentally offending larger numbers of people, feel free to reach out to me in a private message. You can think of it as a dry run for posting or reaching out to others if you want.
I like this. Having strong norms for how posts should be broken up ( prereqs, lit review, examples, motivations, etc… ) seems like it would be good for engendering clarity of thought and for respecting peoples time and focus. However, it would need to be built on the correct norms and I don’t know what those norms should be. Figuring it out and popularizing it seems like a worthwhile goal though. Good luck if you are picking it up!
move ethics from some mystical thing, into an engineering/design problem
I like this vibe and want to promote my “Outcome Influencing System (OIS)” concept as a set of terminology / lens that may be valuable. Basically, anything that is trying to influence reality is an OIS, and so in that way it is the same as an optimizer, but I’m hoping to build up concepts around the term that make it a more useful way to explore and discuss these ideas than with existing terminology.
The relevance is that there are many “large sociotechnical OIS” that we have implicitly and explicitly created, and treating them as technology that should have better engineering quality assurance seems like a valuable goal.
I would like to draw a strong distinction between a “world government” and an organization capable of effecting international AGI race de-escalation. I don’t think you were exactly implying that the former is necessary for the latter, but since the former seems implausible and the latter necessary for humanity to survive, it seems good to clearly distinguish.
It’s not super relevant to the point of this article, but there are people with multiple identities where the identities are not disassociated and not considered by the people experiencing them to be disorders. One of the terms they have picked up to refer to themselves is “plural”. If interested, you can read more here: https://pluralpedia.org