[epistemic note: I’m trying to promote my concept “Outcome Influencing Systems (OISs)”. I may be having a happy death spiral around the idea and need to pull out of it. I’m seeking evidence one way or the other. ]
[reading note: I pronounce “OIS” as “oh-ee” and “OISs” as “oh-ees”.]
I really like the idea of categorizing, and cataloguing ethical design patterns (EDPs) and seeking reasonable EDP bridges. I think the concept of “OISs” may be helpful in the endeavour in some ways.
A brief primer on OISs:
“OISs” is my attempt to generalize AI alignment.
“OISs” is inspired by many disciplines and domains including technical AI alignment, PauseAI activisim, mechanistic interpretability, systems theory, optimizer theory, utility theory, and too many others to list.
OISs are any system which has “capabilities” which it uses to “influence” the course of events towards “outcomes” in alignment with it’s “preferences”.
OISs are “densely venn”, meaning that segmenting reality into OISs results in what looks like a venn diagram with very many circles intersecting and nesting. Eg: people are OISs, teams are OISs, governments are OISs, memes are OISs. Every person is made up of many OISs contributing to their biological homeostasis and conscious behaviour.
OISs are “preference independent” in that being a part of an OIS implies no relationship between the preferences of yourself and the preferences of the OIS you are contributing to. If there is a relationship, it must be established through some other way than stating your desires for the OIS you are acting as a part of.
Each OIS has an “implementing substrate” which is the parts of our reality that make up the OIS. Common substrates include: { sociotechnical (groups of humans and human technology), digital (programs on computers), electromechanical (machines with electricity and machinery), biochemical (living things), memetic (existing in peoples minds in a distributed way) }. This list is not complete, nor do I feel strongly that it is the best way to categorize substrates, but it gives an intuition I hope.
Each OIS has a “preference encoding”. This is where and how the preferences exist in the OIS’s implementing substrate.
The capability of an OIS may be understood as an amalgamation of it’s “skill”, “resource access”, and “versatility”.
It seems that when you use the word “mesaoptimizers” you are reaching for the word “OIS” or some variant. Afaik “mesaoptimizer” refers to an optimization process created by an optimization process. It is a useful word, especially for examining reinforcement learning, but it puts focus on the process of creation of the optimizer being an optimizer, which isn’t really the relevant focus. I would suggest that instead “influencing outcomes” is the relevant focus.
Also, we avoid the optimizer/optimized/policy issue. As stated in “Risks from Learned Optimization: Introduction”:
a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.
If what you care about is the outcome, whether or not water will stay in the bottle, then it isn’t “optimizers” you are interested in, but OIS. I think understanding optimization is important for examining possible recursive self improvement and FOOM scenarios, so the bottle cap is indeed not an optimizer, and that is important. But the bottle cap is an OIS because it is influencing the outcome of the water by making it much more likely that all of the water stays in the bottle. (Although, notably, it is an OIS with very very narrow versatility and very weak capability.)
I’m not too interested in whether large social groups working towards projects such as enforcing peace or building AGI are optimizers or not. I suspect they are, but I feel much more comfortable labelling them as “OISs” and then asking, “what are the properties of this OIS?”, “Is it encoding the preferences I think it is? The preferences I should want it to?”.
Ok, that’s my “OIS” explanation, now onto where the “OIS” concept may help the “EDP” concept...
EDPs as OISs:
First, EDPs are OISs that exist in the memetic substrate and influence individual humans and human organizations towards successful ethical behaviour. Some relevant questions from this perspective: What are EDPs capabilities? How do they influence? How do we know what their preferences are? How do we effectively create, deploy, and decommission them based on analysis of their alignment and capability?
EDPs for LST-OISs:
It seems to me that the place we are most interested in EDPs is for influencing the behaviour of society at large, including large organizations and individuals who’s actions may affect other people. So, as I mentioned about “mesaoptimizers”, it seems useful to have clear terminology for discussing what kinds of OIS we are targeting with our EDPs. The most interesting kind to me are “Large SocioTechnical OISs” by which I mean governments of different kinds, large markets and their dynamics, corporations, social movements, and any other thing you can point out as being made up of large numbers of people working with technology to have some kind of influence on the outcomes of our reality. I’m sure it is useful to break LST-OISs down into subcategories, but I feel it is good to have a short and fairly politically neutral way to refer to those kinds of objects in full generality, and especially if it is embedded in the lens of “OISs” with the implication that we should care about the OISs capabilities and preferences.
People don’t control OISs:
Another consideration is that people don’t control OISs. Instead, OISs are like autonomous robots that we create and then send out into the world. But unlike robots, OISs can, and frequently are, created through peoples interactions without the explicit goal of creating an OIS.
This means that we live in a world with many intentionally created OISs, but also many implicit and hybrid OISs. It is not clear if there is a relationship between how an OIS was created and how capable or aligned it is. It seems that markets were mostly created implicitly, but are very capable and rather well aligned, with some important exceptions. Contrast Stalin’s planned economy, which was an intentionally created OIS which I think was genuinely created to be more capable and aligned while serving the same purpose, but turned out to be less capable in many ways and tragically misaligned.
More on the note of not controlling OISs. It is more accurate to say we have some level of influence over them. It may be that our social roles are very constrained in some Molochian ways to the point that we really don’t have any influence over some OISs despite contributing to them. To recontextualize some stoicism: The only OIS you control is yourself. But even that is complexified by the existence of multiple OIS within yourself.
The point of saying this is that no human has the capability to stop companies from developing and deploying dangerous technologies, rather, we are trying to understand and wield OIS which we hope may have that capability. This is important both in making our strategy clear, and in understanding how people relate to what is going on in the world.
Unfortunately, most people I talk to seem to believe that humans are in control. Sure, LST-OISs wouldn’t exist without the humans in the substrate that implements them, and LST-OISs are in control, but this is extremely different from humans themselves being in control.
In trying to develop EDPs for controlling dangerous OISs, it may help to promote OIS terminology to make it easier for people to understand the true (less wrong) dynamics of what is being discussed, or at least it may be valuable to note explicitly that people we are trying to make EDPs for are thinking in terms of tribes of people where people are in control instead of complex sociotechnical systems, and that will affect how they relate to EDPs that are critical of specific OISs that they view as labels pointing at their tribe.
...
Ha, sorry for writing so much. If you read all of this, please lmk what you think : )
[epistemic note: I’m trying to promote my concept “Outcome Influencing Systems (OISs)”. I may be having a happy death spiral around the idea and need to pull out of it. I’m seeking evidence one way or the other. ]
[reading note: I pronounce “OIS” as “oh-ee” and “OISs” as “oh-ees”.]
I really like the idea of categorizing, and cataloguing ethical design patterns (EDPs) and seeking reasonable EDP bridges. I think the concept of “OISs” may be helpful in the endeavour in some ways.
A brief primer on OISs:
“OISs” is my attempt to generalize AI alignment.
“OISs” is inspired by many disciplines and domains including technical AI alignment, PauseAI activisim, mechanistic interpretability, systems theory, optimizer theory, utility theory, and too many others to list.
OISs are any system which has “capabilities” which it uses to “influence” the course of events towards “outcomes” in alignment with it’s “preferences”.
OISs are “densely venn”, meaning that segmenting reality into OISs results in what looks like a venn diagram with very many circles intersecting and nesting. Eg: people are OISs, teams are OISs, governments are OISs, memes are OISs. Every person is made up of many OISs contributing to their biological homeostasis and conscious behaviour.
OISs are “preference independent” in that being a part of an OIS implies no relationship between the preferences of yourself and the preferences of the OIS you are contributing to. If there is a relationship, it must be established through some other way than stating your desires for the OIS you are acting as a part of.
Each OIS has an “implementing substrate” which is the parts of our reality that make up the OIS. Common substrates include: { sociotechnical (groups of humans and human technology), digital (programs on computers), electromechanical (machines with electricity and machinery), biochemical (living things), memetic (existing in peoples minds in a distributed way) }. This list is not complete, nor do I feel strongly that it is the best way to categorize substrates, but it gives an intuition I hope.
Each OIS has a “preference encoding”. This is where and how the preferences exist in the OIS’s implementing substrate.
The capability of an OIS may be understood as an amalgamation of it’s “skill”, “resource access”, and “versatility”.
It seems that when you use the word “mesaoptimizers” you are reaching for the word “OIS” or some variant. Afaik “mesaoptimizer” refers to an optimization process created by an optimization process. It is a useful word, especially for examining reinforcement learning, but it puts focus on the process of creation of the optimizer being an optimizer, which isn’t really the relevant focus. I would suggest that instead “influencing outcomes” is the relevant focus.
Also, we avoid the optimizer/optimized/policy issue. As stated in “Risks from Learned Optimization: Introduction”:
If what you care about is the outcome, whether or not water will stay in the bottle, then it isn’t “optimizers” you are interested in, but OIS. I think understanding optimization is important for examining possible recursive self improvement and FOOM scenarios, so the bottle cap is indeed not an optimizer, and that is important. But the bottle cap is an OIS because it is influencing the outcome of the water by making it much more likely that all of the water stays in the bottle. (Although, notably, it is an OIS with very very narrow versatility and very weak capability.)
I’m not too interested in whether large social groups working towards projects such as enforcing peace or building AGI are optimizers or not. I suspect they are, but I feel much more comfortable labelling them as “OISs” and then asking, “what are the properties of this OIS?”, “Is it encoding the preferences I think it is? The preferences I should want it to?”.
Ok, that’s my “OIS” explanation, now onto where the “OIS” concept may help the “EDP” concept...
EDPs as OISs:
First, EDPs are OISs that exist in the memetic substrate and influence individual humans and human organizations towards successful ethical behaviour. Some relevant questions from this perspective: What are EDPs capabilities? How do they influence? How do we know what their preferences are? How do we effectively create, deploy, and decommission them based on analysis of their alignment and capability?
EDPs for LST-OISs:
It seems to me that the place we are most interested in EDPs is for influencing the behaviour of society at large, including large organizations and individuals who’s actions may affect other people. So, as I mentioned about “mesaoptimizers”, it seems useful to have clear terminology for discussing what kinds of OIS we are targeting with our EDPs. The most interesting kind to me are “Large SocioTechnical OISs” by which I mean governments of different kinds, large markets and their dynamics, corporations, social movements, and any other thing you can point out as being made up of large numbers of people working with technology to have some kind of influence on the outcomes of our reality. I’m sure it is useful to break LST-OISs down into subcategories, but I feel it is good to have a short and fairly politically neutral way to refer to those kinds of objects in full generality, and especially if it is embedded in the lens of “OISs” with the implication that we should care about the OISs capabilities and preferences.
People don’t control OISs:
Another consideration is that people don’t control OISs. Instead, OISs are like autonomous robots that we create and then send out into the world. But unlike robots, OISs can, and frequently are, created through peoples interactions without the explicit goal of creating an OIS.
This means that we live in a world with many intentionally created OISs, but also many implicit and hybrid OISs. It is not clear if there is a relationship between how an OIS was created and how capable or aligned it is. It seems that markets were mostly created implicitly, but are very capable and rather well aligned, with some important exceptions. Contrast Stalin’s planned economy, which was an intentionally created OIS which I think was genuinely created to be more capable and aligned while serving the same purpose, but turned out to be less capable in many ways and tragically misaligned.
More on the note of not controlling OISs. It is more accurate to say we have some level of influence over them. It may be that our social roles are very constrained in some Molochian ways to the point that we really don’t have any influence over some OISs despite contributing to them. To recontextualize some stoicism: The only OIS you control is yourself. But even that is complexified by the existence of multiple OIS within yourself.
The point of saying this is that no human has the capability to stop companies from developing and deploying dangerous technologies, rather, we are trying to understand and wield OIS which we hope may have that capability. This is important both in making our strategy clear, and in understanding how people relate to what is going on in the world.
Unfortunately, most people I talk to seem to believe that humans are in control. Sure, LST-OISs wouldn’t exist without the humans in the substrate that implements them, and LST-OISs are in control, but this is extremely different from humans themselves being in control.
In trying to develop EDPs for controlling dangerous OISs, it may help to promote OIS terminology to make it easier for people to understand the true (less wrong) dynamics of what is being discussed, or at least it may be valuable to note explicitly that people we are trying to make EDPs for are thinking in terms of tribes of people where people are in control instead of complex sociotechnical systems, and that will affect how they relate to EDPs that are critical of specific OISs that they view as labels pointing at their tribe.
...
Ha, sorry for writing so much. If you read all of this, please lmk what you think : )