Yeah. I think as well as definition problems we also need to be thinking about incentive problems. As you mention, it can’t be acceptable for companies to treat “fines as a cost of doing business”.
A direction that might be worthwhile: Focus explicitly on incentives with the goal of making it illegal to run a company which is incentivized to increase AGI capabilities, or incentivizes people to increase AGI capabilities. I don’t know how that would work and haven’t really thought through the implications, but I do find it an interesting idea, especially since so many companies seem to be able to pass liability onto employees by making their “official policy” ban certain behaviours while the operation of the company incentivizes employees to break the companies official policy.
Another direction I am interested in is moving away from “AI” and “AGI” as terms to talk about. I feel they are so old and overloaded with contradictory meanings that it would be better to start over fresh. The system of definitions I am starting to build up centers on “Outcome Influencing Systems (OISs)” which I am trying to define and build terminology around for the sake of technical alignment work as well as communicating risks. I don’t think the terminology is ready to be used in a legal context yet, but I would like to work towards a point where it could be.
… moving away from “AI” and “AGI” as terms to talk about. I feel they are so old and overloaded with contradictory meanings that it would be better to start over fresh.
I interpret Holden Karnofsky’s PASTA from 2021 in the same vein (emphasis his):
This piece is going to focus on exploring a particular kind of AI I believe could be transformative: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I will call this sort of technology Process for Automating Scientific and Technological Advancement, or PASTA.3 (I mean PASTA to refer to either a single system or a collection of systems that can collectively do this sort of automation.)
(Note how Holden doesn’t care that the AI system be singular, unlike say the Metaculus AGI definition.) He continued (again emphasis his):
PASTA could resolve the same sort of bottleneck discussed in The Duplicator and This Can’t Go On—the scarcity of human minds (or something that plays the same role in innovation).
By talking about PASTA, I’m partly trying to get rid of some unnecessary baggage in the debate over “artificial general intelligence.” I don’t think we need artificial general intelligence in order for this century to be the most important in history. Something narrower—as PASTA might be—would be plenty for that.
When I read that last paragraph, I thought, yeah this seems like the right first-draft operational definition of “transformative AI”, and I anticipated it to gradually disseminate into the broader conversation and be further refined, also because the person proposing this definition was Holden instead of some random alignment researcher or whatever. Instead it seems(?) mostly ignored, at least outside of Open Phil, which I still find confusing.
I’m not sure how you’re thinking about OSIs, would you say they’re roughly in line with what Holden meant above?
Separately, I do however think that the right operationalisation of AGI-in-particular isn’t necessarily Holden’s, but Steven Byrnes’. I like that entire subsection, so let me share it here in full:
A frequent point of confusion is the word “General” in “Artificial General Intelligence”:
The word “General” DOES mean “not specific”, as in “In general, Boston is a nice place to live.”
The word “General” DOES NOT mean “universal”, as in “I have a general proof of the math theorem.”
An AGI is not “general” in the latter sense. It is not a thing that can instantly find every pattern and solve every problem. Humans can’t do that either! In fact, no algorithm can, because that’s fundamentally impossible. Instead, an AGI is a thing that, when faced with a difficult problem, might be able to solve the problem easily, but if not, maybe it can build a tool to solve the problem, or it can find a clever way to avoid the problem altogether, etc.
Consider: Humans wanted to go to the moon, and then they figured out how to do so, by inventing extraordinarily complicated science and engineering and infrastructure and machines. Humans don’t have a specific evolved capacity to go to the moon, akin to birds’ specific evolved capacity to build nests. But they got it done anyway, using their “general” ability to figure things out and get things done.
So for our purposes here, think of AGI as an algorithm which can “figure things out” and “understand what’s going on” and “get things done”, including using language and science and technology, in a way that’s reminiscent of how most adult humans (and groups and societies of humans) can do those things, but toddlers and chimpanzees and today’s large language models (LLMs) can’t. Of course, AGI algorithms may well be subhuman in some respects and superhuman in other respects.
This image is poking fun at Yann LeCun’s frequent talking point that “there is no such thing as Artificial General Intelligence”. (Image sources: 1,2)
Anyway, this series is about brain-like algorithms. These algorithms are by definition capable of doing absolutely every intelligent behavior that humans (and groups and societies of humans) can do, and potentially much more. So they can definitely reach AGI. Whereas today’s AI algorithms are not AGI. So somewhere in between here and there, there’s a fuzzy line that separates “AGI” from “not AGI”. Where exactly is that line? My answer: I don’t know, and I don’t care. Drawing that line has never come up for me as a useful thing to do.
It seems entirely possible for a collection of AI systems to be a civilisation-changing PASTA without being at all Byrnes-general, and also possible for a Byrnes-general algorithm to be below average human intelligence let alone be a PASTA.
Wow, thank you. These are great definitions that I hadn’t already taken note of.
[PASTA] seems like the right first-draft operational definition of “transformative AI”
I think “PASTA” is not quite the right definition, at least not for what I am trying to focus on. I hope my discussion of OISs below will make that more clear. I also “PASTA” is a rhetorically weak acronym and that contributes some of it’s lack of adoption. I don’t think “OIS” is perfect, but I did put some thought into how it can be used in language. First, it can be pronounced “oh-ee” or “oh-ees” for plural/possesive, which is a very short phonetic that does not already have any english meaning afaik. Second, none of the other things I found acronymized with “OIS” seem like they will interfere in any important context. And finally, I think the acronym itself “Outcome Influencing System” is quite good and on point once you know what is being talked about.
I’m not sure how you’re thinking about OSIs, would you say they’re roughly in line with what Holden meant above?
There are definitely motivational and definitional similarities between PASTA and OISs, but I see them as quite distinct. I’ll try to explain.
It seems like both PASTA and OIS defs remove some elements from the general AGI def and add different elements. I think PASTA ends up more specific than AI, while OISs are less specific than AI.
Both emphasize that these are systems, not single programs or products like how people often think of AI.
PASTA is focused on kinds of AI systems with preferences for specific outcomes: advancing science and technology. On the other hand, you could think of OISs as an attempt to generalize the AI alignment problem. So the OIS definition just points out that there are outcomes that the OIS has preference for, but does not specify what those preferences are. So from this perspective, PASTA is an OIS with science and technology preferences.
Another place OIS are more general is “substrate”. Within the OIS def, there is the concept of the system being implemented by a substrate. With PASTA and AI in general, the implied substrate is digital computers. OISs have no implied substrate, but I am hoping for the main substrate of focus will be the “sociotechnical” substrate. That is, a valid OIS might look like a group of humans working with computers and other technology (including paper documents, writing, and all other human tools). This seems like an important distinction since we do already seem to have OISs like these that seem to favour scientific and technological progress, and indeed, these OISs are using what they develop to improve their future development. Whether or not humans are part of the system may be less important than one would naively assume.
More on the idea that OIS generalize “misalignment”: I want the OIS definition to also emphasize “alignment with respect to capabilities”. The set of possible capabilities is highly articulate, which is important to note, but I think it’s still fine to speak of “greater or lesser capability” even though it is not as precisely defined as I would like. (((IE, suppose there exists capability A, B, and C and OIS X and Y with capa(X)={A,B} and capa(Y)={A,C}. Then, is one more capable than the other? The question is not answerable without a metric over sets of capabilities, and to my knowledge, no such metric exists.))) This helps elucidate an important dynamic: incapable systems need not have articulate preferences to be safe / aligned, but, the more capable an OIS is, the more articulately it needs to target human beneficial outcomes in order to be considered “safely aligned”. I note that PASTA, preferring science and technology, is misaligned with humanity as it’s capabilities go to infinity.
About the Steven Byrnes operationalization, I agree that “generality” is not well defined. I want to keep considering the terminology, but currently I feel about “generality” the same as I feel about “capability”, or rather, the capabilities of an OIS have both breadth and depth and can maybe be thought of as living in “the space of all possible tasks, aka task-space”. I might call depth “skill” and breadth “generality” or “versatility”. So an OIS’s capability is defined by it’s versatility and skill. Most OISs will not strictly outmatch others in the sense at being more skilled at every task the other can perform, so again, without a metric over task-space, “versatility” or “generality” are not well defined in the sense that you can’t say “this OIS is more versatile than that OIS”. But, again, people do have a sense of what a metric over task-space should be, so we can gesture at the idea of one OIS being more versatile, but people’s intuition may disagree, and it is good to be aware that we are relying on peoples fuzzy intuition of what a sensible metric over task-space or capability-space should be.
Another thing I want emphasize is OISs are not discrete. Instead, they are like dense venn diagrams, both literally and in the sense that they are a lens which can be applied with great expressiveness to examine the world. I’ll illustrate with an example… Two carpenters are at a worksite. Since both of them are instrumentally aligned to their work, we may temporarily drop their full human preferences and examine the OIS they have created within themselves, specifically, their role as carpenters. That is a first venn aspect. It might be accurate to describe it: the carpenter role is an OIS that uses a person as an implementing substrate. Now, the two carpenters share a saw, and as such, both of them have the capability to cut wood. As OISs, the saw is a part of both of them. This is a second venn aspect. Without the saw, each might have the capability “to use a saw” but it is the existence of the shared saw that transforms the useless “use a saw” capability into the useful “cut wood” capability. There are not two saws, but the saw is part of (at least) two OISs. Finally, the two carpenters are brothers who operate a family business. This business is itself an OIS, and is probably the most useful OIS to consider if you are interested in harnessing it’s power to influence the outcome of wood into finely crafted furnishings. This is a third venn aspect. OISs are composed through their interaction to create larger, often more capable, OISs.
A final detail of the OIS def: It explicitly throws out “intelligence” as a term and instead uses the term “capabilities”. This is because I observe people sometimes argue about what intelligence is as if intelligence was a puzzle handed to us rather than a word we use to label things we observe in the world. People have capabilities to think and plan and act, and some people call that intelligence. Some people do well in school, and some people call that intelligence. Both of those are important to examine, but don’t, imo, benefit from use of the word “intelligence”, and neither of those common definitions ever seems to point at the international social and economic forces that have transformed the surface of Earth. That is the level of capability I often want to refer to and the word “intelligence” only ever gets in the way.
If you read all that, thanks! Please send me more similar concepts if you have them and ask any questions you would like.
I think you’re thinking about this in a very useful way!
If we can narrow down a specific (but broad enough) set of capabilities that we consider illegal to incentivise, then this would be workable.
And yes, when I say “fines as cost of doing business”: this is a very common conclusion that DPOs[1] in Europe come to when asked about the effectiveness of GDPR enforcement.
It’s way too easy for big corporations to just calculate and set off the cost of the fine against the profit margin produced by “the not compliant action”.
Which is why I do support the idea of “banning” dangerous development, and how I started thinking about the definitions to begin with.
Yeah. I think as well as definition problems we also need to be thinking about incentive problems. As you mention, it can’t be acceptable for companies to treat “fines as a cost of doing business”.
A direction that might be worthwhile: Focus explicitly on incentives with the goal of making it illegal to run a company which is incentivized to increase AGI capabilities, or incentivizes people to increase AGI capabilities. I don’t know how that would work and haven’t really thought through the implications, but I do find it an interesting idea, especially since so many companies seem to be able to pass liability onto employees by making their “official policy” ban certain behaviours while the operation of the company incentivizes employees to break the companies official policy.
Another direction I am interested in is moving away from “AI” and “AGI” as terms to talk about. I feel they are so old and overloaded with contradictory meanings that it would be better to start over fresh. The system of definitions I am starting to build up centers on “Outcome Influencing Systems (OISs)” which I am trying to define and build terminology around for the sake of technical alignment work as well as communicating risks. I don’t think the terminology is ready to be used in a legal context yet, but I would like to work towards a point where it could be.
I interpret Holden Karnofsky’s PASTA from 2021 in the same vein (emphasis his):
(Note how Holden doesn’t care that the AI system be singular, unlike say the Metaculus AGI definition.) He continued (again emphasis his):
When I read that last paragraph, I thought, yeah this seems like the right first-draft operational definition of “transformative AI”, and I anticipated it to gradually disseminate into the broader conversation and be further refined, also because the person proposing this definition was Holden instead of some random alignment researcher or whatever. Instead it seems(?) mostly ignored, at least outside of Open Phil, which I still find confusing.
I’m not sure how you’re thinking about OSIs, would you say they’re roughly in line with what Holden meant above?
Separately, I do however think that the right operationalisation of AGI-in-particular isn’t necessarily Holden’s, but Steven Byrnes’. I like that entire subsection, so let me share it here in full:
It seems entirely possible for a collection of AI systems to be a civilisation-changing PASTA without being at all Byrnes-general, and also possible for a Byrnes-general algorithm to be below average human intelligence let alone be a PASTA.
Wow, thank you. These are great definitions that I hadn’t already taken note of.
I think “PASTA” is not quite the right definition, at least not for what I am trying to focus on. I hope my discussion of OISs below will make that more clear. I also “PASTA” is a rhetorically weak acronym and that contributes some of it’s lack of adoption. I don’t think “OIS” is perfect, but I did put some thought into how it can be used in language. First, it can be pronounced “oh-ee” or “oh-ees” for plural/possesive, which is a very short phonetic that does not already have any english meaning afaik. Second, none of the other things I found acronymized with “OIS” seem like they will interfere in any important context. And finally, I think the acronym itself “Outcome Influencing System” is quite good and on point once you know what is being talked about.
There are definitely motivational and definitional similarities between PASTA and OISs, but I see them as quite distinct. I’ll try to explain.
It seems like both PASTA and OIS defs remove some elements from the general AGI def and add different elements. I think PASTA ends up more specific than AI, while OISs are less specific than AI.
Both emphasize that these are systems, not single programs or products like how people often think of AI.
PASTA is focused on kinds of AI systems with preferences for specific outcomes: advancing science and technology. On the other hand, you could think of OISs as an attempt to generalize the AI alignment problem. So the OIS definition just points out that there are outcomes that the OIS has preference for, but does not specify what those preferences are. So from this perspective, PASTA is an OIS with science and technology preferences.
Another place OIS are more general is “substrate”. Within the OIS def, there is the concept of the system being implemented by a substrate. With PASTA and AI in general, the implied substrate is digital computers. OISs have no implied substrate, but I am hoping for the main substrate of focus will be the “sociotechnical” substrate. That is, a valid OIS might look like a group of humans working with computers and other technology (including paper documents, writing, and all other human tools). This seems like an important distinction since we do already seem to have OISs like these that seem to favour scientific and technological progress, and indeed, these OISs are using what they develop to improve their future development. Whether or not humans are part of the system may be less important than one would naively assume.
More on the idea that OIS generalize “misalignment”: I want the OIS definition to also emphasize “alignment with respect to capabilities”. The set of possible capabilities is highly articulate, which is important to note, but I think it’s still fine to speak of “greater or lesser capability” even though it is not as precisely defined as I would like. (((IE, suppose there exists capability A, B, and C and OIS X and Y with capa(X)={A,B} and capa(Y)={A,C}. Then, is one more capable than the other? The question is not answerable without a metric over sets of capabilities, and to my knowledge, no such metric exists.))) This helps elucidate an important dynamic: incapable systems need not have articulate preferences to be safe / aligned, but, the more capable an OIS is, the more articulately it needs to target human beneficial outcomes in order to be considered “safely aligned”. I note that PASTA, preferring science and technology, is misaligned with humanity as it’s capabilities go to infinity.
About the Steven Byrnes operationalization, I agree that “generality” is not well defined. I want to keep considering the terminology, but currently I feel about “generality” the same as I feel about “capability”, or rather, the capabilities of an OIS have both breadth and depth and can maybe be thought of as living in “the space of all possible tasks, aka task-space”. I might call depth “skill” and breadth “generality” or “versatility”. So an OIS’s capability is defined by it’s versatility and skill. Most OISs will not strictly outmatch others in the sense at being more skilled at every task the other can perform, so again, without a metric over task-space, “versatility” or “generality” are not well defined in the sense that you can’t say “this OIS is more versatile than that OIS”. But, again, people do have a sense of what a metric over task-space should be, so we can gesture at the idea of one OIS being more versatile, but people’s intuition may disagree, and it is good to be aware that we are relying on peoples fuzzy intuition of what a sensible metric over task-space or capability-space should be.
Another thing I want emphasize is OISs are not discrete. Instead, they are like dense venn diagrams, both literally and in the sense that they are a lens which can be applied with great expressiveness to examine the world. I’ll illustrate with an example… Two carpenters are at a worksite. Since both of them are instrumentally aligned to their work, we may temporarily drop their full human preferences and examine the OIS they have created within themselves, specifically, their role as carpenters. That is a first venn aspect. It might be accurate to describe it: the carpenter role is an OIS that uses a person as an implementing substrate. Now, the two carpenters share a saw, and as such, both of them have the capability to cut wood. As OISs, the saw is a part of both of them. This is a second venn aspect. Without the saw, each might have the capability “to use a saw” but it is the existence of the shared saw that transforms the useless “use a saw” capability into the useful “cut wood” capability. There are not two saws, but the saw is part of (at least) two OISs. Finally, the two carpenters are brothers who operate a family business. This business is itself an OIS, and is probably the most useful OIS to consider if you are interested in harnessing it’s power to influence the outcome of wood into finely crafted furnishings. This is a third venn aspect. OISs are composed through their interaction to create larger, often more capable, OISs.
A final detail of the OIS def: It explicitly throws out “intelligence” as a term and instead uses the term “capabilities”. This is because I observe people sometimes argue about what intelligence is as if intelligence was a puzzle handed to us rather than a word we use to label things we observe in the world. People have capabilities to think and plan and act, and some people call that intelligence. Some people do well in school, and some people call that intelligence. Both of those are important to examine, but don’t, imo, benefit from use of the word “intelligence”, and neither of those common definitions ever seems to point at the international social and economic forces that have transformed the surface of Earth. That is the level of capability I often want to refer to and the word “intelligence” only ever gets in the way.
If you read all that, thanks! Please send me more similar concepts if you have them and ask any questions you would like.
I think you’re thinking about this in a very useful way!
If we can narrow down a specific (but broad enough) set of capabilities that we consider illegal to incentivise, then this would be workable.
And yes, when I say “fines as cost of doing business”: this is a very common conclusion that DPOs[1] in Europe come to when asked about the effectiveness of GDPR enforcement.
It’s way too easy for big corporations to just calculate and set off the cost of the fine against the profit margin produced by “the not compliant action”.
Which is why I do support the idea of “banning” dangerous development, and how I started thinking about the definitions to begin with.
Again, really useful comment!
Data Protection Officers