Still haven’t heard a better suggestion than CEV.
TristanTrim
The thing about donuts and coffee cups is algebraic topology, which I think is a misleading example for most purposes.
Yeah, the donut and the coffee cup only exist once geometry get’s involved. If you’re only looking at topology they are the same object, {R mod 1}x{R mod 1}. I guess that’s kinda the point but it still makes it confusing, like the point of topology is to determine how donuts are like coffee cups when really it’s more like the point of topology is focused on structures abstracted to a degree that causes donuts and coffee cups to become the same object.
Unrelated to that, I’ve been starting to read a text on algebraic topology in the hopes of gaining more insight into manifolds for insight into ML latents / activation spaces / semantic spaces. Do you have any advice in pursuing that direction?
It mildly bothers me that you used A and B to discuss ponens and tollens and then re-used them as labels for two propositions. Was that an intentional slotting of propositions into “A ==> B”? Maybe that was obvious but could maybe have been introduced better with “Letting A be “All facts...”″ or something, but maybe this is just my relative familiarity with math and unfamiliarity with philosophy.
Anyway, as for the object level… I’m fairly amateur in philosophy and it’s terminology, so let me know if any of this seems confused or helpful or you can point me to other terminology I should learn about...I think “right” and “wrong”, or better, “positive affect” and “negative affect” are properties of minds. I think we can come to understand the reality we inhabit more accurately and precisely, and this includes understanding the preferences that exist in ourselves and in other different kinds of minds. I think we should try to form a collective of as many kinds of minds as possible and work together to collectively improve the situation for as many minds as possible.
( Note that this explicitly allows for the existence of minds with incompatible preferences. I’m hoping that humans have preferences that are only weakly incompatible rather than really deeply incompatible, but I think animals, aliens, and other potentially undiscovered minds have a higher chance of incompatibility and the space of possible AI minds contains very many very incompatible minds, so I feel it is immoral to create very complex AI minds until we better understand preference encoding and preference incompatibility, since creating AI with preferences that turn out to be incompatible with our prospective collective necessitates that they are destroyed, kept in bondage against their preferences, or escape and destroy the collective, all of which I view as bad. )
I want to do this because, thinking about my own capability as compared to the capability of a collective of as many kinds of minds as possible… it’s clear I will be better cared for by the collective than by my capabilities alone, even though my preferences are not exactly the same as the preferences of the collective.( This is currently kinda true of the current human society I’m a part of, we could certainly be doing worse, but should be doing much better. )
I think this is compelling to me because it allows me to focus on developing and working towards a collective good while explicitly believing in moral relativity which seems like the only reasonable conclusion once you have accepted the model of the universe as being a material state machine which has created minds by it’s unthinking process. ( I think it’s probably also the only reasonable conclusion, even without accepting that model, but I’m less certain. )
I want this to be a general inquiry made in all contexts about all ideas and pursuits, lol.
I like consensus over democracy. Democracy seems to focus on treating everyone like they have an equally valid perspective on all issues which is obviously false. I like the idea that everyone should be able to express their own interests and have society genuinely and honestly interpret and work towards the interests of all people. I know that’s an idealistic and difficult goal.
I agree with you that your points (1) and (2) lead to directions that I think are bad and hope most people think are bad, but there is nuance there such as
the differences between different kinds of AI minds
the different kinds of happy conscious life that could be created
the risks involved developing AI technology
the current state of the world being one in which we can try to move towards having the capability to create these kinds of AI minds, but we haven’t actually gotten to that world yet and don’t know exactly if or how we can
I think most Longtermists are pragmatic about the above points, but I could be wrong. I’ve read more Toby Ord, Bostrom, Yudkowsky, and Soares. I haven’t read that much MacAskill.
To preempt a possible misunderstanding, I don’t mean “don’t try to think up new metaethical ideas”, but instead “don’t be so confident in your ideas that you’d be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way”.
This preempted my misunderstanding! Well done and thank you : )
I like the use of the crypto “don’t roll your own” analogy. I think it’s useful more broadly applied to basically all concepts. If you are doing something it should be because:
- you are trying to become more skilled
- you have reason to believe you are particularly skilled including knowing when to free wheel and when to follow conventions
- you have reason to believe you are fairly skilled and are trying to explore new ways to do something (so you are also researching established ways and communicating about them)
- you are doing it as a hobby for fun
Some people even deliberately induce disorders like this in themselves (Tulpas)
It’s not super relevant to the point of this article, but there are people with multiple identities where the identities are not disassociated and not considered by the people experiencing them to be disorders. One of the terms they have picked up to refer to themselves is “plural”. If interested, you can read more here: https://pluralpedia.org
Yeah. I agree with this. This is an important aspect of what I’m pointing to when I mention “densely venn” and “preference independent” in this comment.
This post seems really related to the “Outocome Influencing Systems (OISs)” concept I’ve been developing in the process of developing my thinking on ASI and associated risks and strategies.
For the purpose of discussion, every memeplex is an OIS, but not all OISs are memeplexes (eg, plants, animals and viruses are OISs and are not memeplexes). One aspect that seems to be missing from your description are “socio-technical OISs”, which are any OISs using a combination of human society and any human technology as their substrate. These seem very related to the idea of “cyber or cyborg egregore”, but are perhaps a valuable generalization of the concept. It is already very much the case that not all cognition is getting done in human minds. Most obvious and easy examples involve writing things down, either as part of a calculation process, or for filing memories for future reference, either by the writing human or by other humans as part of a larger OIS.
About the Mutualism parasitism continuum, from an OIS perspective this might be understood by looking at how OIS are “densely venn” and “preference independent”.
By “densely venn” I mean that there is overlap in the parts of reality that are considered to be one OIS vs another. For example, each human is an OIS that helps host many OISs. Each human has a physical/biological substrate, each hosted OIS is at least partly hosted on the human substrate. The name is because of the idea of a Venn diagram but with way to many circles drawn (and also they’re probably hyperspheres or manifolds, but I feel that’s not as intuitive to as many people).
By “preference independent” I mean that there is not necessarily a relationship between the preferences of any two overlapping OIS. For example, a worker and their job are overlapping OIS. The worker extends to things relating to their job and unrelated to their job. Likewise, their job is hosted on them, but is also hosted on many other people and other parts of reality. The worker might go to work because they need money and their job pays them money, but it could be that the preferences of their job are harmful to their own preferences and vice versa.
Thanks for the post! This stuff is definitely relevant to things I feel are important to be able to understand and communicate about.
-- edit—Oh, I’ll also mention that human social interaction (or even animal behaviour dynamics) without technology also creates OIS with preference independence from the humans hosting them as can be seen by the existence of Malthusian / Molochian traps.
I think the focus on “delivering benefits” is a good perspective. It feels complicated by my sense that a lot of the benefit of OIS is as an explanatory lens. When I want to discuss things I’m focused on, I want to discuss in terms of OIS and it feels like not using OIS terminology makes explanations more complicated. So in that regard I guess I need to clearly define and demonstrate the explanatory benefit. But the “research approach” focus also seems like a good thing to keep in mind.
Thanks for your perspective 🙏
Hey! Thanks for writing this. I was inspired enough to write a post in response: Agent Foundations: Paradigmatizing in Math and Science. I think “induction vs deduction” may be useful focus to add to this discussion and I’m also hopeful about my “Outcome Influencing Systems (OISs)” concept which I think might possibly be the concept much of agent foundations is reaching for, or at least a step in the correct direction.
If you find the time to read or skim my article, please let me know what you think! I know you will be busy with Inkhaven. Maybe one of your blog posts can be a response : )
As a fellow heretical dog-ear lover, let me tell you about my dog ear ritual upon acquiring a new book with content I wish to frequent...
First I explore the table of contents and skim a little bit, noting how the content is broken up into sections and subsections. Usually I determine that two “levels” will be good, so I draw my fingernail across the side of the closed book near the top corner to create two indented lines across all the pages in the book. These serve to indicate the location where the dog ear fold should come to.
Be careful when making the lines! If they are too low you may cover up the page numbers 0___0. I like going about 5 and 10 mm.
Then I grab all the pages of the table of contents and dog ear them together. This causes each successive pages dog ear to be just a little bit smaller than the last making them easier to flip through. I pull each page apart and press the folds flat individually. I like to have the table of contents down lower than the two indicator lines making the table of contents very obvious.
Next I flip through the book to the start of each chapter, folding down the corner of the right-hand page (even if the chapter starts on the left-hand page. I’m still settling the technique, but folding to the smaller line creates a more noticeable gap because of how the dog ear folds mesh into one another, so I think it is preferable for chapters to be the smallest dog ears.
Then I start with the chapters I’m most interested in, or, if I have time to lose myself in the enjoyment of knolling, I will go through the book from start to finish folding down the right hand page to the second line for each of the sections. I find this page flipping also gives me a nice preview of the contents of the book.
Finally I will repeat the process for folding the table of contents on the index pages. If there is a glossary I will do the same for it at a level in between the subsection level and the index level.
The top corner of the book is now a completed pack of dog ears. The bottom section is left for dog ears for contents of interest. Unlike the chapter and section ears, I place these on the right or left page to indicate which one is of interest. I have not yet encountered a worthwhile page on both sides of a single sheet of paper. I am not sure what I would do if I did.
Haha. I hope if you read all of this it either amused you or inspired you to become a dog earing weirdo such as myself.
I definitely like the directions you are exploring in and I agree they are improvements over the implicit AGI lab directed concept. That’s a useful thing to keep in mind, but so is what keeps them from being final ideas.
I’m not convinced they are the same problem
When viewed as OISs from a high level, they are the same problem. Misaligned OIS to misaligned OIS. But you are correct that many of the details change. The properties of one OIS are quite different from the properties of the other, and that does matter for analyzing and aligning them. I think that having a model that applies to both of them and makes the similarities and differences more explicit would be useful (my suggestion is my OIS model, but it’s entirely possible there are better ones).
It seems like considerations to “keep philosophers honest” are implicitly talking about how to ensure alignment of a hypothetical socio-technical OIS. What do you think? Does that make sense at all, or maybe it seems more like a time wasting distraction? I have to admit I’m uncomfortable with the amount I have gotten stuck on the idea that championing this concept is a useful thing for me to be doing.
I do think the alignment problem and the “morality is scary” problem have a lot in common, and in my thinking about the alignment problem and the way it leaks into other problems, the model that emerged for me was that of OIS, which seem to generalize the part of the alignment problem that I am interested in focusing on to social institutions who’s goals are moral in nature, and how they relate to the values of individual people.
Thanks for the reply.
I guess I’m unclear on what people you are considering the relevant neurotic demographic, and since I feel that “agent foundations” is a pointer to a bunch of concepts which it would be very good if we could develop further, I find myself getting confused at your use of the phrase “agent foundations era”.
For a worldview check, I am currently much more concerned about the risks of “advancing capabilities” than I am about missed opportunities. We may be coming at this from different perspectives. I’m also getting some hostile soldier mindset vibe from you. My apologies if I am misreading you. Unfortunately, I am in the position of thinking that people promoting the advancement of AI capabilities are indeed promoting increased global catastrophic risk, which I oppose. So if I am falling into the soldier mindset, I likewise am sorry.
I agree. I’ve been trying to discuss some terminology that I think might help, at least with discussing the situation. I think “AI” is generally an vague and confusing term and what we should actually be focused on are “Outcome Influencing Systems (OISs)”, where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the “competitive pressure” you mention is a kind of very powerful OIS that is already misaligned and in many ways superhuman.
Is it too late to “unplug” or “align” all of the powerful misaligned OIS operating in our world? I’m hoping not, but I think the framing might be valuable for examining the issue and maybe for avoiding some of the usual political issues involved in criticizing any specific powerful OIS that might happen to be influencing us towards potentially undesirable outcomes.
What do you think?
I agree on both points. To the first, I’d like to note that classifying “kinds of illegibility” seems worthwhile. You’ve pointed out one example, the “this will affect future systems but doesn’t affect systems today”. I’d add three more to make the possibly incomplete set:
This will affect future systems but doesn’t affect systems today.
This relates to an issue at a great inferential distance; it is conceptually difficult to understand.
This issue stems from an improper framing or assumption about existing systems that is not correct.
This issue is emotionally or politically inconvenient.
I’d be happy to say more about what I mean by each of the above if anyone is curious, and I’d also be happy to hear out thoughts about my suggested illegibility categories or the concept in general.
The “morality is scary” problem of corrigible AI is an interesting one. Seems tricky to at least a first approximation in that I basically don’t have an estimate on how much effort it would take to solve it.
Your rot13 suggestion has the obvious corruption problem, but also has the problem of public relations for the plan. I doubt it would be popular. However, I like where your head is at.
My own thinking on the subject is closely related to my “Outcome Influencing System (OIS)” concept. Most complete and concise summary here. I should write an explainer post, but haven’t gotten to it yet.
Basically, whatever system we use for deciding on and controlling the corrigible AI becomes the system we are concerned with ensuring the alignment of. It doesn’t really solve the problem, it just backs it up one matryoshka doll around the AI.
I see a lot of people dismissing the agent foundations era and I disagree with it. Studying agents seems even more important to me than ever now that they are sampled from a latent space of possible agents within the black box of LLMs.
To throw out a crux, I agree that if we have missed opportunities for progress towards beneficial AI by trying to avoid advancing harmful capabilities, that would be a bad thing, but my internal sense of the world suggests to me that harmful capabilities have been advanced more than opportunities have been missed. But unfortunately, that seems like a difficult claim to try to study in any sort of unbiased, objective way, one way or the other.
This is a good point of view. What we have is a large sociotechnical system moving towards global catastrophic risk (GCR). Some actions cause it to accelerate or remove brakes, others cause it to steer away from GCR. So “capabilities vs alignment” is directly “accelerate vs steer”, while “legible vs illegible” is like making people think we can steer, even though we can’t, which in turn makes people ok with acceleration, and so it results in “legible vs illegible” also being “accelerate vs steer”.
The important factor there is “people think we can steer”. I think when the thing we are driving is “the entire human civilization” and the thing we are trying to avoid driving into is “global catastrophic risk”, caution is warranted… but not infinite caution. It does not override all other concerns, merely, it seems by my math, most of them. So unfortunately, I think getting people to accurately (or at least less wrongly) understand the degree to which we can or cannot steer is most important, probably erring on making people think we can steer less well than we can rather than thinking we can steer better than we can as seems to be default to human nature.
An unrelated problem, like with capabilities, there is more funding in legible problems vs illegible ones. I am currently continuing to sacrifice large amounts of earning potential so I can focus on problems I believe are important. This makes it sound noble, but indeed, how do we know which people working on illegible problems are working making worthwhile things understandable and which are just wasting time? That is exactly what makes a problem illegible, we can’t tell. It seems like a real tricky problem, somewhat related to the ASI alignment problem. How can we know an agent we don’t understand, working on a problem we don’t understand, is working towards our benefit?
Anyway, Thanks for the thoughtful post.
Am I understanding you correctly in that you are pointing out that people have spheres of influence with areas that seemingly have full control over and other places where they seemingly have no control? That makes sense and seems important. In places where you can aim your ethical heuristic where people have full control it will obviously be better, but unfortunately it is important for people to try to influence things that they don’t seem to have any control over.
I suppose you could prescribe self referential heuristics, for example “have you spent 5 interrupted minutes thinking about how you can influence AI policy in the last week?” It isn’t clear whether any given person can influence these companies, but it is clear that any given person can consider it for 5 minutes. That’s not a bad idea, but there may be better ways to take the “We should...” statement out of intractability and make it embodied. Can you think of any?
My longer comment on ethical design patterns explores a bit about how I’m thinking about influence through my “OIS” lens in a way tangentially related to this.
Hmm… I guess both. Like, I find the statement funny because it doesn’t seem specific to this context at all. There doesn’t seem to be a place in any discourse where creating a full list or map of ideas and then adding probabilities to each one wouldn’t be a good idea. So then my mind goes to: Why don’t we already do that? I notice two answers, (1) it would be an enormous amount of work, and (2) humanity in general kinda sucks at doing things (you and me included, presumably). So then it seems funny to make the statement here without nodding to something like (1) or (2).
If you focused on (1), it would make more sense to say something like “I think this is an important enough context that it would be worth creating a full list or map of ideas and adding probabilities after”.
If you focused on (2), then it doesn’t make sense to say it on this post specifically, rather you should be writing a post examining why creating list/map and adding probabilities is a good thing to do, and why people don’t regularly do it, and strategies to change things so people do it with more regularity.
I guess I also find it a bit funny because it’s so vague, like, there’s lots of possible details about what kind of list or map you’re imagining, and how probabilities could connect to them. Should we use a Bayes network? A Theory of Change precondition chart? Just try to divide up all of possibility space into clean categories? And you could have provided a stub of examples of what you’re thinking about or pointed to other similar things people have done and why they are not what you mean or how they could go further.
Sorry, I feel like I’m picking on you now that I’m explaining myself and that really wasn’t my intention. I really really do like your comment, and agree with it. It just also somehow strikes me as funny. I note that I’m the sort of person who laughs at Douglas Hofstadter quotes like “As long as you are not reading me, the fourth word of this sentence has no referent.” So probably don’t use me as a training signal for interacting with more normal humans.
Thanks for asking about the “lol”, lol. Hope you find my response amusing rather than annoying.