I would go with the physical object one.
I love the “E+M” name. It reminds of electricity and magnetism, and IMO embedded agency and multi-agent rationality will eventually be seen as two sides of the same coin about as much as electricity and magnetism.
I think our current best theories of both don’t look much like each other, and predict that as we progress on each, they will slowly look more and more like one field.
I mostly agree with this post.
Figuring out the True Name of a thing, a mathematical formulation sufficiently precise that one can apply lots of optimization pressure without the formulation breaking down, is absolutely possible and does happen.
Precision feels pretty far from the true name of the important feature of true names, I am not quite sure what precision means, but on one definition, precision is the opposite of generality, and true names seem anti-precise. I am not saying precision is not a virtue, and it does seem like precision is involved. (like precision on some meta level maybe?)
The second half about robustness to optimization pressure is much closer, but still not right. (I think it is a pretty direct consequence of true names.) It is clearly not yet a true name in the same way that “It is robust to people trying to push it” is the not the true name of inertia.
I observe that I probably miscommunicated. I think multiple people took me to be arguing for a space of lotteries with finite support. That is NOT what I meant. That is sufficient, but I meant something more general when I said “lotteries closed under finite mixtures” I did not mean there only finitely many atomic worlds in the lottery. I only meant that there is a space of lotteries, some of which maybe have infinite support if you want to think about atomic worlds, and for any finite set of lotteries, you can take a finite mixture of those lotteries to get a new lottery in the space. The space of lotteries has to be closed under finite mixtures for VNM to make sense, but the emphasis is on the fact that it is not closed under all possible countable mixtures, not that the mixtures have finite support.
I think this is a special case of the more general fact that probabilities are for outcomes beyond our control.
I didn’t really read much of the post, but I think you are rejecting weighting people by simplicity unfairly here.Imagine you flip a fair coin until it comes up tails, and either A) you suffer if you flip >100 times, or B) you suffer if you flip <100 times. I think you should prefer action A.
However if you think about there as being a countable collection of possible outcomes, one for each possible number of flips, you are are creating “infinite” suffering rather than “finite” suffering, so you should prefer B.
I think the above argument for B is wrong and similar to the argument you are giving.
Note that the choice of where we draw the boundary between outcomes mattered, and similarly the choice of where we draw the boundary between people in your reasoning matters. You need to make choices about what counts as different people vs same people for this reasoning to even make sense, and even if it does make sense, you are still not taking seriously the proposal that we care about the total simplicity of good/bad experience rather than the total count of good/bad experience.
Indeed, I think the lesson of the whole infinite ethics thing is mostly just grappling with we don’t understand how to talk about total count in the infinite case. But I don’t see the argument for wanting to talk about count in the first place. It feels like a property of where you are drawing the boundaries, rather than what is actually there. In the simple cases, we can just draw boundaries between people and declare that our measure is the uniform measure on this finite set, but then once we declare that to be our measure, we interact with it as a measure.
Note that if P dominates Q in the sense that there is a c>0 such that for all events E, P(E)>c⋅Q(E), U is integrable wrt P, then I think U is integrable wrt Q. I propose the space of all probability distribution dominated by a given distribution P.
Conveniently, if we move to semi-measures, we can take P to be the universal semi-measure. I think we can have our space of utility functions be anything integrable WRT the universal semi-measure, and our space of probabilities be anything lower semi-computable, and everything will work out nicely.
Note that you can take infinite sums, without being able to take all possible infinite sums.
I suspect it looks like you have a prior distribution, and the allowable probability distributions are those that you can get to from this distribution using finitely many bits of evidence.
I am not a fan of unbounded utilities, but it is worth noting that most (all?) the problems with unbounded utilties are actually a problem with utility functions that are not integrable with respect to your probabilities. It feels basically okay to me to have unbounded utilities as long as extremely good/bad events are also sufficiently unlikely.The space of allowable probability functions that go with an unbounded utility can still be closed under finite mixtures and conditioning on positive probability events.
Indeed, if you think of utility functions as coming from VNM, and you a space of lotteries closed under finite mixtures but not arbitrary mixtures, I think there are VNM preferences that can only correspond to unbounded utility functions, and the space of lotteries is such that you can’t make St. Petersburg paradoxes. (I am guessing, I didn’t check this.)
See also: https://www.lesswrong.com/posts/BibDWWeo37pzuZCmL/sources-of-intuitions-and-data-on-agi
I am closing the discord link so it will no longer work. Let me know if you still want to join.
I agree with this asymmetry.
One thing I am confused about is whether to think of the e-coli as qualitatively different from the human. The e-coli is taking actions that can be well modeled by an optimization process searching for actions that would be good if this optimization process output them, which has some reflection in it.
It feels like it can behaviorally be well modeled this way, but is mechanistically not shaped like this, I feel like the mechanistic fact is more important, but I feel like we are much closer to having behavioral definitions of agency than mechanistic ones.
Which isn’t *that* large an update. The average number of agent foundations researchers (That are public facing enough that you can update on their lack of progress) at MIRI over the last decade is like 4.
Figuring out how to factor in researcher quality is hard, but it seems plausible to me that the amount of quality adjusted attention directed at your subgoal over the next decade is significantly larger than the amount of attention directed at your subgoal over the last decade. (Which would not all come from you. I do think that Agent Foundations today is non-trivially closer to John today that Agent Foundations 5 years ago is to John today.)
It seems accurate to me to say that Agent Foundations in 2014 was more focused on reflection, which shifted towards embeddedness, and then shifted towards abstraction, and that these things all flow together in my head, and so Scott thinking about abstraction will have more reflection mixed in than John thinking about abstraction. (Indeed, I think progress on abstraction would have huge consequences on how we think about reflection.)In case it is not obvious to people reading, I endorse John’s research program. (Which can maybe be inferred by the fact that I am arguing that it is similar to my own). I think we disagree about what is the most likely path after becoming less confused about agency, but that part of both our plans is yet to be written, and I think the subgoal is enough of a simple concept that I don’t think disagreements about what to do next to have a strong impact on how to do the first step.
To operationalize, I claim that MIRI has been directed at a close enough target to yours that you probably should update on MIRI’s lack of progress at least as much as you would if MIRI was doing the same thing as you, but for half as long.
Hmm, yeah, we might disagree about how much reflection(self-reference) is a central part of agency in general.It seems plausible that it is important to distinguish between the e-coli and the human along a reflection axis (or even more so, distinguish between evolution and a human). Then maybe you are more focused on the general class of agents, and MIRI is more focused on the more specific class of “reflective agents.”
Then, there is the question of whether reflection is going to be a central part of the path to (F/D)OOM.
Does this seem right to you?
I want to disagree about MIRI. Mostly, I think that MIRI (or at least a significant subset of MIRI) has always been primarily directed at agenty systems in general.I want to separate agent foundations at MIRI into three eras. The Eliezer Era (2001-2013), the Benya Era (2014-2016), and the Scott Era(2017-).
The transitions between eras had an almost complete overhaul of the people involved. In spite of this, I believe that they have roughly all been directed at the same thing, and that John is directed at the same thing.
The proposed mechanism behind the similarity is not transfer, but instead because agency in general is a convergent/natural topic.
I think throughout time, there has always been a bias in the pipeline from ideas to papers towards being more about AI. I think this bias has gotten smaller over time, as the agent foundations research program both started having stable funding, and started carrying less and less of the weight of all of AI alignment on its back. (Before going through editing with Rob, I believe Embedded Agency had no mention of AI at all.)
I believe that John thinks that the Embedded Agency document is especially close to his agenda, so I will start with that. (I also think that both John and I currently have more focus on abstraction than what is in the Embedded Agency document).
Embedded Agency, more so than anything else I have done was generated using an IRL shaped research methodology. I started by taking the stuff that MIRI has already been working on, mostly the artifacts of the Benya Era, and trying to communicate the central justification that would cause one to be interested in these topics. I think that I did not invent a pattern, but instead described a preexisting pattern that originally generated the thoughts.
This is consistent with having the pattern be about agency in general, and so I could find the pattern in ideas that were generated based on agency in AI, but I think this is not the case. I think the use of proof based systems is demonstrating an extreme disregard for the substrate that the agency is made of. I claim that the reason that there was a historic focus on proof-based agents, is because it is a system that we could actually say stuff about. The fact that real life agents looked very different of the surface from proof based agents was a shortfall that most people would use to completely reject the system, but MIRI would work in it because what they really cared about was agency in general, and having another system that is easy to say things about that could be used to triangulate agency in general. If MIRI was directed at a specific type of agency, they would have rejected the proof based systems as being too different.
I think that MIRI is often misrepresented as believing in GOFAI because people look at the proof based systems and think that MIRI would only study those if they thought that is what AI might look at. I think in fact, the reason for the proof based systems is because at the time, this was the most fruitful models we had, and we were just very willing to use any lens that worked when trying to look at something very very general.
(One counterpoint here, is maybe MIRI didn’t care about the substrate the agency was running on, but did have a bias towards singleton-like agency, rather than very distributed systems, I think this is slightly true. Today, I think that you need to understand the distributed systems, because realistic singleton-like agents follow many of the same rules, but it is possible that early MIRI did not believe this as much)
Most of the above was generated by looking at the Benya Era, and trying to justify that it was directed at agency in general at least/almost as much as the Scott Era, which seems like the hardest of three for me.
For the Scott Era, I have introspection. I sometimes stop thinking in general, and focus on AI. This is usually a bad idea, and doesn’t generate as much fruit, and it is usually not what I do.For the Eliezer Era, just look at the sequences. I just looked up and reread, and tried to steel man what you originally wrote. My best steel man is that you are saying that MIRI is trying the develop a prescriptive understanding of agency, and you are trying the develop a descriptive understanding of agency. There might be something to this, but it is really complicated. One way to define agency is as the pipeline from the prescriptive to the descriptive, so I am not sure that prescriptive and descriptive agency makes sense as a distinction.
As for the research methodology, I think that we all have pretty different research methodologies. I do not think Benya and Eliezer and I have especially more in common with each other than we do with John, but I might be wrong here. I also don’t think Sam and Abram and Tsvi and I have especially more in common in terms of research methodologies, except in so far as we have been practicing working together. In fact, the thing that might be going on here is that the distinctions in topics is coming from differences in research skills. Maybe proof based systems are the most fruitful model if you are a Benya, but not if you are a Scott or a John. But this is about what is easiest for you to think about, not about a difference in the shared convergent subgoal of understanding agency in general.
MIRI can’t seem to decide if it’s an advocacy org or a research org.
MIRI is a research org. It is not an advocacy org. It is not even close. You can tell by the fact that it basically hasn’t said anything for the last 4 years. Eliezer’s personal twitter account does not make MIRI an advocacy org.
(I recognize this isn’t addressing your actual point. I just found the frame frustrating.)
So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
I believe they are saying that cheering for seeking out disagreement is learned helplessness as opposed to doing a cost-benefit analysis about seeking out disagreement. I am not sure I get that part either.
I was also confused reading the comment, thinking that maybe they copied the wrong paragraph, and meant the 2nd paragraph.
I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.