Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.
I just submitted some major edits to the post. Changes include:
1. Name change (“Robust, Coherent Agent”)
After much hemming and hawing and arguing, I changed the name from “Being a Robust Agent” to “Being a Robust, Coherent Agent.” I’m not sure if this was the right call.
It was hard to pin down exactly one “quality” that the post was aiming at. Coherence was the single word that pointed towards “what sort of agent to become.” But I think “robustness” still points most clearly towards why you’d want to change. I added some clarifying remarks about that. In individual sentences I tend to refer to either “Robust Agents” or “Coherent agents” depending on what that sentence was talking about
Other options include “Reflective Agent” or “Deliberate Agent.” (I think once you deliberate on what sort of agent you want to be, you often become more coherent and robust, although not necessarily)
Edit” Undid the name change, seemed like it was just a worse title.
2. Spelling out what exactly the strategy entails
Originally the post was vaguely gesturing at an idea. It seemed good to try to pin that idea down more clearly. This does mean that, by getting “more specific” it might also be more “wrong.” I’ve run the new draft by a few people and I’m fairly happy with the new breakdown:
Deliberate Agency
Gears Level Understanding of Yourself
Coherence and Consistency
Game Theoretic Soundness
But, if people think that’s carving the concept at the wrong joints, let me know.
3. “Why is this important?”
Zvi’s review noted that the post didn’t really argue why becoming a robust agent was so important.
Originally, I viewed the post as simply illustrating an idea rather than arguing for it, and… maybe that was fine. I think it would have been fine to “why” that for a followup post.
But I reflected a bit on why it seemed important to me, and ultimately thought that it was worth spelling it out more explicitly here. I’m not sure my reasons are the same as Zvi’s, or others. But, I think they are fairly defensible reasons. Interested if anyone has significantly different reasons, or thinks that the reasons I listed don’t make sense.
I’m leaning towards reverting the title to just “being a robust agent”, since the new title is fairly clunky, and someone gave me private feedback that it felt less like a clear-handle for a concept. [edit: have done so]
So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.
Have you found that this post (and the concept handle) have been useful for this purpose? Have you found that you do in fact reference it as a litmus test, and steer conversations according to the response others make?
It’s definitely been useful with people I’ve collaborated closely with. (I find the post a useful background while working with the LW team, for example)
I haven’t had a strong sense of whether it’s proven beneficial to other people. I have a vague sense that the sort of people who inspired this post mostly take this as background that isn’t very interesting or something. Possibly with a slightly different frame on how everything hangs together.
It sounds like this post functions (and perhaps was intended) primarily as a filter for people who are already good at agency, and secondarily as a guide for newbies?
If so, that seems like a key point—surrounding oneself with other robust (allied) agents helps develop or support one’s own agency.
I actually think it works better as a guide for newbies than as a filter. The people I want to filter on, I typically am able to have long protracted conversations about agency with them anyway, and this blog post isn’t the primary way that they get filtered.
One thing I think might be worth doing is linking to the post on Realism about Rationality, and explicitly listing at is a potential crux for this post.
I’m pretty onboard theoreticallly with the idea of being a robust agent, but I don’t actually endorse it as a goal because I tend to be a rationality anti-realist.
I actually don’t consider Realism about Rationality cruxy for this (I tried to lay out my own cruxes in this version). Part of what seemed important here is that I think Coherent Agency is only useful in some cases for some people, and I wanted to be clear about when that was.
I think each of the individual properties (gears level understanding, coherence, game-theoretic-soundness) are each just sort of obviously useful in some ways. There are particular failure modes to get trapped in if you’ve only made some incremental progress, but generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
I do think that the sort of person who naturally gravitates towards this probably has something like ‘rationality realism’ going on, but I suspect it’s not cruxy, and in particular I suspect shouldn’t be cruxy for people who aren’t naturally oriented that way.
Some people are aspiring directly to be a fully coherent, legible, sound agent. And that might be possible or desirable, and it might be possible to reach a variation of that that is cleanly mathematically describable. But I don’t think that has be true for the concept to be useful.
generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
To me this implies some level on the continuum of realism about rationality. For instance I often think taht to make improvements on life outcomes I have to purposefully go off of pareto improvements in these domaiins, and sometimes sacrifice them. Because I don’t think my brain runs that code natively, and sometimes efficient native code is in direct opposition to naive rationality.
I’ve been watching the discussion on Realism About Rationality with some interest and surprise. I had thought of ‘something like realism about rationality’ as more cruxy for alignment work, because the inspectability of the AI matters a lot more than the inspectability of your own mind – mostly because you’re going to scale up the AI a lot more than your own mind is likely to scale up. The amount of disagreement that’s come out more recently about that has been interesting.
Some of the people who seem most invested in the Coherent Agency thing are specifically trying to operate on cosmic scales (i.e. part of their goal is to capture value in other universes and simulations, and to be the sort of person you could safely upload).
Upon reflection though, I guess it’s not surprising that people don’t consider realism “cruxy” for alignment, and also not “cruxy” for personal agency (i.e. upon reflection, I think it’s more like an aesthetic input, than a crux. It’s not necessary for agency to be mathematically simple or formalized, for incremental legibility and coherence to be useful for avoiding wasted motion)
Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.
I just submitted some major edits to the post. Changes include:
1. Name change (“Robust, Coherent Agent”)After much hemming and hawing and arguing, I changed the name from “Being a Robust Agent” to “Being a Robust, Coherent Agent.” I’m not sure if this was the right call.It was hard to pin down exactly one “quality” that the post was aiming at. Coherence was the single word that pointed towards “what sort of agent to become.” But I think “robustness” still points most clearly towardswhyyou’d want to change. I added some clarifying remarks about that. In individual sentences I tend to refer to either “Robust Agents” or “Coherent agents” depending on what that sentence was talking aboutOther options include “Reflective Agent” or “Deliberate Agent.” (I think once you deliberate on what sort of agent you want to be, you often become more coherent and robust, although not necessarily)Edit” Undid the name change, seemed like it was just a worse title.
2. Spelling out what exactly the strategy entails
Originally the post was vaguely gesturing at an idea. It seemed good to try to pin that idea down more clearly. This does mean that, by getting “more specific” it might also be more “wrong.” I’ve run the new draft by a few people and I’m fairly happy with the new breakdown:
Deliberate Agency
Gears Level Understanding of Yourself
Coherence and Consistency
Game Theoretic Soundness
But, if people think that’s carving the concept at the wrong joints, let me know.
3. “Why is this important?”
Zvi’s review noted that the post didn’t really argue why becoming a robust agent was so important.
Originally, I viewed the post as simply illustrating an idea rather than arguing for it, and… maybe that was fine. I think it would have been fine to “why” that for a followup post.
But I reflected a bit on why it seemed important to me, and ultimately thought that it was worth spelling it out more explicitly here. I’m not sure my reasons are the same as Zvi’s, or others. But, I think they are fairly defensible reasons. Interested if anyone has significantly different reasons, or thinks that the reasons I listed don’t make sense.
I’m leaning towards reverting the title to just “being a robust agent”, since the new title is fairly clunky, and someone gave me private feedback that it felt less like a clear-handle for a concept. [edit: have done so]
Have you found that this post (and the concept handle) have been useful for this purpose? Have you found that you do in fact reference it as a litmus test, and steer conversations according to the response others make?
It’s definitely been useful with people I’ve collaborated closely with. (I find the post a useful background while working with the LW team, for example)
I haven’t had a strong sense of whether it’s proven beneficial to other people. I have a vague sense that the sort of people who inspired this post mostly take this as background that isn’t very interesting or something. Possibly with a slightly different frame on how everything hangs together.
It sounds like this post functions (and perhaps was intended) primarily as a filter for people who are already good at agency, and secondarily as a guide for newbies?
If so, that seems like a key point—surrounding oneself with other robust (allied) agents helps develop or support one’s own agency.
I actually think it works better as a guide for newbies than as a filter. The people I want to filter on, I typically am able to have long protracted conversations about agency with them anyway, and this blog post isn’t the primary way that they get filtered.
I feel like perhaps the name “Adaptive Agent” captures a large element of what you want: an agent capable of adapting to shifting circumstances.
I like the edits!
One thing I think might be worth doing is linking to the post on Realism about Rationality, and explicitly listing at is a potential crux for this post.
I’m pretty onboard theoreticallly with the idea of being a robust agent, but I don’t actually endorse it as a goal because I tend to be a rationality anti-realist.
I actually don’t consider Realism about Rationality cruxy for this (I tried to lay out my own cruxes in this version). Part of what seemed important here is that I think Coherent Agency is only useful in some cases for some people, and I wanted to be clear about when that was.
I think each of the individual properties (gears level understanding, coherence, game-theoretic-soundness) are each just sort of obviously useful in some ways. There are particular failure modes to get trapped in if you’ve only made some incremental progress, but generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.
I do think that the sort of person who naturally gravitates towards this probably has something like ‘rationality realism’ going on, but I suspect it’s not cruxy, and in particular I suspect shouldn’t be cruxy for people who aren’t naturally oriented that way.
Some people are aspiring directly to be a fully coherent, legible, sound agent. And that might be possible or desirable, and it might be possible to reach a variation of that that is cleanly mathematically describable. But I don’t think that has be true for the concept to be useful.
To me this implies some level on the continuum of realism about rationality. For instance I often think taht to make improvements on life outcomes I have to purposefully go off of pareto improvements in these domaiins, and sometimes sacrifice them. Because I don’t think my brain runs that code natively, and sometimes efficient native code is in direct opposition to naive rationality.
Relatedly:
I’ve been watching the discussion on Realism About Rationality with some interest and surprise. I had thought of ‘something like realism about rationality’ as more cruxy for alignment work, because the inspectability of the AI matters a lot more than the inspectability of your own mind – mostly because you’re going to scale up the AI a lot more than your own mind is likely to scale up. The amount of disagreement that’s come out more recently about that has been interesting.
Some of the people who seem most invested in the Coherent Agency thing are specifically trying to operate on cosmic scales (i.e. part of their goal is to capture value in other universes and simulations, and to be the sort of person you could safely upload).
Upon reflection though, I guess it’s not surprising that people don’t consider realism “cruxy” for alignment, and also not “cruxy” for personal agency (i.e. upon reflection, I think it’s more like an aesthetic input, than a crux. It’s not necessary for agency to be mathematically simple or formalized, for incremental legibility and coherence to be useful for avoiding wasted motion)