Oh, nevermind then
Small typo: you have two sections numbered [7.2]
(I assume that by “gears-level models” you mean a combination of reasoning about actors’ concrete capabilities; and game-theory-style models of interaction where we can reach concrete conclusions? If so,)
I would turn this around, and say instead that “gears-level models” alone tend to not be that great for understanding how power works.
The problem is that power is partly recursive. For example, A may have power by virtue of being able to get B to do things for it, but B’s willingness also depends on A’s power. All actors, in parallel, are looking around, trying to understand the landscape of power and possibility, and making decisions based on their understanding, changing that landscape in turn. The resulting dynamics can just be incredibly complicated. Abstractions can come to have something almost like causal power, like a rumor starting a stampede.
We have formal tools for thinking about these kinds of things, like common knowledge, and game-theoretic equilibria. But my impression is that they’re pretty far from being able to describe most important power dynamics in the world.
In your scenario where people deliberate while their AIs handle all the competition on their behalf, you note that persuasion is problematic: this is partly because, with intent-aligned AIs, the system is vulnerable to persuasion in that “what the operator intends” can itself become a target of attack during conflict.
Here is another related issue. In a sufficiently weird or complex situation, “what the operator intends” may not be well-defined—the operator may not know it, and the AI may not be able to infer it with confidence. In this case, clarifying what the human really wants seems to require more deliberation, which is what we were trying to screen off in the first place!
Furthermore, it seems to me that unbounded competition tends to continually spiral out, encompassing more and more stuff, and getting weirder and more complex: there are the usual arms race dynamics. There are anti-inductive dynamics around catching your opponent by surprise by acting outside their ontology. And there is also just the march of technology, which in your scenario hasn’t stopped, and which keeps creating new possibilities and new dimensions for us to grapple with around what we really want. (I’m using state-run social media disinformation campaigns as an intuition pump here.)
So in your scenario, I just imagine the human operators getting overwhelmed pretty quickly, unable to keep from being swept up in conflict. This is unless we have some kind of pretty strong limits on it.
The next time you are making a complicated argument, if you can, try and watch yourself recalling bits and pieces at a time. To me, it feels viscerally like I have the whole argument in mind, but when I look closely, it’s obviously not the case. I’m just boldly going on and putting faith in my memory system to provide the next pieces when I need them. And usually it works out.
Yes! And, I would offer an additional, alternative way of phrasing this: “you” actually do have the whole argument in mind, but it’s a higher-level “you”, a slower but more inclusive one, corresponding to a higher level of memory caching.
(When it doesn’t, there’s this whole failure mode where people continue viscerally feeling like they can make the argument, even though they don’t have the pieces; and I think this is where a lot of bad reasoning comes from.)
The problem here ^ then becomes becomes a problem of maintaining appropriate relationships among the different self-layers.
I disagree that mesa optimization requires explicit representation of values. Consider an RL-type system that (1) learns strategies that work well in its training data, and then (2) generalizes to new strategies that in some sense fit well or are parsimonious with respect to its existing strategies. Strategies need not be explicitly represented. Nonetheless, it’s possible for those initially learned strategies to implicitly bake in what we could call foundational goals or values, that the system never updates away from.
For another angle, consider that value-directed thought can be obfuscated. A single central value could be transformed into a cloud of interlocking heuristics that manage to implement essentially the same logic. (This might make it more difficult to generalize that value, but not impossible.) This is a common strategy in humans, in situations where they want to avoid been seen as holding certain values, but still reap the benefits of effectively acting according to those values.
(tl;dr: I think a lot of this is about one-way (read-only) vs. two-way communication)
As a long-term meditator and someone who takes contents of phenomenal consciousness as quite “real” in their own way, I enjoyed this post—it helped me clarify some of my disagreements with these ideas, and to just feel out this conceptual-argumentative landscape.
I want to draw out something about “access consciousness” that you didn’t mention explicitly, but that I see latent in both your account (correct me if I’m wrong) and the SEP’s discussion of it (ctrl-F for “access consciousness”). Which is: an assumed one-way flow of information. Like, an element of access consciousness carries information, which is made available to the rest of the system; but there isn’t necessarily any flow back to that element.
I believe to the contrary (personal speculation) that all channels in the mind are essentially two-way. For example, say we’re walking around at night, and we see a patch of grey against the black of the darkness ahead. That information is indeed made available to the rest of the system, and we ask ourselves: “could it be a wild animal?”. But where does that question go? I would say it’s addressed to the bit of consciousness that carried the patch of grey. This starts a process of the question percolating down the visual processing hierarchy till it reaches a point where it can be answered—“no, see that curve there, it’s just the moonlight catching a branch”. (In reality the question might kick off lots of other processes too, which I’m ignoring here.)
Anyway, the point is that there is a natural back and forth between higher-level consciousness, which deals in summaries and can relate disparate considerations, and lower-level e.g. sensory consciousness, which deals more in details. And I think this back-and-forth doesn’t fit well in the “access consciousness” picture.
More generally, in terms of architectural design for a mind, we want whatever process carries a piece of information to also be able to act as a locus of processing for that information. The same way, if a CEO is getting briefed on some complex issue by a topic expert, it’s much more efficient if they can ask questions, propose plans and get feedback, and keep them as a go-to-person for that issue, rather than just hear a report.
I think “acting as an addressable locus of processing” accounts for at least a lot of the nature of “phenomenal consciousness” as opposed to “access consciousness”.
Also, on your description of designs factorizing into parts, maybe you already know this, but I wanted to highlight that often “factorization”, even when neat, isn’t just a straightforward decomposition into separate parts. For example, say you’re designing a distributed system. You might have a kind of “vertical” decomposition into roles like leader and follower. But then also a “horizontal” decomposition into different kinds of data that get shared in different ways. The logic of roles and kinds of data might then interact, so that the algorithm is really conceptually two-dimensional.
(These kinds of issues make cognition harder to factorize)
Thanks for the thought-provoking post, Alex.
Thinking about how exactly design stories help create trust, I came upon what might be a useful distinction: whether the design is good according to the considerations known to the designer, vs. whether all relevant considerations are present. A good design story lets us check both of these. The first being false means the designer just did a bad job, or perhaps is hiding something. The second being false means there are actually just considerations the designer didn’t know about—for example because they live implicit in some other human’s head—and spelling things out in a story lets us recognize that, and correct it.
The latter use of stories lets you catch honest mistakes around issues that are unknown unknowns to you, but knowns for someone else. And when I think intuitively about trusting an AI—or another human for that matter—this is a big part of what I care about: beyond them being competent, and not actively deceiving me, I should also trust that they’ll communicate with me enough to fill in all the blind spots they might have about me and the things I care about.
On the first, more philosophical part of your post: I think your notion of “freedom-as-arbitrariness” is actually also what allows for “freedom-as-optimization”, in the following way.
Suppose I have an abstract set of choices. These can be instantiated in a concrete situation, which then carries its own set of considerations. When I go to do my optimizing in a given concrete situation, the more constrained or partisan my choice is in the abstract, the more difficult is my total optimization. Conversely, the freer, the more arbitrary the choice is in the abstract, the less constrained my optimization is in any concrete situation, and the better I can do.
For example, if I were hiring a programmer for a project, then (all else equal) I’d rather have someone who knew a variety of technologies and wasn’t too strongly attached to any, so that they could simply use whatever the situation called for.
You could state this as system design principle: if you’re designing a subsystem that’s going to be doing something, but you don’t really know what yet, optimize the subsystem for being able to potentially do anything (arbitrariness).
I feel there’s much more to say along these lines about systems being well-factored (the pattern of concrete-abstract, as above, is a kind of factorization (as in lambda abstraction)), but I’m having trouble putting it into words at the moment.
Cool. I’ve had one brief, spontaneous experience, while circling, of that sort of concept → vision ‘synaesthesia’: seeing dark halos around people, that I think represented their anxiety and desire to avoid talking about certain things.
But I’d never imagined working deliberately with vision in that way.
So is this a fair summary?
Contemplative practitioners sometimes have great psyche-refactoring experiences, “insights”. But, when interpreting & integrating them, they fail to keep a strong enough epistemic distinction between their experience and the ultimate reality it arises from. And then they make crazy inferences about the nature of that ultimate reality.
When this happens with parts of the network that are involved with the visual system, for instance, the visual field can actually dissolve into a bunch of vibrations temporarily as you refactor parts of the network related to extremely low level things like edge or motion detection (this is also where ‘auras’ come from imo)
Wow, I’ve never heard of this, and it sounds really interesting. Would you care to elaborate, on what kind of refactoring is going on, and what the resulting ‘auras’ are / mean?
You can get into some weird, loopy situations when people reflect enough to lift up the floorboards, infer some “player-level” motivations, and then go around talking or thinking about them at the “character level”. Especially if they’re lacking in tact or social sophistication. I remember as a kid being so confused about charitable giving—because, doesn’t everyone know that giving is basically just a way of trying to make yourself look good? And doesn’t everyone know that that’s Wrong? So shouldn’t everyone just be doing charity anonymously or something?
Luckily, complex societies develop ways for handling different, potentially contradictory levels of meaning with grace and tact; and nobody listens too much to overly sincere children.
Yeah, I think costly signalling is definitely part of it. I think there’s really several different things going on in the birthday example. One, the friend knows that you decided to spend the evening with them, so they can infer that you want to perform friendship, and/or anticipate having a good time with them, enough to make you decide that. This is the costly signalling part. But then there’s also the stuff that actually happens at the party: talking, laughing together, etc. I think this is what actually accounts for most of the “feeling closer”. (Or perhaps these two effects act on different levels of “feeling closer”).
Anyway this is maybe getting unnecessarily analytical.
A ritual is about making a sacrifice to imbue a moment with symbolic power, and using that power to transform yourself.
I’m really curious where you’re getting the sacrifice part from! Or how important you think it is. Because my experience with rituals doesn’t generally include sacrificing anything; and the bits of sociology I’ve read about ritual (mostly Randall Collins’ book Interaction Ritual Chains) don’t mention it much. It does resonate with perhaps a western-magical perspective?
Another aspect of this divide is about articulability. In a nurturing context, it’s possible to bring something up before you can articulate it clearly, and even elicit help articulating it.
For example, “Something about <the proposal we’re discussing> strikes me as contradictory—like it’s somehow not taking into account <X>?”. And then the other person and I collaborate to figure out if and what exactly that contradiction is.
Or more informally, “There’s something about this that feels uncomfortable to me”. This can be very useful to express even when I can’t say exactly what it is that I’m uncomfortable with, IF my conversation partner respects that, and doesn’t dismiss what I’m saying because it’s not precise enough.
In a combative context, on the other hand, this seems like a kind of interaction you just can’t have (I may be wrong, I don’t have much experience in them). Because there, inarticulateness just reads as your arguments being weak. And you don’t want to run the risk of putting half-baked ideas out there and having them swatted down. So your only real choices are to figure out how to articulate things, by yourself, on the fly, or remain silent.
And that’s too bad, because the edge of what can be articulated is IME the most interesting place to be.
(Gendlin’s Focusing is an extreme example of being at the edge of what can be articulated, and in the paired version you have one person whose job is basically to be a nurturing & supportive presence.)