# PaulK

Karma: 76
• 30 Nov 2022 20:50 UTC
2 points
0 ∶ 0

Wow, I came here to say literally the same thing about commensurability: that perhaps AM is for what’s commensurable, and GM is for what’s incommensurable.

Though, one note is that to me it actually seems fine to consider different epistemic viewpoints as incommensurate. These might be like different islands of low K-complexity, that each get some nice traction on the world but in very different ways, and where the path between them goes through inaccessibly-high K-complexity territory.

• 28 Nov 2022 19:28 UTC
1 point
0 ∶ 0

Another setting that seems natural and gives rise to multiplicative utility is if we are trying to cover as much of a space as possible, and we divide it dimension-wise into subspace, each tracked by a subagent. To get the total size covered, we multiply together the sizes covered within each subspace.

We can kinda shoehorn unequal weighing in here if we have each sub-agent track not just the fractional or absolute coverage of their subspace, but the per-dimension geometric average of their coverage.

For example, say we’re trying to cover a 3D cube that’s 10x10x10, with subagent A minding dimension 1 and subagent B minding dimensions 2 and 3. A particular outcome might involve A having 410 coverage and B having 81100 coverage, for a total coverage of (4/​10)*(81/​100), which we could also phrase as (4/​10)*(9/​10)^2.

I’m not sure how to make uncertainty work correctly within each factor though.

• These are super interesting ideas, thanks for writing the sequence!

I’ve been trying to think of toy models where the geometric expectation pops out—here’s a partial one, which is about conjunctivity of values:

Say our ultimate goal is to put together a puzzle (U = 1 if we can, U = 0 if not), for which we need 2 pieces. We have sub-agents A and B who care about the two pieces respectively, each of whose utility for a state is its probability estimates for finding its piece there. Then our expected utility for a state is the product of their utilities (assuming this is a one-shot game, so we need to find both pieces at once), and so our decision-making will be geometrically rational.

This easily generalizes to an N-piece puzzle. But, I don’t know how to extend this interpretation to allow for unequal weighing of agents.

• 21 Nov 2022 22:04 UTC
5 points
3 ∶ 0

I also think that the fact that AI safety thinking is so much driven by these fear + distraction patterns, is what’s behind the general flail-y nature of so much AI safety work. There’s a lot of, “I have to do something! This is something! Therefore, I will do this!”

• 21 Nov 2022 22:01 UTC
9 points
2 ∶ 1

I think your diagnosis of the problem is right on the money, and I’m glad you wrote it.

As for your advice on what a person should do about this, it has a strong flavor of: quit doing what you’re doing and go in the opposite direction. I think this is going to be good for some people but not others. Sometimes it’s best to start where you are. Like, one can keep thinking about AI risk while also trying to become more aware of the distortions that are being introduced by these personal and collective fear patterns.

That’s the individual level though, and I don’t want that to deflect from the fact that there is this huge problem at the collective level. (I think rationalist discourse has a libertarian-derived tendency to focus on the former and ignore the latter.)

• Nice essay, makes sense to me! Curious how you see this playing into machine intelligence.

One thought is that “help maintain referential stability”, or something in that ballpark, might be a good normative target for an AI. Such an AI would help humans think, clarify arguments, recover dropped threads of meaning. (Of course, done naively, this could be very socially disruptive, as many social arrangements depend on the absence of clear flows of meaning.)

• As a slightly tangential point, I think if you start thinking about how to cast survival /​ homeostasis in terms of expected-utility maximization, you start having to confront a lot of funny issues, like, “what happens if my proxies for survival change because I self-modified?”, and then more fundamentally, “how do I define /​ locate the ‘me’ whose survival I am valuing? what if I overlap with other beings? what if there are multiple ‘copies’ of me?”. Which are real issues for selfhood IMO.

• >There is no way for the pursuit of homeostasis to change through bottom-up feedback from anything inside the wrapper. The hierarchy of control is strict and only goes one way.

Note that people do sometimes do things like starve themselves to death or choose to become martyrs in various ways, for reasons that are very compelling to them. I take this as a demonstration that homeostatic maintenance of the body is in some sense “on the same level” as other reasons /​ intentions /​ values, rather than strictly above everything else.

• I do see the inverse side: a single fixed goal would be something in the mind that’s not open to critique, hence not truly generally intelligent from a Deutschian perspective (I would guess; I don’t actually know his work well).

To expand on the “not truly generally intelligent” point: one way this could look is if the goal included some tacit assumptions about the universe that turned out later not to be true in general—e.g. if the agent’s goal was something involving increasingly long-range simultaneous coordination, before the discovery of relativity—and if the goal were really unchangeable, then it would bar or at least complicate the agent’s updating to a new, truer ontology.

• I’ve been thinking along the same lines, very glad you’ve articulated all this!

• The way I understand the intent vs. effect thing is that the person doing “frame control” will often contain multitudes: an unconscious, hidden side that’s driving the frame control, and then the more conscious side that may not be very aware of it, and would certainly disclaim any such intent.

• Small typo: you have two sections numbered [7.2]

• (I assume that by “gears-level models” you mean a combination of reasoning about actors’ concrete capabilities; and game-theory-style models of interaction where we can reach concrete conclusions? If so,)

I would turn this around, and say instead that “gears-level models” alone tend to not be that great for understanding how power works.

The problem is that power is partly recursive. For example, A may have power by virtue of being able to get B to do things for it, but B’s willingness also depends on A’s power. All actors, in parallel, are looking around, trying to understand the landscape of power and possibility, and making decisions based on their understanding, changing that landscape in turn. The resulting dynamics can just be incredibly complicated. Abstractions can come to have something almost like causal power, like a rumor starting a stampede.

We have formal tools for thinking about these kinds of things, like common knowledge, and game-theoretic equilibria. But my impression is that they’re pretty far from being able to describe most important power dynamics in the world.

• Interesting essay!

In your scenario where people deliberate while their AIs handle all the competition on their behalf, you note that persuasion is problematic: this is partly because, with intent-aligned AIs, the system is vulnerable to persuasion in that “what the operator intends” can itself become a target of attack during conflict.

Here is another related issue. In a sufficiently weird or complex situation, “what the operator intends” may not be well-defined—the operator may not know it, and the AI may not be able to infer it with confidence. In this case, clarifying what the human really wants seems to require more deliberation, which is what we were trying to screen off in the first place!

Furthermore, it seems to me that unbounded competition tends to continually spiral out, encompassing more and more stuff, and getting weirder and more complex: there are the usual arms race dynamics. There are anti-inductive dynamics around catching your opponent by surprise by acting outside their ontology. And there is also just the march of technology, which in your scenario hasn’t stopped, and which keeps creating new possibilities and new dimensions for us to grapple with around what we really want. (I’m using state-run social media disinformation campaigns as an intuition pump here.)

So in your scenario, I just imagine the human operators getting overwhelmed pretty quickly, unable to keep from being swept up in conflict. This is unless we have some kind of pretty strong limits on it.

• The next time you are making a complicated argument, if you can, try and watch yourself recalling bits and pieces at a time. To me, it feels viscerally like I have the whole argument in mind, but when I look closely, it’s obviously not the case. I’m just boldly going on and putting faith in my memory system to provide the next pieces when I need them. And usually it works out.

Yes! And, I would offer an additional, alternative way of phrasing this: “you” actually do have the whole argument in mind, but it’s a higher-level “you”, a slower but more inclusive one, corresponding to a higher level of memory caching.

(When it doesn’t, there’s this whole failure mode where people continue viscerally feeling like they can make the argument, even though they don’t have the pieces; and I think this is where a lot of bad reasoning comes from.)

The problem here ^ then becomes becomes a problem of maintaining appropriate relationships among the different self-layers.

• I disagree that mesa optimization requires explicit representation of values. Consider an RL-type system that (1) learns strategies that work well in its training data, and then (2) generalizes to new strategies that in some sense fit well or are parsimonious with respect to its existing strategies. Strategies need not be explicitly represented. Nonetheless, it’s possible for those initially learned strategies to implicitly bake in what we could call foundational goals or values, that the system never updates away from.

For another angle, consider that value-directed thought can be obfuscated. A single central value could be transformed into a cloud of interlocking heuristics that manage to implement essentially the same logic. (This might make it more difficult to generalize that value, but not impossible.) This is a common strategy in humans, in situations where they want to avoid been seen as holding certain values, but still reap the benefits of effectively acting according to those values.

• (tl;dr: I think a lot of this is about one-way (read-only) vs. two-way communication)

As a long-term meditator and someone who takes contents of phenomenal consciousness as quite “real” in their own way, I enjoyed this post—it helped me clarify some of my disagreements with these ideas, and to just feel out this conceptual-argumentative landscape.

I want to draw out something about “access consciousness” that you didn’t mention explicitly, but that I see latent in both your account (correct me if I’m wrong) and the SEP’s discussion of it (ctrl-F for “access consciousness”). Which is: an assumed one-way flow of information. Like, an element of access consciousness carries information, which is made available to the rest of the system; but there isn’t necessarily any flow back to that element.

I believe to the contrary (personal speculation) that all channels in the mind are essentially two-way. For example, say we’re walking around at night, and we see a patch of grey against the black of the darkness ahead. That information is indeed made available to the rest of the system, and we ask ourselves: “could it be a wild animal?”. But where does that question go? I would say it’s addressed to the bit of consciousness that carried the patch of grey. This starts a process of the question percolating down the visual processing hierarchy till it reaches a point where it can be answered—“no, see that curve there, it’s just the moonlight catching a branch”. (In reality the question might kick off lots of other processes too, which I’m ignoring here.)

Anyway, the point is that there is a natural back and forth between higher-level consciousness, which deals in summaries and can relate disparate considerations, and lower-level e.g. sensory consciousness, which deals more in details. And I think this back-and-forth doesn’t fit well in the “access consciousness” picture.

More generally, in terms of architectural design for a mind, we want whatever process carries a piece of information to also be able to act as a locus of processing for that information. The same way, if a CEO is getting briefed on some complex issue by a topic expert, it’s much more efficient if they can ask questions, propose plans and get feedback, and keep them as a go-to-person for that issue, rather than just hear a report.

I think “acting as an addressable locus of processing” accounts for at least a lot of the nature of “phenomenal consciousness” as opposed to “access consciousness”.

• Also, on your description of designs factorizing into parts, maybe you already know this, but I wanted to highlight that often “factorization”, even when neat, isn’t just a straightforward decomposition into separate parts. For example, say you’re designing a distributed system. You might have a kind of “vertical” decomposition into roles like leader and follower. But then also a “horizontal” decomposition into different kinds of data that get shared in different ways. The logic of roles and kinds of data might then interact, so that the algorithm is really conceptually two-dimensional.

(These kinds of issues make cognition harder to factorize)