because many “goals” I have in mind involve existence and not optimality conditions
Could you give some examples? This seems quite important. Re: existence, this is possibly related to what you’re thinking?
While writing, I conceptualised the external observer as part of a thought experiment in which non-embeddedness is allowed as a hermeneutic device (sorry that this wasn’t clear!). I think this can be a useful idea even if no possible external observer actually has this property, but it breaks down when the observer has itself to interact with the agent (i.e. in a game-theoretical setup).
I’m naturally (intuitively) suspicious of god’s-eye-views––thinking about observers that stand completely outside observed-systems––and much more inclined to thinking in interactive/embedded terms. I’m curious why/how you find this perspective fruitful and/or interesting.
Let’s say you’re a selction process designing an agent and choose betweeen “giving it” an architecture that is likely to encode abstraction A versus abstraction B[3]. Your choice between A and B might depend on questions like what internal model and goals are fit for the agent’s environment. This means there’s a dependency between the way in which the agent will be intelligent (i.e. whether it learns abstraction A or B) and what its internal goals will be.
I see why this is true for e.g. a hawk and a bat, where the abstraction-capabilities in question are visual ability v.s. echolocation. I don’t see why this is true once the selection process cranks up the “general intelligence” knob (at least, assuming natural abstractions).
Information about one gives information about the other and vice-versa.
I’d expect two ASIs with different goals to have similar abstractions about the world, but different abstractions where those abstractions involve themselves/some level of recursive modeling, since those are how internal goals are represented; i.e. imo mutual info between them is low in this case.
I like this perspective a lot and I think it is indeed more informative than the optimizey perspective wrt agents-that-we-currently-observe-exist. But I don’t expect this perspective to be informative if we build something that is very consequentialist/optimize-y (e.g. ASI).
Imo the best formal grounding for this intuition of agents being exist-y/satisfice-y perspective is FEP. And I do think ASI will be an active inference agent, but that doesn’t really preclude the possibility that it’s also optimize-y; active inference agents behave more and more like EU maximizers under some conditions (namely low ambiguity), and I (tentatively) expect these conditions to be met for ASI.
Some of my uncertainties around this:
Maybe it’s possible to construct an agent that has high optimization power/capability, but is uncertain about what to optimize for. This would probably lead to it acting in less scary ways. Whether this is possible probably depends on the particular selection process the agent went through.
I’m not sure how to think about parts and wholes, in relation to this. For example, can you have an optimize-y thing, that is itself made up of exist-y/satisfice-y things? Can you have an exist-y/satisfice-y thing, that is itself made up of optimizey things? I’m not sure how these kinds of agents compose/interact across scales.
Yes, I think it’s cruxy. Could you elaborate on your uncertainty? Even if you’re just sketching out very feathery intuitions.
And I’m curious—if you think this knob doesn’t really meaningfully exist, what do you think current frontier labs are doing/selecting for, and what do you think they’re trying to do/select for? (Like, for example, do you think they’re trying to crank up the general intelligence knob, and that this is a futile task––really, they’re cranking up some different, adjacent knob?)
Thanks for probing on this! I’m not sure I endorse that strong of a claim anymore. Refining into something I’d endorse more:
The two ASIs exist in the same world, and this world has underlying laws; these laws can be inferred at a sufficiently-high intelligence level, with sufficient data.
But not every single law will be inferred by the two ASIs, because of boundedness. The laws that are inferred will probably depend on their goals. There’s a problem of relevance, here.
But some laws are convergently useful to infer, no matter what goal you have, because there are a small set of bottlenecks to achieving your goals for a wide variety of goals (e.g. resource constraints).
Predicting those convergently useful domains well will lead agents to a similar set of abstractions
The last point seems to be the most important point, but I’m not sure why I buy it. But I do buy it.