[Question] Could there be “natural impact regularization” or “impact regularization by default”?

Specifically, imagine you use general-purpose search procedure which recursively invokes itself to solve subgoals for the purpose of solving some bigger goal.

If the search procedure’s solutions to subgoals “change things too much”, then they’re probably not going to be useful. E.g. for Rubik’s cubes, if you want to swap some of the cuboids, it does you know good if those swaps leave the rest of the cube scrambled.

Thus, to some extent, powerful capabilities would have to rely on some sort of impact regularization.

I’m thinking that natural impact regularization is related to the notion of “elegance” in engineering. Like if you have some bloated tool to solve a problem, then even if it’s not strictly speaking an issue because you can afford the resources, it might feel ugly because it’s excessive and puts mild constaints on your other underconstrained decisions, and so on. Meanwhile a simple, minimal solution often doesn’t have this.

Natural impact regularization wouldn’t guarantee safety, since it’s still allows deviations that don’t interfere with the AI’s function, but it sort of reduces one source of danger which I had been thinking about lately, namely I had been thinking that the instrumental incentive is to search for powerful methods of influencing the world, where “power” connotes the sort of raw power that unstopably forces a lot of change, but really the instrumental incentive is often to search for “precise” methods of influencing the world, where one can push in a lot of information to effect narrow change.[1]

Maybe another word for it would be “natural inner alignment”, since in a sense the point is that capabilities inevitably select for inner alignment. Here I mean “natural” in the sense of natural abstractions, i.e. something that a wide variety of cognitive algorithms would gravitate towards.

  1. ^

    A complication is that any one agent can only have so much bandwidth, which would sometimes incentivize more blunt control. I’ve been thinking bandwidth is probably going to become a huge area of agent foundations, and that it’s been underexplored so far. (Perhaps because everyone working in alignment sucks at managing their bandwidth? 😅)