Noosphere89 comments on Rant on Problem Factorization for Alignment

Noosphere89 6 Aug 2022 18:14 UTC
9 points
7
One successful example of factorization working is our immune systems. Our immune system does it’s job by defending the body without needing intelligence. In fact every member of the immune system is blind, naked, and unintelligent. Your body has no knowledge of how many bacteria/viruses/cancer cells are in your body, doubling time or how many infected cells are there. Thus this problem needs to be factored in order for the immune system to do anything at all, and indeed it is.

So the factorizing basically factors into different cells for different jobs, plus a few systems not connected to cells.

There are tens of different classes of cells that can be divided into a few subclasses, plus 5 major classes of antibodies. And they all have different properties, as well.

Now what does tell us about factorizing a problem, and are there any lessons on factorizing problems more generally?
1. The biggest reason factorization works for the immune system is they can make billions of them per day. One of bureaucracy’s biggest problems is we can’t simply copy skillets of people’s brains to lead bureaucracies so we have to hire them, and even without Goodhart’s law, this would introduce interface problems between people, and this leads to our next solved problem from the immune system, coordination. The most taut constraints are the rarity of talented people in companies.
2. Each cell of the immune system has the same source code, so there’s effectively no coordination problems at all, since everyone has the same attitude, dedication and abilities. It also partially solves the interface problem as everyone has shared understanding and shared ontologies. Unfortunately even if it solves the intra-organization problem, collaborating with others is an unsolved problem. Again this is something we can’t do. Your best case scenario is hiring relatively competent people with different source codes, abilities, ontologies and having to deal with interfacing problems due to differing source codes, beliefs, ontologies and competencies. They also can’t fully trust each other, even if they’re relatively aligned, due to trust issues, thus you need constant communication, which scales fairly poorly with the size of teams or bureaucrats.
So should we be optimistic about HCH/Debate/Factored Cognition? Yes! One of the most massive advantages AGI will have over regular people early on is that they can be copied very easily due to being digital, being able to cooperate fully and trust fully with copies of themselves since they share the same ways they reason, and a single mindedness towards goals that mostly alleviate the most severe problems of trust. They will also have similar ontologies, as well. So you don’t realize just how much AGI can solve those problems like coordination and interface issues.

EDIT: I suspect a large part of the reason your intuition is recoiling against the HCH/Debate/Factored Cognition solutions is because of scope neglect. Our intuitions don’t work well with extremely big or small numbers, and a commenter once claimed that 100 distillation steps could produce 2^100 agents. To put it lightly, this is a bureaucracy that has 10^30 humans, with a corresponding near-infinite budget, that is single mindedly trying to answer questions. To put it another way, that’s more humans than have ever lived by a factor of 19. And with perfect coordination, trust, single-mindedness towards goals and with a smooth interface due to same ontologies, due to being digital, such a bureaucracy could plausibly solve every problem in the entire universe, since I suspect that a large part of the problem for alignment is that at the end of the day, capabilities groups have much more money and researchers available to them than safety groups, and capabilities researchers are not surprisingly winning the race.

Our intuitions fail us here, so they aren’t a reliable guide to estimating how a very large HCH tree works.
- johnswentworth 7 Aug 2022 5:04 UTC
  2 points
  0
  Parent
  Yeah bio analogies! I love this comment.
  One can trivially get (some variants of) HCH to do things by e.g. having each human manually act like a transistor, arranging them like a computer, and then passing in a program for an AGI to the HCH-computer. This, of course, is entirely useless for alignment.
  There’s a whole class of problem factorization proposals which fall prey to that sort of issue. I didn’t talk about it in the post, but it sounds like it probably (?) applies to the sort of thing you’re picturing.
  - Noosphere89 7 Aug 2022 14:53 UTC
    1 point
    0
    Parent
    It’s not useless, but it’s definitely risky to do it, and the things required for safety would mean distillation has to be very cheap. And here we come to the question “How doomed by default are we if AGI is created?” If the chance is low, I agree that it’s not a risk worth taking. If high, then you’ll probably have to do it. The more doomed by default you think creating AGI is, the more risk you should take, especially with short timelines. So MIRI would probably want to do this, given their atypically high levels of doominess, but most other organizations probably won’t do it due to thinking of fairly low risk from AGI.