Tor Økland Barstad comments on Half-baked AI Safety ideas thread

Tor Økland Barstad 23 Jun 2022 16:52 UTC
2 points
0
Having a “council” of AGIs that are “siloed”.

The first AGI can be used in the creation of code for AGIs that are aligned based on various different principles. Could be in one “swoop”, but with the expectation that code and architecture is optimized for clarity/modularity/verifiability/etc. But could also be by asking the AGI to do lots of different things. Or both (and then we can see whether output from the different systems is the same).

Naturally, the all these things should not be asked of the same instance of the AGI (although that could be done as well, to check if output converges).

In the end we have a “council” of AGIs. Some maybe predicting output of smart humans humans working for a long time. Some maybe using high welfare brain emulations. Some maybe constructing proofs where ambiguity of cluster-like concepts is accounted for within formalism, and mapping between concepts and outside world is accounted for within formalism—with as much of “core” thinking as possible being one of the same as these proofs. Some maybe based on machine learning by debate ideas. The more concepts that seem likely to work (without having suffering sub-routines) the better.

This “council” of “siloed” AGIs can then be used as oracle/genie, and we can see if output converges. And they also search for mistakes in output (answers, proofs/verification, argument-trees, etc) from other AGIs.