Tor Økland Barstad comments on Half-baked AI Safety ideas thread

Tor Økland Barstad 23 Jun 2022 16:40 UTC
6 points
0
Having AGI-system output programs that generate output instead of outputting the answers/results directly, and requiring these programs to be optimized for being narrow/specialized/readable/modular/short—and last but not least, verifiable.

When/if possible, programs that are outputted should be combined with as much in the way of formal proofs/verification as possible.

Since narrow/specialized programs are easier to verify, this will in some situations make it harder (but often still possible, it depends on specifics) for it to trick human operators.

We could call these narrow/specialized programs “generators”.

It could be possible to output “generators” that help with the verification of other “generators”.

One way to verify “generators” would be to look at code, as well as any formal verifications of behavior (with varying levels of assurance).

Another way to verify “generators” would be to test against real world data. A “generator” that e.g. simulates (or otherwise predicts) result of construction-procedures, should also work for procedures that have been done in the past (that the AI doesn’t have access to). And if it later is used to implement AGI-provided procedures, those would preferably be tested first in conditions where there are additional guaranties that it’s hard for the instruction to result in something that behaves unexpectedly. E.g., if it is nanomachinery, then running in conditions where it’s enclosed in material that makes getting chemically impossible (even if more subtle parts of design don’t go as expected). A “generator” that predicts human behavior and/or approximates results of (high welfare) brain emulations can be tested against existing real world data that AGI doesn’t have access to.

It can also be checked how “generator” that is outputted varies based on info AGI-instance has been given.

Sometimes it may make sense to have “generators” that help generate other “generators”. E.g. we could have a “generator” that generates “generators” for predicting the behavior of some type of animal (again, suffering sub-routines should be avoided). Or if we want one function to approximate the output of another function (but using less compute), we could use an “approximator”, instead of directly asking “give me a physics simulator, but make it computationally efficient”.

Maybe “generators” also could aid in finding design flaws in other “generators”, e.g. ways a “physics-simulation-generator” generates faulty output in certain cases, or ways a “generator” isn’t as narrow/modular as it could be, or ways a “generator” has more complexity than it needs to do its job. And the “approximator” should then be tested insofar as possible of course.

More “generators” that do the same job, but based on different architectures/principles, is better. The more converging results from various designs/principles/implementations, the better.
- Tor Økland Barstad 23 Jun 2022 17:39 UTC
  2 points
  0
  Parent
  Getting AGI to output “physics-simulator” where both starting and ending conditions are specified to be within some ranges of possible state-configurations, instead of simulating one specific example and providing a “point-like” value.
  
  Have possible machines (proposed by AI-system) be tested by such “physics-simulators”.
  
  ”Simulator” might highlight predictable aspects of machine behavior vs less predictable things. That is to say, the behavior of machine that is “low-entropy” (I’m probably using the term wrong).
  
  Point would be to have a probabilistic safety-measure that makes it harder for AI to propose some machine design (or machine-creation-instructions) that results in unexpected behavior.
  
  The “simulator” should be tested against real-world data that the AI doesn’t have access to (as well as using various other measures to verify and look for problems).