Sensory modality for code

This post is inspired by Chimera wanting to talk about “Levels of Organization in General Intelligence.”

In the paper, Eliezer mentions giving a seed AI a sensory modality for code. And I think it’ll be fun to figure out just what that means.

Code is a pretty simple environment compared to the cool stuff humans can do with sight and hearing, so I’d like to start by giving a picture of a totally different sensory modality in a simple environment—vision in robot soccer.

The example system described in this paper can be used to find and track ball and robot movement during a soccer match. For simplicity, let’s imagine that the AI is just looking down at the field from above using a single camera. The camera is sending it a bunch of signals, but it’s not sensory modality yet because it’s not integrated with the rest of the AI. At least the following tasks needed to be implemented as part of the program before the AI can see:

  • Finding unusual pixels quickly.

  • Finding shapes of the same color in the picture.

  • Use object properties (what the tops of the robots look like) to collect shapes into physical objects.

  • Use a mapping between the picture and real space to get real space coordinates of objects.

  • Correlate with previous positions and motion to track objects and figure out which is which.

This work seems a lot like compression—it takes a big hunk of image data and turns it into a petite morsel of robot locations and orientations. But evaluated as compression, it’s awful, since so much information, like the texture of the robots, gets discarded. Rather than calling it compression, it’s more like translation from the language of images to the language the AI thinks in. The AI doesn’t need to think about the exact texture of the robots, and so the sensory modality doesn’t pass it on.

So how do we do something like that with code? We want something that takes raw code as an input and outputs a language for the AI to think about code in. Additionally, as mentioned in LOGI, this language can be used by the AI to imagine code, so it should contain the necessary ideas. It should discard a lot of low-level information, like the vision system extracting the coordinates and allowing the AI to ignore the texture. It might also shuffle the information around to keep track of function of the code, like mapping points from the camera onto real space.

For some code, this might be done with nested black boxes—finding groups of code and replacing them with black boxes that say what they do and what they’re connected to. Then finding groups of black boxes and replacing those groups with what they do and what other groups they’re connected to. The big problem for this approach is figuring out how to label what a piece of code does. Finding a short description of a piece of code is straightforward, but just condensing the code is not enough, the program needs to be able to separate function from form and throw out the form information. Ideally, once our program had removed as much form as it could, the AI would be able in principle to rewrite the program just from the functional description.

Unfortunately, this is where the problem gets hard for me. I have some thoughts, like “follow the flow of the program” and “check if locally arbitrary choices will cause nonlocal problems if changed.” But first I’d like to check that I’m sort of on the right track, and then maybe forge ahead.