I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.
I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. In 2023 I was on Vivek’s team at MIRI, before that I did MATS 2, and before that I did a CS and Neuroscience undergrad (thesis on statistical learning theory).
A way to think about Condensation in the algorithmic setting
These are note that describe a (somewhat flawed) variation of algorithmic condensation. It covers similar ground as this post by Abram, with slightly more concrete detail. I’m publishing it because a) it’s simple and relatively concrete and still works on most test cases I’ve worked through, and b) it helps motivate the more general approach.
We want a generative model whose output is a set of observations. We want the hypothesis space to be the space of programs, and we want it to factorize naturally into components such that each component “contributes to” some subset of the observations.
Observations: A set of tuples, where each tuple contains some content and a location. For example, the content might be a pixel r,g,b and the location might be the coordinates of that pixel x,y. Or the content might be a bit b and location as the index of that bit inside a large bitstring.
To make the factorization work we need to impose some structure on the program. We’ll have one top program which can access (via built-in instructions) a
Listof other strings. The List must have an interface that satisfies some properties:It must be possible to retrieve elements based on their content, or their index in the list.
Item retrieval from the
Listis such that any items other than the one that actually ends up being retrieved can be removed from theListwithout changing the behavior of the program.For example, we could have a function
lookupthat scores each item on the list and returns the highest scoring one.The top program, when executed, must output the full set of observations (and no others).
Scoring subsets
A condensation has a score for every subset A of observation variables.
Conditioned[1] score: Delete as many strings as possible from the List such that the subset of observations A is still recovered correctly by running the program. Sum the length of these programs. This gives the conditioned score. “Recovered correctly” means that the program outputs a set that contains all the correct pixels, and contains no more than one pixel for each location. It’s allowed to output incorrect pixels that aren’t part of A (because forcing it to output an exact subset would add complexity).
Some more detail about
ListWe want to make sure the program can’t hide complexity in unexpected ways using the
Listdata structure. Kaarel pointed out one way of doing this in long lists, where you can sometimes save complexity by adding a bit to several items in the list. Items that contain the bit are located such that their index conveys useful information.To avoid this and similar hacks, we’ll assume that the
Listmemory layout is automatically and perfectly managed behind the scenes, but the additional data required for this (e.g. storing the lengths of each item in the list) counts as part of the complexity of the top program. This stops us from adding a bit to an item in the list as a means of encoding its index, because that forces the List to remember which item has a unique length.Why include a location label in every observation?
This is a simple way to force the program to keep track of which bit it’s generating at a given time, even when most parts of the program (or observations) aren’t present.
Example: Two red squares
Imagine our data is an image of two red squares on a black background. Remember each pixel also contains its location. Ideally, we want the model to contain a string representing the concept of “red square”, two strings for the individual squares, a string for the background color and a top string that holds information about how to tie it all together.
We can guess the conditional scores of important subsets:
A single pixel in the background only requires the top string to generate. This is a longer string than what it would take to describe the background color, so it’s not a great score.
A single pixel in one of the red squares requires three strings to generate (top, red square, specific red square). Not great score.
The set of pixels that cover a red square require the same three strings as a single pixel. The score should be near-optimal.
The set of all pixels requires all strings, and its score is near-optimal (this is an ~optimal compression of the data).
Example: 70% 1s random bitstring
We want this condensation to have the following structure: One top string that captures the fact that 70% of the observations, and a string associated with each observation that captures the unique information in that bit.
However, this won’t work. Because each string can hold a minimum of 1 bit, the total length of all strings is going to be much larger than the optimal compression of the observations.
So the condensation model could go in two directions: Either it’ll drop the 70% information, or it’ll partition the observations into small groups in exchange for a poor score on the small subsets.
This flaw motivates a better approach
I think Sam Eisenstat has a (maybe unfinished, unpublished) approach that helps with this issue: Have the top latent generate a probability distribution over the other latents, and then score the other latents according to their nlogprob in this distribution, rather than their length.
We can see this as taking the
Listmemory management to its logical conclusion. My description above requires that the top latent tracks the length of each latent string (which is equivalent to specifying a uniform distribution over bitstrings of fixed length). Sam’s approach just allows this distribution to be non-uniform.This is cleaner theoretically, but I find it somewhat harder to think about. It makes it hard to look at an example and work out a set of programs the would score well. It also requires a bunch of machinery (around handling the distributions) that isn’t usually needed (because most useful latent concepts are larger than a bit), so it’s good to be able to drop it.
Simple score:
If we’re scoring using the simple score, we don’t need to allow access to other strings, so there’s no
List. Simply have a set of programs, each of which outputs a set of observations. The simple score of A is the sum of lengths of programs such that the intersection of their output sets correctly pins down the observations in A.