While at a recent CFAR workshop with Scott, Peter Schmidt-Nielsen and I wrote some code to run experiments of the form that Scott is talking about here. If anyone is interested, the code can be found here , though I’ll also try to summarize our results below.Our methodology was as follows:1. Generate a real utility function V:[0,1]10→[0,1] by randomly initializing a feed-forward neural network with 3 hidden layers with 10 neurons each and tanh activations, then train it using 5000 steps of gradient descent with a learning rate of 0.1 on a set of 1023 uniformly sampled data points. The reason we pre-train the network on random data is that we found that randomly initialized neural networks tended to be very similar and very smooth such that it was very easy for the proxy network to learn them, whereas networks trained on random data were significantly more variable.2. Generate a proxy utility function U:[0,1]10→[0,1] by training a randomly initialized neural network with the same architecture as the real network on 50 uniformly sampled points from the real utility using 1000 steps of gradient descent with a learning rate of 0.1.3. Fix μ to be uniform sampling.4. Let ^μ be uniform sampling followed by 50 steps of gradient descent on the proxy network with a learning rate of 0.1.5. Sample 1000000 points from μ, then optimize those same points according to ^μ. Create buckets of radius 0.01 utilons for all proxy utility values and compute the real utility values for points in that bucket from the μ set and the ^μ set.6. Repeat steps 1-5 10 times, then average the final real utility values per bucket and plot them. Furthermore, compute the “Goodhart error” as the real utility for the proxy utility points minus the real utility for the random points plotted against their proxy utility values.The plot generated by this process is given below: As can be seen from the plot, the Goodhart error is fairly consistently negative, implying that the gradient descent optimized points are performing worse on the real utility conditional on the proxy utility.However, using an alternative ^μ , we were able to reverse the effect. That is, we ran the same experiment, but instead of optimizing the proxy utility to be close to the real utility on the sampled points, we optimized the gradient of the proxy utility to be close to the gradient of the real utility on the sampled points. This resulted in the following graph: As can be seen from the plot, the Goodhart error flipped and became positive in this case, implying that the gradient optimization did significantly better than the point optimization.Finally, we also did a couple of checks to ensure the correctness of our methodology.First, one concern was that our method of bucketing could be biased. To determine the degree of “bucket error” we computed the average proxy utility for each bucket from the μ and ^μ datasets and took the difference. This should be identically zero, since the buckets are generated based on proxy utility, while any deviation from zero would imply a systematic bias in the buckets. We did find a significant bucket error for large bucket sizes, but for our final bucket size of 0.01, we found a bucket error in the range of 0 − 0.01, which should be negligible.Second, another thing we did to check our methodology was to generate ^μ simply by sampling 100 random points, then selecting the one with the highest proxy utility value. This should give exactly the same results as μ, since bucketing conditions on the proxy utility value, and in fact that was what we got.
Is there an RSS feed?
Dimensional analysis is absolutely an instance of what I’m talking about!
As for only being able to do constructive stuff, you actually can do classical stuff as well, but you have to explicitly assume the law of the excluded middle. For example, if in Lean I write
axiom lem (P: Prop): P ∨ ¬P
then I can start doing classical reasoning.
Also, you’re totally right that you could also do h(h(fxx)) and so on as much as you want, but there’s no real reason to do so, since if you start from the simplest possible way to do it you’ll solve the problem by the time you get to h(fxx).
There are a couple of pieces of this that I disagree with:
I think claim 1 is wrong because even if the memory is unhelpful, the agent which uses it might be simpler, and so you might still end up with an agent. My intuition is that just specifying a utility function and an optimization process is often much easier than specifying the complete details of the actual solution, and thus any sort of program-search-based optimization process (e.g. gradient descent in a nn) has a good chance of finding an agent.
I think claim 3 is wrong because agenty solutions exist for all tasks, even classification tasks. For example, take the function which spins up an agent, tasks that agent with the classification task, and then takes that agent’s output. Unless you’ve done something to explicitly remove agents from your search space, this sort of solution always exists.
Thus, I think claim 6 is wrong due to my complaints about claims 1 and 3.