What this would mean is that we would have to recalibrate our notion of “safe”, as whatever definition has been proved impossible does not match our intuitive perception. We consider lots of stuff we have around now to be reasonably safe, although we don’t have a formal proof of safety for almost anything.
In the mad scientist example, why would your measure for the die landing 0 be 0.91? I think Solomonoff Induction would assign probability 0.1 to that outcome, because you need an extra log2(90) bits to specify which clone you are. Or is this just meant to illustrate a problem with ASSA, UD not included?
Yeah, if you train the algorithm by random sampling, the effect I described will take place. The same thing will happen if you use an RL algorithm to update the parameters instead of an unsupervised learning algorithm(though it seems willfully perverse to do so—you’re throwing away a lot of the structure of the problem by doing this, so training will be much slower)
I also just found an old comment which makes the exact same argument I made here. (Though it now seems to me that argument is not necessarily correct!)
If you literally ran (a powered-up version of) GPT-2 on “A brilliant solution to the AI alignment problem is...” you would get the sort of thing an average internet user would think of as a brilliant solution to the AI alignment problem. Trying to do this more usefully basically leads to Paul’s agenda (which is about trying to do imitation learning of an implicit organization of humans)
Reflective Oracles are a bit of a weird case case because their ‘loss’ is more like a 0⁄1 loss than a log loss, so all of the minima are exactly the same(If we take a sample of 100000 universes to score them, the difference is merely incredibly small instead of 0). I was being a bit glib referencing them in the article; I had in mind something more like a model parameterizing a distribution over outputs, whose only influence on the world is via a random sample from this distribution. I think that such models should in general have fixed points for similar reasons, but am not sure. Regardless, these models will, I believe, favour fixed points whose distributions are easy to compute(But not fixed points with low entropy, that is they will punish logical uncertainty but not intrinsic uncertainy). I’m planning to run some experiments with VAEs and post the results later.
You might be interested in Transformer Networks, which use a learned pattern of attention to route data between layers. They’re pretty popular and have been used in some impressive applications like this very convincing image-synthesis GAN.
re: whether this is a good research direction. The fact that neural networks are highly compressible is very interesting and I too suspect that exploiting this fact could lead to more powerful models. However, if your goal is to increase the chance that AI has a positive impact, then it seems like the relevant thing is how quickly our understanding of how to align AI systems progresses, relative to our understanding of how to build powerful AI systems. As described, this idea sounds like it would be more useful for the latter.
Is there a reason you think a reflective oracle (or equivalent) can’t just be selected “arbitrarily”, and will likely be selected to maximize some score?
The gradient descent is not being done over the reflective oracles, it’s being done over some general computational model like a neural net. Any highly-performing solution will necessarily look like a fixed-point-finding computation of some kind, due to the self-referential nature of the predictions. Then, since this fixed-point-finder is *internal* to the model, it will be optimized for log loss just like everything else in the model.
That is, the global optimization of the model is distinct from whatever internal optimization the fixed-point-finder uses to choose the reflective oracle. The global optimization will favor internal optimizers that produce fixed-points with good score. So while fixed-point-finders in general won’t optimize for anything in particular, the one this model uses will.
I submit Predictors as Agents.
If we assume Sleeping Beauty has lots of information, we might expect that the shortest matching program will look like a simulation of physical law plus a “bridging law” that, given this simulation, tells you what symbols get written to the tape
I agree. I still think that the probabilities would be closer to 1⁄2, 1⁄4, 1⁄4. The bridging law could look like this: search over the universe for compact encodings of my memories so far, then see what is written next onto this encoding. In this case, it would take no more bits to specify waking up on Tuesday, because the memories are identical, in the same format, and just slightly later temporally.
In a naturalized setting, it seems like the tricky part would be getting the AIXI on Monday to care what happens after it goes to sleep. It ‘knows’ that it’s going to lose consciousness(it can see that its current memory encoding is going to be overwritten) so its next prediction is undetermined by its world-model. There is one program that will give it the reward of its successor then terminates, as I described above, but it’s not clear why the AIXI would favour that hypothesis. Maybe if it has been in situations involving memory-wiping before, or has observed other RO-AIXI’s in such situations.
“I can’t make bets on my beliefs about the Eschaton, because they are about the Eschaton.” -- Well, it makes sense. Besides, I did offer you a bet taking into account a) that the money may be worth less in my branch b) I don’t think DL + RL AGI is more likely than not, just plausible. If you’re more than 96% certain there will be no such AI, 20:1 odds are a good deal.
But anyways, I would be fine with betting on a nearer-term challenge. How about—in 5 years, a bipedal robot that can run on rough terrain, as in this video, using a policy learned from scratch by DL + RL(possibly including a simulated environment during training) 1:1 odds.
Hmmm...but if I win the bet then the world may be destroyed, or our environment could change so much the money will become worthless. Would you take 20:1 odds that there won’t be DL+RL-based HLAI in 25 years?
I still don’t see how you’re getting those probabilities. Say it takes 1 bit to describe the outcome of the coin toss, and assume it’s easy to find all the copies of yourself(ie your memories) in different worlds. Then you need:
1 bit to specify if the coin landed heads or tails
If the coin landed tails, you need 1 more bit to specify if it’s Monday or Tuesday.
So AIXI would give these scenarios P(HM)=0.50, P(TM)=0.25, P(TT)=0.25.
Have something in mind?
Well, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI’s who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI’s it actually was and the K-complexity will go back up again.
EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.
Well, the DotA bot pretty much just used PPO,. AlphaZero used MCTS + RL, OpenAI recently got a robot hand to do object manipulation with PPO and a simulator(the simulator was hand-built, but in principle it could be produced by unsupervised learning like in this). Clearly it’s possible to get sophisticated behaviors out of pretty simple RL algorithms. It could be the case that these approaches will “run out of steam” before getting to HLAI, but it’s hard to tell at the moment, because our algorithms aren’t running with the same amount of compute + data as humans (for humans, I am thinking of our entire lifetime experiences as data, which is used to build a cross-domain optimizer).
re: Uber, I agree that at least in the short term most applications in the real world will feature a fair amount of engineering by hand. But the need for this could decrease as more power becomes available, as has been the case in supervised learning.
How do the initial simple conditions relate to the branching? Our universe seems to have had simple initial conditions but there’s still been a lot of random branching, right? That is, the universe from our perspective is just one branch of a quantum state evolving simply from simple conditions, so you need O(#branching events) bits to describe it. Incidentally this undermines Eliezer’s argument for MWI based on Solomonoff induction, though MWI is probably still true
[EDITED: Oh, from one of your other comments I see that you aren’t saying the shortest program involves beginning at the start of the universe. That makes sense]
I agree that you do need some sort of causal structure around the function-fitting deep net. The question is how complex this structure needs to be before we can get to HLAI. It seems plausible to me(at least a 10% chance, say) that it could be quite simple, maybe just consisting of modestly more sophisticated versions of the RL algorithms we have so far, combined with really big deep networks.
Incidentally, you can use the same idea to have RO-AIXI do anthropic reasoning/bargaining about observers that are in a broader reference class than ‘exact same sense data’, by making the mapping O → O’ some sort of coarse-graining.