Consider a Solomonoff inductor predicting the next bit in the sequence {0, 0, 0, 0, 0...} At most places, it will be very certain the next bit is 0. But, at some places it will be less certain: every time the index of the place is highly compressible. Gradually it will converge to being sure the entire sequence is all 0s. But, the convergence will be very slow: about as slow as the inverse Busy Beaver function!
This is not just a quirk of Solomonoff induction, but a general consequence of reasoning using Occam’s razor (which is the only reasonable way to reason). Of course with bounded algorithms the convergence will be faster, something like the inverse bounded-busy-beaver, but still very slow. Any learning algorithm with inductive bias towards simplicity will have generalization failures when coming across the faultlines that carve reality at the joints, at every new level of the domain hierarchy.
This has an important consequence for alignment: in order to stand a chance, any alignment protocol must be fully online, meaning that whatever data sources it uses, those data sources must always stay in the loop, so that the algorithm can query the data source whenever it encounters a faultline. Theoretically, the data source can be disconnected from the loop at the point when it’s fully “uploaded”: the algorithm unambiguously converged towards a detailed accurate model of the data source. But in practice the convergence there will be very slow, and it’s very hard to know that it already occurred: maybe the model seems good for now but will fail at the next faultline. Moreover, convergence might literally never occur if the machine just doesn’t have the computational resources to contain such an upload (which doesn’t mean it doesn’t have the computational resources to be transformative!)[1]
This is also a reason for pessimism regarding AI outcomes. AI scientists working through trial and error will see the generalization failures becoming more and more rare, with longer and longer stretches of stable function in between. This creates the appearance of increasing robustness. But, in reality robustness increases very slowly. We might reach a stable stretch between “subhuman” and “far superhuman” and the next faultline will be the end.
In the Solomonoff analogy, we can imagine the real data source as a short but prohibitively expensive program, and the learned model of the data source as an affordable but infinitely long program: as time progresses, more and more bits of this program will be learned, but there will always be bits that are still unknown. Of course, any prohibitively expensive program can be made affordable by running it much slower than real-time, which is something that Turing RL can exploit, but at some point this becomes impractical.
An alignment-unrelated question: Can we, humans, increase the probability that something weird happens in our spacetime region (e.g., the usual laws of physics stop working) by making it possible to compress our spacetime location? E.g., by building a structure that is very regular (meaning that its description can be very short) and has never been built before in our space region, something like make a huge perfectly aligned rectangular grid of hydrogen atoms, or something like that.
It’s like a magical ritual for changing the laws of physics. This gives a new meaning to summoning circles, pentagrams, etc.
We can rephrase your question as follows: “Can we increase the probability of finding an error in the known laws of physics by performing an experiment with a simple property that never happened before, either naturally or artificially”? And the answer is: yes! This is actually what experimental physicists do all the time: perform experiments that try to probe novel circumstances where it is plausible (Occam-razor-wise) that new physics will be discovered.
As to magical rituals, sufficiently advanced technology is indistinguishable from magic :)
I have a sense that similar principles are at play with Spaced Repetition, and that pointing out that connection may be relevant to effectively handling this issue
convergence might literally never occur if the machine just doesn’t have the computational resources to contain such an upload
I think that in embedded settings (with a bounded version of Solomonoff induction) convergence may never occur, even in the limit as the amount of compute that is used for executing the agent goes to infinity. Suppose the observation history contains sensory data that reveals the probability distribution that the agent had, in the last time step, for the next number it’s going to see in the target sequence. Now consider the program that says: “if the last number was predicted by the agent to be 0 with probability larger than 1−2−1010 then the next number is 1; otherwise it is 0.” Since it takes much less than 1010 bits to write that program, the agent will never predict two times in a row that the next number is 0 with probability larger than 1−2−1010 (after observing only 0s so far).
Consider a Solomonoff inductor predicting the next bit in the sequence {0, 0, 0, 0, 0...} At most places, it will be very certain the next bit is 0. But, at some places it will be less certain: every time the index of the place is highly compressible. Gradually it will converge to being sure the entire sequence is all 0s. But, the convergence will be very slow: about as slow as the inverse Busy Beaver function!
This is not just a quirk of Solomonoff induction, but a general consequence of reasoning using Occam’s razor (which is the only reasonable way to reason). Of course with bounded algorithms the convergence will be faster, something like the inverse bounded-busy-beaver, but still very slow. Any learning algorithm with inductive bias towards simplicity will have generalization failures when coming across the faultlines that carve reality at the joints, at every new level of the domain hierarchy.
This has an important consequence for alignment: in order to stand a chance, any alignment protocol must be fully online, meaning that whatever data sources it uses, those data sources must always stay in the loop, so that the algorithm can query the data source whenever it encounters a faultline. Theoretically, the data source can be disconnected from the loop at the point when it’s fully “uploaded”: the algorithm unambiguously converged towards a detailed accurate model of the data source. But in practice the convergence there will be very slow, and it’s very hard to know that it already occurred: maybe the model seems good for now but will fail at the next faultline. Moreover, convergence might literally never occur if the machine just doesn’t have the computational resources to contain such an upload (which doesn’t mean it doesn’t have the computational resources to be transformative!)[1]
This is also a reason for pessimism regarding AI outcomes. AI scientists working through trial and error will see the generalization failures becoming more and more rare, with longer and longer stretches of stable function in between. This creates the appearance of increasing robustness. But, in reality robustness increases very slowly. We might reach a stable stretch between “subhuman” and “far superhuman” and the next faultline will be the end.
In the Solomonoff analogy, we can imagine the real data source as a short but prohibitively expensive program, and the learned model of the data source as an affordable but infinitely long program: as time progresses, more and more bits of this program will be learned, but there will always be bits that are still unknown. Of course, any prohibitively expensive program can be made affordable by running it much slower than real-time, which is something that Turing RL can exploit, but at some point this becomes impractical.
An alignment-unrelated question: Can we, humans, increase the probability that something weird happens in our spacetime region (e.g., the usual laws of physics stop working) by making it possible to compress our spacetime location? E.g., by building a structure that is very regular (meaning that its description can be very short) and has never been built before in our space region, something like make a huge perfectly aligned rectangular grid of hydrogen atoms, or something like that.
It’s like a magical ritual for changing the laws of physics. This gives a new meaning to summoning circles, pentagrams, etc.
We can rephrase your question as follows: “Can we increase the probability of finding an error in the known laws of physics by performing an experiment with a simple property that never happened before, either naturally or artificially”? And the answer is: yes! This is actually what experimental physicists do all the time: perform experiments that try to probe novel circumstances where it is plausible (Occam-razor-wise) that new physics will be discovered.
As to magical rituals, sufficiently advanced technology is indistinguishable from magic :)
I have a sense that similar principles are at play with Spaced Repetition, and that pointing out that connection may be relevant to effectively handling this issue
I think that in embedded settings (with a bounded version of Solomonoff induction) convergence may never occur, even in the limit as the amount of compute that is used for executing the agent goes to infinity. Suppose the observation history contains sensory data that reveals the probability distribution that the agent had, in the last time step, for the next number it’s going to see in the target sequence. Now consider the program that says: “if the last number was predicted by the agent to be 0 with probability larger than 1−2−1010 then the next number is 1; otherwise it is 0.” Since it takes much less than 1010 bits to write that program, the agent will never predict two times in a row that the next number is 0 with probability larger than 1−2−1010 (after observing only 0s so far).