The point was that we may be able to train an agent to do what we want, even in cases where we can’t effectively build a predictor.
Re: your example. You can do amplification to get exponentially close to certainty (choose instances that are satisfiable with 2⁄3 probability, and then consider the problem “solve at least half of these 1000 instances”). If you really want every instance to have a solution, then you can probably generate the instances pseudorandomly from a small enough seed and do a union bound.
By “predictor” I don’t mean something that produces exact predictions, I mean something that produces probabilistic predictions of given quantities. Maybe we should call it “inductor” to avoid conflation with optimal predictors (even though the concepts are closely related). As I said before, I think that an agent has to have a reasonable model of humans to follow human values. Moreover an agent that doesn’t have a reasonable model of humans is probably much less dangerous since it won’t be able to manipulate humans (although I guess the risk is still non-negligible).
The question is what kind of inductors are complexity-theoretically feasible and what class of models do these inductors correspond to. Bounded Solomonoff induction using Λ works on the class of samplable models. In machine learning language, inductors using samplable models are feasible since it is possible to train the inductors by sampling random such models (i.e. by sampling the bounded Solomonoff ensemble). On the other hand it’s not clear what broader classes of models are admissible if any.
That said, it seems plausible that if it’s feasible to construct inductors for a broader class, my procedure will remain efficient.
Model 1: The “superinductor” works by finding an efficient transformation q of the input sequence x and a good sampleable model for q(x). E.g.q(x) contains the results of substituting the observed SAT-solutions into the observed SAT-instances. In this model, we can apply my procedure by running utility inference on q(x) instead of x.
Model 2: The superinductor works via an optimal predictor for SAT*. I think that it should be relatively straightforward to show that given an optimal predictor for SAT + assuming the ability to design AGIs for target utility functions relative to a SAT oracle, there is an optimal predictor for the total utility function my utility inference procedure defines (after including external agents that run over a SAT oracle). Therefore it is possible to maximize the latter.
*A (poly,log)-optimal predictor for SAT cannot exist unless all sparse problems in NP have efficient heuristic algorithms in some sense, which is unlikely. On the other hand, there is no reason that I know why a (poly,0)-optimal predictor for SAT cannot exist.
The point was that we may be able to train an agent to do what we want, even in cases where we can’t effectively build a predictor.
Re: your example. You can do amplification to get exponentially close to certainty (choose instances that are satisfiable with 2⁄3 probability, and then consider the problem “solve at least half of these 1000 instances”). If you really want every instance to have a solution, then you can probably generate the instances pseudorandomly from a small enough seed and do a union bound.
By “predictor” I don’t mean something that produces exact predictions, I mean something that produces probabilistic predictions of given quantities. Maybe we should call it “inductor” to avoid conflation with optimal predictors (even though the concepts are closely related). As I said before, I think that an agent has to have a reasonable model of humans to follow human values. Moreover an agent that doesn’t have a reasonable model of humans is probably much less dangerous since it won’t be able to manipulate humans (although I guess the risk is still non-negligible).
The question is what kind of inductors are complexity-theoretically feasible and what class of models do these inductors correspond to. Bounded Solomonoff induction using Λ works on the class of samplable models. In machine learning language, inductors using samplable models are feasible since it is possible to train the inductors by sampling random such models (i.e. by sampling the bounded Solomonoff ensemble). On the other hand it’s not clear what broader classes of models are admissible if any.
That said, it seems plausible that if it’s feasible to construct inductors for a broader class, my procedure will remain efficient.
Model 1: The “superinductor” works by finding an efficient transformation q of the input sequence x and a good sampleable model for q(x). E.g.q(x) contains the results of substituting the observed SAT-solutions into the observed SAT-instances. In this model, we can apply my procedure by running utility inference on q(x) instead of x.
Model 2: The superinductor works via an optimal predictor for SAT*. I think that it should be relatively straightforward to show that given an optimal predictor for SAT + assuming the ability to design AGIs for target utility functions relative to a SAT oracle, there is an optimal predictor for the total utility function my utility inference procedure defines (after including external agents that run over a SAT oracle). Therefore it is possible to maximize the latter.
*A (poly,log)-optimal predictor for SAT cannot exist unless all sparse problems in NP have efficient heuristic algorithms in some sense, which is unlikely. On the other hand, there is no reason that I know why a (poly,0)-optimal predictor for SAT cannot exist.