nostalgebraist comments on “embedded self-justification,” or something like that

nostalgebraist 3 Nov 2019 7:14 UTC
1 point
I don’t understand Thing #1. Perhaps, in the passage you quote from my post, the phrase “decision procedure” sounds misleadingly generic, as if I have some single function I use to make all my decisions (big and small) and we are talking about modifications to that function.
(I don’t think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as “decisions” that aren’t being made using the whole function, which contradicts the original premise.)
Instead, I’m just talking about the ordinary sort of case where you shift some resources away from doing X to thinking about better ways to do X, where X isn’t the whole of everything you do.
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, “I’m managing the recursion to the best of my ability” can mean something better than just “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary.” But then this seems to threaten the Embedded Agency programme, because it would mean we can’t make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
(The discussion of self-justification in this post was originally motivated by the topic of external assessment, on the premise that if we are powerful enough to assess a proposed AGI in a given way, it must also be powerful enough to assess itself in that way. And contrapositively, if the AGI can’t assess itself in a given way then we can’t assess it in that way either.)
- Isnasene 3 Nov 2019 16:17 UTC
  1 point
  Parent
  (I don’t think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as “decisions” that aren’t being made using the whole function, which contradicts the original premise.)
  Even if the decision function has a lot of sub-structure, I think that in the context of AGI
  - (less important point) It is unlikely that we will be able to directly separate substructures of the function from the whole function. This is because I’m assuming the function is using some heuristic approximating logical induction to think about itself and this has extremely broad uses across basically every aspect of the function.
  - (more important point) It doesn’t matter if it’s a sub-structure or not. The point is that some part of the decision function is already capable of reasoning about either improving itself or about improving other aspects of the decision function. So whatever method it uses to anticipate whether it should try self-improvement is already baked-in in some way.
  Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, “I’m managing the recursion to the best of my ability” can mean something better than just “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary.” But then this seems to threaten the Embedded Agency programme, because it would mean we can’t make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
  So “I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary” is not exactly true because, in all relevant scenarios, we’re the ones who build the AI. It’s more like “So I work on exactly N levels and then my decisions at level N+1 were deemed irrelevant by the selection pressures that created me which granted me this decision-function that deemed further levels irrelevant.”
  If we’re okay with leveraging normative or empirical assumptions about the world, we should be able to assess AGI (or have the AGI assess itself) with methods that we’re comfortable with.
  In some sense, we have practical examples of what this looks like. N, the level of meta, can be viewed as a hyperparameter of our learning system. However, in data science, hyperparameters perform differently for different problems so people often use Bayesian optimization to iteratively pick the best hyperparameters. But, you might say, our Bayesian hyperparameter optimization process requires its own priors—it too has hyperparameters!
  But no one really bothers to optimize these for a couple reasons--
  #1. As we increase the level of meta in a particular optimization process, we tend to see diminishing returns on the improved model performance
  #2. Meta-optimization is prohibitively expensive: Each N-level meta-optimizer generally needs to consider multiple possibilities of (N-1)-level optimizers in order to pick the best one. Inductively, this means your N-level meta-optimizer’s computational cost is around $x^{N}$ where x represents the number of (N-1)-level optimizers each N-level optimizer needs to consider.
  But #1. can’t actually be proved. It’s just an assumptiont that we think is true because we have a strong observational prior for it being true. Maybe we should question how human brains generate their priors but, at the end of the day, the way we do this questioning is still determined by our hard-coded algorithms for dealing with probability.
  The upshot is that, when we look at problems to the one similar we face with embedded agency, we still use the Eliezer-an approach. We just happen to be very confident in our boundary for reasons that cannot be rigorously justified.
- Gurkenglas 3 Nov 2019 14:43 UTC
  1 point
  Parent
  I don’t understand your argument for why #1 is impossible. Consider a universe that’ll undergo heat death in a billion steps. Consider the agent that implements “Take an action if PA+<steps remaining> can prove that it is good.” using some provability checker algorithm that takes some steps to run. If there is some faster provability checker algorithm, it’s provable that it’ll do better using that one, so it switches when it finds that proof.