it’s built out of an optimizer, why would that tame inner optimizers? perhaps it makes them explicit, because now the whole thing is a loss function, but the iterative inference can’t be shut off and still get functionally
That’s just part of the definition of “works out of distribution”. Scenarios where inner optimizers become AGI or something are out of distribution from training.
it’s built out of an optimizer, why would that tame inner optimizers? perhaps it makes them explicit, because now the whole thing is a loss function, but the iterative inference can’t be shut off and still get functionally
That’s just part of the definition of “works out of distribution”. Scenarios where inner optimizers become AGI or something are out of distribution from training.