tailcalled comments on Mesa-optimization for goals defined only within a training environment is dangerous

tailcalled 17 Aug 2022 7:53 UTC
2 points
0
Could you be more specific about the AI architecture and training system you have in mind? Because I don’t follow the “Instead, its interim goals will be to restart the training environment” part.
- Rubi J. Hudson 18 Aug 2022 3:30 UTC
  1 point
  0
  Parent
  I was thinking RL systems for the case where an agent learns the correct outcome to optimize for but in the wrong environment, but the same issue applies for mesa-optimizers within any neural net.
  As for why it tries to restart the training environment, it needs a similar environment to meet a goal that is only defined within that environment. If the part that’s unclear is what a training environment means for something like a neural net trained with supervised learning, the analogy would be that the AI can somehow differentiate between training data (or a subset of it) and deployment data and wants to produce its outputs from inputs with the training qualities.