Viktor Rehnberg comments on Environments as a bottleneck in AGI development

Viktor Rehnberg 29 Jul 2020 8:01 UTC
1 point
0
This seems related to a thought I had when reading An overview of 11 proposals for building safe advanced AI. How much harder is it to find an environment that promotes aligned AGI compared to any AGI?
It seems that a lot of the proposals for AGI under the current ML paradigm either utilizes oversight to get a second chance or to get an extra term in the loss-function to promote alignedness. How well either of these types of methods work seem to be dependent on the base rate of aligned AGI to any AGI that can emerge from a particular model and training environment. I’m thinking of it as roughly
$P ({AGI}_{aligned} | S, M, E_{base}) = \begin{matrix} P (AGI | M, E_{base}) \cdot (P ({AGI}_{aligned} | AGI) + P (align AGI | S, {AGI}_{unaligned}) P ({AGI}_{unaligned} | AGI)), \end{matrix}$
where $M$ is some model and $E_{base}$ is the training environment without safeguards $S$ to detect deceptive or otherwise catastrophic behavior.
This post seems to concern
$P (AGI | E) ? \approx P (AGI | M, E),$
how much does the environment compared to the model influence the emergence of AGI?
What I’m trying to get at is that I think a related important question is
$P ({AGI}_{aligned} | {AGI}_{E}) ? \approx P ({AGI}_{aligned} | {AGI}_{M, E}),$
how much does the alignedness of an emerging AGI depend on its environment compared to the model?