Hello Zvi,
I don’t agree with everything you on every point, but I find your writing to be extremely high-quality and informative. Keep up the great work
Hello Zvi,
I don’t agree with everything you on every point, but I find your writing to be extremely high-quality and informative. Keep up the great work
AutoGPT is an excellent demonstration of the point. Ask someone on this forum 5 years ago whether they think AGI might be a series of next token predictors strung together with modular cognition occurring in English and they would have called you insane.
Yet if that is how we get something close to AGI it seems like a best case scenario since intrepretability is solved by default and you can measure alignment progress very easily.
Reality is weird in very unexpected ways.
Your model assumes lot about the nature of AGI. Sure if you jump directly to “we’ve created coherent, agential, strategic strong AGI, what happens now?” you end up with a lot of default failure modes. The cruxes of disagreement are along what does AGI actually look like in practice and what are the circumstances around it’s creation?
Is it Agential? Does it have strategic planning capabilities that it tries to act on in the real world? Current systems don’t look like this.
Is it coherent? Even if it has the capability to strategically plan is it able to coherently pursue those goals over time? Current systems don’t even have the concept of time and there is some reason to believe that coherence and intelligence may have an inverse correlation.
Do we get successive chances to work on aligning a system? If “AGI” was derived from scaling LLMs and adding cognitive scaffolding doesn’t it seem highly likely they will both be interpretable and steerable given their use of natural language and ability to iterate on failures?
Is “kindness” truly completely orthogonal to intelligence? If there is even a slight positive correlation the future could look very different. Paul Christianio made an argument about this on a thread recently.
I think part of the challenge is that AGI is a very nebulous term and presupposing an agential, strategic, coherent AGI involves assuming a lot of steps in between. I think a lot of the disagreements rely on what the properties of the AGI are rather than specific claims about the likelihood of successful alignment. And there seems to be a lot of uncertainty on how this technology actually ends up developing that’s not accounted for in many of the standard AI X-Risk Models
AI Regulation will make the problem worse seems like a very strong statement that is unsupported by your argument. Even in your scenario where large training runs are licensed this will make things more expensive, increase the cost of training runs, and generally slow things down, particularly if it prevents smaller AI companies from pushing the frontier of research.
To take your example of GDPR, the draft version of the EU’s AI Act seems so convoluted that it will cost companies a lot of money to comply and make investing in small AI startups more risky. Even though the law is aimed at issues like data privacy and bias, the costs of compliance will likely result in slower development (and based on the current version less open-source models) since resources will need to be diverted away from capabilities work into compliance & audits.