Thoughts on the Feasibility of Prosaic AGI Alignment?
I’d like to preface by saying that I am not an expert on AI by any means, nor am I remotely involved with any kind of research or studies relevant to ML. I have no insight regarding any of the technical or mathematical aspects of discussions about this technology, and only deal in abstracts.
If you’re still reading this:
Let’s assume two things: (A) that the scaling hypothesis will continue to provide real-world empirical evidence that it’s a plausible approach to AGI (such as with GPT), and (B), that bigger, more well-funded institutions (such as Deepmind, GoogleBrain, and MicrosoftAI) will shift focus from building an AGI that results from or shows something new being revealed about intelligence to adopting OpenAI’s strategy of simply throwing more compute and hardware at the problem to get results (something that they actually have the resources to do in an uncomfortably short-term timeframe).
Whatever you believe (https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang?commentId=jbD8siv7GMWxRro43) to be the actual likelihood of (B), please just humor me for the sake of discussion.
If you consider both assumptions (A) and (B) to be true with high probability, then you’re ultimately conceding that a prosaic AGI is the kind we’re most likely to build. This is discounting the unfortunately less-likely (imo) possibility that another, fundamentally different approach will succeed first.
I say “unfortunately” due to the fact that, by my understanding, most approaches towards AGI alignment (use MIRI as an example) aren’t relevant to the alignment of a prosaic AGI.
That’s not to say that there aren’t approaches towards this issue, because there are (https://www.lesswrong.com/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai). The problem is that these proposals have caveats that make institutions that I hold in very high regard (MIRI) consider these approaches to be almost certainly impossible.[(https://www.lesswrong.com/posts/Djs38EWYZG8o7JMWY/paul-s-research-agenda-faq), (https://www.lesswrong.com/posts/S7csET9CgBtpi7sCh/challenges-to-christiano-s-capability-amplification-proposal)]
But regardless, there is still some debate regarding whether or not Yudkowsky’s objections to the current proposals are as much of a knock-down argument in favor of their irreducible impossibility. (https://www.lesswrong.com/posts/3nDR23ksSQJ98WNDm/developmental-stages-of-gpts)
So, I ask, what are your personal takes? Is prosaic alignment almost certainly impossible, or is there a non-negligible amount of hope by your own intuition or evidence?