Maybe the real issue is we don’t know what AGI will be like, so we can’t do science on it yet. Like pre-LLM alignment research, we’re pretty clueless.
Yes, this is part of the issue. It’s something I’ve personally said in various places in the past.
I think we’re basically in a position where, “hopefully AIs in the current paradigm continue to be safe with our techniques and allows us to train the ‘true’ AGI safely and does not lead to a sloppy output despite it intending to be helpful.”
In case this is helpful to anyone, here are resources that have informed my thinking:
Yes, this is part of the issue. It’s something I’ve personally said in various places in the past.
I think we’re basically in a position where, “hopefully AIs in the current paradigm continue to be safe with our techniques and allows us to train the ‘true’ AGI safely and does not lead to a sloppy output despite it intending to be helpful.”
In case this is helpful to anyone, here are resources that have informed my thinking:
[1] https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2?commentId=adS78sYv5wzumQPWe
[2] https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe
[3] https://www.lesswrong.com/posts/QqYfxeogtatKotyEC/training-ai-agents-to-solve-hard-problems-could-lead-to
[4] https://www.lesswrong.com/posts/trzFrnhRoeofmLz4e/insofar-as-i-think-llms-don-t-really-understand-things-what
[5] https://shash42.substack.com/p/automated-scientific-discovery-as
[6] https://www.dwarkesh.com/p/ilya-sutskever-2
[7] https://minihf.com/posts/2025-06-25-why-arent-llms-general-intelligence-yet/
[8] https://www.lesswrong.com/posts/apHWSGDiydv3ivmg6/varieties-of-doom