You ask a superintendent LLM to design a drug to cure a particular disease. It outputs just a few tokens with the drug formula. How do you use a previous gen LLM to check whether the drug will have some nasty humanity-killing side-effects years down the road?
Edited to add: the point is that even with a few tokens, you might still have a huge inferential distance that nothing with less intelligence (including humanity) could bridge.
Yes, and I was attempting to illustrate why this is a bad assumption. Yes, LLMs subject to unrealistic limitations are potentially easier to align, but that does not help, unfortunately.
You ask a superintendent LLM to design a drug to cure a particular disease. It outputs just a few tokens with the drug formula. How do you use a previous gen LLM to check whether the drug will have some nasty humanity-killing side-effects years down the road?
Edited to add: the point is that even with a few tokens, you might still have a huge inferential distance that nothing with less intelligence (including humanity) could bridge.
That violates assumption one (a single pass cannot produce super intelligent output).
Yes, and I was attempting to illustrate why this is a bad assumption. Yes, LLMs subject to unrealistic limitations are potentially easier to align, but that does not help, unfortunately.
I don’t see how you’ve shown it’s a bad assumption?