Anon User comments on Stopping unaligned LLMs is easy!

Anon User 3 Feb 2025 19:06 UTC
1 point
0
You ask a superintendent LLM to design a drug to cure a particular disease. It outputs just a few tokens with the drug formula. How do you use a previous gen LLM to check whether the drug will have some nasty humanity-killing side-effects years down the road?
Edited to add: the point is that even with a few tokens, you might still have a huge inferential distance that nothing with less intelligence (including humanity) could bridge.
- Yair Halberstadt 3 Feb 2025 19:08 UTC
  2 points
  −1
  Parent
  That violates assumption one (a single pass cannot produce super intelligent output).
  - Anon User 4 Feb 2025 14:46 UTC
    1 point
    0
    Parent
    Yes, and I was attempting to illustrate why this is a bad assumption. Yes, LLMs subject to unrealistic limitations are potentially easier to align, but that does not help, unfortunately.
    - Yair Halberstadt 4 Feb 2025 15:50 UTC
      2 points
      0
      Parent
      I don’t see how you’ve shown it’s a bad assumption?