Noosphere89 comments on What are the best arguments for/against AIs being “slightly ‘nice’”?

Noosphere89 25 Nov 2024 19:44 UTC
3 points
1
In retrospect, I am more pessimistic about AI having small amounts of niceness making humans live, and I now think that some amount of stronger alignment than pseudokindness is necessary to make humans survive with AI (but maybe not as strong as MIRI thinks), essentially because niceness to humans requires giving up opportunities to save compute on modeling the world, which is anti-incentivized by AI companies:

https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H
- Dakara 3 Dec 2024 10:12 UTC
  1 point
  0
  Parent
  Do you think that the scalable oversight/iterative alignment proposal that we discussed can get us to the necessary amount of niceness to make humans survive with AGI?
  - Noosphere89 3 Dec 2024 16:13 UTC
    4 points
    2
    Parent
    My answer is basically yes.
    
    I was only addressing the question “If we basically failed at alignment, or didn’t align the AI at all, but had a very small amount of niceness, would that lead to good outcomes?”