Vladimir_Nesov comments on Mis-Understandings’s Shortform

Vladimir_Nesov 11 Mar 2025 5:13 UTC
2 points
0
This does suggest some moderation in stealthy autonomous self-improvement, in case alignment is hard, but only to the extent that things in control if this process (whether human or AI) are both risk averse and sufficiently sane. Which won’t be the case for most groups of humans and likely most early AIs. The local incentive of greater capabilities is too sweet, and prompting/fine-tuning overcomes any sanity or risk-aversion that might be found in early AIs to impede development of such capabilities.
- Mis-Understandings 11 Mar 2025 20:12 UTC
  1 point
  0
  Parent
  I agree that on the path to becoming very powerful, we would expect autonomous self-improvement to involve doing some things that are in retrospect somewhat to very dumb. It also suggests that risk-aversion is sometimes a safety increasing irrationality to grant a system.