StanislavKrym comments on StanislavKrym’s Shortform

StanislavKrym 28 Apr 2025 23:17 UTC
1 point
0
If we were to view raising the humans from birth to adulthood and training the AI agents from birth to deployment as similar processes, then what human analogues do the six goal types from the AI-2027 forecast have? The analogues of developers are, obviously, the adults who have at least partial control over the human’s life. Then the analogues of written Specs and developer-intended goals are the adults’ intentions; the analogues of reward/reinforcement seems to be short-term stimuli and the morals of one’s communities. I also think that the best analogue for proxies and/or convergent goals is possession of resources (and knowledge, but the latter can be acquired without ethical issues), while the ‘other goals’ are, well, ideologies, morality^[1] and tropes absorbed from the most concentrated form of training data available to humans, which is speech in all its forms.
What exactly do the analogies above tell us about the perspectives of alignment? The possession of resources is the goal behind aggressive wars, colonialism and related evils^[2]. If human culture managed to make them unacceptable, then does it imply that the AI will also not try the AI takeover?
1. ^
  I also think that humans rarely develop their own moral codes or ideologies; instead, they usually adopt some moral code or ideology close to the one existing in the “training data”. Could anyone comment on this?
2. ^
  And crimes, but criminals, unlike colonizers, also try to avoid conflicts with the law enforcers that have at least similar power.
What links here?
- StanislavKrym's comment on Anthropomorphizing AI might be good, actually by Seth Herd (5 May 2025 0:16 UTC; 3 points)