ADifferentAnonymous comments on General alignment plus human values, or alignment via human values?

ADifferentAnonymous 22 Oct 2021 19:27 UTC
6 points
0
I think your argument does show that ‘safely aligning’ an AI requires significant engagement with human values. But I’m not convinced that it requires ‘learning human values’ well enough to successfully optimize the world.

In particular, I think it might be easier to recognize when effects are morally neutral than to recognize when they’re improvements. Or at least I don’t think the argument here convincingly shows that it isn’t.
- Stuart_Armstrong 25 Oct 2021 9:58 UTC
  2 points
  0
  Parent
  My thought is that when deciding to take a morally neutral act with tradeoffs, the AI needs to be able to balance the positive and negative to get a reasonable acceptable tradeoff, and hence needs to know both positive and negative human values to achieve that.