jacob_cannell comments on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.)

jacob_cannell 15 Oct 2023 22:45 UTC
4 points
−1
If ordinary people have access to human-genius-level AGIs, then there will be many AGIs at that level (along with some far more powerful above them) and thus these weaker agents almost certainly won’t be dangerous unless a significant fraction are not just specifically misaligned in the most likely failure mode (selfish empowerment), but co-aligned specifically against humanity in their true utility functions (ie terminal rather than just instrumental values). These numerous weak AGI are not much more dangerous to humanity than psycopaths (unaligned to humanity yes, but also crucially unaligned with each other).

EY/MIRI? has a weird argument about AIs naturally coordinating because they can “read each others source code”, but that wouldn’t actually cause true alignment of utility functions, just enable greater cooperation, and regardless is not really compatible with how DL AGI works. There are strong economic/power incentives against sharing source code (open source models lag), it’s also only really useful for deterministic systems and ANNs are increasingly non-deterministic and moving towards BNNs in that regard, and too difficult to verify against various spoofing mechanisms regardless (even if an agent’s source code is completely avail and you have a full deterministic hash chain, difficult to have any surety the actual agent isn’t in some virtual prison with other agent(s) actually in control unless it’s chain amounts to enormous compute ).