Ben Pace, the Vacationing Vagabond comments on Can corrigibility be learned safely?

Ben Pace, the Vacationing Vagabond 18 Apr 2018 23:52 UTC
LW: 6 AF: 3
0
AF
I curated this post for these reasons:
- The post helped me understand a critique of alignment via amplifying act-based agents, in terms of the losses via scaling down.
- The excellent comment section had a lot of great clarifications around definitions and key arguments in alignment from all involved. This is >50% of the reason I curated the post.
I regret I don’t have the time right now to try to summarise the specific ideas I found valuable (and what exact updates I made). Paul’s subsequent post on the definition of alignment was helpful, and I’d love to read anyone else’s attempt to summarise updates from the comment thread (added: I’ve just seen William_S’s new post following on in part from this conversation).
Biggest hesitation with curating this post:
- The post + comments require some background reading and are quite a high-effort read. I would be much happier curating this post if it allowed readers to get the picture of what it was responding to without having to visit elsewhere first (e.g. via quotes and a summary of the position being argued against).
But the comments helped make explicit some key considerations within alignment, which was the strongest variable for me. Thanks Wei Dai for starting and moving forward this discussion.