Shared the draft with you. Feel free to comment and question.
sayan
Shared the draft with you. Please let me know your feedback.
This is an amazingly comprehensive and useful paper. I wish it was longer with little summaries of some papers it references, rather than just citing them.
I also wish somebody creates a video version of it in the spirit of CGP Grey’s video on the classic Bostrom paper, so that I can just redirect people to the video instead of sub-optimally trying to explain all these things myself.
Just finished reading Yuval Noah Harari’s new book 21 Lessons for the 21st Century. Primary reaction: even if you already know all the things being presented in the book, it is worth a read just because of the clarity into the discussion the book offers.
[Question] Unknown Unknowns in AI Alignment
I am interested!
[Question] What is your Personal Knowledge Management system?
Quick question. Given that now the Conservative Agency paper is available, what am I missing if I just read the paper and not this post? It seems easier to me to follow the notations of the paper. Is there any significant difference between the formalization of this post and the paper?
Where is the paradigm for Effective Activism? On a first thought, it doesn’t even seem to be difficult to do better than status quo.
Would CIRL with many human agents realistically model our world?
What does AI alignment mean with respect to many humans with different goals? Are we implicitly assuming (with all our current agendas) that the final model of AGI is to being corrigible with one human instructor?
How do we synthesize goals of so many human agents into one utility function? Are we assuming solving alignment with one supervisor is easier? Wouldn’t having many supervisors restrict the space meaningfully?
What’s a good way to force oneself outside their comfort zone where most expectations and intuitions routinely fail?
This might become useful to build antifragility about expectation management.
Quick example—living without money in a foreign nation.
Is it possible to design a personal or group retreat for this?
What are the possible failure modes of AI-aligned Humans? What are the possible misalignment scenarios? I can think of malevolent uses of AI tech to enforce hegemony and etc etc. What else?
Pathological examples of math are analogous to adversarial examples in ML. Or are they?
The internet might be lacking multiple kind of curation and organization tools? How can we improve?
Speculation: People never use pro-con lists to actually make decisions, they rather use them rationalizingly to convince others.
Extremely low probability events are great as intuition pumps, but terrible as real world decisionmaking.
Is there a good bijection between specification gaming and wireheading vs different types of Goodhart’s law?
It is so difficult to understand the difference and articulate in pronunciation some accent that is not one’s native, because of the predictive processing of the brain. Our brains are constantly appropriating signals that are closely related to the known ones.
If there is no self, what are we going to upload to the cloud?
I have started to write a series of rigorous introductory blogposts on Reinforcement Learning for people with no background in it. This is totally experimental and I would love to have some feedback on my draft. Please let me know if anyone is interested.