Extremely low probability events are great as intuition pumps, but terrible as real world decisionmaking.
sayan
[Question] What is your Personal Knowledge Management system?
Quick question. Given that now the Conservative Agency paper is available, what am I missing if I just read the paper and not this post? It seems easier to me to follow the notations of the paper. Is there any significant difference between the formalization of this post and the paper?
This is an amazingly comprehensive and useful paper. I wish it was longer with little summaries of some papers it references, rather than just citing them.
I also wish somebody creates a video version of it in the spirit of CGP Grey’s video on the classic Bostrom paper, so that I can just redirect people to the video instead of sub-optimally trying to explain all these things myself.
I have started to write a series of rigorous introductory blogposts on Reinforcement Learning for people with no background in it. This is totally experimental and I would love to have some feedback on my draft. Please let me know if anyone is interested.
[Question] Unknown Unknowns in AI Alignment
Would CIRL with many human agents realistically model our world?
What does AI alignment mean with respect to many humans with different goals? Are we implicitly assuming (with all our current agendas) that the final model of AGI is to being corrigible with one human instructor?
How do we synthesize goals of so many human agents into one utility function? Are we assuming solving alignment with one supervisor is easier? Wouldn’t having many supervisors restrict the space meaningfully?
I think this post is broadly making two claims -
-
Impactful things fundamentally feel different.
-
A good Impact Measure should be designed in a way that it strongly safeguards against almost any imperfect objective.
It is also (maybe implicitly) claiming that the three properties mentioned completely specify a good impact measure.
I am looking forward to reading the rest of the sequence with arguments supporting these claims.
-
Seems like this has been done already.
Is there a good bijection between specification gaming and wireheading vs different types of Goodhart’s law?
Speculation: People never use pro-con lists to actually make decisions, they rather use them rationalizingly to convince others.
The internet might be lacking multiple kind of curation and organization tools? How can we improve?
Are Dharma traditions that posit ‘innate moral perfection of everyone by default’ reasoning from the just world fallacy?
Can we have a market with qualitatively different (un-interconvertible) forms of money?
It is so difficult to understand the difference and articulate in pronunciation some accent that is not one’s native, because of the predictive processing of the brain. Our brains are constantly appropriating signals that are closely related to the known ones.
Just finished reading Yuval Noah Harari’s new book 21 Lessons for the 21st Century. Primary reaction: even if you already know all the things being presented in the book, it is worth a read just because of the clarity into the discussion the book offers.
Enjoyed reading this. Looking forward to the next posts in the sequence.
How would signalling/countersignalling work in a post-scarcity economy?
What are some effective ways to reset the hedonic baseline?
As far as I understand, this post decomoses ‘impact’ into value impact and objective impact. VI is dependent on some agent’s ability to reach arbitrary value-driven goals, while OI depends on any agent’s ability to reach goals in general.
I’m not sure if there exists a robust distinction between the two—the post doesn’t discuss any general demarcation tool.
Maybe I’m wrong, but I think the most important point to note here is that ‘objectiveness’ of an impact is defined not to be about the ‘objective state of the world’ - rather about how ‘general to all agents’ an impact is.