I think that M only prints something after converging with Adv, and that Adv does not print anything directly to H
Abram, did you reply to that crux somewhere?
I agree that hierarchy can be used only sparingly and still be very helpful. Perhaps just nesting under the core tags, or something similar.
On special posts where that does not seem to be the case that the hierarchy holds, people can still downvote the parent tag. That is annoying, but may reduce work overall.
Also, navigating up/down with arrow keys and pressing enter should allow choice of tags with keyboard only.
1. More people would probably rank tags if it could be done directly through the tag icon instead of using the pop-up window.
2. When searching for new tags, I’d like them sorted probably by relevance (say, some preference for: being a prefix, being a popular tag, alphabetical ordering).
3. When browsing through all posts tagged with some tag, I’d maybe prefer to see higher karma posts first, or to have it factored in the ordering.
4. Perhaps it might be easier to have a hierarchy of tags—so that voting for Value learning also votes for AI Alignment say
If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
I think that the debate around the incentives to make aligned systems is very interesting, and I’m curious if Buck and Rohin formalize a bet around it afterwards.
I feel like Rohin point of view compared to Buck is that people and companies are in general more responsible, in that they are willing to pay extra costs to ensure safety—not necessarily out of a solution to a race-to-the-bottom situation. Is there another source of disagreement, conditional on convergence on the above?
Is GPI / forethought foundation missing?
No, I was simply mistaken. Thanks for correcting my intuitions on the topic!
If this is the case, this seems more like a difference in exploration/exploitation strategies.
We do have positively valenced heuristics for exploration—say curiosity and excitement
I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it’s time near what it perceives to be a local(ish) maximum.
So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get’s reinforced enough (or not punished enough), even though it is bad in total.
Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).
And kudos for the neat explanation and an interesting theoretical framework :)
I’d expect the preference at each point to mostly go in the direction of either axis.
However, this analysis should be interesting in non-cooperative games where the vector might represent a mixed strategy, with amplitude the expected payoff perhaps.
I may be mistaken. I tried reversing your argument, and I bold the part that doesn’t feel right.
Optimistic errors are no big deal. The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly.
But pessimistic errors are catastrophic. The agent will systematically make sure not to fall into behaviors that avoid high punishment, and will use loopholes to avoid penalties even if that results in the loss of something really good. So even if these errors are extremely rare initially, they can totally mess up my agent.
So I think that maybe there is inherently an asymmetry between reward and punishment when dealing with maximizers.
But my intuition comes from somewhere else. If the difference between pessimism and optimism is given by a shift by a constant then it ought not matter for a utility maximizer. But your definition goes at errors conditional on the actual outcome, which should perhaps behave differently.
Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.
I’d love to see someone analyze this thoroughly (or I’ll do it if there will be an interest). I don’t think it’s that simple, and it seems like this is the main analytical argument.
For example, if the world is symmetric in the appropriate sense in terms of what actions get you rewarded or penalized, and you maximize expected utility instead of satisficing in some way, then the argument is wrong. I’m sure there is good literature on how to model evolution as a player, and the modeling of the environment shouldn’t be difficult.
I find the classification of the elements of robust agency to be helpful, thanks for the write up and the recent edit.
I have some issues with Coherence and Consistency:
First, I’m not sure what you mean by that so I’ll take my best guess which in its idealized form is something like: Coherence is being free of self contradictions and Consistency is having the tool to commit oneself to future actions. This is going by the last paragraph of that section-
There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.
Second, the only case for Coherence is that reasons that coherence helps you make trade with your future self. My reasons for it are more strongly related to avoiding compartmentalization and solving confusions, and making clever choices in real time given my limited rationality.
Similarly, I do not view trades with future self as the most important reason for Consistency. It seems that the main motivator here for me is some sort of trade between various parts of me. Or more accurately, hacking away at my motivation schemes and conscious focus, so that some parts of me will have more votes than others.
Third, there are other mechanisms for Consistency. Accountability is a major one. Also, reducing noise in the environment externally and building actual external constraints can be helpful.
Forth, Coherence can be generalized to a skill that allows you to use your gear lever understanding of yourself and your agency to update your gears to what would be the most useful. This makes me wonder if the scope here is too large, and that gears level understanding and deliberate agency aren’t related to the main points as much. These may all help one to be trustworthy, in that one’s reasoning can judged to be adequate—including for oneself—which is the main thing I’m taking out from here.
Fifth (sorta), I have reread the last section, and I think that I understand now that your main motivation for Coherence and Consistency is that the conversation between rationalists can be made much more effective in that they can more easily understand each other’s point of view. This I view related to Game Theoretic Soundness, more than the internal benefits of Coherence and Consistency which are probably more meaningful overall.
Non-Bayesian Utilitarian that are ambiguity averse sometimes need to sacrifice “expected utility” to gain more certainty (in quotes because that need not be well defined).
Thank you very much! Excited to read it :)
If it’s simple, is it possible to publish also a kindle version?