I’m curious what “put it in my SuperMemo” means. Quick googling only yielded SuperMemo as a language learning tool.
I agree it’s sort of the same problem under the hood, but I think knowing how you’re going to go from “understanding understanding” to producing an understandable model controls what type of understanding you’re looking for.
I also agree that this post makes ~0 progress on solving the “hard problem” of transparency, I just think it provides a potentially useful framing and creates a reference for me/others to link to in the future.
Yeah, I agree 95% is a bit high.
One way of looking at DDT is “keeping it dumb in various ways.” I think another way of thinking about is just designing a different sort of agent, which is “dumb” according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they’re just sacrificing utility for no reason.
There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.
I agree that the maximum a posterior world doesn’t help that much, but I think there is some sense in which “having uncertainty” might be undesirable.
has been changed to imitation, as suggested by Evan.
Yeah, you’re right that it’s obviously unsafe. The words “in theory” were meant to gesture at that, but it could be much better worded. Changed to “A prototypical example is a time-limited myopic approval-maximizing agent. In theory, such an agent has some desirable safety properties because a human would only approve safe actions (although we still would consider it unsafe).”
Yep—I switched the setup at some point and forgot to switch this sentence. Thanks.
This is brilliant.
I am using the word “causal” to mean d-connected, which means not d-seperated. I prefer the term “directly causal” to mean A->B or B->A.
In the case of non-effects, the improbable events are “taking Benadryl” and “not reacting after consuming an allergy”
I agree market returns are equal in expectation, but you’re exposing. yourself to more risk for the same expected returns in the “I pick stocks” world, so risk-adjusted returns will be lower.
I sometimes roleplay as someone role playing as myself, then take the action that I would obviously want to take, e.g. “wow sleeping regularly gives my character +1 INT!” and “using anki every day makes me level up 1% faster!”
If X->Z<-Y, then X and Y are independent unless you’re conditioning on Z. A relevant TAP might thus be:
Trigger: I notice that X and Y seem statistically dependent
Action: Ask yourself “what am I conditioning on?”. Follow up with “Are any of these factors causally downstream of both X and Y?” Alternatively, you could list salient things causally downstream of either X or Y and check the others.
This TAP unfortunately abstract because “things I’m currently conditioning on” isn’t an easy thing to list, but it might help.