In category theory, one learns that good math is like kabbalah, where nothing is a coincidence. All short terms ought to mean something, and when everything fits together better than expected, that is a sign that one is on the right track, and that there is a pattern to formalize.π∗X=argmaxa∑ypx(x,a,y)(RX(x,y)+γVX(y)) and Vx=maxa∑ypx(x,a,y)(RX(x,y)+γVX(y)) can be replaced by π∗X=argmaxaEX and EX=∑ypx(x,a,y)(RX(x,y)+γmaxaEX). I expect that the latter formation is better because it is shorter. Its only direct effect would be that you would write maxaEX instead of VX, so the previous sentence must cash out as this being a good thing. Indeed, it points out a direction in which to generalize. How does your math interact with quantilization? I plan to expand when I’ve had time to read all links.
In category theory, one learns that good math is like kabbalah, where nothing is a coincidence.
OK, I think I see what inspired your question.
If you want to give this kind of give the math the kabbalah
treatment, you may also look at the math in [EFDH16], which produces
agents similar to my definitions (4) (5), and also some variants that have different
types of self-reflection. In the
later paper here, Everitt et
al. develop some diagrammatic models of this type of agent
self-awareness, but the models are not full definitions of the agent.
For me, the main questions I have about the math developed in the
paper is how exactly I can map the model and the constraints (C1-3)
back to things I can or should build in physical reality.
There is a thing going on here (when developing agent models, especially when treating AGI/superintelligence and embeddeness) that also often happens in post-Newtonian
physics. The equations work, but if we attempt to map these equations
to some prior intuitive mental model we have about how reality or decision
making must necessarily work, we have to conclude that this attempt raises some
strange and troubling questions.
I’m with modern physics here (I used to be an experimental physicist
for a while), where the (mainstream) response to this is that ‘the math
works, your intuitive feelings about how X must necessarily work are
wrong, you will get used to it eventually’.
BTW, I offer some additional interpretation of a
difficult-to-interpret part of the math in section 10 of my 2020
paper here.
How does your math interact with quantilization?
You could insert quantilization in several ways in the model. Most
obvious way is to change the basic definition (4). You might also
define a transformation that takes any reward function R and returns
a quantilized reward function Rq, this gives you a different type of
quantilization, but I feel it would be in the same spirit.
In a more general sense, I do not feel that quantilization can produce
the kind of corrigibility I am after in the paper. The effects you
get on the agent by changing f0 into fc, by adding a balancing
term to the reward function, are not the same effects produced by
quantilization.
In category theory, one learns that good math is like kabbalah, where nothing is a coincidence. All short terms ought to mean something, and when everything fits together better than expected, that is a sign that one is on the right track, and that there is a pattern to formalize.π∗X=argmaxa∑ypx(x,a,y)(RX(x,y)+γVX(y)) and Vx=maxa∑ypx(x,a,y)(RX(x,y)+γVX(y)) can be replaced by π∗X=argmaxaEX and EX=∑ypx(x,a,y)(RX(x,y)+γmaxaEX). I expect that the latter formation is better because it is shorter. Its only direct effect would be that you would write maxaEX instead of VX, so the previous sentence must cash out as this being a good thing. Indeed, it points out a direction in which to generalize. How does your math interact with quantilization? I plan to expand when I’ve had time to read all links.
OK, I think I see what inspired your question.
If you want to give this kind of give the math the kabbalah treatment, you may also look at the math in [EFDH16], which produces agents similar to my definitions (4) (5), and also some variants that have different types of self-reflection. In the later paper here, Everitt et al. develop some diagrammatic models of this type of agent self-awareness, but the models are not full definitions of the agent.
For me, the main questions I have about the math developed in the paper is how exactly I can map the model and the constraints (C1-3) back to things I can or should build in physical reality.
There is a thing going on here (when developing agent models, especially when treating AGI/superintelligence and embeddeness) that also often happens in post-Newtonian physics. The equations work, but if we attempt to map these equations to some prior intuitive mental model we have about how reality or decision making must necessarily work, we have to conclude that this attempt raises some strange and troubling questions.
I’m with modern physics here (I used to be an experimental physicist for a while), where the (mainstream) response to this is that ‘the math works, your intuitive feelings about how X must necessarily work are wrong, you will get used to it eventually’.
BTW, I offer some additional interpretation of a difficult-to-interpret part of the math in section 10 of my 2020 paper here.
You could insert quantilization in several ways in the model. Most obvious way is to change the basic definition (4). You might also define a transformation that takes any reward function R and returns a quantilized reward function Rq, this gives you a different type of quantilization, but I feel it would be in the same spirit.
In a more general sense, I do not feel that quantilization can produce the kind of corrigibility I am after in the paper. The effects you get on the agent by changing f0 into fc, by adding a balancing term to the reward function, are not the same effects produced by quantilization.