It makes perfect sense, but I have no easy-to-access perception of this thing. Will try to do something with this skill issue.
As someone who believes myself to have had some related experiences, this is very easy to Goodhart on and very easy to screw up badly if you try to go straight for it without [a kind of prepwork that my safety systems say I shouldn’t try to describe] first, and the part where you’re tossing that sentence out without obvious hesitation feels like an immediate bad sign. See also this paragraph from that very section (to be clear, it’s my interpretation that treats it as supporting here, and I don’t directly claim Eliezer would agree with me):
(Frankly I expect almost nobody to correctly identify those words of mine as internally visible mental phenomena after reading them; and I’m worried about what happens if somebody insists on interpreting it anyway. Seriously, if you don’t see phenomena inside you that obviously looks like what I’m describing, it means, you aren’t looking at the stuff I’m talking about. Do not insist on interpreting the words anyway. If you don’t see an elephant, don’t look under every corner of the room until you find something that could maybe be an elephant.)
Please don’t [redacted verb phrase] and passively generate a stack of pseudo-elephants that jam the area and maybe-permanently block off a ton of your improvement potential. The vast majority of human-embodied minds are not meant for that kind of access! I suspect that mine either might have been or almost was, but earlier me still managed to fuck it up in subtle ways, and I had a ton of guardrails and foresight that ~nobody around me seemed to have or even think possible, and didn’t even make the kind of grotesque errors that I imagine the kind of people who write about it the way you just did making.
Please, please just do normal, socially integrated emotional skill building instead if you can get that. This goes double if you haven’t already obviously exhausted what you can get from it (and I’d bet that most people who think of self-modification as cool also have at least a bit of “too cool for school” attitude there, with associated blindspots).
(The “learning to not panic because it won’t actually help” part is fine.)
I think I worded it poorly. I think it is an “internally visible mental phenomena” for me. I do know how it feels and have some access to this thing. It’s different from hyperstition and different from “white doublethink”/”gamification of hyperstition”. It’s easy enough to summon it on command and check, yeah, it’s that thing. It’s the thing that helps to jump in a lake from a 7-meters cliff, that helps to get up from a very comfy bed, that sometimes helps to overcome social anxiety. But I didn’t generalise from these examples to one unified concept before.
And in the cases where I sometimes do it, my skill issues are due to the fact that the access is not easy enough:
I can’t do it constantly, it takes several seconds and eats attention.
I can’t reliably remember to do when it’s most important—in highly stressful situations or when my attention is too occupied with other stuff.
Some internal processes (usually—strong negative emotions) can override it by uploading more powerful image into the script, so I follow it instead, even while understanding that it’s worse.
Also it doesn’t really work for long period of time from one uploading. (So it works best when returning to default course of action after initial decision would be hard/impossible/obviously silly/embarassing/weird.)
Do you think I’m wrong and this is a different thing?
As someone who believes myself to have had some related experiences, this is very easy to Goodhart on and very easy to screw up badly if you try to go straight for it without [a kind of prepwork that my safety systems say I shouldn’t try to describe] first, and the part where you’re tossing that sentence out without obvious hesitation feels like an immediate bad sign. See also this paragraph from that very section (to be clear, it’s my interpretation that treats it as supporting here, and I don’t directly claim Eliezer would agree with me):
Please don’t [redacted verb phrase] and passively generate a stack of pseudo-elephants that jam the area and maybe-permanently block off a ton of your improvement potential. The vast majority of human-embodied minds are not meant for that kind of access! I suspect that mine either might have been or almost was, but earlier me still managed to fuck it up in subtle ways, and I had a ton of guardrails and foresight that ~nobody around me seemed to have or even think possible, and didn’t even make the kind of grotesque errors that I imagine the kind of people who write about it the way you just did making.
Please, please just do normal, socially integrated emotional skill building instead if you can get that. This goes double if you haven’t already obviously exhausted what you can get from it (and I’d bet that most people who think of self-modification as cool also have at least a bit of “too cool for school” attitude there, with associated blindspots).
(The “learning to not panic because it won’t actually help” part is fine.)
Thanks for your concern!
I think I worded it poorly. I think it is an “internally visible mental phenomena” for me. I do know how it feels and have some access to this thing. It’s different from hyperstition and different from “white doublethink”/”gamification of hyperstition”. It’s easy enough to summon it on command and check, yeah, it’s that thing. It’s the thing that helps to jump in a lake from a 7-meters cliff, that helps to get up from a very comfy bed, that sometimes helps to overcome social anxiety. But I didn’t generalise from these examples to one unified concept before.
And in the cases where I sometimes do it, my skill issues are due to the fact that the access is not easy enough:
I can’t do it constantly, it takes several seconds and eats attention.
I can’t reliably remember to do when it’s most important—in highly stressful situations or when my attention is too occupied with other stuff.
Some internal processes (usually—strong negative emotions) can override it by uploading more powerful image into the script, so I follow it instead, even while understanding that it’s worse.
Also it doesn’t really work for long period of time from one uploading. (So it works best when returning to default course of action after initial decision would be hard/impossible/obviously silly/embarassing/weird.)
Do you think I’m wrong and this is a different thing?