I’m not sure how to pinpoint disagreement here.
I think it’s bad, possibly very bad, to have delusional beliefs like this. But I think by default we don’t already know how to decouple belief from intention. Saying “we’re the only ones with a plan to save the world that might work” is part belief (e.g., it implies that you expect to always find fatal flaws in others’s world-saving plans), and part intention (as in, I’m going to make myself have a plan that might work). We also can’t by default decouple belief from caring. Specialization can be interpreted as a belief that being a certain way is the best way for you to be; it’s not true, objectively, but it results in roughly the same actions. The intention to make your plans work and caring about the worlds in which you can possibly succeed is good, and if we can’t decouple these things, it might be worth having false beliefs (though of course it’s also extremely worth becoming able to decouple belief from caring and intention, and ameliorating the negative effects on the margin by forming separate beliefs about things that you are able to decouple, e.g. using explicit reason to figure out whether someone else’s plan might work, even if intuitively you’re “sure” that no one else’s plan could work).
I think it’s clearly bad to prevent feedback for the sake of protecting “beliefs”. But secrecy makes sense for other reasons. (Intentions matter because they affect many details of the implementation, which can add up to large overall effects on the outcomes.)
>”the only organization with a plan that could possibly work, and the only real shot at saving the world”
It’s definitely a warpy sort of belief. The issue to me, and why I could still feel positively about such an organization, is that the strong default for people and organizations might be a strong false lack of hope. In which case, it might be correct to have what seems like a delusional bubble of exceptionalism. It still seems to have some significant bad effects, and is still probably partly delusional, but if we don’t know how to get the magic of Hope without some delusion I don’t think that means we should throw away Hope.
>Are Leverage’s standard operating procedures auditable knowledge to outsiders?
It would be nice to live in a world where this standard were good and feasible, but I don’t think we do. Not holding this standard does open us up for the possibility of all sorts of abuse hiding in relative secrecy, but unfortunately I don’t see how to avoid that risk without becoming ineffective.
I think the things you point out are big risk factors, but to me don’t seem to indicate a “poison” in the telos of the organization. Whereas sexual/romantic stuff seems like significant evidence towards “poison”, in the sense of “it would actually be bad if these people were in power”.
I’m not sure what the meaning, if any, of the following fact is, but: I notice that I would feel very positively about Leverage as it’s portrayed here if there weren’t relationships with multiple younger subordinates (e.g. if the leader had been monogamously married), and as it is I feel mildly negative about it on net.
1. Growth quickens. 2. People notice, and are more willing to lend capital.3. Take out too many loans.4. Use your borrowed money to capture the apparatus that would make you pay your debt.5. Cancel your debt.
Reinforcing based on naively extrapolated trajectories produces double binds. We have a reinforcer R and an agent A. R doesn’t want A to be too X or too not-X. Whenever A does something that’s uncommonly X-ish, R notices that A seems to be shifting more towards X-ishness in general. If this shift continues as a trajectory, then A will end up way too X-ish. So to head that off, R negatively reinforces A. Likewise, R punishes anything that’s uncommonly not-X-ish. As an agent, A is trying to figure out which trajectory to be on. So R isn’t mistaken that A is often putting itself on trajectories which naively imply a bad end state. But, A is put in an impossible situation. R must model that R and A will continue their feedback cycle in the future.
Also worth minding that information ≠ knowledge/understanding, and that understanding behaves differently. You might not understand the first five examples, and then the sixth example lets you triangulate some crucial aspect, which allows you to go back and re-understand more deeply the previous examples.
Amnesia. When something becomes unready-to-hand for the first time, it becomes newly available for decoupled thought, and in particular, memory. When we ourselves become unready-to-hand, we’re presented with the possibility of knowing what we are. Different kinds of things are more or less available for understanding (sensibility, memory, significance, use) to gather around them. Whenever we try and fail to understand something, we have a choice what to do with toeholds left over from our aborted forays: have them dissolve, or have them continue to gather understanding. For example: “I can’t figure out how to fix the table because it involves measurements and arithmetic, and I’m bad at math so it would be a waste to try to learn fixing tables” vs. “I don’t know how to fix the table, so I’ll stop trying now, and I’ll seize chances I get to practice easier arithmetic” (not necessarily as explicit thoughts of course). Since toeholds strengthen each other, dissolution of toeholds amplifies dissolution of toeholds. Since what we are is “big” relative to our understanding, when what we are is unready-to-hand, the default attractor is dissolution of understanding (reset-and-forget). Big, deep, external events are not noticed, and if they are they are not tracked in any detail, and if they are they aren’t understood, and if they are they aren’t remembered. (Whereas events that are well-understood, even if trivial, are easily remembered.) Cf. Samo Burja’s work (remembering the collapse of civilization) and Nietzsche (remembering the death of God). Hitler, for example, is impossible to forget and also as yet impossible to remember. Yak-shaving and poetry could help. Rejecting “unfounded” abstraction is anti-helpful. Saying the same thing about the same thing is more helpful than it seems.
Deal! I’m glad we can realize gains from trade across metaphysical chasms.
Ok, actually I can see a non-Goodharting reason to care about emotional states as such, though it’s still instrumental, so isn’t what tslarm was talking about: emotional states are blunt-force brain events, and so in a context (e.g. modern life) where the locality of emotions doesn’t fit into the locality of the demands of life, emotions are disruptive, especially suffering, or maybe more subtly any lack of happiness.
For example, it makes me curious as to whether, when observing say a pre-civilization group of humans, I’d end up wanting to describe them as caring about happiness and suffering, beyond caring about various non-emotional things.
>Well I pre-theoretically care about happiness and suffering too.
That you think this, and that it might be the case, for the record, wasn’t previously obvious to me, and makes a notch more sense out of the discussion.
>I think I’ll bow out of the discussion now
Ok, thanks for engaging. Be well. Or I guess, be happy and unsufferful.
>I think we’ve both done our best, but to be blunt, I feel like I’m having to repeatedly assure you that I do mean the things I’ve said and I have thought about them, and like you are still trying to cure me of ‘mistakes’ that are only mistakes according to premises that seem almost too obvious for you to state, but that I really truly don’t share.
I don’t want to poke you more and risk making you engage when you don’t want to, but just as a signpost for future people, I’ll note that I don’t recognize this as describing what happened (except of course that you felt what you say you felt, and that’s evidence that I’m wrong about what happened).
How, in broad strokes, does one tease out the implication that one cares mainly about happiness and suffering, from the pre-theoretic caring about kids, life, play, etc.?
I mean, at risk of seeming flippant, I just want to say “basically all the values your ‘real person’ holds”?
Like, it’s just all that stuff we both think is good. Play, life, children, exploration; empowering others to get what they want, and freeing them from pointless suffering; understanding, creating, expressing, communicating, …
I’m just… not doing the last step where I abstract that into a mental state, and then replace it with that mental state. The “correctness” comes from Reason, it’s just that the Reason is applied to more greatly empower me to make the world better, to make tradeoffs and prioritizations, to clarify things, to propagate logical implications… For example, say I have an urge to harm someone. I generally decide to nevertheless not harm them, because I disagree with the intuition. Maybe it was put there by evolution fighting some game I don’t want to fight, maybe it was a traumatic reaction I had to something years ago; anyway, I currently believe the world will be better if I don’t do that. If I harm someone, they’ll be less empowered to get what they want; I’ll less live among people who are getting what they want, and sharing with me; etc.
>to some degree our disagreement here is semantic
The merely-lexical ambiguity is irrelevant of course. You responded to the top level post giving your reasons for not taking action re/ cryonics. So we’re just talking about whatever actually affects your behavior. I’m taking sides in your conflict, trying to talk to the part of you that wants to affect the world, against the part of you that wants to prevent you from trying to affect the world (by tricking your good-world-detectors).
>I see no reason to care more about my pre-theoretic good-thing detector than the ‘good-thing detector’ that is my whole process of moral and evaluative reflection and reasoning.
Reflection and reasoning, we can agree these things are good. I’m not attacking reason, I’m trying to implement reason by asking about the reasoning that you took to go from your pre-theoretic good-thing-detector to your post-theoretic good-thing judgements. I’m pointing out that there seems, prima facie, to be a huge divergence between these two. Do you see the apparent huge divergence? There could be a huge divergence without there being a mistake, that’s sort of the point of reason, to reach conclusions you didn’t know already. It’s just that I don’t at all see the reasoning that led you there, and it still seems to have produced wrong conclusions. So my question is, what was the reasoning that brought you to the conclusion that, despite what your pre-theoretic good-thing-detectors are aimed at (play, life, etc.), actually what’s a good thing is happiness (contra life)? So far I don’t think you’ve described that reasoning, only stated that its result is that you value happiness. (Which is fine, I haven’t asked so explicitly, and maybe it’s hard to describe.)
If you’re dealing with a blindspot that’s distributed across a group of people, then yes, it’s more effective to talk with people outside that group who don’t share the blindspot, because they’re less likely to collaborate with you to keep the spot blind. Obviously it’s not helpful, or even very possible, to just believe whatever other people tell you (it’s not possible to meaningfully believe something you don’t understand). Does DiAngelo actually say to do that? My impression is no, what she says is about what to do if you’re in a group blindspot.
>I think everyone who has moral or axiological opinions is making the same leap of faith at some point, or else fudging their way around it by conflating the normative and the merely descriptive
This may be right, but we can still notice differences, especially huge ones, and trace back their origins. It actually seems pretty surprising if you and I have wildly, metaphysically disparate values, and at least interesting.
> I don’t see how it implies that I shouldn’t consider happiness to be a fundamentally, intrinsically good thing
Because it’s replacing the thing with your reaction to the thing. Does this make sense, as stated?
What I’m saying is, when we ask “what should I consider to be a fundamentally good thing”, we have nothing else to appeal to other than (the learned generalizations of) those things which our happiness comes from. Like, we’re asking for clarification about what our good-thing-detectors are aimed at. So I’m pointing out that, on the face of it, your stated fundamental values—happiness, non-suffering—are actually very very different from the pre-theoretic fundamental values—i.e. the things your good-thing-detectors detect, such as having kids, living, nuturing, connecting with people, understanding things, exploring, playing, creating, expressing, etc. Happiness is a mental event, those things are things that happen in the world or in relation to the world. Does this make sense? This feels like a fundamental point to me, and I’m not sure we’ve gotten shared clarity about this.
>I don’t see anything necessarily unreasonable about wanting everyone, including me, to experience the feeling they get when their ‘world getting better’ module is firing. (And seeing that feeling, rather than whatever triggers it, as the really important thing.)
I mean, it’s not “necessarily unreasonable”, in the sense of the orthogonality thesis of values—one could imagine an agent that coherently wants certain mental states to exist. I’m saying a weaker claim: it’s just not what you actually value. (Yes this is in some sense a rude claim, but I’m not sure what else to do, given that it’s how the world seems to me and it’s relevant and it would be more rude to pretend that’s not my current position. I don’t necessarily think you ought to engage with this as an argument, exactly. More like a hypothesis, which you could come to understand, and by understanding it you could come to recognize it as true or false of yourself; if you want to reject it before understanding it (not saying you’re doing that, just hypothetically) then I don’t see much to be gained by discussing it, though maybe it would help other people.) A reason I think it’s not actually what you value is that I suspect you wouldn’t press a button that would make everyone you love be super happy, with no suffering, and none of their material aims would be achieved (other than happiness), i.e. they wouldn’t explore or have kids, they wouldn’t play games or tell stories or make things, etc., or in general Live in any normal sense of the word; and you wouldn’t press a button like that for yourself. Would you?
>You also ignore the point about the direction of travel
> The aspiration is good, but it’s still correct to see color if other people are already aggressively seeing color.
That is me agreeing about the direction of travel, and making the point that it’s a mistake to unilaterally “go all the way” while a bunch of other people haven’t gotten on the way. Does this make sense? I don’t see anything in your comment responding to what I said, other than you saying my comment assumes bad faith. Which is true, except I’m not “assuming” bad faith, I’m trying to further explain the hypothesis that’s being presented, namely that you / the context you’re embedded in contains a substrate of bad faith.
Well, the way the agent loses in ASP is by failing to be updateless about certain logical facts (what the predictor predicts). So from this perspective, it’s a SemiUDT that does update whenever it learns logical facts, and this explains why it defects.
> So it wouldn’t adopt UDT in this situation and would still two-box.
True, it’s always [updateless, on everything after now].