Programmer.
MinusGix
(Note: I’ve only read a few pages so far, so perhaps this is already in the background)
I agree that if the parent comment scenario holds then it is a case of the upload being improper.
However, I also disagree that most humans naturally generalize our values out of distribution. I think it is very easy for many humans to get sucked into attractors (ideologies that are simplifications of what they truly want; easy lies; the amount of effort ahead stalling out focus even if the gargantuan task would be worth it) that damage their ability to properly generalize and also importantly apply their values. That is, humans have predictable flaws. Then when you add in self-modification you open up whole new regimes.
My view is that a very important element of our values is that we do not necessarily endorse all of our behaviors!
I think a smart and self-aware human could sidestep and weaken these issues, but I do think they’re still hard problems. Which is why I’m a fan of (if we get uploads) going “Upload, figure out AI alignment, then have the AI think long and hard about it” as that further sidesteps problems of a human staring too long at the sun. That is, I think it is very hard for a human to directly implement something like CEV themselves, but that a designed mind doesn’t necessarily have the same issues.
As an example: power-seeking instinct. I don’t endorse seeking power in that way, especially if uploaded to try to solve alignment for Humanity in general, but given my status as an upload and lots of time realizing that I have a lot of influence over the world, I think it is plausible that instinct affects me more and more. I would try to plan around this but likely do so imperfectly.
A core element is that you expect acausal trade among far more intelligent agents, such as AGI or even ASI. As well that they’ll be using approximations.
Problem 1: There isn’t going to be much Darwinian selection pressure against a civilization that can rearrange stars and terraform planets. I’m of the opinion that it has mostly stopped mattering now, and will only matter even less over time. As long as we don’t end up in a “everyone has an AI and competes in a race to the bottom”. I don’t think it is that odd that an ASI could resist selection pressures. It operates on a faster time-scale and can apply more intelligent optimization than evolution can, towards the goal of keeping itself and whatever civilization it manages stable.
Problem 2: I find it somewhat plausible there’s some nicely sufficiently pinned down variables that can get us to a more objective measure. However, I don’t think it is needed and most presentations of this don’t go for an objective distribution.
So, to me, using a UTM that is informed by our own physics and reality is fine. This presumably results in more of a ‘trading nearby’ sense, the typical example being across branches, but in more generality. You have more information about how those nearby universes look anyway.The downside here is that whatever true distribution there is, you’re not trading directly against it. But if it is too hard for an ASI in our universe to manage, then presumably many agents aren’t managing to acausally trade against the true distribution regardless.
I think you’re referring to their previous work? Or you might find it relevant if you didn’t run into it. https://www.lesswrong.com/posts/ifechgnJRtJdduFGC/emergent-misalignment-narrow-finetuning-can-produce-broadly
If you were pessimistic about LLMs learning a general concept of good/bad, then yes, that should update you. However, I think it still has the main core problems. If you are doing a simple continual learning loop (LLM → output → retrain to accumulate knowledge; analogous to ICL) then we can ask the question of how robust this process is. Do the values of how to behave drastically diverge. Such as, are there attractors over a hundred days of output that it is dragged towards that aren’t aligned at all? Can it be jail-broken wittingly or not by getting the model to produce garbage responses that it is then trained on? And then arguments like ‘does this hold up under reflection’ or ’does it attach itself to the concept of good or chatgpt-influenced good (or evil). So while LLMs being capable of learning good is, well, good, there are still big targeting, resolution, and reflection issues.
For this post specifically, I believe it to be bad news. It provides evidence that subtle reward hacking scenarios encourage the model to act misaligned in a more general manner. It is likely quite nontrivial to get rid of reward-hacking like behavior in our larger and larger training runs. So if the model gets into a period of time where reward-hacking is rewarded, a continual learning scenario is easiest to imagine but even in training, then it may drastically change its behavior.
I have some of the same feeling, but internally I’ve mostly pinned it to two prongs of repetition and ~status.
ChatGPT’s writing is increasingly disliked by those who recognize it. The prose is poor in various ways, but I’ve certainly read worse and not been so off-put. Nor am I as off-put when I first use a new model, but then I increasingly notice its flaws over the next few weeks. The main aspect is that the generated prose is repetitive across the writings which ensures we can pick up on the pattern. Such as making it easy to predict flaws. Just as I avoid many generic power fantasy fiction as much of it is very predictable in how it will fall short even though many are still positive value if I didn’t have other things to do with my time.
So, I think a substantial part is that of recognizing the style, there being flaws you’ve seen in many images in the past, and then regardless of whether this specific actual image is that problematic, the mind associates it with negative instances and also being overly predictable.
Status-wise this is not entirely in a negative status game sense. A generated image is a sign that it was probably not that much effort for the person making it, and the mind has learned to associate art with effort + status to a degree, even if indirect effort + status by the original artist the article is referencing. And so it is easy to learn a negative feeling towards these, which attaches itself to the noticeable shared repetition/tone. Just like some people dislike pop in part due to status considerations like being made by celebrities or countersignaling of not wanting to go for the most popular thing, and then that feeds into an actual dislike for that style of musical art.
But this activates too easily, a misfiring set of instincts, so I’ve deliberately tamped it down on myself; because I realized that there are plenty of images which five years ago I would have been simply impressed and find them visually appealing. I think this is an instinct that is to a degree real (generated images can be poorly made), while also feeding on itself that makes it disconnected from past preferences. I don’t think that the poorly made images should notably influence my enjoyment of better quality images, even if there is a shared noticeable core. So that’s my suggestion.
Anecdotally, I would perceive “Bowing out of this thread” as a more negative response because it encapsulates both topic as well as the quality of my response or behavior of myself. While “not worth getting into” is mostly about the worth of the object level matter. (Though remarking on behavior of the person you’re arguing with is a reasonable thing to do, I’m not sure that interpretation is what you intend)
I disagree. Posts seem to have an outsized effect and will often be read a bunch before any solid criticisms appear. Then are spread even given high quality rebuttals… if those ever materialize.
I also think you’re referring to a group of people who write high quality posts typically and handle criticism well, while others don’t handle criticism well. Despite liking many of his posts, Duncan is an example of this.As for Said specifically, I’ve been annoyed at reading his argumentation a few times, but then also find him saying something obvious and insightful that no one else pointed out anywhere in the comments. Losing that is unfortunate. I don’t think there’s enough “this seems wrong or questionable, why do you believe this?”
Said is definitely more rough than I’d like, but I also do think there’s a hole there that people are hesitant to fill.So I do agree with Wei that you’ll just get less criticism, especially since I do feel like LessWrong has been growing implicitly less favorable towards quality critiques and more favorable towards vibey critiques. That is, another dangerous attractor is the Twitter/X attractor, wherein arguments do exist but they matter to the overall discourse less than whether or not someone puts out something that directionally ‘sounds good’. I think this is much more likely than the sneer attractor or the linkedin attractor.
I also think that while the frontpage comments section has been good for surfacing critique, it encourages the “this sounds like the right vibe” substantially. As well as a mentality of reading the comments before the post, encouraging faction mentality.
Because Said is an important user who provides criticism/commentary across many years. This is not about some random new user, which is why there is a long post in the first place rather than him being silently banned.
Alicorn is raising a legitimate point. That it is easy to get complaints about a user who is critical of others, that we don’t have much information about the magnitude, and that it is far harder to get information about users who think his posts are useful.LessWrong isn’t a democracy, but these are legitimate questions to ask because they are about what kind of culture (as Habryka talks about) LW is trying to create.
I find this surprising. The typical beliefs I’d expect are 1) Disbelief that models are conscious in the first place; 2) believing this is mostly signaling (and so whether or not model welfare is good, it is actually a negative update about the trustworthiness of the company); 3) That it is costly to do this or indicates high cost efforts in the future. 4) Effectiveness
I suspect you’re running into selection issues of who you talked to. I’d expect #1 to come up as the default reason, but possibly the people you talk to were taking precautionary principle seriously enough to avoid that.
The objections you see might come from #3. That they don’t view this as a one-off cheap piece of code, they view it as something Anthropic will hire people for (which they have), which “takes” money away from more worthwhile and sure bets. This is to some degree true, though I find those X odd as Anthropic isn’t going to spend on those groups anyway. However, for topics like furthering AI capabilities or AI safety then, well, I do think there is a cost there.
How did you arrive at this belief? Like, the thing that I would be concerned with is “How do I know that Russel’s teapot isn’t just beyond my current horizon”?
Empirical evidence of being more in tune with my own emotions, generally better introspection, and in modeling why others make decisions. Compared to others. I have no belief that I’m perfect at this, but I do think I’m generally good at it and that I’m not missing a ‘height’ component to my understanding.
Is it possible, do you think, that the way you’re doing analysis isn’t sufficient, and that if you were to be more careful and thorough, or otherwise did things differently, your experience would be different? If not, how do you rule this out, exactly? How do you explain others who are able to do this?
Because, (I believe) the impulse to dismiss any sort of negativity or blame once you understand the causes deep enough is one I’ve noticed myself. I do not believe it to be a level of understanding that I’ve failed to reach, I’ve dismissed it because it seems an improper framing.
At times the reason for this comes from a specific grappling with determinism and choice that I disagree with.
For others, the originating cause is due to considering kindness as automatically linked with empathy, with that unconsciously shaping what people think is acceptable from empathy.
In your case, some of it is tying it purely to prediction that I disagree with, because of some mix of kindness-being-the-focus, determinism, a feeling that once it has been explained in terms of the component parts that there’s nothing left, and other factors that I don’t know because they haven’t been elucidated.Empirical exploration as in your example can be explanatory. However, I have thought about motivation and the underlying reasons to a low granularity plenty of times (impulses that form into habits, social media optimizing for short form behaviors, the heuristics humans come with which can make doing it now hard to weight against the cost of doing it a week from now, how all of those constrain the mind...), which makes me skeptical. The idea of ‘shift the negativity elsewhere’ is not new, but given your existing examples it does not convince me that if I spent an hour with you on this that we would get anywhere.
“because they’re bad/lazy/stupid”/”they shouldn’t have” or whatever you want to round it to, but these things are semantic stopsigns, not irreducible explanations.
This, for example, is a misunderstanding of my position or the level of analysis that I’m speaking of. Wherein I am not stopping there, as I mentally consider complex social cause and effect and still feel negative about the choices they’ve made.
Yet as you grieve, these things come up less and less frequently. Over time, you run out of errant predictions like “It’s gonna be fun to see Benny when—Oh fuck, no, that’s not happening”. Eventually, you can talk about their death like it’s just another thing that is, because it is.
Grief like this exists, but I don’t agree that it is pure predictive remembrance. There is grief which lasts for a time and then fades away, not because my lower level beliefs are prediction to see them—away from home and a pet dies, I’m still sad, not because of prediction error but because I want (but wants are not predictions) the pet to be alive and fine, but they aren’t. Because it is bad, to be concise.
You could try arguing that this is ‘prediction that my mental model will say they are alive and well’, with two parts of myself in disagreement, but that seems very hard to determine the accuracy as an explanation and I think is starting to stretch the meaning of prediction error. Nor does the implication that ‘fully knowing the causes’ carves away negative emotion follow?
I’m holding the goal posts even further forward though. Friendly listening is one thing, but I’m talking about pointing out that they’re acting foolish and getting immediate laughter in recognition that you’re right. This is the level of ability that I’m pointing at. This is what is what’s there to aim for, which is enabled by sufficiently clear maps.
This is more about socialization ability, though having a clear map helps. I’ve done this before, with parents and joking with a friend about his progress on a project, but I do not do so regularly nor could I do it in arbitrarily. Joking itself is only sometimes the right route, the more general capability is working a push into normal conversation, with joking being one tool in the toolbox there. I don’t really accept the implication ‘and thus you are mismodeling via negative emotions if you can not do that consistently’. I can be mismodeling to the degree that I don’t know precisely what words will satisfy them, but that can be due to social abilities.
The big thing I was hoping you’d notice, is that I was trying to make my claims so outrageous and specific so that you’d respond “You can’t say this shit without providing receipts, man! So lets see them!”. I was daring you to challenge me to provide evidence. I wonder if maybe you thought I was exaggerating, or otherwise rounding my claims down to something less absurd and falsifiable?
When you don’t provide much argumentation, I don’t go ‘huh, guess I need to prod them for argumentation’ I go ‘ah, unfortunate, I will try responding to the crunchy parts in the interests of good conversation, but will continue on’. That is, the onus is on you to provide reasons. I did remark that you were asserting without much backing.
I was taking you literally, and I’ve seen plenty of people fall back without engaging—I’ve definitely done it during the span of this discussion, and then interpreting your motivations through that. ‘I am playing a game to poke and prod at you’ is uh.....
Anyway, there are a few things in your comment that suggest you might not be having fun here. If that’s the case, I’m sorry about that. No need to continue if you don’t want, and no hard feelings either way.
A good chunk of it is the ~condescension. Repeated insistence while seeming to mostly just continue on the same line of thought without really engaging where I elaborate, goalpost gotcha, and then the bit about Claude when you just got done saying that it was to ‘test’ me; which it being to prod me being quite annoying in-of-itself.
Of course, I think you have more positive intent behind that. Pushing me to test myself empirically, or pushing me to push back on you so then you can push back yourself on me to provide empirical tests (?), or perhaps trying to use it as an empathy test for whether I understand you. I’m skeptical of you really understanding my position given your replies.I feel like I’m being better at engaging at the direct level, while you’re often doing ‘you would understand if you actually tried’, when I believe I have tried to a substantial degree even if nothing precisely like ‘spend two hours mapping cause and effect of how a person came to these actions’.
The thing that I was missing then, and which you’re missing now, is that the bar for deep careful analysis is just a lot higher than you think (or most anyone thinks). It’s often reasonable to skimp out and leave it as “because they’re bad/lazy/stupid”/”they shouldn’t have” or whatever you want to round it to, but these things are semantic stopsigns, not irreducible explanations.
No, I believe I’m fully aware the level of deep careful analysis, and I understand why it pushes some people to sweep all facets of negativity or blame away, I just think they’re confused because their understanding of emotions/relations/causality hasn’t updated properly alongside their new understanding of determinism
“I’m annoyed that the calculator doesn’t work… without batteries?” How do you finish the statement of annoyance?
Because I wanted the calculator to work, I think it is a good thing for calculators in stores to work, I am frustrated that the calculator didn’t work… none of this is exotic, nor is it purely prediction error. (nor do prediction error related emotions have to go away once you’ve explained the error… I still feel emotional pain when a pet dies even if I realize all the causes why; why would that not extend to other emotions related to prediction error?)
Empirically, what happens, is that you can keep going and keep going, until you can’t, and at that point there’s just no more negative around that spot because it’s been crowded out. It doesn’t matter if it’s annoyance, or sadness, or even severe physical pain. If you do your analysis well, the experience shifts, and loses its negativity.
You assert this but I still don’t agree with it. I’ve thought long and hard about people before and the causes that make them do things, but no, this does not match my experience. I understand the impulse that encourages sweeping away negative emotions once you’ve found an explanation, like realizing that humanities’ lack of coordination is a big problem, but I can still very well feel negative emotions about that despite there being an explanation.
In other words, there are reasons for their choices. Do you understand why they chose the way they did?
Relatively often? Yes. I don’t blame people for not outputting the code for an aligned AGI because it is something that would have been absurdly hard to reinforce in yourself to become the kind of person to do that.
If someone has a disease that makes so they struggle to do much at all, I am going to judge them a hell of a lot less. Most humans have the “disease” that they can’t just smash out the code for an aligned AGI.
I can understand why someone is not investing more time studying, and I can even look at myself and relatively well pin down why, and why it is hard to get over that hump… I just don’t dismiss the negative feeling even though I understand why. They ‘could have’, because the process-that-makes-their-decisions is them and not some separate third-thing.
I fail to study when I should because a combination of short-term optimized positive feeling seeking which leads me to watching youtube or skimming X, a desire for faster intellectual feelings that are easier gotten from arguing on reddit (or lesswrong) than slowly reading through a math paper, because I fear failure, and much more. Yet I still consider that bad, even if I got a full causal explanation it would have still been my choices.
Regardless, I do not have issues getting along with someone even if I experience negative emotions about how they’ve failed to reach farther in the past—just like I can do so even if their behavior, appearance, and so on are displeasing. This will be easier if I do something vaguely like John’s move of ‘thinking of them like a cat’, but it is not necessary for me to be polite and friendly.
Notice the movement of goal posts here? I’m talking about successfully helping people, you’re saying you can “get along”. Getting along is easy. I’m sure you can offer what passes as empathy to the girl with the nail in her head, instead of fighting her like a beliggerent dummy.
I don’t have issues with helping people, there “goalposts” moved forward again, despite nothing in my sentence meaning I can’t help people. My usage of ‘get along’ was not the bare minimum meaning.
Getting along with people in the nail scenario often means being friendly and listening to them. I can very well do that, and have done it many times before, while still thinking their individual choices are foolish.
I don’t think your comment has supplied much more beyond further assertions that I must surely not be thinking things through.
Yes. But also that people are still making those choices.
Yes. But I would point out that ‘punishment’ in the moral sense of ‘hurt those who do great wrongs’ still holds just fine in determinism for the same reasons it originally did, though I personally am not much of a fan
Yes, just like I can be happy in a situation where that doesn’t help me.
“if my brain was in their body, then I wouldn’t...” or “if I had their resources, then I wouldn’t...”, which is saying you’re only [80]% that person. You’re leaving out a part of them that made them who they are.
No, it is more that I am evaluating from multiple levels. There is
basic empathy: knowing their own standards and feeling them, understanding them.
‘idealized empathy’: Then I often have extended sort of classical empathy where I am considering based on their higher goals, which is why I often mention ideals. People have dreams they fail to reach, and I’d love them to reach further, and yet it disappoints me when they falter because my empathy reaches towards those too.
Values: Then of course my own values, which I guess could be considered the 80% that person, but I think I keep the levels separate; all the considerations have to come together in the end. I do have values about what they do, and how their mind succeeds.
Some commenters seemingly don’t consider the higher ideals sort or they think of most people in terms of short-term values; others are ignoring the lens of their own values.
So I think I’m doing multiple levels of emulation, of by-my-values, in-the-moment, reflection, etc. They all inform my emotions about the person.
I remember being 9 years old & being sad that my friend wasn’t going to heaven. I even thought “If I was born exactly like them, I would’ve made all the same choices & had the same experiences, and not believe in God”. I still think that if I’m 100% someone else, then I would end up exactly as they are.
And I agree. If I ‘became’ someone I was empathizing with entirely then I would make all their choices. However, I don’t consider that notably relevant! They took those actions, yes influenced by all there is in the world, but what else would influence them? They are not outside physics. Those choices were there, and all the factors that make up them as a person were what decided their actions.
If I came back to a factory the next day and notice the steam engine failed, I consider that negative even when knowing that there must have been a long chain of cause and effect. I’ll try fixing the causes… which usually ends up routing through whatever human mind was meant to work on the steam engine as we are very powerful reflective systems. For human minds themselves that have poor choices? That often routes back through themselves.
I do think that the hard-determinist stance often, though of course not always, comes from post-Christian style thought which views the soul as atomically special, but that they then still think of themselves as ‘needing to be’ outside physics in some important sense rather than fully adapting their ontology. That choices made within determinism are equivalent to being tied up by ropes, when there is actually a distinction between the two scenarios.
Now, you could still argue #2, that these negative emotions set correct incentives. I’ve only heard second-hand of extreme situations where that worked [1], but most of the time backfires
A negative emotion can still push me to spend more effort on someone, though it usually needs to be paired with a belief that they could become better. Just because you have a negative emotion doesn’t mean you only output negative-emotion flavored content. I’ll generally be kind to people even if I think their choices are substantially flawed and that they could improve themselves.
I do think that the example of your teacher is one that can work, I’ve done it at least once though not in person, and it helped but it definitely isn’t my central route. This is effectively the ‘staging an intervention’ methodology, and it can be effective but requires knowledge and benefits greatly from being able to push the person.
But, as John is making the point, a negative emotion may not be what people are wanting, because I’m not going to have a strong kindness about how hard someone’s choices were… when I don’t respect those choices in the first place. However, giving them full positive empathy is not necessarily good either, it can feel nice but rarely fixes things. Which is why you focus on ‘fixing things’, advice, pointing out where they’ve faltered, and more if you think they’ll be receptive. They often won’t be, because most people have a mix of embarrassment at these kinds of conversations and a push to ignore them.
I’ve considered that myself before, part of the response I eventually got to was that my standards don’t have to lower. I can just have high standards. Just as my morality can be demanding regardless that I fail to reach its demands.
That is, my answer to the Draco-style thing is that it is good to encourage him to get better. To notice that he was worse, that he’s gotten better and that is an improvement. Just as someone who was a hitman-for-hire giving up on that because of a moral revelation and being merely a sneak-thief is still a win.
They are still a person who fails, who does not reach my bar; I hold disgust for their actions even within their newly-better state, but that I can still encourage them to become better. I still hold my bar higher than they are at.
The main problematic part of this stance is that of linking your emotions and actions to it, of feeling disquiet that you and everyone around you fails to reach the brilliant gleaming stars they could be, and then still being happy. Trying to improve, not out of guilt, but out of a sheer desire to do better, to see the world grow.
I really liked Replacing Guilt by So8res, not just in the avoiding relying on guilt part, but of instilling a view of reaching for more.
Hermione’s issue is one of blame and not quite understanding change, of still blaming Draco for his actions before he improved himself, of thinking that because Draco had failed so harshly he couldn’t be recovered. Whereas Harry views Draco as someone he can convince and tempt to become a better person, because Draco can choose to be better, that his failures are not intrinsic to him as a person. The issue is not precisely her blame, I can still be angry at someone for their actions before they changed though it loses impact, but rather the lack of a drive to push Draco to a higher point. So the issue is not a bar, but rather the willingness/belief of dragging them up to the bar.
If your reasoning results in “I can’t have negative emotions about things where I deeply understand the causes”, then I think you’ve made a misstep.
You could, yes, but it would require mismodeling them as someone who could do more than they actually can given the very real limitations which you may or may not understand yet.
They could have done more. The choices were there in front of them, and they failed to choose them.
I will feel more positive flavored emotions like kindness/sadness if they’re pushed into hard choices where they have to decide between becoming closer to their ideal or putting food on the table; with the converse of feeling substantially less positive when the answer is they were browsing dazedly browsing social media. With enough understanding I could trace back the route which led to them relying more and more on social media as it fills some hole of socialization they lack, is easy to do, … and still retain my negative emotions while holding this deeper understanding.
Accurately modeling people, and credibly conveying these accurate models so that they can recognize and trust that you have accurately modeled them, is incredibly important for helping people. Good luck getting people to open themselves to your help while you view them as disgusting.
I disagree that I am inaccurately modeling them, because I dispute the absolute connection between negative emotion and prediction error in the first place. I can understand them. I can accurately feel the mental pushes that push against their mind; I’ve felt them myself many times. And yet still be disquieted, disappointed in their actions.
Regardless, I do not have issues getting along with someone even if I experience negative emotions about how they’ve failed to reach farther in the past—just like I can do so even if their behavior, appearance, and so on are displeasing. This will be easier if I do something vaguely like John’s move of ‘thinking of them like a cat’, but it is not necessary for me to be polite and friendly.
Word-choice implication nitpick: Common usage of lower expectations means a mix of literal prediction and also moral/behavioral standards. I might have a ‘low expectation’ in the sense that a friend rarely arrives on time while still holding them ‘high expectations’ in the what-is-good sense!
This is just kicking the can one step further. You can still be annoyed, but you can no longer be annoyed at “the stupid calculator!” for not working. You have to be annoyed at the company for not including batteries—if you can pull that one off.
No, I can be annoyed at the calculator and the company. There’s no need for my annoyance to be moved down the chain like I only have 1 Unit of Annoyance to divvy out. Or, you can view it as cumulative if that makes more sense, that it ties back into the overall emotions on the calculator. If I learn that supplying batteries is illegal, my annoyance with the company does decrease, but then it gets more moved primarily to the authorities. Some remains still, and I’m still annoyed at the calculator despite understanding why it doesn’t have a battery.
I do think the calculator metaphor starts to break apart, because a calculator is not the system that feeds-back-on-itself to then decide on no batteries.
Humans are complex, and I love them for it, their decisions, mindset, observations, thought processes, and so much more loop back in on themselves to shape the actions they take in the world. …That includes both their excellent actions where they do great things, reach farther, become closer to their ideals… as well as when they falter, when they get ground down by short-term optimization leaving them unable to focus on ways to improve themselves, and find themselves falling short. But that does mean my negative emotions will be more centered on humans, on their beliefs and more. Some of this negative evaluation bleeds off to social media companies optimizing short-form content feeds, or society in vague generality for lack of ambition, but as I said before it isn’t 1 Unit of Annoyance to spread around like jam.
That is, you’re talking this like the concept of blame, when negative emotions and blame are not necessarily the same thing. Paired with this: You appear to be implicitly taking a hard determinist sort of stance, wherein concepts like blame and ‘being able to choose otherwise’ start dissolving, but I find that direction questionable in the first-place. We can still judge people’s decisions, it is normal that their actions are influenced by their interactions with the world, and I can still feel negative emotions about their choices. That they were not able to do better, that their decisions did not go elsewise, that they failed to reinforce good decisions and more.
Interesting, I really enjoyed Replacing Guilt, but if anything it made me more more willing/able/fine-with experiencing a disquiet or deep disappointment at other’s actions. It made the ways to improve more obvious while helping to detach it from guilt-based motivation. I was still, as John phrases it, having conditional self-love but it was less short-term and less based around guilt, but still about reaching-farther and doing-more.
I think a lot of people automatically connect empathic-kindness to a ‘this is fine’ stance, I see a lot of it in how people phrase things in the comments of this post, and I notice it in myself because I, well, empathize with John because I have similar feelings at times even if seemingly not as strong.
So, it can feel risky to get rid of that, because in a way it is part of how I keep my standards up. That I desire/require more from people, that I dream for both myself and them to be better, and some amount of disquiet or even disgust is a useful tool there. I’m still polite, but it serves as a fuel.It is certainly possible to get around without that. However I look at various people I respect that have high standards and they seem to have some degree of this though perhaps they don’t conceptualize it as related to empathy, and then I look at others who I do see lowering their standards and being more wishy-washy over time due to pure ~positive-tinged empathy. Sadness at their faltering is a more passive drive in a lot of ways, disgust helps both in pushing oneself to improve and also in my experience with convincing friends of mine to try for more. Though, of course, I am going to be helpful and friendly even as I find their faltering disquieting. So it feels like to deliberately switch in such a way risks part of the mind that maintains its own standards.
Huh, I think most people have problems like this, though they’re at times more self-aware about it than the nail video. Many people, including myself, have flaws that would require an investment of time/effort but if done- even one-time investments for some- would give good improvements to their life whether through better mental health or through being closer to their ideal self. Classic examples being cleaning your room, fixing a part of a house that’s breaking down everyone keeps putting off, exercising more, reading that math paper now instead of a month from now, asking someone out, and so on.
The nail video is hyperbolic, but I don’t see it as excessively so, and I do think it illustrates the core issue of how people relate to their own minds while not necessarily being willing/able to go actually fix it or potentially recognize it.
Understanding crowds out prediction error, it does not necessarily crowd out negative emotions, which is part of the point of this article.
That is, I understand the last paragraph, but it does not then go ‘thus I feel kindness’ necessarily. There may be steps to take to try to help them up, but that does not necessitate kindness, I can feel disgust at someone I know who could do so much more while still helping them. Possibly one phrasing of it as based on your calculator example, is there’s no need for there to be a “lower expectations” step. I can still have the dominant negative emotion that the calculator and the calculator company did not include a battery, even if I understand why.
One missing part of the post is causing the largest degree of disconnect, the lack of explaining their internal reasons/beliefs/way-they-were-shaped that made them like this, and of understanding that. Regardless of whether John in actuality has an issue in empathizing, or whether a short post just left out the obvious, I do think the core argument still importantly holds.
You can understand and feel people’s emotions without their own opinion on their mental state and emotions becoming a dominant factor, which is a core confusion repeated in the other comments.
Someone comes to me with a nail in their head? I can understand that they feel stressed, tired out, are used to the pain and have stopped considering it as something to fix.… while still having my primary emotion be a strong disquiet that they’re faltering. That means my empathy might be unkind, because how they are shaped and how they handle their life is shaped by my empathy and my own values.I can construct a slightly more idealized version of themselves in my head, knowing that a decent chunk of their problems would be solved without the nail, that they’d be happier, while knowing that their current way of thinking about things poisons the idea of removing the nail itself. I can also construct a version that is elaborated upon by my own values, because I never just consider their mind, I also consider how I think of them, how I interact with them, how they make me feel. And my values importantly intercede in here, and deliberately having empathy, considering what they feel in-depth and focused, can make me dislike them more.
Like in John’s examples, they’ve failed to live up to his ideals, and also very likely failed to live up to any of their own ideals. Most people want more, desire more, but short-sighted near-term optimization has ground away the parts of them that reach for more. It is not a strange or exotic observation that people fail to reach their ideals.That is, I think a lot of the responses to John’s post are making the same understanding mistake he’s objecting to, that it must be a kindness or it is fake empathy. I can understand and empathize with the reasons that they can barely consider removing the nail, while still then the final result of all my empathy is that I know they’ve failed. I’m saddened for them yet disgusted by their failures, of the knowledge of what they could have been, of knowing that they’re constrained in such ways like a nail in their skull they “just” need to break free.
An intuition pump here that is more dramatic than college students slacking is that of a government official who became corrupt due to needing money. If I considered detailed knowledge of their life I would empathize, see how they got put into unfavorable positions while trying to pay medical bills, how they desperately tried to convince themselves, how it grew worse over time, how they failed to report on themselves in moments of awareness… and then I very likely would still be disgusted by what they’ve become, how they’ve failed to meet my own standards and likely their own. Even harsher potentially, knowing the ways they’ve faltered and considered and then faltered again and again (even though I understand why, that I may have even made similar choices in such a scenario).
I don’t hold this as strong as John seemingly does. I have noticed this in myself too, that I emotionally understand and know why they do not adjust themselves to do better- because I fail at fully applying that myself- but that does not make me more kind precisely. Some mix of not believing in my heart that kindness is the right response, because people often connect kindness to their emotions of ‘this is fine’, and also a general belief that it simply is a lacking, that understanding and feeling it myself does not mean I hold a positive emotion towards it.
To be eloquent: A god’s eye view where all flaws are drawn clear at hand, each of them hard to justify even if understandable, drawn into a tapestry of regrets. Seeing the whole tapestry just makes the pattern clearer.
Empathy does not just stop when you consider that their life shaped them that way! Empathy is part of emulating their mind, why they might behave in this way, and then dragging that back to how your mind understands things. Empathy is still and should be integrated within your understanding of the world, your values, even if you understand that they are shaped differently. You can still be displeased, unhappy that they’ve been shaped in a certain way, disquieted because they do not grasp for all that they can be. I feel empathy for someone who has faltered, failed to reach even if they do not even want to reach for more because of how they’ve been affected by life.
As well, even if I take the strict latter definition and proceed from their mental frame entirely, that still entails some degree of disquiet. People often wish that they could be more, find things easier, be less stressed but then fail to take routes that lead to that which are visible from the outside but hard to see from the inside.
From my current stance, it is plausible, because we haven’t settled how we think of aliens (especially those who are significantly outside of our behaviors) philosophically. I most likely don’t respect arbitrary intelligent agents, as I’d be for getting rid of a vulnerable paperclipper if we found one on the far edges of the galaxy.
Then, I think you’re not extrapolating mentally how much that computronium would give. From our current perspective the logic makes sense: where we upload the aliens regardless even if you respect their preferences beyond that, because it lets you simulate vastly more aliens or other humans at the same time.
I expect we care about their preferences. However those preferences will end up to some degree subordinate to our own preferences, the clear obvious being that we probably wouldn’t allow them an ASI depending on how attack/defense works, but the other being that we may upload them regardless due to the sheer benefits.
Beyond that I disagree how common that motivation is. I think the kind of learning we know naturally results in that, limited social agents modeling each other in an iterated environment, is currently not on track to apply to AI.… and that another route is “just care strategically” especially if you’re intelligent enough. I feel this is extrapolating a relatively modern human line of thought to arbitrary kinds of minds.