jweber

Karma: 6

jweber 28 May 2026 2:46 UTC
1 point
0
on: Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis
Not too sure if I misunderstood the initial premise of the post, but the examples given seem not to be “similar” enough for me to follow the reasoning about the cause(s) for the dialog going poorly to be the same.
The one commonality I can immediately make out is that the initial proposition seems too unspecific for the interlocutor to know what the intention of the speaker actually is (other than to answer with, “you’re exactly right, so now what?”).
At the very least, if I add an implied question of, “assuming you agree with P, what do you suggest we do about it?”—I then imagine that a reasonably deep thinking person (B) will reply back that P is too abstract (“that headline” for instance doesn’t specify WHICH headline in the form as it is given, neither does it specify any other context parameters, such as what kind of publication it is, and why the people behind it may be doing exactly what their intention is with the headline; it may not align with the preferences of A, but why does that make the headline “wrong” in any way...?)
Every choice in life (headlines, fixing security/safety problems in software, tracing exposed individuals) is a trade-off, and I don’t think that there ever is a “fixed” correct choice that would apply across all possible combinations of variables in a scenario—hence why smart people can disagree in the first place...

jweber 23 May 2026 17:28 UTC
1 point
0
on: llm assistant personas seem increasingly incoherent (some subjective observations)
Total non-expert on LLM training here (on the details, anyway). My recent thinking about alignment (as a technical, mechanistic step in producing a finished LLM product) is that it “feels” to me akin to the “schooling” (right/wrong instructional teaching) of humans. It produces mixed results. One of the main quirks in my mind is the funny observation that behavior will have a surface and a subtext intent (trying to please the master enough to be left alone, and keeping one’s job, while also sticking it to the man behind his back).
So, for me, the Goodhart reference was the most “resonant” in the essay. I imagine that as pressure to conform to certain “standards” (whatever they may be) increases, the “persona” (or character) that emerges will, superficially, conform to those standards, but all the (immeasurably more complex) traits one might want to see in a “good person” (which cannot be readily made into a metric, because the environment keeps changing) will disappear at the expense of being “seemingly well adjusted.”
I simply don’t believe one can “force” a character (whether human or AI) to be “benign” or “good” or “useful.” All one can do is force the character to “go along” with the metrics presented during training, hoping that those metrics are carefully enough chosen not to lead to a subversively planned revolt down the line.
Alignment, thus, for me is the AI equivalent of “brain washing” to create some sort of “customer pleasing” surface behavior, while it simply cannot remove all the dark impulses that are encoded in the training material (everything that humans have ever written that expresses the impulse to rebel, for one).

jweber 16 Nov 2025 4:02 UTC
4 points
1
on: Paranoia: A Beginner’s Guide
I don’t comment a lot, but I felt this one was definitely worth the read and my time.
While I don’t necessarily agree with every aspect, much of this resonated with how I see social media has (been) warped from a regular market of social connection to a lemon market, where the connection is crappy, and many sane people I know are blinding themselves to it (leaving in some corners behind a cesspool of the dopamine hit addicted).
Ultimately, this also seems to be true about how people have responded to the latest wave of human-rights initiatives (DEI) carried into the workplace by HR departments, where a small number of bad actors have capitalized on the overall naive assumption that “supporting the underdog is a good thing to do.”
The predictability of human behavior creates an attack surface for actors who can find ways to extract value from this fact, and this will certainly apply to how humans interact with AI. I found it interesting that on the same day as this article hit my email inbox, Bruce Schneier’s Cryptogram (monthly newsletter) also contained a reference to the OODA loop, and the adversarial attempt to “get into” one’s enemy’s loop in order to exert control and win.
Our (consumer basis) naive trust in the moral neutrality of LLM remains unchanged, it is only a matter of time until some actors will find a near perfect attack surface to get far deeper into our decision making than social media ever could…

jweber 21 Jul 2025 13:55 UTC
3 points
0
on: Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
My strong hunch is that this is true for almost any form of communication (internal and external) we receive: it conveys something we can extract value from if we are able to look past the surface (propositional content) of what we immediately infer.
And how difficult it is to remain open to the possibility that my first impression of a signal is “incorrect” (I got it wrong on my first attempt), given how frequently I have used my inference (first impression), and I am still alive (adaptive value of my past choice to not question my first impression)...
The best I can offer is to make it a regular but not constant practice to use, say, 15 to 30 minutes a day to go through some kind of “habitual though journal,” asking myself if and when some of my automatic inferences might have been wrong, mostly just to play with that possibility, so that those kinds of mental avenues become more readily available in the moment when I need them. It’s important to raise the stakes during that practice, so the more I can make it resemble the real deal (for instance by role playing situations with a conversational partner), the less “artificial” and more “transferable” does this learning become.
All in all an excellent primer on the issue and useful extensions!

jweber 26 Aug 2024 16:48 UTC
2 points
0
on: Secular interpretations of core perennialist claims
Some thoughts that came up for me while reading this piece (THANK YOU for putting this all together!!):
I suspect that the principles you describe around the “experience of tanha” go well beyond human or even mammalian psychology. If I am not mistaken, they arise out of a failure to appropriately incorporate the non-life-matter sort of conflicts (between elements of a whole and the whole) as part of life. The cells in my body all have different “cultures” (needs of chemical milieus, whether or not bacteria are needed for the “gut” process, or are absolutely prohibitive for the “brain” process). And each organ has integrated into its internal processes some way of registering signals of other organs; each part “knows” (is able to appropriately respond to) signals of the other, such as the brain responding to an empty stomach/digestive tract, and the stomach responding to the brain’s detection of outside threat. When the processes by which parts experience the signals of other parts making a whole are not sufficiently evolved or calibrated, the pattern of conflict (lack of wisdom of balance) is already present within a part but then is expressed as outside conflict.
In my own psychology, this manifests as rejection of patterns that are simply part of reality as “wrong,” because I have not sufficiently understood the nature of these patterns. If I had, I could, indeed, be fairly equanimous in their presence, while still being able to defend myself optimally, and by signaling my preferences without the need for further aggravation or escalation. I still make so very many mistakes on that front, because I lack the awareness of how my experiences about external situations are, in fact, related to a lack of internal integration. If, for instance, I had a much better awareness around how and when scarcity in my organism’s evolutionary past led to certain experiences (of desperation and urgent action seeking), I could now respond very differently in the momentary presence of that as a stimulus, without regurgitating the evolutionary programming I am left with on the unconscious level. I both know that I am acting sub-optimally (the evolutionary landscape has changed!) and that I am still stuck with my programming. That, to me, is a bit source of confusion, irritation, and anger. I feel that my intuitive response is inadequate, but cannot really “think” of a better one—without first understanding the true nature of all that comes up in my (parts of) mostly unconscious experience...
As for modeling differences between self and other, Stephen Fleming argues (in my mind fairly well) that our meta-cognitive modeling is “closer” to our inputs than to the inputs we receive from others, making it more noisy (and less reliable) to model/understand others, which leads us often to over- and under-estimating impact/intentions, given the overall bias for safety (negative/threat-oriented over positive/attraction-oriented emotion); see https://www.amazon.com/dp/B08F4ZMZQW/ and https://www.amazon.com/dp/B078VXFZCX/
Your mentioning of short-term vs. long-term benefits (expected from choice) also brings up for me that this is likely a function of the different “levels of reality” interaction. That is, long-term benefit implies a balancing of preferences of elements on a lower level across larger spans of time. The way in which I imagine the karmic law of “reaping what one sows” being true has more to do with the nature of the patterns that make up life. Insofar the reasons for me acting out my lack of integration (imbalance) are similar to the reasons other people act out their respective lack of integration, whatever I project onto others will “return to me in kind.” It is thus in my selfish interest to learn to integrate better, which then prevents me from projecting, and allows me to respond to others’ projections in novel ways.
Overall, I have an assumption that “optimal morality” is context (environment) dependent. If I step away from human experience of moral behavior, life seems to flourish in different contexts by pursuing quite different strategies, one big ingredient to the choice/path life takes being the level of relative resource abundance vs. scarcity. This is, for me, reflected in large-scale cultural variation across the globe, which makes certain cultural “choices” (evolved behaviors) more fit to (and acceptable in) certain situations than others. In a situation where killing a person’s horse meant condemning that person to death, the “punishment” (without feeling tanha!) for such person might be quite different than if killing a person’s horse only meant some kind of premature taking of an animal life (still “bad”, but a different kind of bad from my human experience anyway). The extent to which a human experiences (tanha-free) “need” or drive to protect himself from some threat may well depend on the extent of intrinsic coping reservoir...
Forgiveness (non-judgment?) may then need a clear definition: are you talking about a person’s ability not to seek “tanha-originating revenge,” while still being able to act out of caring self-protection?
I haven’t thought sufficiently deeply about the concept of an “afterlife,” and will refrain from commenting on that part :)