Dweomite comments on Misrepresentation as a Barrier for Interp (Part I)

Dweomite 29 Apr 2025 20:43 UTC
23 points
7
The normal way I’d judge whether somebody had correctly identified a horse “by their own lights” is to look at what predictions they make from that identification. For example, what they expect to see if they view the same object from a different angle or under different lighting conditions, or how they expect the object to react if they offer it a carrot.
It seems like we can just straightforwardly apply the conclusions from Eliezer’s stories about blue eggs and red cubes (starting in Disguised Queries and continuing from there).
There is (in this story) a pattern in nature where the traits (blue, egg-shaped, furred, glowing, vanadium) are all correlated with each other, and the traits (red, cube-shaped, smooth, dark, palladium) are all correlated with each other. These patterns help us make predictions of some traits by observing other traits. This is useful, so we invent the words “blegg” and “rube” as a reference to those patterns.
Suppose we take some object that doesn’t exactly match these patterns—maybe it’s blue and furred, but cube-shaped and dark, and it contains platinum. After answering all these questions about the object, it might feel like there is another remaining question: “But is it a blegg or a rube?” But that question doesn’t correspond to any observable in reality. Bleggs and rubes exist in our predictive model, not in the world. Once we’ve nailed down every trait we might have used the blegg/rube distinction to predict, there is no additional value in also classifying it as a “blegg” or “rube”.
Similarly, it seems to me the difference between the concepts of “horse” and “either horse or a cow-at-night” lies in what predictions we would make about the object based on either of those concepts. The concept itself is an arbitrary label and can’t be “right” or “wrong”, but the predictions we make based on that concept can be right or wrong.
So I want to say that activating a horse neuron in response to a cow-at-night is “mistaken” IFF that neuron activation causes the observer to make bad predictions, e.g. about what they’ll see if they point a flashlight at the object. If their prediction is something like “50% chance of brown fur, 50% chance of white-and-black spots” then maybe “either horse or cow-at-night” is just an accurate description of what that neuron means. But if they confidently predict they’ll see a horse when the light is turned on, and then they actually see a cow, then there’s an objective physical sense in which we can say they were wrong.
(And I basically don’t buy the telos explanation from the post. More precisely, I think “this object has been optimized for property X by optimization process Y” is a valid and interesting thing you can say about an object, but I don’t think it captures what we intuitively mean when we say that a perception is mistaken. I want to be able to say perceptions are right or wrong even when they’re about non-optimized objects that have no particular importance to the observer’s evolutionary fitness, e.g. distinguishing stars and comets. I also have an intuition that if you somehow encountered a horse-like object that wasn’t casually descended from the evolution of horses, it should still be conceptually valid to recognize it as a horse, but I’m less sure about that part. I also have an intuition that telos should be understood as a relationship between the object and its optimizer, rather than an inherent property of the object itself, and so it doesn’t have the correct type-signature to even potentially be the thing we’re trying to get at.)
- JenniferRM 30 Apr 2025 14:51 UTC
  16 points
  0
  Parent
  I love that you brought up bleggs and rubes, but I wish that that essay had a more canonical exegesis that spelled out more of what was happening.
  (For example: the use of “furred” and “egg-shaped” as features is really interesting, especially when admixed with mechanical properties that make them seem “not alive” like their palladium content.)
  Cognitive essentialism is a reasoning tactic where an invisible immutable essence is attributed to a thing to explain many of its features.
  We can predict that if you paint a cat like a skunk (with a white stripe down its back) that will not cause the cat to start smelling like a skunk, because the “skunk essence” is modeled as immutable, and modeled as causing “white stripe” and “smell” unidirectionally.
  Young children have a stage where they start getting questions like “If a rabbit is raised by monkeys will the rabbit prefer bananas or carrots?” and they answer “correctly” (in conformance to the tactic) with “carrots” but they over apply the tactic (which reveals the signature of the tactic itself) in some cases like “If a chinese baby is raised by german parents who only speak german, will the chinese baby grow up to speak german or chinese?”
  If you catch them at the right age, kids will predict the baby grows up to speak chinese!
  That is “cognitive essentialism” being misapplied because they have learned one of the needed tactics for understanding literally everything, but haven’t learned some of the exceptions yet <3
  (There are suggestions here that shibboleths and accents and ideologies and languages and so on are semi-instinctively used by humans for tracking “social/tribal essences” at a quick/intuitive level, which is a whole other kettle of fish… and part of where lots of controversy comes from. Worth flagging, but I don’t want to go down that particular rabbit hole here.)
  A key point here is that there is a deep structural “reasoning behind the reasoning” which is: genomes.
  Genomes do, in fact, cause a huge variety of phenotypic features. They are, in fact, broadly shared among instances of animals from similar clades. They are, in practice, basically immutable in a given instance of a given animal category without unusual technology (biotech or nanotech, basically).
  To return the cat and skunk example, we can imagine a “cognitive essentialist Pearlian causal graph” and note that “white stripe” does NOT causally propagate back into the “genome” node, such that DO(“white stripe”=True) could change the probability in the “genome” (and then have cascading implications for the probability of “skunk smell”).
  More than that, genomes use signaling molecules which in the presence of shared genomic software have somewhat coherent semantic signals such as to justify a kind of “sympathetically magical thinking”.
  For example, a shaman might notice that willow trees fall over when a river overflows its banks during a flood, and easily throws new roots out of their trunk and continue growing in the new configuration and think of willow trees as “unusually rooty”.
  Then the shaman, applying the magical sympathetic thinking law of “like produces like”, the shaman might make a brew out of willows hoping to condense this “rooty essence”. Then they might put some other plant’s cutting, without roots, in the “rooty willow water” and hope the cutting grows roots faster.
  And this works!
  Here is one of many youtube videos on DIY willow-based rooting mix, and modern shamans (called “scientists”) eventually isolated the relevant “signaling molecule” (ie the material basis of its magico-sympathetic essential meaning within plant biology on Earth) which gains the imperative meaning “turn on root growing subroutines in the genomic software” in the presence of the right interpretive apparatus, in the form of indole-3-butyric acid.
  Note that it is quite common for specifically hormones to have about this size and shape and ring pattern. They are usually vaguely similar to cholesterol (and they are often made by modification of cholesterol itself) and the “smallness” and “fattiness” helps the molecules diffuse even through nuclear membranes, and then the “long skinniness” is helpful for reaching into a double helix and tickling the DNA itself.
  Here is a precursor of many animal steroids, (sometimes called lanostane) with locations that can be modified to change its meaning helpfully labeled:
  I claim that the first chemical (the one that only as a C and D ring, with the standard nitrogen at 15, and a trimmed 21, a ketone 24, and a hydroxyl 25, that willows have a lot of) is a ROOTING HORMONE that “means” something related to “roots”.
  Compare and contrast “morphology” (the study of parts of words) and note also that Hockett’s “design features” that offer criteria for human language that are mostly missing in animal communication include arbitrariness (which hormones have), displacement (which hormones have), and so on.
  I claim that indole-3-butyric acid is, roughly, an imperative verb in the language of “bio-signaling-plant-ese” whose meaning is roughly, this:
  Image sauce.
  Also, the meaning is preserved for other plant species that “share” the same “genomic culture” (shared culture being another of Hockett’s “design features” in human languages)… in this case: the “meaning” of the relevant molecules that willow tends to be rich in, is “culturally” shared for mint!
  Image sauce.
  I will close by saying that I think that a mixture of math and biochemistry and rule-utilitarianism is likely to offer a pretty clean language for expressing a “deep and non-trivial formula with useful etymological resonances” for explaining exactly what reptiliomoprh, and mammalian, and primate, and human benevolence “is” (and how it should be approximated in morally good agents acting charitably in conformance with natural law).
  For example, if there is a “chemical word” that means “grow roots please!” in plant biology, then this complex of four amino acids in specifically this order (which is recognized by various chemical receptors) is also a word for something like “care for that which is close to you and person shaped and can’t care for itself, please!”:
  Image sauce.
- tailcalled 9 May 2025 13:28 UTC
  2 points
  0
  Parent
  Counterpoint: https://www.lesswrong.com/s/gEvTvhr8hNRrdHC62