Bunthut comments on Trojan Sky

Bunthut 13 Mar 2025 8:53 UTC
13 points
−2
for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans.
Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans—your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”, but if the same adversarial examples work on minds with very different architectures, then that’s clearly not why they exist. Instead, they have to be explained by some higher-level cognitive factors shared by ~anyone who gets good at interpreting a wide range of visual data.
The really obvious adversarial example of this kind in human is like, cults, or so
Cults use much stronger means than is implied by adversarial examples. For one, they can react to and reinforce your behaviour—is a screen with text promising you things for doing what it wants, with escalating impact and building a track record an adversarial example? No. Its potentially worrying, but not really distinct from generic powerseeking problems. The cult also controls a much larger fraction of your total sensory input over an extended time. Cult members spreading the cult also use tactics that require very little precision—there isn’t information transmitted to them on how to do this, beyond simple verbal instructions. Even if there are more precision-needing ways of manipulating individuals, its another thing entirely to manipulate them into repeating those high precision strategies that they couldn’t themselves execute correctly on purpose.
if you’re not personally familiar with hypnosis
I think I am a little bit. I don’t think that means what you think it does. Listening-to-action still requires comprehension of the commands, which is much lower bandwidth than vision, and its a structure thats specifically there to be controllable by others, so it’s not an indication that we are controllable to others in other bizzare ways. And you are deliberately not being so critical—you haven’t, actually, been circumvented, and there isn’t really a path to escalating power—just the fact youre willing to oey someone in a specific context. Hypnosis also ends on its own—the brain naturally tends back towards baseline, implanting a mechanism that keeps itself active indefinitely is high-precision.
- the gears to ascension 13 Mar 2025 13:54 UTC
  2 points
  0
  Parent
  
  your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”,
  
  I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I’ve expressed some of it, but I’ve been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I’d guess that that’s a common problem, seems like the sort of thing science exists to solve.) I’m likely not well equipped to present justifiedly-convincing-to-highly-skeptical-careful-evaluator claims about this, just detailed sketches of hunches and how I got them.
  
  Your points about the limits of hypnosis seem reasonable. I agree that the foothold would only occur if the receiver is being “paid-in-dopamine”-or-something hard enough to want to become more obedient. We do seem to me to see that presented in the story—the kid being concerningly fascinated by the glitchers right off the bat as soon as they’re presented. And for what it’s worth, I think this is an exaggerated version of a thing we actually see on social media sometimes, though I’m kind of bored of this topic and would rather not expand on that deeply.