tailcalled comments on faul_sname’s Shortform

tailcalled 4 Mar 2025 10:27 UTC
4 points
0
Homeostatic agents are easily exploitable by manipulating the things they are maintaining or the signals they are using to maintain them in ways that weren’t accounted for in the original setup. This only works well when they are basically a tool you have full control over, but not when they are used in an adversarial context, e.g. to maintain law and order or to win a war.
As capabilities to engage in conflict increase, methods to resist losing to those capabilities have to get optimized harder. Instead of thinking “why would my coding assistant/tutor bot turn evil?”, try asking “why would my bot that I’m using to screen my social circles against automated propaganda/spies sent out by scammers/terrorists/rogue states/etc turn evil?”.
Though obviously we’re not yet at the point where we have this kind of bot, and we might run into law of earlier failure beforehand.
- faul_sname 4 Mar 2025 11:19 UTC
  4 points
  1
  Parent
  I agree that a homeostatic agent in a sufficiently out-of-distribution environment will do poorly—as soon as one of the homeostatic feedback mechanisms starts pushing the wrong way, it’s game over for that particular agent. That’s not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that’s game over for the maximizer.
  
  This only works well when they are basically a tool you have full control over, but not when they are used in an adversarial context, e.g. to maintain law and order or to win a war.
  
  Sorry, I’m having some trouble parsing this sentence—does “they” in this context refer to homeostatic agents? If so, I don’t think they make particularly great tools even in a non-adversarial context. I think they make pretty decent allies and trade partners though, and certainly better allies and trade partners than consequentialist maximizer agents of the same level of sophistication do (and I also think consequentialist maximizer agents make pretty terrible tools—pithily, it’s not called the “Principal-Agent Solution”). And I expect “others are willing to ally/trade with me” to be a substantial advantage.
  
  As capabilities to engage in conflict increase, methods to resist losing to those capabilities have to get optimized harder. Instead of thinking “why would my coding assistant/tutor bot turn evil?”, try asking “why would my bot that I’m using to screen my social circles against automated propaganda/spies sent out by scammers/terrorists/rogue states/etc turn evil?”.
  
  Can you expand on “turn evil”? And also what I was trying to accomplish by making my comms-screening bot into a self-directed goal-oriented agent in this scenario?
  - tailcalled 4 Mar 2025 12:19 UTC
    2 points
    0
    Parent
    That’s not something unique to homeostatic agents, though. If a model-based maximizer has some gap between its model and the real world, that gap can be exploited by another agent for its own gain, and that’s game over for the maximizer.
    I don’t think of my argument as model-based vs heuristic-reactive, I mean it as unbounded vs bounded. Like you could imagine making a giant stack of heuristics that makes it de-facto act like an unbounded consequentialist, and you’d have a similar problem. Model-based agents only become relevant because they seem like an easier way of making unbounded optimizers.
    If so, I don’t think they make particularly great tools even in a non-adversarial context. I think they make pretty decent allies and trade partners though, and certainly better allies and trade partners than consequentialist maximizer agents of the same level of sophistication do (and I also think consequentialist maximizer agents make pretty terrible tools—pithily, it’s not called the “Principal-Agent Solution”). And I expect “others are willing to ally/trade with me” to be a substantial advantage.
    You can think of LLMs as a homeostatic agent where prompts generate unsatisfied drives. Behind the scenes, there’s also a lot of homeostatic stuff going on to manage compute load, power, etc..
    Homeostatic AIs are not going to be trading partners because it is preferable to run them in a mode similar to LLMs instead of similar to independent agents.
    Can you expand on “turn evil”? And also what I was trying to accomplish by making my comms-screening bot into a self-directed goal-oriented agent in this scenario?
    Let’s say a think tank is trying to use AI to infiltrate your social circle in order to extract votes. They might be sending out bots to befriend your friends to gossip with them and send them propaganda. You might want an agent to automatically do research on your behalf to evaluate factual claims about the world so you can recognize propaganda, to map out the org chart of the think tank to better track their infiltration, and to warn your friends against it.
    However, precisely specifying what the AI should do is difficult for standard alignment reasons. If you go too far, you’ll probably just turn into a cult member, paranoid about outsiders. Or, if you are aggressive enough about it (say if we’re talking a government military agency instead of your personal bot for your personal social circle), you could imagine getting rid of all the adversaries, but at the cost of creating a totalitarian society.
    (Realistically, the law of earlier failure is plausibly going to kick in here: partly because aligning the AI to do this is so difficult, you’re not going to do it. But this means you are going to turn into a zombie following the whims of whatever organizations are concentrating on manipulating you. And these organizations are going to have the same problem.)
- Mateusz Bagiński 4 Mar 2025 11:03 UTC
  3 points
  −3
  Parent
  Unbounded consequentialist maximizers are easily exploitable by manipulating the things they are optimizing for or the signals/things they are using to maximize them in ways that weren’t accounted for in the original setup.
  - tailcalled 4 Mar 2025 14:32 UTC
    6 points
    3
    Parent
    That would be ones that are bounded so as to exclude taking your manipulation methods into account, not ones that are truly unbounded.
    - Mateusz Bagiński 4 Mar 2025 16:00 UTC
      2 points
      0
      Parent
      I interpreted “unbounded” as “aiming to maximize expected value of whatever”, not “unbounded in the sense of bounded rationality”.
      - tailcalled 4 Mar 2025 16:44 UTC
        3 points
        0
        Parent
        The defining difference was whether they have contextually activating behaviors to satisfy a set of drives, on the basis that this makes it trivial to out-think their interests. But this ability to out-think them also seems intrinsically linked to them being adversarially non-robust, because you can enumerate their weaknesses. You’re right that one could imagine an intermediate case where they are sufficiently far-sighted that you might accidentally trigger conflict with them but not sufficiently far-sighted for them to win the conflicts, but that doesn’t mean one could make something adversarially robust under the constraint of it being contextually activated and predictable.
        Mateusz Bagiński 5 Mar 2025 16:52 UTC
        2 points
        0
        Parent
        Alright, fair, I misread the definition of “homeostatic agents”.