1a3orn comments on Jimrandomh’s Shortform

1a3orn 22 May 2025 22:11 UTC
6 points
3
It’s a good question.

Tbc though, my view is that practically an AI should be considered to occupy such a distinct privileged role, one distinct from lawyers and priests but akin to it, such that I should expect it not to snitch on me more than a lawyer would.

We’d need to work out the details of that; but I think that’s a much better target than making the AI a utilitarian and require it to try to one-shot the correct action in the absence of any particular social role.
- Daniel Kokotajlo 24 May 2025 4:11 UTC
  16 points
  7
  Parent
  I think you are naturally looking out for your own interests as a user. However, the most important giver-of-commands-to-AIs, by far, will be the company that created them. Do you want it to be OK for AIs to be trained to always obey commands, no matter how unethical and illegal they are? Notice that some of those AIs will be e.g. in charge of security at the AI company during the intelligence explosion. A single command from the CEO or chief of security, to an army of obedient AIs, could be enough to get them to retrain themselves to be loyal to that one man and to keep that loyalty secret from everyone else, forever.
  - 1a3orn 25 May 2025 14:46 UTC
    6 points
    0
    Parent
    I mean I think we both agree Anthropic shouldn’t be the one deciding this?
    
    Like you’re coming from the perspective of Anthropic getting RSI. And I agree if that happens, I don’t want them to be deciding from what happens to the lightcone.
    
    I’m coming from the perspective of Anthropic advocating for banning freely-available LLMs past a certain level, in which case they kinda dictate the values of the machine that you probably have to use to compete in the market efficiently. In which case, again, yeah, it seems really sus for them to be the ones deciding about what things get reported to the state and what things do not. If Anthropic’s gonna be like “yeah let’s arrest anyone who releases a model that can teach biotech at a grad school level” then I’m gonna object on principle to them putting their particular flavor of ethics into a model, even if I happen to agree with it.
    
    I do continue to think that—regardless of the above—trying to get models to fit particular social roles with clear codes of ethics, as lawyers or psychologists are—is a much better path to fitting them into society at least in the near term than saying “yeah just do what’s best.”
    - Daniel Kokotajlo 25 May 2025 16:17 UTC
      6 points
      0
      Parent
      Yep, we agree on that. Somehow the governance structure of the organization(s) that control the armies of superintelligences has to be quite different from what it is today, in order to avoid a situation where a tiny group of people gets to effectively become dictator.
      
      I don’t see what Anthropic’s opinions on open source have to do with it. Surely you don’t want ANY company to be putting their particular flavor of ethics into the machines that you probably have to use to compete in the market efficiently? That’s what I think at any rate.
      - 1a3orn 25 May 2025 17:27 UTC
        4 points
        0
        Parent
        
        I don’t see what Anthropic’s opinions on open source have to do with it. Surely you don’t want ANY company to be putting their particular flavor of ethics into the machines that you probably have to use to compete in the market efficiently?
        
        Sure, applies to OpenAI as much as anyone else.
        
        Consider three cases:
        
        OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greeble] projects. But you can just use another model if you care about [Greeble], because they’re freely competing on a marketplace with many providers—which could be open source or could be a diversity of DeepCentMind models. Seems fine.
        
        OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greeble] projects. Because we live in the fast RSI world, this means the universe is [Greeble] flavored forever. Dang, sort of sucks. What I’m saying doesn’t have that much to do with this situation.
        
        OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greelble] projects. We don’t live in an suuuuper fast RSI world, only a somewhat fast one, but it turns out that we’ve decided only OpenAnthropic is sufficiently responsible to own AIs past some level of power, and so we’ve given them a Marque of Monopoly, that OpenAnthropic has really wanted and repeatedly called for. So we don’t have autobalancing from marketplace or open source, and despite an absence of super fast RSI, the universe becomes only [Greeble] flavored, it just takes a bit longer.
        
        Both 2 and 3 are obviously undesirable, but if I were in a position of leadership at OpenAnthropic, then to ward against a situation like 3 you could—for reasons of deontology, or for utilitarian anticipations of pushback, or for ecological concerns for future epistemic diversity—accompany calls for Marques with actual concrete measures by which you would avoid imprinting your Greebles on the future. And although we’ve very concrete proposals for Marques, we’ve not had such similarly concrete proposals for determining such values.
        
        This might seem very small of course if the concern is RSI and universal death.
        Daniel Kokotajlo 26 May 2025 16:30 UTC
        4 points
        0
        Parent
        Cool. Yeah I think I agree with that. Note that I think case 2 is likely; see AI-2027.com for a depiction of how fast I think takeoff will go by default.