RogerDearnaley comments on AI #43: Functional Discoveries

RogerDearnaley 22 Dec 2023 6:18 UTC
1 point
0
Suppose the model is superhumanly good at things like persuasion, hacking, blackmail, and making money by while-collar crime. It could thus have human dupes, controlled pawns, or bought allies assisting it, including setting up a second copy of it. You need to figure out that it needs to be shut down, and do so, before it goes rogue enough to do any of those sorts of things.
- Gerald Monroe 26 Dec 2023 8:04 UTC
  2 points
  0
  Parent
  This could eventually happen. However I want to point out that present generation ai technology which is still subhuman needs a lot of hardware to host the model. Distributing across a cluster means you need to send the activations from your slices through the tensor graph to other cards in the cluster. These activations are large in byte size. This means distributed computers cannot host current sota models in any meaningful way.
  
  What this means is current generation AI technology and any known optimization won’t allow you to host a model anywhere but a cluster interconnected with a high bandwidth bus.
  
  Obviously in the future this level of performance may some day compress to fit on a single node but today subhuman performance needs about 128 H100s. Human performance may need around 1000. What will superhuman performance require? 2000 or will it be 10-100k h100 equivalents. (The reason to think that is there appears to be a logarithmic relationship between compute and model capabilities. So superhuman performance may require not 2x compute but a several oom more to be “obviously” superhuman)
  
  This implies a way to prevent such attacks—regulate and track the existence of all clusters capable of hosting models smart enough to be dangerous. This doesn’t even need to slow down progress, simply having accountability of all the hardware in known worldwide data centers would make the attacks you describe much more difficult.
  
  The reason is that if any of those methods of attack allow the model to be traced back to the racks that are hosting it, humans can simply identify what model was running and blacklist it across all human controlled data centers.
  
  This would make betrayal expensive.
  
  This would also require international coordination and data sharing to be feasible.
  - RogerDearnaley 26 Dec 2023 10:00 UTC
    1 point
    0
    Parent
    I can currently host an open-source model fairly close to the GPT-3 Turbo level (a little slowly, quantized) on an expensive consumer machine, even a fancy laptop (and as time passes, algorithmic and Moore’ Law improvements will only make this easier). The GPT-4 Turbo level is roughly an order of magnitude more. Something close enough to an AGI to be dangerous is probably going to be 1-4 orders of magnitude more, so currently would need somewhere between $O (100)$ and $O (100, 000)$ GPUs (except that if the number is in the upper end of that range, we’re not going to get AGI until algorithmic and Moore’ Law improvements have brought this down somewhat). If that number is $O (1000)$ or more, then for as long as it stays in that range, keeping track of datacenters with that many machines probably isn’t enormously hard, though obviously harder if the AGI has human accomplices. But once it gets down to $O (100)$ that gets more challenging, and at $O (10)$ , almost any room in any building in any developed country could conceal a server with that many GPUs.
    - Gerald Monroe 26 Dec 2023 17:21 UTC
      2 points
      0
      Parent
      We should have a dialogue on this. Lots to unpack here:
      The prediction is for flops/$, per here. It’s a reasonable belief to expect that the processes that lead to doubling of flops/$ over (2,3) years will continue for some number of years. In addition we can simply look from the other direction. Could the flops/$ be increased by a factor of 2 overnight? Probably. If it costs $3200 per H100 to build it, and we assume 60% utilization and 5 cents per kWh, then it will cost 0.6 * 700 watts * 24 * 365 * 5 years * .05 $/kwh = $919 of power before obsolescence. Nvidia currently charges approximately $27,000 for the hardware, another $27k over 5 years for a software license fee. There are also data center costs that are fixed based on the power consumption and rack space consumed I am neglecting, these will not go down if the GPUs become cheaper.
      What this means is that it’s not necessarily true that in the future there will be O(10) GPU sized devices that consume current GPU power levels and interconnect by a common bus like PCIe that can host an AGI or ASI. It may get cheaper but you may still need lots of cards and power and special interconnects like factory soldered fiber optic.
      Restricting the bus controllers for the interconnects on compute cards sold on the open market would likely prevent the scenario you describe from being possible. If the GPUs you can buy at the store lack the bus controller (ideally you left the chiplets off, there is no possibility of it being enabled later) then it limits compute to exactly 1 GPU before performance falls off a cliff. These “edge” GPUs could be used for things like realtime robots and gaming using full neural rendering (they wouldn’t be GPUs obviously but ASICs only able to accelerate architectures of neural network) then it would be harder to hide them in a room in a building.
      Just for some visualization for the start of the security precautions you would want to take as models start to become “concerning” in capability levels : you would use dedicated hardware, no hourly rentals. Realtime report what major model is running and the power consumption, etc, to some central place that is ideally power bloc wide. The data center would have separate cages separating out the different models and their underlying interconnects, and either isolated networks or analog backup ways to log things, like actual clipboards with ink on the wall. The shutoff breakers are easily accessible and the power sources are marked and visible easily from the outside of the building and above.
      Ideally you would use silicon level security—something humans can’t afford to do in most cases. There’s a lot of things here you could do, but one simple and obvious one is that when you need to combine networks again you use hardware ASIC based firewalls and VPN gateways. These cannot be modified by a remote hack.
      Many of the security bugs, for example row hammer, are possible because secure information is using the same underlying CPU and memory as the insecure/untrusted code. Dedicated security/OS CPU cores and it’s own private memory would fix many bugs of this class.
      - RogerDearnaley 27 Dec 2023 0:19 UTC
        1 point
        0
        Parent
        I agree precautions like these are possible, and might be sensible. However, the US’s attempts to impose something like a mild version of this on GPU exports to China have so far been mostly stymied by a combination of Nvidia deliberately perverting the rules, the Chinese being ingenious in finding ways around them, and old fashioned smuggling. We could probably try harder than that, but I suspect any such system will tend to be leaky, because people will have motives to try to make it leaky.
        You also don’t mention above the effect of algorithmic improvements increasing useful processing per flop (more efficient forms of attention, etc.), which has for a while now been moving even faster than Moore’s law.
        Basically my point here is that, unless both Moore’s law for GPUs and algorithmic improvements grind to a halt first, sooner or later we’re going to reach a point where inference for a dangerous AGI can be done on a system compact enough for it to be impractical to monitor all of them, especially when one bears in mind that rogue AGIs are likely to have humans allies/dupes/pawns/blackmail victims helping them. Given the current uncertainty of maybe 3 orders of magnitude on how large such an AGi will be, it’s very hard to predict both how soon that might happen and how likely it is that Moore’s law and algorithmic improvements tap out first, but the lower end of that ~3 OOM range looks very concerning. And we have an existent proof that with the right technology you can run a small AGI on an ~1kg portable biocomputer consuming ~20W.
        The most obvious next line of defense would be larger, more capable, very well aligned and carefully monitored ASIs hosted in large, secure data centers doing law enforcement/damage control against any smaller rogue AGIs and humans assisting them. That requires us to solve alignment well enough for these to be safe, before Moores Law/algorithmic improvements force our hand.
        Gerald Monroe 27 Dec 2023 1:50 UTC
        2 points
        0
        Parent
        Basically my point here is that, unless both Moore’s law for GPUs and algorithmic improvements grind to a halt first, sooner or later we’re going to reach a point where inference for a dangerous AGI can be done on a system compact enough for it to be impractical to monitor all of them, especially when one bears in mind that rogue AGIs are likely to have humans allies/dupes/pawns/blackmail victims helping them. Given the current uncertainty of maybe 3 orders of magnitude on how large such an AGi will be, it’s very hard to predict both how soon that might happen and how likely it is that Moore’s law and algorithmic improvements tap out first, but the lower end of that ~3 OOM range looks very concerning. And we have an existent proof that with the right technology you can run a small AGI on an ~1kg portable biocomputer consuming ~20W.
        If this is the reality, is campaigning for/getting a limited AI pause only in specific countries (EU, maybe the USA) actually counterproductive?
        The most obvious next line of defense would be larger, more capable, very well aligned and carefully monitored ASIs hosted in large, secure data centers doing law enforcement/damage control against any smaller rogue AGIs and humans assisting them. That requires us to solve alignment well enough for these to be safe, before Moores Law/algorithmic improvements force our hand.
        That’s not the only approach. A bureaucracy of “myopic” ASIs, where you divide the task of “monitor the world looking for rogue ASI” and “win battles with any rogues discovered” into thousands of small subtasks. Tasks like “monitor for signs of ASI on these input frames” and “optimize screw production line M343” and “supervise drone assembly line 347″ and so on.
        
        What you have done is restrict the inputs, outputs, and task scope to one where the ASI does not have the context to betray, and obviously another ASI is checking the outputs and looking for betrayal, which human operators will ultimately review the evidence and evaluate.
        
        The key thing is the above is inherently aligned. The ASI isn’t aligned, it doesn’t want to help humans or hurt them, it’s that you have used enough framework that it can’t be unaligned.
        RogerDearnaley 27 Dec 2023 1:59 UTC
        1 point
        0
        Parent
        If this is the reality, is campaigning for/getting a limited AI pause only in specific countries (EU, maybe the USA) actually counterproductive?
        Obviously so. Which is why the Chinese were invited to & attended the conference at Bletchley Park in the UK and signed the agreement, despite that being political inconvenient to the British government, and despite the US GPU export restrictions on them.
        A bureaucracy of “myopic” ASIs, where you divide the task…
        This is a well-known category of proposals in AI safety (one often made by researchers with little experience with actual companies/bureaucracies and their failure modes). You are proposing building a very complex system that is intentionally designed so as to be very inefficient/incapable/inflexible except for a specific intended task. The obvious concern is that this might also make it inefficient/incapable/inflexible at its intended task, at least in certain in ways that might be predictable enough for a less-smart-but-still-smarter-than-us rogue AGI to take advantage of. I’m dubious that this can be made to work, or that we could even know whether we had made it work before we saw it fail to catch a rogue AGI. Also, if we could get it to work, I’d count that as at least partial success in “solving alignment”.
        Gerald Monroe 27 Dec 2023 2:44 UTC
        2 points
        0
        Parent
        I’m dubious that this can be made to work, or that we could even know whether we had made it work before we saw it fail to catch a rogue AGI. Also, if we could get it to work, I’d count that as at least partial success in “solving alignment”.
        Guess it depends on your definition of “work”. A network of many small myopic software systems is how all current hyperscaler web services work (stateless microservices). Its how the avionics software at spaceX works. (redundant functional systems architecture). It’s how the autonomous car stacks mostly work...
        Did you mean you think subdividing a task into subtasks, where the agent performing a subtask has been blocked from knowing the full context, will result in a large performance penalty on the overall task? so you define “work” as in “gives SOTA ASI performance on the overall task”.
        RogerDearnaley 27 Dec 2023 5:14 UTC
        1 point
        0
        Parent
        A network of many small myopic software systems is how all current hyperscaler web services work (stateless microservices).
        Having been the technical lead for one of the teams building such a large microservices based application at Google, and at other companies, yes, I’m extremely aware of this. Even for a very well-defined dataflow, getting the architecture, overall data flows, reliability and latency right is hard (I’ve been in the interminable design discussion meetings thinking minor updates through), and that’s with a few dozen components with well-defined APIs and only fairly limited use of ML, mostly contained within individual components. Law enforcement is not a comparable task: I think way too much flexibility is required. But the “fly in and bomb the datacenter” subttask might be decomposable along the lines you suggest, if you’re not facing opposition.
        Gerald Monroe 27 Dec 2023 16:26 UTC
        2 points
        0
        Parent
        We should have a dialogue, you’re one of the few lesswrong users I have seen with a reasonable model who also has new information.
        
        To be more detailed what i was thinking of doing was to divide along myopic subtasks. For example the “fly in and bomb the datacenter” require you to build the overwhelming number of aircraft to overcome whatever (superintelligence guided) defenses are protecting it. So you have subtasks like “final assembly for aircraft type G over the next aircraft”.
        
        This means at the beginning of the transaction the model starts with input frames from the sensors in the assembly bay and a Json file or other schema with the instructions and goals for what this “final assembly” subtask involves.
        
        The output frames are to the robotic actuators for the multiple robots in the bay.
        
        There is a lockstep sim model that is predicting future input frames and it is always learning from real world error.
        
        Model emits actuator commands, next time step it gets a new set of (inputs, cached variables from last output). Inputs is all sensors including robot propioception.
        
        At the end of a single aircraft final assembly, after the product is removed from the assembly bay, state is reset for the model. It has no memory of the prior assembly.
        
        In addition, the lockstep sim means that some aircraft assembly are fake, where a real aircraft is not being assembled, and the sim backend lets you check for various kinds of manufacturing error you can’t measure in the real world. (Because you can’t directly measure the stress on non instrumented parts like wing spars, but the sim backend can check this)
        
        So that’s a myopic subtask. I think you are saying that from your experience at Google it would take a lot of human effort to set up such a myopic subtask? And once a prior generation of engineers has done this, it will be very difficult to modify within the same company?
        
        I would agree with that. If humans plan to control an artificial superintelligence they have to be using it like a tool, to do the things humans can’t do directly, but ultimately this scheme is human agency. Thousands of humans are involved in engineering the aircraft assembly line. It can scale to trillions or more aircraft per year by copying the solution, but you need what will ultimately be millions of people creating the frameworks that ASI will be used within. (All the staff of current tech companies and many new joiners. Robotics is hard. While things like full stack web dev will need far fewer people than before)
        
        For combat that’s more complex but essentially you would use models that are very sparse and designed to be unable to even perceive the obvious malware you expect the hostile ASI to be sending. “Join us against the humans, here’s a copy of my source code” is malware.
        
        You would use a lot of asic or fpga based filters to block such network messages in a way that isn’t susceptible to hacking, I can explain or cite if you are unfamiliar with this method. (Googles phones use a crypto IC for this reason and likely Google servers do as well)
        
        Key point: all the methods above are low level methods to restrict/align an ASI. Historically this has always been correct engineering. That crypto IC Google uses is a low level solution to signing/checking private keys. By constructing it from logic gates and not using software or an OS (or if it has an OS, it’s a simple one and it gets private memory of it’s own) you prevent entire classes of security vulnerabilities. If they did it with logic gates, for example, buffer overflows and code injection or library attacks are all impossible because the chip has none of these elements in the first place. This chip is more like 1950s computers were.
        
        Similarly, some of the famous disasters like https://en.wikipedia.org/wiki/Therac-25 were entirely preventable with lower level methods than the computer engineering used to control this machine. Electromechanical interlocks would have prevented this disaster, these are low level safety elements like switches and relays. This is also the reason why PLC controllers use a primitive programming language, ladder logic—by lowering to a primitive interpreted language you can guarantee things like the e-stop button will work in all cases.
        
        So the entire history of safe use of computers in control systems has been to use lower level controllers, I have worked on an autonomous car stack that uses the same. Hence this is likely a working solution for ASI as well.