Daniel Kokotajlo comments on Six Thoughts on AI Safety

Daniel Kokotajlo 25 Jan 2025 18:30 UTC
17 points
2
What we want is reasonable compliance in the sense of:
1. Following the specification precisely when it is clearly defined.
2. Following the spirit of the specification in a way that humans would find reasonable in other cases.
This section on reasonable compliance (as opposed to love humanity etc.) is perhaps the most interesting and important. I’d love to have a longer conversation with you about it sometime if you are up for that.
Two things to say for now. First, as you have pointed out, there’s a spectrum between vague general principles like ‘do what’s right’ and ‘love humanity’ ‘be reasonable’ and ‘do what normal people would want you to do in this situation if they understood it as well as you do’ on the one end, and then thousand-page detailed specs / constitutions / flowcharts on the other end. But I claim that the problems that arise on each end of the spectrum don’t go away if you are in the middle of the spectrum, they just lessen somewhat. Example: On the “thousand page spec” end of the spectrum, the obvious problem is ’what if the spec has unintended consequences / loopholes / etc.?” If you go to the middle of the spectrum and try something like Reasonable Compliance, this problem remains but in weakened form: ‘what if the clearly-defined parts of the spec have unintended consequences / loopholes / etc.?’ Or in other words, ‘what if every reasonable interpretation of the Spec says we must do X, but X is bad?’ This happens in Law all the time, even though the Law does include for flexible vague terms like ‘reasonableness’ in its vocabulary.
Second point. Making an AI be reasonably compliant (or just compliant) instead of Good, means you are putting less trust in the AI’s philosophical reasoning / values / training process / etc. but more trust in the humans who get to write the Spec. Said humans had better be high-integrity and humble, because they will be tempted in a million ways to abuse their power and put things in the Spec that essentially make the AI a reflection of their own ideosyncratic values—or worse, essentially making the AI their own loyal servant instead of making it serve everyone equally. (If we were in a world with less concentration of AI power, this wouldn’t be so bad—in fact arguably the best outcome is ‘everyone gets their own ASI aligned to them specifically.’ But if there is only one leading ASI project, with only a handful of people at the top of the hierarchy owning the thing… then we are basically on track to create a dictatorship or oligarchy.
What links here?
- AI #101: The Shallow End by Zvi (30 Jan 2025 14:50 UTC; 39 points)
- boazbarak 27 Jan 2025 0:38 UTC
  6 points
  0
  Parent
  Agree with many of the points.
  
  Let me start with your second point. First as background, I am assuming (as I wrote here) that to a first approximation, we would have ways to translate compute (let’s put aside if it’s training or inference) into intelligence, and so the amount intelligence that an entity of humans controls is proportional to the amount of compute it has. So I am not thinking of ASIs as individual units but more about total intelligence.
  
  I 100% agree that control of compute would be crucial, and the hope is that, like with current material strength (money and weapons) it would be largely controlled by entities that are at least somewhat responsive to the will of the people.
  
  Re your first point, I agree that there is no easy solution, but I am hoping that AIs would interpret the laws within the spectrum of (say) how the 60% more reasonable judges do it today. That is, I think good judges try to be humble and respect the will of the legislators, but the more crazy or extreme following the law would be, the more they are willing to apply creative interpretations to maintain the morally good (or at least not extremely bad) outcome.
  
  I don’t think any moral system tells us what to do, but yes I am expressly in the positions that humans should be in control even if they are much less intelligent than the AIs. I don’t think we need “philosopher kings”.