Seth Herd comments on AI Alignment Metastrategy

Seth Herd 2 Jan 2024 1:14 UTC
5 points
−1
Great post in identifying metastrategies. I think we need more metastrategic thinking.
I think you’re conflating two separable factors in the conservative strategy: taking longer, and leaning on math as the primary route to good alignment work. Math being better/safer seems to be a very common assumption, and it seems baseless to me (and therefore creates a communication gap with a world that doesn’t share your love of pure math).
I doubt that you can math all the way from atoms to human values in an implementable form. It’s not at all apparent that a mathematically precise notion of agency would be helpful in solving the technical problems of getting an actual implemented AGI to do things we like. If you’re thinking of applying that rigor to develop algorithmic/Bayesian AGI, I think it’s true that Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc., and that the same insight applies to making them provably safe. I see the appeal of the intuition, and I think it’s just an intuition.
I think there’s a good bit of mathophilia or maybe physics envy going on in alignment. But different fields lean on different techniques and types of analysis. Machine learning, AI design, computer science, and psychology seem to me like more useful fields for aligning AGI, however long we take to do it. All of those fields use math but don’t heavily rely on rigorously provable formulations of their problems.
Separately, it’s not obvious to me that a “competent civilization” would obviously pursue the conservative metastrategy. You’re assuming a “competent civilization” would be long-termist and roughly utilitarian in its values. Humans tend to not be. If that’s what you mean by “competent civilization”, then fine, they would be more cautious.
There’s lots more to say on both of those points, but it will be a while before I get around to writing a full post on either, so here are those thoughts. Thanks again for a useful conceptualization of metastrategy.
- rotatingpaguro 13 Jan 2024 5:14 UTC
  1 point
  0
  Parent
  All of those fields use math but don’t heavily rely on rigorously provable formulations of their problems.
  Chicken and egg: is this evidence they are not mature enough to make friendly AI, or evidence that friendly AI can be made with that current level of rigor?
  - Seth Herd 13 Jan 2024 23:17 UTC
    3 points
    0
    Parent
    I agree; practices in other fields aren’t evidence for the right approach to AGI.
    
    My point is that there’s no evidence that math IS the right approach, just loose intuitions and preferences.
    
    And the arguments for it are increasingly outdated. Yudkowsky originated those arguments, and he now thinks that stopping current AGI research, starting over and doing better math is the best approach, but still >99% likely to fail.
    
    Arguments against less-rigorous, more-ML-and-cogsci-like approaches are loose and weak. Therefore, those approaches are pretty likely to offer better odds of success than Yudkowsky’s plan at estimated 99%-plus failure. This is a big claim, but I’m prepared to make it and defend it. That’s the post I’m working on. In short, claims about the fragility of values and capabilities generalizing better than alignment are based on intuitions, and the opposite conclusions are just as easy to argue for. This doesn’t say which is right, it says that we don’t know yet how hard alignment is for deep network based AGI.