MIRI has a lot invested in the idea that AI safety is a hard problem which must have a difficult solution. So there’s a sense in which the salaries of their employees depend on them not understanding how a simple solution to FAI might work.
That doesn’t sound correct. My understanding is that they’re looking for simple solutions, in the sense that quantum mechanics and general relativity are simple. What they’ve invested a lot in is the idea that it’s hard to even ask the right questions about how AI alignment might work. They’re biased against easy solutions, but they might also be biased in favor of simple solutions.
We value quantum mechanics and relativity because there are specific phenomena which they explain well. If I’m a Newtonian physics advocate, you can point me to solid data my theory doesn’t predict in order to motivate the development of a more sophisticated theory. We were able to advance beyond Newtonian physics because we were able to collect data which disconfirmed the theory. Similarly, if someone suggests a simple approach to FAI, you should offer a precise failure mode, in the sense of a toy agent in a toy environment which clearly exhibits undesired behavior (writing actual code to make the conversation precise and resolve disagreements as necessary), before dismissing it. This is how science advances. If you add complexity to your theory without knowing what the deficiencies of the simple theory are, you probably won’t add the right sort of complexity because your complexity isn’t well motivated.
Math and algorithms are both made up of thought stuff. So these physics metaphors can only go so far. If I start writing a program to solve some problem, and I choose a really bad set of abstractions, I may get halfway through writing my program and think to myself “geez this is a really hard problem”. The problem may be very difficult to think about given the set of abstractions I’ve chosen, but it could be much easier to think about given a different set of abstractions. It’d be bad for me to get invested in the idea that the problem is hard to think about, because that could cause me to get attached to my current set of inferior abstractions. If you want the simplest solution possible, you should exhort yourself to rethink your abstractions if things get complicated. You should always be using your peripheral vision to watch out for alternative sets of abstractions you could be using.
That doesn’t sound correct. My understanding is that they’re looking for simple solutions, in the sense that quantum mechanics and general relativity are simple. What they’ve invested a lot in is the idea that it’s hard to even ask the right questions about how AI alignment might work. They’re biased against easy solutions, but they might also be biased in favor of simple solutions.
We value quantum mechanics and relativity because there are specific phenomena which they explain well. If I’m a Newtonian physics advocate, you can point me to solid data my theory doesn’t predict in order to motivate the development of a more sophisticated theory. We were able to advance beyond Newtonian physics because we were able to collect data which disconfirmed the theory. Similarly, if someone suggests a simple approach to FAI, you should offer a precise failure mode, in the sense of a toy agent in a toy environment which clearly exhibits undesired behavior (writing actual code to make the conversation precise and resolve disagreements as necessary), before dismissing it. This is how science advances. If you add complexity to your theory without knowing what the deficiencies of the simple theory are, you probably won’t add the right sort of complexity because your complexity isn’t well motivated.
Math and algorithms are both made up of thought stuff. So these physics metaphors can only go so far. If I start writing a program to solve some problem, and I choose a really bad set of abstractions, I may get halfway through writing my program and think to myself “geez this is a really hard problem”. The problem may be very difficult to think about given the set of abstractions I’ve chosen, but it could be much easier to think about given a different set of abstractions. It’d be bad for me to get invested in the idea that the problem is hard to think about, because that could cause me to get attached to my current set of inferior abstractions. If you want the simplest solution possible, you should exhort yourself to rethink your abstractions if things get complicated. You should always be using your peripheral vision to watch out for alternative sets of abstractions you could be using.