As a concrete tool for discussing AI risk scenarios, I wonder if it doesn’t have too many parameters? Like you have to specify the map (at least locally), how far we can see, what will the impact be of this and that research…
I agree that the model does have quite a few parameters. I think you can get some value out of it already by being aware of what the different parameters are and, in case of a disagreement, identifying the one you disagree about the most.
>If we currently have some AI system x, we can ask which system are reachable from x—i.e., which other systems are we capable of constructing now that we have x.
What are the constraints on “construction”? Because by definition, if those others systems are algorithms/programs, we can build them. It might be easier or faster to build them using x, but it doesn’t make construction possible where it was impossible before. I guess what you’re aiming at is something like “x reaches y” as a form of “x is stronger than y”, and also “if y is stronger, it should be considered when thinking about x”?
I agree that if we can build x, and then build y (with the help of x), then we can in particular build y. So having/not having x doesn’t make a difference on what is possible. I implicitly assumed some kind of constraint on the construction like “what can we build in half a year”, “what can we build for $1M” or, more abstractly, “how far can we get using X units of effort”.
[About the colour map:] This part confused me. Are you just saying that to have a color map (a map from the space of AIs to R), you need to integrate all systems into one, or consider all systems but one as irrelevant?
The goal was more to say that in general, it seems hard to reason about the effects of individual AI systems. But if we make some of the extra assumptions, it will make more sense to treat harmfulness as a function from AIs to $$\mathbb R$$.
I agree that the model does have quite a few parameters. I think you can get some value out of it already by being aware of what the different parameters are and, in case of a disagreement, identifying the one you disagree about the most.
I agree that if we can build x, and then build y (with the help of x), then we can in particular build y. So having/not having x doesn’t make a difference on what is possible. I implicitly assumed some kind of constraint on the construction like “what can we build in half a year”, “what can we build for $1M” or, more abstractly, “how far can we get using X units of effort”.
The goal was more to say that in general, it seems hard to reason about the effects of individual AI systems. But if we make some of the extra assumptions, it will make more sense to treat harmfulness as a function from AIs to $$\mathbb R$$.