I think general solutions are especially important for fields with big solution spaces / few researchers, like alignment. If you were optimizing for, say, curing cancer, it might be different (I think both the paradigm-and subproblem-spaces are smaller there).
From my reading of John Wentworth’s Framing Practicum sequence, implicit in his (and my) model is that solution spaces for these sorts of problems are apriori enormous. We (you and I) might also disagree on what apriori feasibility would be “weakly” vs “strongly” generalizable; I think my transition is around 15-30%.
I see what you mean with regards to the number of researchers. I do wonder a lot about the amount of waste from multiple researchers unknowingly coming up with the same research (a different problem to what you pointed out) and the uncoordinated solution to that is to work on niche problems and ideas (which coincidentally seem less likely to individually generalize).
Could you share your intuition for why the solution space in AI alignment research is large, or larger than in cancer? I don’t have an intuition about the solution space in alignment v.s. a “typical” field, but I strongly think cancer research has a huge space and can’t think of anything more difficult within biology. Intuition: you can think of fighting cancer as a two player game, with each individual organism being an instance and the within-organism evolutionary processes leading to cancer being the other player. In most problems in biology you can think about the genome of the organisms as defining rule set for the game, but here the genome isn’t held constant.
w.r.t. to the threshold to strong generalizability, I don’t have a working model of this to disagree with. Putting confidence/probability values on relatively abstract things is something I’m not used to (and I understand is encouraged here), so I’d appreciate if you could share a little more insight about how to
Define the baseline distribution generalizability is defined on.
Give a little intuition about why a threshold is meaningful, rather than a linear “more general is better”.
I’m sorry if that’s too much to ask for or ignorant of something you’ve laid out before. I have no intuition about #2, but for #1 I suspect that you have a model of AI alignment or an abstract field as having a single/few central problems at a time, whereas my intuition is that they are mostly composed of many problems, with “centrality” only being relevant as a storytelling device or a good way to make sense of past progress in retrospect (so that generalizability is more the chance that one work will be useful for another, rather than be useful following a solution to the central problem) . More concretely, aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn’t seem useful.
I strongly think cancer research has a huge space and can’t think of anything more difficult within biology.
I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and “other”, where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a “bottleneck”, as you mentioned:
aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn’t seem useful.
Later:
Define the baseline distribution generalizability is defined on.
For a given problem, generalizability is how likely a given sub-solution is to be part of the final solution, assuming you solve the whole problem. You might choose to model expected utility, if that differs between full solutions; I chose not to here because I natively separate generality from power.
Give a little intuition about why a threshold is meaningful, rather than a linear “more general is better”.
I agree that “more general is better” with a linear or slightly superlinear (because you can make plans which rely heavier on solution) association with success probability. We were already making different value statements about “weakly” vs “strongly” general, where putting concrete probabilities / ranges might reveal us to agree w.r.t the baseline distribution of generalizability and disagree only on semantics.
I.e. thresholds are only useful for communication.
Perhaps a better way to frame this is in ratios of tractability (how hard to identify and solve) and usefulness (conditional on the solution working) between solutions with different levels generalizability. E.g. suppose some solution w is 5x less general than g. Then you expect, for the types of problems and solutions humans encounter, that w will be more than 5x as tractable * useful as g.
I disagree in expectation, meaning for now I target most of my search at general solutions.
My model of the central AIS problems:
How to make some AI do what we want? (under immense functionally adversarial pressures)
Why does the AI do things? (Abstractions / context-dependent heuristics; how do agents split reality given assumptions about training / architecture)
How do we change those things-which-cause-AI-behavior?
How do we use behavior specification to maximize our lightcone?
How to actually get technical alignment into a capable AI? (AI labs / governments)
What do we want the AI to do? (“Long reflection” / CEV / other)
I’d be extremely interested to hear anyone’s take on my model of the central problems.
I don’t find the “probability of inclusion in final solution” model very useful, compared to “probability of use in future work” (similarly for their expected value versions) because
I doubt that central problems are a good model for science or problem solving in general (or even in the navigation analogy).
I see value in impermanent improvements (e.g. current status of HIV/AIDS in rich countries) and in future-discounting our value estimations.
Even if a good description of a field as a central problem and satalite problems exists, we are unlikely to correctly estimate it apriori, or estimate the relevance of a solution to it. In comparison, predicting how useful a solution is to “nearby” work is easier (with the caveat that islands or cliques of only internaly-useful problems and solutions can arise, and do in practice).
Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
How much would you say (3) supports (1) on your model? I’m still pretty new to AIS and am updating from your model.
I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I’m doubtful about stability under reflection. I’ll put 35% we find a semi-robust solution sufficient to not kill everyone.
Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.
Earlier:
Designing only best-worst-case subproblem solutions while waiting for Alice would be like restricting strategies in game to ones agnostic to the opponent’s moves
Is this saying people should coordinate in case valuable solutions aren’t in the apriori generalizable space?
I think general solutions are especially important for fields with big solution spaces / few researchers, like alignment.
If you were optimizing for, say, curing cancer, it might be different (I think both the paradigm-and subproblem-spaces are smaller there).From my reading of John Wentworth’s Framing Practicum sequence, implicit in his (and my) model is that solution spaces for these sorts of problems are apriori enormous. We (you and I) might also disagree on what apriori feasibility would be “weakly” vs “strongly” generalizable; I think my transition is around 15-30%.
I see what you mean with regards to the number of researchers. I do wonder a lot about the amount of waste from multiple researchers unknowingly coming up with the same research (a different problem to what you pointed out) and the uncoordinated solution to that is to work on niche problems and ideas (which coincidentally seem less likely to individually generalize).
Could you share your intuition for why the solution space in AI alignment research is large, or larger than in cancer? I don’t have an intuition about the solution space in alignment v.s. a “typical” field, but I strongly think cancer research has a huge space and can’t think of anything more difficult within biology. Intuition: you can think of fighting cancer as a two player game, with each individual organism being an instance and the within-organism evolutionary processes leading to cancer being the other player. In most problems in biology you can think about the genome of the organisms as defining rule set for the game, but here the genome isn’t held constant.
w.r.t. to the threshold to strong generalizability, I don’t have a working model of this to disagree with. Putting confidence/probability values on relatively abstract things is something I’m not used to (and I understand is encouraged here), so I’d appreciate if you could share a little more insight about how to
Define the baseline distribution generalizability is defined on.
Give a little intuition about why a threshold is meaningful, rather than a linear “more general is better”.
I’m sorry if that’s too much to ask for or ignorant of something you’ve laid out before. I have no intuition about #2, but for #1 I suspect that you have a model of AI alignment or an abstract field as having a single/few central problems at a time, whereas my intuition is that they are mostly composed of many problems, with “centrality” only being relevant as a storytelling device or a good way to make sense of past progress in retrospect (so that generalizability is more the chance that one work will be useful for another, rather than be useful following a solution to the central problem) . More concretely, aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn’t seem useful.
I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and “other”, where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a “bottleneck”, as you mentioned:
Later:
For a given problem, generalizability is how likely a given sub-solution is to be part of the final solution, assuming you solve the whole problem. You might choose to model expected utility, if that differs between full solutions; I chose not to here because I natively separate generality from power.
I agree that “more general is better” with a linear or slightly superlinear (because you can make plans which rely heavier on solution) association with success probability. We were already making different value statements about “weakly” vs “strongly” general, where putting concrete probabilities / ranges might reveal us to agree w.r.t the baseline distribution of generalizability and disagree only on semantics.
I.e. thresholds are only useful for communication.
Perhaps a better way to frame this is in ratios of tractability (how hard to identify and solve) and usefulness (conditional on the solution working) between solutions with different levels generalizability. E.g. suppose some solution w is 5x less general than g. Then you expect, for the types of problems and solutions humans encounter, that w will be more than 5x as tractable * useful as g.
I disagree in expectation, meaning for now I target most of my search at general solutions.
My model of the central AIS problems:
How to make some AI do what we want? (under immense functionally adversarial pressures)
Why does the AI do things? (Abstractions / context-dependent heuristics; how do agents split reality given assumptions about training / architecture)
How do we change those things-which-cause-AI-behavior?
How do we use behavior specification to maximize our lightcone?
How to actually get technical alignment into a capable AI? (AI labs / governments)
What do we want the AI to do? (“Long reflection” / CEV / other)
I’d be extremely interested to hear anyone’s take on my model of the central problems.
Thank you, that was very informative.
I don’t find the “probability of inclusion in final solution” model very useful, compared to “probability of use in future work” (similarly for their expected value versions) because
I doubt that central problems are a good model for science or problem solving in general (or even in the navigation analogy).
I see value in impermanent improvements (e.g. current status of HIV/AIDS in rich countries) and in future-discounting our value estimations.
Even if a good description of a field as a central problem and satalite problems exists, we are unlikely to correctly estimate it apriori, or estimate the relevance of a solution to it. In comparison, predicting how useful a solution is to “nearby” work is easier (with the caveat that islands or cliques of only internaly-useful problems and solutions can arise, and do in practice).
Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
How much would you say (3) supports (1) on your model? I’m still pretty new to AIS and am updating from your model.
I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I’m doubtful about stability under reflection. I’ll put 35% we find a semi-robust solution sufficient to not kill everyone.
I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.
Earlier:
Is this saying people should coordinate in case valuable solutions aren’t in the apriori generalizable space?