Elliot Callender comments on So You Want To Make Marginal Progress...

Elliot Callender 10 Feb 2025 16:52 UTC
2 points
0
I strongly think cancer research has a huge space and can’t think of anything more difficult within biology.
I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and “other”, where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a “bottleneck”, as you mentioned:
aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn’t seem useful.
Later:
Define the baseline distribution generalizability is defined on.
For a given problem, generalizability is how likely a given sub-solution is to be part of the final solution, assuming you solve the whole problem. You might choose to model expected utility, if that differs between full solutions; I chose not to here because I natively separate generality from power.
Give a little intuition about why a threshold is meaningful, rather than a linear “more general is better”.
I agree that “more general is better” with a linear or slightly superlinear (because you can make plans which rely heavier on solution) association with success probability. We were already making different value statements about “weakly” vs “strongly” general, where putting concrete probabilities / ranges might reveal us to agree w.r.t the baseline distribution of generalizability and disagree only on semantics.
I.e. thresholds are only useful for communication.
Perhaps a better way to frame this is in ratios of tractability (how hard to identify and solve) and usefulness (conditional on the solution working) between solutions with different levels generalizability. E.g. suppose some solution $w$ is 5x less general than $g$ . Then you expect, for the types of problems and solutions humans encounter, that $w$ will be more than 5x as tractable * useful as $g$ .
I disagree in expectation, meaning for now I target most of my search at general solutions.
My model of the central AIS problems:
1. How to make some AI do what we want? (under immense functionally adversarial pressures)
  1. Why does the AI do things? (Abstractions / context-dependent heuristics; how do agents split reality given assumptions about training / architecture)
  2. How do we change those things-which-cause-AI-behavior?
2. How do we use behavior specification to maximize our lightcone?
  1. How to actually get technical alignment into a capable AI? (AI labs / governments)
  2. What do we want the AI to do? (“Long reflection” / CEV / other)
I’d be extremely interested to hear anyone’s take on my model of the central problems.
- Ariel 10 Feb 2025 18:54 UTC
  9 points
  0
  Parent
  Thank you, that was very informative.
  I don’t find the “probability of inclusion in final solution” model very useful, compared to “probability of use in future work” (similarly for their expected value versions) because
  1. I doubt that central problems are a good model for science or problem solving in general (or even in the navigation analogy).
  2. I see value in impermanent improvements (e.g. current status of HIV/AIDS in rich countries) and in future-discounting our value estimations.
  3. Even if a good description of a field as a central problem and satalite problems exists, we are unlikely to correctly estimate it apriori, or estimate the relevance of a solution to it. In comparison, predicting how useful a solution is to “nearby” work is easier (with the caveat that islands or cliques of only internaly-useful problems and solutions can arise, and do in practice).
  Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
  - Elliot Callender 13 Feb 2025 0:41 UTC
    1 point
    0
    Parent
    How much would you say (3) supports (1) on your model? I’m still pretty new to AIS and am updating from your model.
    I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I’m doubtful about stability under reflection. I’ll put 35% we find a semi-robust solution sufficient to not kill everyone.
    Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
    I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.
    Earlier:
    Designing only best-worst-case subproblem solutions while waiting for Alice would be like restricting strategies in game to ones agnostic to the opponent’s moves
    Is this saying people should coordinate in case valuable solutions aren’t in the apriori generalizable space?