The main claim I’m making isn’t that there’s a greater compute cost that scales with the size of the world model, indeed I find it plausible that the costs are essentially flat. I’m claiming that the amount of labor and man-hours necessary to build powerful AIs that are interpretable vastly outweighs the expense in compute (relative to how much we have of each good) of making uninterpretable models, thus there’s much higher ROI in trying to scale AI than in trying to make AI better that doesn’t rely on uninterpretable end-to-end learning based on symbolic world models until you can scale labor as fast as compute scales, or faster, which only happens after AGI.
That said, this claim here does deserve a separate response:
Everything except the final “make sense of the already-interpreted world-model” step is supposed to be automated, by general-purpose methods whose efficiency does purely scale with compute/data.
If this is the plan, than my main criticism is that I’m deeply skeptical we can get enough labor for the other steps to be automated without at least fully automating AI research, meaning we can apply much greater AI labor to the problem, and while this sort of plan is good from a “How do we automate alignment perspective?”, it’s much worse as a plan for human alignment researchers.
The point at which the plan would be practical is basically the point where we have more or less achieved the holy grail of AI that can automate almost all jobs, conventionally called AGI, meaning it’s useful for AI alignment automation, but it isn’t a useful agenda for you to work on.
I still think your other research is nice, I’m just claiming that without AI research being fully automated, it’s not very useful to try to make AIs much more interpretable than they already are, because the marginal benefit of improved uninterpretable capabilities is far vaster than the marginal benefit of making interpretable AIs (ignoring existential risk here for the discussion).
The main claim I’m making isn’t that there’s a greater compute cost that scales with the size of the world model, indeed I find it plausible that the costs are essentially flat. I’m claiming that the amount of labor and man-hours necessary to build powerful AIs that are interpretable vastly outweighs the expense in compute (relative to how much we have of each good) of making uninterpretable models, thus there’s much higher ROI in trying to scale AI than in trying to make AI better that doesn’t rely on uninterpretable end-to-end learning based on symbolic world models until you can scale labor as fast as compute scales, or faster, which only happens after AGI.
That said, this claim here does deserve a separate response:
If this is the plan, than my main criticism is that I’m deeply skeptical we can get enough labor for the other steps to be automated without at least fully automating AI research, meaning we can apply much greater AI labor to the problem, and while this sort of plan is good from a “How do we automate alignment perspective?”, it’s much worse as a plan for human alignment researchers.
The point at which the plan would be practical is basically the point where we have more or less achieved the holy grail of AI that can automate almost all jobs, conventionally called AGI, meaning it’s useful for AI alignment automation, but it isn’t a useful agenda for you to work on.
I still think your other research is nice, I’m just claiming that without AI research being fully automated, it’s not very useful to try to make AIs much more interpretable than they already are, because the marginal benefit of improved uninterpretable capabilities is far vaster than the marginal benefit of making interpretable AIs (ignoring existential risk here for the discussion).