Is it more like a square root than an exponential? Don’t things get harder somewhat faster than the advantage you gain? Does it look like FOOM from where you’re standing?
Anthropic’s equivalent of the Epoch Capabilities Index has been growing linearly for Anthropic’s non-Mythos frontier since Opus 3. Unfortunately, no one measured the AECI for Sonnets 3,4 and 4.6 or Haikus 3, 3.5 and 4.5, and no other Sonnets or Haikus have been released. EpochAI’s capability indices have Sonnet 4.6 lag behind Opus 4.6 by 2-3 points despite Sonnet 4.5 being on-trend and Sonnet 4 being close to Opus 4. I suspect that the scaling law of models’ ECI is that they follow the trend of scaling the capabilities with lived experience or logarithm thereof until they are saturated or slowed down.
As for a potential FOOM, any significant differences from current scaling laws will likely be driven by breakthroughs like neuralese (or a mildly safer equivalent?) or continual learning. Then I suspect that the race is bottlenecked purely on labs’ recklessness. If mankind discovers that capabilities and control-related problems of neuralese/CL scale predictably with the amount of neurons subjected to the treatment, then an irresponsible lab could try and follow the neo-scaling laws until the model becomes completely brainlike (and discovers a post-brainlike Agent-5?), while a responsible lab would try to avoid following the laws until it can reliably align the models of such capabilities.
Here’s a fun one. If there’s a distribution of how morally correct people are. Imagine that makes sense for a second, picking whatever definition you like. Where do you think you are on it? Maybe in the right direction. People tend to think they’re doing the right thing. How do you know if you’re wrong, how do you know what the person who’s furthest along in the “right direction” sees?
Is this just a funny cover on top of “what if there were such a thing as moral correctness”? The distribution doesn’t matter (or exist) if the underlying measure is incoherent.
I think I’m trying to say something about the place where the tails don’t yet diverge too far yet, as if there is some sort of rough consensus morality zone where people might disagree on details but agree sufficiently with each other to disagreeing with me, about myself. But maybe I’m confused and that’s still incoherent (sorry)!
I wonder if I might have a wrong impression of myself. Maybe people who I see doing ethical things that I don’t do, who I would put above me in my personal version of an ethics distribution, would form a rough consensus where I’m in a worse percentile on average than I’d have guessed
I’m holding to this idea that it’s meaningful to consider in what percentile other people would put you, according to their own metric, and then do statistics over that, but I am neither a statistician nor a clever person so I’d be happy to be corrected :)
Hmm. I wonder if you could something like “moral admiration”—how people evaluate each other, without recourse to any concept of correctness. This would not be consistent nor transitive (a rating b highly, and b rating c highly doesn’t imply that a rates c highly), but it might be possible to find patterns or local clusters of agreement.
Is it more like a square root than an exponential? Don’t things get harder somewhat faster than the advantage you gain? Does it look like FOOM from where you’re standing?
Anthropic’s equivalent of the Epoch Capabilities Index has been growing linearly for Anthropic’s non-Mythos frontier since Opus 3. Unfortunately, no one measured the AECI for Sonnets 3,4 and 4.6 or Haikus 3, 3.5 and 4.5, and no other Sonnets or Haikus have been released. EpochAI’s capability indices have Sonnet 4.6 lag behind Opus 4.6 by 2-3 points despite Sonnet 4.5 being on-trend and Sonnet 4 being close to Opus 4. I suspect that the scaling law of models’ ECI is that they follow the trend of scaling the capabilities with lived experience or logarithm thereof until they are saturated or slowed down.
As for a potential FOOM, any significant differences from current scaling laws will likely be driven by breakthroughs like neuralese (or a mildly safer equivalent?) or continual learning. Then I suspect that the race is bottlenecked purely on labs’ recklessness. If mankind discovers that capabilities and control-related problems of neuralese/CL scale predictably with the amount of neurons subjected to the treatment, then an irresponsible lab could try and follow the neo-scaling laws until the model becomes completely brainlike (and discovers a post-brainlike Agent-5?), while a responsible lab would try to avoid following the laws until it can reliably align the models of such capabilities.
Here’s a fun one. If there’s a distribution of how morally correct people are.
Imagine that makes sense for a second, picking whatever definition you like.
Where do you think you are on it?
Maybe in the right direction. People tend to think they’re doing the right thing.
How do you know if you’re wrong, how do you know what the person who’s furthest along in the “right direction” sees?
Is this just a funny cover on top of “what if there were such a thing as moral correctness”? The distribution doesn’t matter (or exist) if the underlying measure is incoherent.
I think I’m trying to say something about the place where the tails don’t yet diverge too far yet, as if there is some sort of rough consensus morality zone where people might disagree on details but agree sufficiently with each other to disagreeing with me, about myself. But maybe I’m confused and that’s still incoherent (sorry)!
I wonder if I might have a wrong impression of myself. Maybe people who I see doing ethical things that I don’t do, who I would put above me in my personal version of an ethics distribution, would form a rough consensus where I’m in a worse percentile on average than I’d have guessed
I’m holding to this idea that it’s meaningful to consider in what percentile other people would put you, according to their own metric, and then do statistics over that, but I am neither a statistician nor a clever person so I’d be happy to be corrected :)
Hmm. I wonder if you could something like “moral admiration”—how people evaluate each other, without recourse to any concept of correctness. This would not be consistent nor transitive (a rating b highly, and b rating c highly doesn’t imply that a rates c highly), but it might be possible to find patterns or local clusters of agreement.