hm, as a non-expert onlooker, I found the paraphrase pretty accurate.. for sure it sounds more reasonable in your own words here compared to the oversimplified summary (so thank you for clarification!), but as far as accuracy of summaries go, this one was top tier IMHO (..have you seen the stuff that LLMs produce?!)
I agree that my view is that they can count as continuous (though the exact definition of the word continuous can matter!), but then the statement “I find this perspective baffling— think MuZero and LLMs are wildly different from an alignment perspective” isn’t really related to this from my perspective. Like things can be continuous (from a transition or takeoff speeds perspective) and still differ substantially in some important respects!
I somehow completely agree with both of your perspectives, have you tried to ban the word “continuous” in your discussions yet? (on the other hand, I don’t think it should be a crux, probably just ambiguous meaning like “sound” in the “when a tree falls” thingy … but I would be curious if you would be able to agree on the 2 non-controversial meanings between the 2 of you)
It reminds me of stories about gradualism / saltationism debate in evolutionary biology after gradualism won and before the idea of punctuated equilibrium… Parents and children are pretty discreet units, but gene pools over millions of years are pretty continuous from the perspective of an observer long long time later who is good at spotting low-frequency patterns ¯\_(ツ)_/¯
For a researcher, even GPT 3.5 to 4 might have been a big jump in terms of compute budget approval process (and/or losing a job from disbanding a department). And the same event on a benchmark might look smooth—throughout multiple big architecture changes a la the charts that illustrate Moore’s law—the sweat and blood of thousands of engineers seems kinda continuous if you squint enough.
And what even is “continuous”—general relativity is a continuous theory, but my phone calculates my GPS coordinates with numerical methods, time dilation from gravity field/the geoid shape is just approximated and nanosecond(-ish) precision is good enough to pin me down as much as I want (TBH probably more precision that I would choose myself as a compromise with my battery life). Real numbers are continuous, but they are not computable (I mean in practice in our own universe, I don’t care about philosophical possibilities), so we approximate them with a finite set of kinda shitty rational-ish numbers for which even 0.1 + 0.2 == 0.3 is false (in many languages, including JS in a browser console and in Python)..
Some stuff will work “the same” in the new paradigm, some will be “different”—does it matter whether we call it (dis)continuous, or do we know already what to predict in more detail?
I somehow completely agree with both of your perspectives, have you tried to ban the word “continuous” in your discussions yet?
I agree taboo-ing is a good approach in this sort of case. Talking about “continuous” wasn’t a big part of my discussion with Steve, but I agree if it was.
hm, as a non-expert onlooker, I found the paraphrase pretty accurate.. for sure it sounds more reasonable in your own words here compared to the oversimplified summary (so thank you for clarification!), but as far as accuracy of summaries go, this one was top tier IMHO (..have you seen the stuff that LLMs produce?!)
I agree that my view is that they can count as continuous (though the exact definition of the word continuous can matter!), but then the statement “I find this perspective baffling— think MuZero and LLMs are wildly different from an alignment perspective” isn’t really related to this from my perspective. Like things can be continuous (from a transition or takeoff speeds perspective) and still differ substantially in some important respects!
I somehow completely agree with both of your perspectives, have you tried to ban the word “continuous” in your discussions yet? (on the other hand, I don’t think it should be a crux, probably just ambiguous meaning like “sound” in the “when a tree falls” thingy … but I would be curious if you would be able to agree on the 2 non-controversial meanings between the 2 of you)
It reminds me of stories about gradualism / saltationism debate in evolutionary biology after gradualism won and before the idea of punctuated equilibrium… Parents and children are pretty discreet units, but gene pools over millions of years are pretty continuous from the perspective of an observer long long time later who is good at spotting low-frequency patterns ¯\_(ツ)_/¯
For a researcher, even GPT 3.5 to 4 might have been a big jump in terms of compute budget approval process (and/or losing a job from disbanding a department). And the same event on a benchmark might look smooth—throughout multiple big architecture changes a la the charts that illustrate Moore’s law—the sweat and blood of thousands of engineers seems kinda continuous if you squint enough.
And what even is “continuous”—general relativity is a continuous theory, but my phone calculates my GPS coordinates with numerical methods, time dilation from gravity field/the geoid shape is just approximated and nanosecond(-ish) precision is good enough to pin me down as much as I want (TBH probably more precision that I would choose myself as a compromise with my battery life). Real numbers are continuous, but they are not computable (I mean in practice in our own universe, I don’t care about philosophical possibilities), so we approximate them with a finite set of kinda shitty rational-ish numbers for which even
0.1 + 0.2 == 0.3is false (in many languages, including JS in a browser console and in Python)..Some stuff will work “the same” in the new paradigm, some will be “different”—does it matter whether we call it (dis)continuous, or do we know already what to predict in more detail?
I agree taboo-ing is a good approach in this sort of case. Talking about “continuous” wasn’t a big part of my discussion with Steve, but I agree if it was.