Probably one of the core infohazards of postmodernism is that “moral rightness” doesn’t really exist outside of some framework. Asking about “rightness” of change is kind of a null pointer in the same way self-modifying your own reward centers can’t be straightforwardly phrased in terms of how your reward centers “should” feel about such rewiring.
calef
For literally “just painting the road”, cost of materials of paint would be $50, yes. Doing it “right” in a way that’s indistinguishable from if the state of a California did it would almost certainly require experimenting with multiple paints, time spent measuring the intersection/planning out a new paint pattern that matches a similar intersection template, and probably even signage changes (removing the wrong signs (which is likely some kind of misdemeanor if not a felony)), and replacing the signage with the correct form. Even in opportunity costs loss, this is looking like tens of hours of work, and hundreds-to-thousands in costs of materials / required tools.
You could probably implement this change for less than $5,000 and with minimal disruption to the intersection if you (for example) repainted the lines over night / put authoritative cones around the drying paint.
Who will be the hero we need?
Google doesn’t seem interested in serving large models until it has a rock solid solution to the “if you ask the model to say something horrible, it will oblige” problem.
The relevant sub-field of RL interested in this calls this “lifelong learning”, though I actually prefer your framing because it makes pretty crisp what we actually want.
I also think that solving this problem is probably closer to “something like a transformer and not very far away”, considering, e.g. memorizing transformers work (https://arxiv.org/abs/2203.08913)
I think the difficulty with answering this question is that many of the disagreements boil down to differences in estimates for how long it will take to operationalize lab-grade capabilities. Say we have intelligences that are narrowly human / superhuman on every task you can think of (which, for what it’s worth, I think will happen within 5-10 years). How long before we have self-replicating factories? Until foom? Until things are dangerously out of our control? Until GDP doubles within one year? In what order do these things happen? Etc. etc.
If I got anything out of the thousands of words of debate on the site in the last couple of months, it’s the answers to these questions that folks seem to disagree about (though I think I only actually have a good sense of Paul’s answers to these). Also curious to see more specific answers / timelines.
Something worth reemphasizing for folks not in the field is that these benchmarks are not like usual benchmarks where you train the model on the task, and then see how good it does on a held-out set. Chinchilla was not explicitly trained on any of these problems. It’s typically given some context like: “Q: What is the southernmost continent? A: Antarctica Q: What is the continent north of Africa? A:” and then simply completes the prompt until a stop token is emitted, like a newline character.
And it’s performing above-average-human on these benchmarks.
That got people to, I dunno, 6 layers instead of 3 layers or something? But it focused attention on the problem of exploding gradients as the reason why deeply layered neural nets never worked, and that kicked off the entire modern field of deep learning, more or less.
This might be a chicken or egg thing. We couldn’t train big neural networks until we could initialize them correctly, but we also couldn’t train them until we had hardware that wasn’t embarrassing / benchmark datasets that were nontrivial.
While we figured out empirical init strategies fairly early, like Glorot init in 2010, it took until much later that we developed initialization schemes that really Just Worked (He init in 2015 , Dynamical Isometry from Xiao et al 2018)
If I had to blame something, I’d blame GPUs and custom kernel writing getting to the point that small research labs could begin to tinker with ~few million parameter models on essentially single machines + a few GPUs. (The AlexNet model from 2012 was only 60 million parameters!)
For what it’s worth, the most relevant difficult-to-fall-prey-to-Goodheartian-tricks measure is probably cross entropy validation loss, as shown in this figure from the GPT-3 paper:
Serious scaling efforts are much more likely to emphasize progress here over Parameter Count Number Bigger clickbait.
Further, while this number will keep going down, we’re going to crash into the entropy of human generated text at some point. Whether that’s within 3 OOM or ten is anybody’s guess, though.
By the standards of “we will have a general intelligence”, Moravec is wrong, but by the standards of “computers will be able to do anything humans can do”, Moravec’s timeline seems somewhat uncontroversially prescient? For essentially any task that we can define a measurable success metric, we more or less* know how to fashion a function approximator that’s as good as or better than a human.
*I’ll freely admit that this is moving the goalposts, but there’s a slow, boring path to “AGI” where we completely automate the pipeline for “generate a function approximator that is good at [task]”. The tasks that we don’t yet know how to do this for are increasingly occupying the narrow space of [requires simulating social dynamics of other humans], which, just on computational complexity grounds, may be significantly harder than [become superhuman at all narrowly defined tasks].
Relatedly, do you consider [function approximators for basically everything becoming better with time] to also fail to be a good predictor of AGI timelines for the same reasons that compute-based estimates fail?
In defense of shot-ness as a paradigm:
Shot-ness is a nice task-ambiguous interface for revealing capability that doesn’t require any cleverness from the prompt designer. Said another way, If you needed task-specific knowledge to construct the prompt that makes GPT-3 reveal it can do the task, it’s hard to compare “ability to do that task” in a task-agnostic way to other potential capabilities.
For a completely unrealistic example that hyperbolically gestures at what I mean: you could spend a tremendous amount of compute to come up with the magic password prompt that gets GPT-3 to reveal that it can prove P!=NP, but this is worthless if that prompt itself contains a proof that P!=NP, or worse, is harder to generate than the original proof.
This is not what if “feels like” when GPT-3 suddenly demonstrates it is able to do something, of course—it’s more like it just suddenly knows what you meant, and does it, without your hinting really seeming like it provided anything particularly clever-hans-y. So it’s not a great analogy. But I can’t help but feel that a “sufficiently intelligent” language model shouldn’t need to be cajoled into performing a task you can demonstrate to it, thus I personally don’t want to have to rely on cajoling.
Regardless, it’s important to keep track of both “can GPT-n be cajoled into this capability?” as well as “how hard is it to cajole GPT-n into demonstrating this capability?”. But I maintain that shot-prompting is one nice way of probing this while holding “cajoling-ness” relatively fixed.
This is of course moot if all you care about is demonstrating that GPT-n can do the thing. Of course you should prompt tune. Go bananas. But it makes a particular kind of principled comparison hard.
Edit: wanted to add, thank you tremendously for posting this—always appreciate your LLM takes, independent of how fully fleshed they might be
Honestly, at this point, I don’t remember if it’s inferred or primary-sourced. Edited the above for clarity.
This is based on:
The Q&A you mention
GPT-3 not being trained on even one pass of its training dataset
“Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)
Some experience with undertrained enormous language models underperforming relative to expectation
This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.
I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.
Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly
OpenAI is still running evaluations.
This was frustrating to read.
There’s some crux hidden in this conversation regarding how much humanity’s odds depend on the level of technology (read: GDP) increase we’ll be able to achieve with pre-scary-AGI. It seems like Richard thinks we could be essentially post-scarcity, thus radically changing the geopolitical climate (and possibly making collaboration on an X-risk more likely? (this wasn’t spelled out clearly)). I actually couldn’t suss out what Eliezer thinks from this conversation—possibly that humanity’s odds are basically independent of the achieved level of technology, or that the world ends significantly sooner than we’ll be able to deploy transformative tech, so the point is moot. I wish y’all had nailed this down further.
Despite the frustration, this was fantastic content, and I’m excited for future installments.
Sure, but you have essentially no guarantee that such a model would remain contained to that group, or that the insights gleaned from that group could be applied unilaterally across the world before a “bad”* actor reimplemented the model and started asking it unsafe prompts.
Much of the danger here is that once any single lab on earth can make such a model, state actors probably aren’t more than 5 years behind, and likely aren’t more than1 year behind based on the economic value that an AGI represents.
“bad” here doesn’t really mean evil in intent, just an actor that is unconcerned with the safety of their prompts, and thus likely to (in Eliezer’s words) end the world
I don’t think the issue is the existence of safe prompts, the issue is proving the non-existence of unsafe prompts. And it’s not at all clear that a GPT-6 that can produce chapters from 2067EliezerSafetyTextbook is not already past the danger threshold.
If you haven’t already, you might consider speaking with a doctor. Sudden, intense changes to one’s internal sense of logic are often explainable by an underlying condition (as you yourself have noted). I’d rather not play the “diagnose a person over the internet” game, nor encourage anyone else here to do so. You should especially see a doctor if you actually think you’ve had a stroke. It is possible to recover from many different sorts of brain trauma, and the earlier you act, the better odds you have of identifying the problem (if it exists!).
I’ve seen pretty uniform praise from rationalist audiences, so I thought it worth mentioning that the prevailing response I’ve seen from within a leading lab working on AGI is that Eliezer came off as an unhinged lunatic.
For lack of a better way of saying it, folks not enmeshed within the rat tradition—i.e., normies—do not typically respond well to calls to drop bombs on things, even if such a call is a perfectly rational deduction from the underlying premises of the argument. Eliezer either knew that the entire response to the essay would be dominated by people decrying his call for violence, and this was tactical for 15 dimensional chess reasons, or he severely underestimated people’s ability to identify that the actual point of disagreement is around p(doom), and not with how governments should respond to incredibly high p(doom).
This strikes me as a pretty clear failure to communicate.