When you say “switching” it reminds me of the “big switch” approach of https://en.wikipedia.org/wiki/General_Problem_Solver.
Regarding to how they do it, I believe the relevant passage to be:
Because distinct tasks within a domain can share identical embodiments, observation formats and action specifications, the model sometimes needs further context to disambiguate tasks. Rather than providing e.g. one-hot task identifiers, we instead take inspiration from (Brown et al., 2020; Sanh et al., 2022; Wei et al., 2021) and use prompt conditioning.
I guess it should be possible to locate the activation paths for different tasks, as the tasks are pretty well separated. Something on the lines of https://github.com/jalammar/ecco
Fair analysis, I agree with the conclusions. The main contribution seems to be a proof that transformers can handle many tasks at the same time.
Not sure if you sorted the tests in order of relevance, but I also consider the “held-out” test as being the more revealing. Besides finetuning, it would be interesting to test the zero-shot capabilities.
A single network is solving 600 different tasks spanning different areas. 100+ of the tasks are solved at 100% human performance. Let that sink in.
While not a breaktrough in arbitrary scalable generality, the fact that so many tasks can be fitted into one architecture is surprising and novel. For many real life applications, being good in 100-1000 tasks makes an AI general enough to be deployed as an error tollerant robot, say in a warehouse.
The main point imho is that this architecture may be enough to be scaled (10-1000x parameters) in few years to a useful proto-AGI product.
Pretty disappointing and unexpected to hear this in 2022, after all the learnings from the pandemic.
What’s stopping the companies from hiring a new researcher? People are queueing for tech jobs.
If they leave then only who does not care remains…
If by “sort of general, flexible learning ability that would let them tackle entirely new domains” we include adding new tokenised vectors in the training set, then this fit the definition. Of course this is “cheating” since the system is not learning purely by itself, but for the purpose of building a product or getting the tasks done this does not really matter.
And it’s not unconcievable to imagine self-supervised tokens generation to get more skills and perhaps a K-means algorithm to make sure that the new embeddings do not interfere with previous knowledge. It’s a dumb way of getting smarter, but apparently it works thanks to scale effects!
I would agree with “proto-AGI”. I might soon write a blog on this, but ideally we could define a continuous value to track how close we are to AGI, which is increasing if:
-the tasks to solve are very different from each other
-the tasks are complex
-how well a task have been solved
-few experience (or info) is fed to the system
-experience is not directly related to the task
-experience is very raw
-computation is done in few steps
Then adding new tasks and changing the environment.
I have always been cautios, but I would say yes this time.
With the caveat that it learns new tasks only from supervised data, and not reusing previous experience.
The fact that adding new tasks doesn’t diminuish performance on previous tasks is highly non trivial!
It may be that there is a lot of room in the embedding space to store them. The wild thing is that nothing (apart few hardware iterations) stop us to increase the embedding space if really needed.
Possibly the first truly AGI paper.
Even though it is just exploiting the fact that all the narrow problems can be solved as sequence problems via tokenisation, it’s remarkable that the tasks do not interferee distructively between each other. My gut feeling is that this is due the very high dimensional space of the embedding vectors.
It leaves ample room for grow.
My main point is that there is not enough evidence for a strong claim like doom-soon. In absence of hard data anybody is free to cook up argument pro or against doom-soon.
You may not like my suggestion, but I would strongly advise to get deeper into the field and understand it better yourself, before taking important decisions.
In terms of paradigms, you may have a look at why building AI-software development is hard (easy to get to 80% accurate, hellish to get to 99%), AI-winters and hype cycles (disconnect between claims-expectations and reality), the development of dangerous technologies (nuclear, biotech) and how stability has been achieved.
Don’t look at opinions, look for data and facts. Speculations, opinions or beliefs cannot be the basis on which you take decisions or update your knowledge. It’s better to know few things, but with high confidence.
Ask yourself, which hard data points are there in favour of doom-soon?
Geniuses or talented researchers are not that impactful as much as the right policy. Contribute creating the right conditions (work environment, education, cross contamination, funding, etc.) to make good research flourish. At the same time if fundamentals are not covered (healthcare, housing, etc.) people are not able to focus on much more than suvival. So pretty much anything that makes the whole system works better helps.
As an example, there are plenty of smart individuals in poor counties which are not able to express their potential.
Thanks. Yes, pretty much in line with the authors. Btw, I would super happy to be wrong and see advancement in those areas, especially the robotic one.
Thanks for the offer, but I’m not interested in betting money.
A close call, but I would lean still on no. Engineering the prompt is where humans leverage all their common sense and vast (w.r.t.. the AI) knowledge.
The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I’m reading in the comments and in other papers/articles, it’s a mixture of beliefs, estrapolations from known facts, reliance on what “experts” said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what’s unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it’s not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a “good” super-AGI).
Another factors often forgotten is that what we mean by “humanity” today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
The downvotes are excessive, the post is provoking, but interesting.
I think you will not even need to “push the fat man”. The development on an AGI will be slow and gradual (as any other major technology) and there will be incidents along the way (e.g. an AGI chatbot harassing someone). Those incidents will periodically mandate new regulations, so that measurements to tackle real AGI related dangers will be enacted, similarly to what happens in the nuclear energy sector. They will not be perfect, but there will be regulations.
The tricky part is that not all nations will set similar safety level, in fact some may encourage the development of unsafe, but high reward, AGI. So overall it looks like “pushing the fat man” will not even work that well.
Matthew, Tamay: Refreshing post, with actual hard data and benchmarks. Thanks for that.
A model/ensemble of models achieves >80% on all tasks in the MMLU benchmark
No in 2026, no in 2030. Mainly due to the fact that we don’t have much structured data and incentives to solve some of the categories. A powerful unsupervised AI would be needed to clear those categories, or more time.
A credible estimate reveals that an AI lab deployed EITHER >10^30 FLOPs OR hardware that would cost $1bn if purchased through competitive cloud computing vendors at the time on a training run to develop a single ML model (excluding autonomous driving efforts)
This may actually happen (the 1B one, not the 10^30), also due to inflation and USD created out of thin air and injected into the market. I would go for no in 2026 and yes in 2030.
A model/ensemble of models will achieve >90% on the MATH dataset using a no-calculator rule
No in 2026, no in 2030. Significant algorithmic improvements needed. It may be done if prompt engineering is allowed.
A model/ensemble of models achieves >80% top-1 strict accuracy on competition-level problems on the APPS benchmark
No in 2026, no in 2030. Similar to the above, but there will be more progress, as a lot of data is available.
A gold medal for the IMO Grand Challenge (conditional on it being clear that the questions were not in the training set)
No in 2026, no in 2030.
A robot that can, from beginning to end, reliably wash dishes, take them out of an ordinary dishwasher and stack them into a cabinet, without breaking any dishes, and at a comparable speed to humans (<120% the average time)
I work with smart robots, this cannot happen so fast also due to hardware limitations. The speed requirement is particularly harsh. Without the speed limit and with the system known in advance I would say yes in 2030. As the bet stands, I go for No in 2026, no in 2030.
Tesla’s full-self-driving capability makes fewer than one major mistake per 100,000 miles
Not sure about this one, but I lean on No in 2026, no in 2030.