On one hand, the obvious question is: are you far above/very different from the state-of-the-art? The further ahead or to the side you are from what other people are doing, the more likely you are to encounter unexpected risks. If you are, in effect, repeating other people’s steps and are behind those other people, your risks are low (disclaimer: those risks are “kind of low” today, in the future we might start encountering a situation where training needs to be more guarded, because models are more powerful, then even if you are following other people, you must also competently follow their safety precautions, or you might create unacceptable risks; even today people do follow some precautions, and if you screw those up, there might be big trouble; for example, today’s leaders don’t open source GPT-4-level models, we don’t know if leaking or open-sourcing GPT-4-level weights would create unacceptable risks).
On the other hand, the main big risk in this sense (at least with the AIs today, especially if you are not using robots) is the “foom” risk: instances of a model capable of doing competent coding and competent AI research becoming capable of independently producing their own even more capable successors, and so on.
The more you use those models for coding and for doing AI research, the more you should ponder whether you are getting close to creating conditions for runaway self-improvement, where AIs produce more capable successors on their own, those produce even more capable successors, etc… So far, all those recursive self-improvement runs have saturated after a bit of improvement, but they will not keep saturating forever (see e.g. “Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation”, https://arxiv.org/abs/2310.02304, and in particular see Figure 4 on page 6; there you see that the scheme does not work at all with GPT-3.5 as the underlying LLM, and works but quickly saturates with GPT-4 as the underlying LLM; so one might ponder if even keeping literally this self-improvement scheme, but using a better future LLM as the underlying fixed LLM might already be highly dangerous in this sense).
do you refer to your Clarification 2, or do you mean something else?
but more importantly, since you are saying
say I’ve got 1000x the gpt4 flops and that my architecture is to transformers as convolutions are to simple MLPs in vision (ie a lot better)
no, sorry, you are way deep in the danger zone, and whether people can proceed at all with something like that really depends on the state of the field of AI existential safety at the time when that level of flops and architecture are feasible… if our understanding of AI existential safety is what it is today, but people are proceeding with this magnitude of compute and architecture improvements, our chances are really bad...
there is no generic answer here which does not depend on the state of research which has not been successfully accomplished yet...
so the only answer is: be very aware of the state of the art of research in AI existential safety (that should really hopefully be part of the requirements to this kind of training runs by the time we get to those compute and architecture improvements)… one can’t get a pilot’s license without understanding certain things about plane safety; the runs you are describing should require people being safety-qualified in this sense as well...
so
an answer like “here’s how to get strong evidence of danger so you know when to stop training” is valid but “here’s how to wipe out the danger” is much better.
In a sane world, people will have to take courses and pass exams where they must demonstrate that they know the “consensus answers” to these question before doing runs with the compute and architecture you are describing.
And we’ll need to get something resembling “consensus answers” before this is possible.
So the answer is: one will need an honestly earned official certificate showing one knows the answers to these questions. At the moment, no one knows those answers.
On one hand, the obvious question is: are you far above/very different from the state-of-the-art? The further ahead or to the side you are from what other people are doing, the more likely you are to encounter unexpected risks. If you are, in effect, repeating other people’s steps and are behind those other people, your risks are low (disclaimer: those risks are “kind of low” today, in the future we might start encountering a situation where training needs to be more guarded, because models are more powerful, then even if you are following other people, you must also competently follow their safety precautions, or you might create unacceptable risks; even today people do follow some precautions, and if you screw those up, there might be big trouble; for example, today’s leaders don’t open source GPT-4-level models, we don’t know if leaking or open-sourcing GPT-4-level weights would create unacceptable risks).
On the other hand, the main big risk in this sense (at least with the AIs today, especially if you are not using robots) is the “foom” risk: instances of a model capable of doing competent coding and competent AI research becoming capable of independently producing their own even more capable successors, and so on.
The more you use those models for coding and for doing AI research, the more you should ponder whether you are getting close to creating conditions for runaway self-improvement, where AIs produce more capable successors on their own, those produce even more capable successors, etc… So far, all those recursive self-improvement runs have saturated after a bit of improvement, but they will not keep saturating forever (see e.g. “Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation”, https://arxiv.org/abs/2310.02304, and in particular see Figure 4 on page 6; there you see that the scheme does not work at all with GPT-3.5 as the underlying LLM, and works but quickly saturates with GPT-4 as the underlying LLM; so one might ponder if even keeping literally this self-improvement scheme, but using a better future LLM as the underlying fixed LLM might already be highly dangerous in this sense).
Please restate last paragraph as instructions/submission if you’re submitting
do you refer to your Clarification 2, or do you mean something else?
but more importantly, since you are saying
no, sorry, you are way deep in the danger zone, and whether people can proceed at all with something like that really depends on the state of the field of AI existential safety at the time when that level of flops and architecture are feasible… if our understanding of AI existential safety is what it is today, but people are proceeding with this magnitude of compute and architecture improvements, our chances are really bad...
there is no generic answer here which does not depend on the state of research which has not been successfully accomplished yet...
so the only answer is: be very aware of the state of the art of research in AI existential safety (that should really hopefully be part of the requirements to this kind of training runs by the time we get to those compute and architecture improvements)… one can’t get a pilot’s license without understanding certain things about plane safety; the runs you are describing should require people being safety-qualified in this sense as well...
so
In a sane world, people will have to take courses and pass exams where they must demonstrate that they know the “consensus answers” to these question before doing runs with the compute and architecture you are describing.
And we’ll need to get something resembling “consensus answers” before this is possible.
So the answer is: one will need an honestly earned official certificate showing one knows the answers to these questions. At the moment, no one knows those answers.