I think that the complexity of the real world was quite crucial, and that simulating environments that reach the appropriate level of complexity will be a very difficult task.
Paul made some arguments that contradict this on the 80k podcast:
Almost all the actual complexity comes from other organisms, so that’s sort of something you get for free if you’re spending all this compute running evolution cause you get to have the agent you’re actually producing interact with itself.
I guess, other than that, you have this physical environment, which is very rich. Quantum field theory is very computationally complicated if you want to actually simulate the behavior of materials, but, it’s not an environment that’s optimized in ways that really pull out … human intelligence is not sensitive to the details of the way that materials break. If you just substitute in, if you take like, “Well, materials break when you apply stress,” and you just throw in some random complicated dynamics concerning how materials break, that’s about as good, it seems, as the dynamics from actual chemistry until you get to a point where humans are starting to build technology that depends on those properties. And, by that point, the game is already over.
Hired an econ tutor based on this.
Yep my comment was about the linear scale up rather than it’s implications for social learning.
Costs don’t really grow linearly with model size because utilization goes down as you spread a model across many GPUs. I. e. aggregate memory requirements grow superlinearly. Relatedly, model sizes increased <100x while compute increased 300000x on OpenAI’s data set. That’s been updating my views a bit recently.
People are trying to solve this with things like GPipe, but I don’t know yet if there can be an approach that scales to many more TPUs than what they tried (8). Communication would be the next bottleneck.
Edit: double commented
The concept of pre-training and fine-tuning in ML seems closely related to mesa-optimization. You pre-train a model on a general distribution so that it can quickly learn from little data on a specific one.
However, as the number of tasks you want to do (N) increases, there seems to be the opposite effect as what your (very neat) model in section 2.1 describes: you get higher returns for meta-optimization so you’ll want to spend relatively more on it. I think model’s assumptions are defied here because the tasks don’t require completely distinct policies. E.g. GPT-2 does very well across tasks with the exact same prediction-policy. I’m not completely sure about this point but it seems fruitful to explore the analogy to pre-training which is widely used.
The exact Bayesian solution penalizes complex models as a side effect. Each model should have a prior over its parameters. The more complex model can fit the data better, so P(data | best-fit parameters, model) is higher. But the model gets penalized because P(best-fit parameters | model) is lower on the prior. Why? The prior is thinly spread over a higher dimensional parameter space so it is lower for any particular set of parameters. This is called “Bayesian Occam’s razor”.
This recent Deepmind paper seems to claim that they found a mesa optimizer. E. g. suppose their LSTM observes an initial state. You can let the LSTM ‘think’ about what to do by feeding it that state multiple times in a row. The more time it had to think, the better it acts. It has more properties like that. It’s a pretty standard LSTM so part of their point is that this is common.
Terminology: the phrase ‘inner alignment’ is loaded with connotations to spiritual thought (https://www.amazon.com/Inner-Alignment-Dinesh-Senan-ebook/dp/B01CRI5UIY)
“high intensity aerobic exercise provides the benefit, and resistance training, if it includes high intensity aerobic exercise, can capture that benefit.”
Which part made you conclude that high intensity aerobic exercise is needed? Asking because most resistance training doesn’t include it.
Great answer, thanks!
It would help if the poster directly approaches or tags me as a relevant expert.
For example, an RL agent that learns a policy that looks good to humans but isn’t. Adversarial examples that only fool a neural nets wouldn’t count.
It’d be nice to hear a response from Paul to paragraph 1. My 2 cents:
I tend to agree that we end up with extremes eventually. You seem to say that we would immediately go to alignment given somewhat aligned systems so Paul’s 1st story barely plays out.
Of course, the somewhat aligned systems may aim at the wrong thing if we try to make them solve alignment. So the most plausible way it could work is if they produce solutions that we can check. But if this were the case,
human supervision would be relatively easy. That’s plausible but it’s a scenario I care less about.
Additionally, if we could use somewhat aligned systems to make more aligned ones, iterated amplification probably works for alignment (narrowly defined by “trying to do what we want”). The only remaining challenge would be to create one system that’s somewhat smarter than us and somewhat aligned (in our case that’s true by assumption). The rest follows, informally speaking, by induction as long as the AI+humans system can keep improving intelligence as alignment is improved. Which seems likely. That’s also plausible but it’s a big assumption and may not be the most important scenario / isn’t a ‘tale of doom’.
AFAICT Paul’s definition of slow (I prefer gradual) takeoff basically implies that local takeoff and immediate unipolar outcomes are pretty unlikely. Many people still seem to put stock in local takeoff. E.g. Scott Garrabrant. Zvi and Eliezer have said they would like to write rebuttals. So I’m surprised by the scarcity of disagreement that’s written up.
Thanks. IIRC the comments didn’t feature that much disagreement and little engagement from established researchers. I didn’t find too much of these in other threads either. I’m not sure if I should infer that little disagreement exists.
Re Paul’s definition, he expects there will be years between 50% and 100% GDP growth rates. I think a lot of people here would disagree but I’m not sure.
I counted 37 researchers with safety focus plus MIRI researchers in September 2018. These are mostly aimed at AGI and at least PhD level. I also counted 38 who do safety at various levels of part-time. I can email the spreadsheet. You can also find it in 80k’s safety Google group.