Same, but I’m more skeptical. At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio. Also I’d guess that a lot of why the field is so empirical is less that nobody is able to form models, but rather that people have models, but rationally put more trust in empirical research methods than in their inside-view models. When I talked to the average ICML presenter they generally had some reason they expected their research to work, even if it was kind of fake.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
It’s like degrowth or dieting or veganism; people come up with a solution that makes things better but requires personal sacrifice and then make that solution a cornerstone of personal moral virtue. Once that’s your identity, any other solutions to the original problem are evil.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.
Same, but I’m more skeptical. At ICML there were many papers that seemed well motivated and had deep models, probably well over 5%. So the skill of having deep models is not limited to visionaries like Bengio. Also I’d guess that a lot of why the field is so empirical is less that nobody is able to form models, but rather that people have models, but rationally put more trust in empirical research methods than in their inside-view models. When I talked to the average ICML presenter they generally had some reason they expected their research to work, even if it was kind of fake.
Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
To be clear, I would also expect “well over 5%”. 10-20% feels about right. When I said in the OP that the median researcher lacks deep models, I really did mean the median, I was not trying to claim 90%+.
Re: the TRPO vs PPO example, I don’t think this is getting at the thing the OP is intended to be about. It’s not about how “well-justified” a technique is mathematically. It’s about models of what’s going wrong—in this case, something to do with large update steps messing things up. Like, imagine someone who sees their training run mysteriously failing and starts babbling random things like “well, maybe it’s getting stuck in local minima”, “maybe the network needs to be bigger”, “maybe I should adjust some hyperparameters”, and they try all these random things but they don’t have any way to go figure out what’s causing the problem, they just fiddle with whatever knobs are salient and available. That person probably never figures out TRPO or PPO, because they don’t figure out that too-large update steps are causing problems.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
Cf this tweet and the text below:
https://twitter.com/Rocketeer_99/status/1706057953524977740
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.