If the timer starts to run out, then slap something together based on the best understanding we have. 18-24 months is about how long I expect it to take to slap something together based on the best understanding we have.
Can you say more about what you expect to be doing after you have slapped together your favorite plans/recommendations? I’m interested in getting a more concrete understanding of how you see your research (eventually) getting implemented.
Suppose after the 18-24 month process, you have 1-5 concrete suggestions that you want AGI developers to implement. Is the idea essentially that you would go to the superalignment team (and the equivalents at other labs) and say “hi, here’s my argument for why you should do X?” What kinds of implementation-related problems, if any, do you see coming up?
I ask this partially because I think some people are kinda like “well, in order to do alignment research that ends up being relevant, I need to work at one of the big scaling labs in order to understand the frames/ontologies of people at the labs, the constraints/restrictions that would come up if trying to implement certain ideas, get better models of the cultures of labs to see what ideas will simply be dismissed immediately, identify cruxes, figure out who actually makes decisions about what kinds of alignment ideas will end up being used for GPT-N, etc etc.”
My guess is that you would generally encourage people to not do this, because they generally won’t have as much research freedom & therefore won’t be able to work on core parts of the problem that you see as neglected. I suspect many would agree that there is some “I lose freedom” cost, but that this might be outweighed by the “I get better models of what kind of research labs are actually likely to implement” benefit, and I’m curious how you view this trade-off (or if you don’t even see this as a legitimate trade-off).
Good question. As an overly-specific but illustrative example, let’s say that the things David and I have been working on the past month (broadly in the vein of retargeting) go unexpectedly well. What then?
The output is definitely not “1-5 concrete suggestions that you want AGI developers to implement”. The output is “David and John release a product to control an image generator/language model via manipulating its internals directly, this works way better than prompting, consumers love it and the major labs desperately play catch-up”.
More general background principle: alignment is a bottleneck to economic value for nets. If we’re able to do alignment qualitatively better, then that’s going to have a very big market. We will not be trying to convince the major labs to adopt our ideas, the major labs will be offering us money to let them (i.e. acquisition offers), and also trying to reverse-engineer whatever we’re doing. If we release the methods publicly, they’ll be widely adopted within weeks. Or we’ll just grab a big market share ourselves, that works too.
Our main job will be to get that first product to market, in order to legibly prove that the methods work in practice.
(This is not my main response, go read my other comment first.)
I ask this partially because I think some people are kinda like “well, in order to do alignment research that ends up being relevant, I need to work at one of the big scaling labs in order to understand the frames/ontologies of people at the labs, the constraints/restrictions that would come up if trying to implement certain ideas, get better models of the cultures of labs to see what ideas will simply be dismissed immediately, identify cruxes, figure out who actually makes decisions about what kinds of alignment ideas will end up being used for GPT-N, etc etc.”
I have… an intuitive response here which I’m struggling to express, and struggling even more to express kindly.
I’ll start with the most intuitive gut-level response, then try to unpack the underlying intuitions (and hopefully manage to make it more kind in the process). My intuitive gut-level response to that whole thing is roughly… “man, sounds like what such people actually need is some cojones”.
Like, my other comment said that in worlds where the work David and I are doing goes really well, “we will not be trying to convince the major labs to adopt our ideas, the major labs will be offering us money to let them”. And when I ask myself “what is the gap between a mindset which generates my answer, vs a mindset which generates the thing Akash was quoting other people as saying”… it feels like the main difference is some combination of bravery and ambition?
Probably from the perspective of someone on the other side, it seems like I’m cocky and massively overconfident and need a strong dose of humility. From my perspective, it seems like the other side just doesn’t have an ambitious vision and/or the bravery to seriously invest in it.
There’s this quote from Richard Hamming: “if you do not work on important problems, then you will not do important work”. And I think I’d invoke some kind of generalization or analogue of that: “if you do not pursue an ambitious vision, something important which the rest of the world would not do in your absence, then you will not have large counterfactual impact”. (Maybe with an “except by accident” tacked on the end.)
Can you say more about what you expect to be doing after you have slapped together your favorite plans/recommendations? I’m interested in getting a more concrete understanding of how you see your research (eventually) getting implemented.
Suppose after the 18-24 month process, you have 1-5 concrete suggestions that you want AGI developers to implement. Is the idea essentially that you would go to the superalignment team (and the equivalents at other labs) and say “hi, here’s my argument for why you should do X?” What kinds of implementation-related problems, if any, do you see coming up?
I ask this partially because I think some people are kinda like “well, in order to do alignment research that ends up being relevant, I need to work at one of the big scaling labs in order to understand the frames/ontologies of people at the labs, the constraints/restrictions that would come up if trying to implement certain ideas, get better models of the cultures of labs to see what ideas will simply be dismissed immediately, identify cruxes, figure out who actually makes decisions about what kinds of alignment ideas will end up being used for GPT-N, etc etc.”
My guess is that you would generally encourage people to not do this, because they generally won’t have as much research freedom & therefore won’t be able to work on core parts of the problem that you see as neglected. I suspect many would agree that there is some “I lose freedom” cost, but that this might be outweighed by the “I get better models of what kind of research labs are actually likely to implement” benefit, and I’m curious how you view this trade-off (or if you don’t even see this as a legitimate trade-off).
Good question. As an overly-specific but illustrative example, let’s say that the things David and I have been working on the past month (broadly in the vein of retargeting) go unexpectedly well. What then?
The output is definitely not “1-5 concrete suggestions that you want AGI developers to implement”. The output is “David and John release a product to control an image generator/language model via manipulating its internals directly, this works way better than prompting, consumers love it and the major labs desperately play catch-up”.
More general background principle: alignment is a bottleneck to economic value for nets. If we’re able to do alignment qualitatively better, then that’s going to have a very big market. We will not be trying to convince the major labs to adopt our ideas, the major labs will be offering us money to let them (i.e. acquisition offers), and also trying to reverse-engineer whatever we’re doing. If we release the methods publicly, they’ll be widely adopted within weeks. Or we’ll just grab a big market share ourselves, that works too.
Our main job will be to get that first product to market, in order to legibly prove that the methods work in practice.
(This is not my main response, go read my other comment first.)
I have… an intuitive response here which I’m struggling to express, and struggling even more to express kindly.
I’ll start with the most intuitive gut-level response, then try to unpack the underlying intuitions (and hopefully manage to make it more kind in the process). My intuitive gut-level response to that whole thing is roughly… “man, sounds like what such people actually need is some cojones”.
Like, my other comment said that in worlds where the work David and I are doing goes really well, “we will not be trying to convince the major labs to adopt our ideas, the major labs will be offering us money to let them”. And when I ask myself “what is the gap between a mindset which generates my answer, vs a mindset which generates the thing Akash was quoting other people as saying”… it feels like the main difference is some combination of bravery and ambition?
Probably from the perspective of someone on the other side, it seems like I’m cocky and massively overconfident and need a strong dose of humility. From my perspective, it seems like the other side just doesn’t have an ambitious vision and/or the bravery to seriously invest in it.
There’s this quote from Richard Hamming: “if you do not work on important problems, then you will not do important work”. And I think I’d invoke some kind of generalization or analogue of that: “if you do not pursue an ambitious vision, something important which the rest of the world would not do in your absence, then you will not have large counterfactual impact”. (Maybe with an “except by accident” tacked on the end.)