I am maybe starting from the assumption that sooner-or-later, alignment research would reach this point, and “well, help the alignment research progress as fast possible” seemed like a straightforward goal on the meta-level and is one of the obvious things to be shooting for whether or not it’s currently tractable.
I have a current set of projects, but, the meta-level one is “look for ways to systematically improve people’s ability to quickly navigate confusing technical problems, and see what works, and stack as many interventions as we can.”
(I can drill into that but I’m not sure what level your skepticism was one)
I have a current set of projects, but, the meta-level one is “look for ways to systematically improve people’s ability to quickly navigate confusing technical problems, and see what works, and stack as many interventions as we can.”
Yeah so I think I read all the posts about that you wrote in the last 2 years.
I think such techniques and the meta skill of deriving more of them are very important for making good alignment progress, but I still think it takes geniuses. (In fact I sorta tried my shot at the alignment problem over the last 3ish years where I often reviewed and looked at what I could do better, and started developing a more systematic sciency approach for studying human minds. I’m now pivoting to working on Plan 1 though, because there’s not enough time left.)
Like currently it takes a some special kind of genius to make useful progress. Maybe we could try to train the smartest young supergeniuses in the techniques we currently have, and maybe they then could make progress much faster than me or Eliezer. Or maybe they still wouldn’t be able to judge what is good progress. If you actually get supergeniuses you could try to train that would obviously be quite useful to try, even though it likely won’t get done on time, but if they don’t end up running away with their dumb idea for solving alignment without understanding the systems they are dealing with, it would still be great for needing less time after the ban.
(Steven Byrnes’ agenda seems to be potentially more scaleable with good methodology, and it has the advantage that progress is relatively less exfohazardry, but would still take very smart people (and relatively long study) to make progress. But it won’t get done on time, so you still need international coordination to not build ASI.)
But the way you seemed to motivate your work in your previous comment sounded more like “make current safety researchers do more productive work so we might actually solve alignment without international coordination”. Seems very difficult to me, I think they are not even tracking many problems that are actually rather obvious or not getting the difficulties that are easy to get. People somehow often have a very hard time understanding relevant concepts here. E.g. even special geniuses like John Wentworth and Steven Byrnes made bad attempts at attacking corrigibility where they misunderstood the problem (1, 2), although that’s somewhat cherry picked and may be fixable. I mean not that such MIRI-like research is likely necessary, but still. Though I’m still curious about how you imagine your project might help here more precisely.
try for as much international coordination as we can get.
(this is actually my main focus right now, I expect to pivot back towards intellectual-progress stuff once I feel like we’ve done all the obvious things for waking up the world)
try to make progress as fast as we can on the technical bits
probably, we don’t either as much coordination or technical progress as we want, but, like, try our best on both axes and hope it’s enough
I’m not sure whether you need literal geniuses, but, I do think you need a lot of raw talent for it to plausibly work. I’m uncertain if, like, say you need be “a level 8 researcher” to make nonzero progress on the core bottlenecks, is it possible to upgrade a level 6 or 7 person to level 8, or are level 8s basically born not made and the only hope is in finding internentions that bump 8s into 9s)
I frequently see people who seem genius-like who nonetheless are kinda metacognitively dumb and seem like there’s low hanging fruit there, so I feel like either world is tractable, but, they are pretty different.
I am maybe starting from the assumption that sooner-or-later, alignment research would reach this point, and “well, help the alignment research progress as fast possible” seemed like a straightforward goal on the meta-level and is one of the obvious things to be shooting for whether or not it’s currently tractable.
I have a current set of projects, but, the meta-level one is “look for ways to systematically improve people’s ability to quickly navigate confusing technical problems, and see what works, and stack as many interventions as we can.”
(I can drill into that but I’m not sure what level your skepticism was one)
Yeah so I think I read all the posts about that you wrote in the last 2 years.
I think such techniques and the meta skill of deriving more of them are very important for making good alignment progress, but I still think it takes geniuses. (In fact I sorta tried my shot at the alignment problem over the last 3ish years where I often reviewed and looked at what I could do better, and started developing a more systematic sciency approach for studying human minds. I’m now pivoting to working on Plan 1 though, because there’s not enough time left.)
Like currently it takes a some special kind of genius to make useful progress. Maybe we could try to train the smartest young supergeniuses in the techniques we currently have, and maybe they then could make progress much faster than me or Eliezer. Or maybe they still wouldn’t be able to judge what is good progress.
If you actually get supergeniuses you could try to train that would obviously be quite useful to try, even though it likely won’t get done on time, but if they don’t end up running away with their dumb idea for solving alignment without understanding the systems they are dealing with, it would still be great for needing less time after the ban.
(Steven Byrnes’ agenda seems to be potentially more scaleable with good methodology, and it has the advantage that progress is relatively less exfohazardry, but would still take very smart people (and relatively long study) to make progress. But it won’t get done on time, so you still need international coordination to not build ASI.)
But the way you seemed to motivate your work in your previous comment sounded more like “make current safety researchers do more productive work so we might actually solve alignment without international coordination”. Seems very difficult to me, I think they are not even tracking many problems that are actually rather obvious or not getting the difficulties that are easy to get. People somehow often have a very hard time understanding relevant concepts here. E.g. even special geniuses like John Wentworth and Steven Byrnes made bad attempts at attacking corrigibility where they misunderstood the problem (1, 2), although that’s somewhat cherry picked and may be fixable. I mean not that such MIRI-like research is likely necessary, but still. Though I’m still curious about how you imagine your project might help here more precisely.
My overall plan is:
try for as much international coordination as we can get.
(this is actually my main focus right now, I expect to pivot back towards intellectual-progress stuff once I feel like we’ve done all the obvious things for waking up the world)
try to make progress as fast as we can on the technical bits
probably, we don’t either as much coordination or technical progress as we want, but, like, try our best on both axes and hope it’s enough
I’m not sure whether you need literal geniuses, but, I do think you need a lot of raw talent for it to plausibly work. I’m uncertain if, like, say you need be “a level 8 researcher” to make nonzero progress on the core bottlenecks, is it possible to upgrade a level 6 or 7 person to level 8, or are level 8s basically born not made and the only hope is in finding internentions that bump 8s into 9s)
I frequently see people who seem genius-like who nonetheless are kinda metacognitively dumb and seem like there’s low hanging fruit there, so I feel like either world is tractable, but, they are pretty different.