I was “let’s build it before someone evil”, I’ve left that particular viewpoint behind since realizing how hard aligning it is.
It was empirically infeasible (for the general AGI x-risk technical milieu) to explain this to you faster than you trying it for yourself, and one might have reasonably expected you to have been generally culturally predisposed to be open to having this explained to you. If this information takes so much energy and time to be gained, that doesn’t bode well for the epistemic soundness of whatever stance is currently being taken by the funder-attended vibe-consensus. How would you explain this to your past self much faster?
Inhabiting my views i had at the time rather than my current ones where I cringe at several alignment-is-straightforward claims I’m about to make:
Well yeah. So many people around me clearly were refusing to look at the evidence that made it obvious AGI was near (and since then have deluded themselves into believing current AI is not basically-AGI-with-some-fixable-limitations), and yet still insist they know the future around what’s possible. Obviously alignment is hard, but it doesn’t seem harder than problems the field chews through regularly anyway. it’s like they don’t even realize or think about how to turn capabilities into an ingredient for alignment—if we could train a model on all human writing we could use it to predict likely alignment successes. i explained to a few MIRI folks in early 2017, before transformers (I can show you an email thread from after our in-person conversation) why neural networks are obviously going to be AGI by 2027, they didn’t find it blindingly obvious like someone who grokked neural networks should have, and that was a huge hit to their credibility with me. They’re working on AGI but don’t see it happening in front of their faces? Come on, how stupid are you?
(Turns out getting alignment out of capabilities requires you to know what to ask for still, and to be able to reliably recognize if you got it. If you’d told me that then it would gave gotten me thinking about how to solve it but I’d still have assumed the field would figure it out in time like any other capability barrier.)
(Turns out Jake and I were somewhat, though not massively, unique in finding this blindingly obvious. Some people both found this obvious and were positioned to do something about it, eg Dario. I’ve talked before about why the ML field as a whole tends to doubt their 3 year trajectory—when you’re one of the people doing hits based research you have a lot of failures, which makes success seem further than it is for the field as a whole—and now it’s turned into alignment researchers not believing me when I tell them they’re doing work that seems likely to be relevant to capabilities more than to alignment.)
So, like, #1: make arguments based on actual empirical evidence rather than being mysteriously immune to evidence as though your intuition still can’t retrodict capability successes. (This is now being done, but not by you.) #2 might be, like, Socratic questioning my plan—if I’m so sure I can align it, how exactly do I plan to do that? And then, make arguments based on accepting that neural networks do in fact work like other distributed-representation minds (eg animals), and use that to make your point. Accept that you can’t prove that we’ll fail because it’s legitimately not true that we’re guaranteed to. Engage with the actual problems someone making a capable ai system encounters rather than wrongly assuming the problem is unrecognizeably different.
And then you get where we’ve already gotten. I mean, you haven’t, I’m still annoyed at you for having had the job of seeing this coming and being immune to evidence, but mostly the field is now making arguments that actually depend on the structures inside neural networks, so the advice that was relevant then mostly doesn’t apply anymore. But if you’d wanted to do it then, that’s what it would have looked like.
Basically if you can turn it into a socratic prompt for how to make an aligned AI you can give a capability researcher, where the prompt credibly sounds like you actually have thought about the actual steps of making capable systems and if it’s hard to do it will become apparent to them as they think about the prompt, then you’re getting somewhere. But if your response to a limitation of current AI is still “I don’t see how this can change” rather than “let me now figure out how to make this change” you’re going to be doomed to think AI is going slower than it is and your arguments are likely to sound insane to people who understand capabilities. Their intuitions have detailed bets about what’s going on inside. When you say “we don’t understand”, their response is “maybe you don’t, but I’ve tried enough stuff to have very good hypotheses consistently”. The challenge of communication is then how you make the challenge of alignment into an actual technical question where their intuitions engage and can see why it’s hard for themselves.
I was culturally disposed against it, btw. I was an LW hater in 2015 and always thought the alignment concern was stupid.
It was empirically infeasible (for the general AGI x-risk technical milieu) to explain this to you faster than you trying it for yourself, and one might have reasonably expected you to have been generally culturally predisposed to be open to having this explained to you. If this information takes so much energy and time to be gained, that doesn’t bode well for the epistemic soundness of whatever stance is currently being taken by the funder-attended vibe-consensus. How would you explain this to your past self much faster?
Inhabiting my views i had at the time rather than my current ones where I cringe at several alignment-is-straightforward claims I’m about to make:
Well yeah. So many people around me clearly were refusing to look at the evidence that made it obvious AGI was near (and since then have deluded themselves into believing current AI is not basically-AGI-with-some-fixable-limitations), and yet still insist they know the future around what’s possible. Obviously alignment is hard, but it doesn’t seem harder than problems the field chews through regularly anyway. it’s like they don’t even realize or think about how to turn capabilities into an ingredient for alignment—if we could train a model on all human writing we could use it to predict likely alignment successes. i explained to a few MIRI folks in early 2017, before transformers (I can show you an email thread from after our in-person conversation) why neural networks are obviously going to be AGI by 2027, they didn’t find it blindingly obvious like someone who grokked neural networks should have, and that was a huge hit to their credibility with me. They’re working on AGI but don’t see it happening in front of their faces? Come on, how stupid are you?
(Turns out getting alignment out of capabilities requires you to know what to ask for still, and to be able to reliably recognize if you got it. If you’d told me that then it would gave gotten me thinking about how to solve it but I’d still have assumed the field would figure it out in time like any other capability barrier.)
(Turns out Jake and I were somewhat, though not massively, unique in finding this blindingly obvious. Some people both found this obvious and were positioned to do something about it, eg Dario. I’ve talked before about why the ML field as a whole tends to doubt their 3 year trajectory—when you’re one of the people doing hits based research you have a lot of failures, which makes success seem further than it is for the field as a whole—and now it’s turned into alignment researchers not believing me when I tell them they’re doing work that seems likely to be relevant to capabilities more than to alignment.)
So, like, #1: make arguments based on actual empirical evidence rather than being mysteriously immune to evidence as though your intuition still can’t retrodict capability successes. (This is now being done, but not by you.) #2 might be, like, Socratic questioning my plan—if I’m so sure I can align it, how exactly do I plan to do that? And then, make arguments based on accepting that neural networks do in fact work like other distributed-representation minds (eg animals), and use that to make your point. Accept that you can’t prove that we’ll fail because it’s legitimately not true that we’re guaranteed to. Engage with the actual problems someone making a capable ai system encounters rather than wrongly assuming the problem is unrecognizeably different.
And then you get where we’ve already gotten. I mean, you haven’t, I’m still annoyed at you for having had the job of seeing this coming and being immune to evidence, but mostly the field is now making arguments that actually depend on the structures inside neural networks, so the advice that was relevant then mostly doesn’t apply anymore. But if you’d wanted to do it then, that’s what it would have looked like.
Basically if you can turn it into a socratic prompt for how to make an aligned AI you can give a capability researcher, where the prompt credibly sounds like you actually have thought about the actual steps of making capable systems and if it’s hard to do it will become apparent to them as they think about the prompt, then you’re getting somewhere. But if your response to a limitation of current AI is still “I don’t see how this can change” rather than “let me now figure out how to make this change” you’re going to be doomed to think AI is going slower than it is and your arguments are likely to sound insane to people who understand capabilities. Their intuitions have detailed bets about what’s going on inside. When you say “we don’t understand”, their response is “maybe you don’t, but I’ve tried enough stuff to have very good hypotheses consistently”. The challenge of communication is then how you make the challenge of alignment into an actual technical question where their intuitions engage and can see why it’s hard for themselves.
I was culturally disposed against it, btw. I was an LW hater in 2015 and always thought the alignment concern was stupid.