I do not see you as failing to be a team player re: existential risk from AI.
I do see you as something like … making a much larger update on the bias toward simple functions than I do. Like, it feels vaguely akin to … when someone quotes Ursula K. LeGuin’s opinion as if that settles some argument with finality?
I think the bias toward simple functions matters, and is real, and is cause for marginal hope and optimism, but “bias toward” feels insufficiently strong for me to be like “ah, okay, then the problem outlined above isn’t actually a problem.”
I do not, to be clear, believe that my essay contains falsehoods that become permissible because they help idiots or children make inferential leaps. I in fact thought the things that I said in my essay were true (with decently high confidence), and I still think that they are true (with slightly reduced confidence downstream of stuff like the link above).
(You will never ever ever ever ever see me telling someone a thing I know to be false because I believe that it will result in them outputting a correct belief or a correct behavior; if I do anything remotely like that I will headline explicitly that that’s what I’m doing, with words like “The following is a lie, but if you pretend it’s true for a minute you might have a true insight downstream of it.”)
(That link should take you to the subheading “Written April 2, 2022.”)
I think that we don’t know what teal shape to draw, and that drawing the teal shape perfectly would not be sufficient on its own. In future writing I’ll try to twitch those two threads a little further apart.
“bias toward” feels insufficiently strong for me to be like “ah, okay, then the problem outlined above isn’t actually a problem.”
You’re right; Steven Byrnes wrote me a really educational comment today about what the correct goal-counting argument looks like, which I need to think more about; I just think it’s really crucial that this is fundamentally an argument about generalization and inductive biases, which I think is being obscured in the black-shape metaphor when you write that “each of these black shapes is basically just as good at passing that particular test” as if it didn’t matter how complex the shape is.
(I don’t think talking to middle schoolers about inductive biases is necessarily hopeless; consider a box behind a tree.)
cause for marginal hope and optimism
I think the temptation to frame technical discussions in terms of pessimism vs. optimism is itself a political distortion that I’m trying to avoid. (Apparently not successfully, if I’m coming off as a voice of marginal hope and optimism.)
You wrote an analogy that attempts to explain a reason why it’s hard to make neural networks do what we want; I’m arguing that the analogy is misleading. That disagreement isn’t about whether the humans survive. It’s about what’s going on with neural networks, and the pedagogy of how to explain it. Even if I’m right, that doesn’t mean the humans survive: we could just be dead for other reasons. But as you know, what matters in rationality is the arguments, not the conclusions; not only are bad arguments for a true conclusion still bad, even suboptimal pedagogy for a true lesson is still suboptimal.
I do not, to be clear, believe that my essay contains falsehoods that become permissible because they help idiots or children make inferential leaps [...] You will never ever ever ever ever see me telling someone a thing I know to be false because I believe that it will result in them outputting a correct belief or a correct behavior
I do not see you as failing to be a team player re: existential risk from AI.
I do see you as something like … making a much larger update on the bias toward simple functions than I do. Like, it feels vaguely akin to … when someone quotes Ursula K. LeGuin’s opinion as if that settles some argument with finality?
I think the bias toward simple functions matters, and is real, and is cause for marginal hope and optimism, but “bias toward” feels insufficiently strong for me to be like “ah, okay, then the problem outlined above isn’t actually a problem.”
I do not, to be clear, believe that my essay contains falsehoods that become permissible because they help idiots or children make inferential leaps. I in fact thought the things that I said in my essay were true (with decently high confidence), and I still think that they are true (with slightly reduced confidence downstream of stuff like the link above).
(You will never ever ever ever ever see me telling someone a thing I know to be false because I believe that it will result in them outputting a correct belief or a correct behavior; if I do anything remotely like that I will headline explicitly that that’s what I’m doing, with words like “The following is a lie, but if you pretend it’s true for a minute you might have a true insight downstream of it.”)
(That link should take you to the subheading “Written April 2, 2022.”)
I think that we don’t know what teal shape to draw, and that drawing the teal shape perfectly would not be sufficient on its own. In future writing I’ll try to twitch those two threads a little further apart.
You’re right; Steven Byrnes wrote me a really educational comment today about what the correct goal-counting argument looks like, which I need to think more about; I just think it’s really crucial that this is fundamentally an argument about generalization and inductive biases, which I think is being obscured in the black-shape metaphor when you write that “each of these black shapes is basically just as good at passing that particular test” as if it didn’t matter how complex the shape is.
(I don’t think talking to middle schoolers about inductive biases is necessarily hopeless; consider a box behind a tree.)
I think the temptation to frame technical discussions in terms of pessimism vs. optimism is itself a political distortion that I’m trying to avoid. (Apparently not successfully, if I’m coming off as a voice of marginal hope and optimism.)
You wrote an analogy that attempts to explain a reason why it’s hard to make neural networks do what we want; I’m arguing that the analogy is misleading. That disagreement isn’t about whether the humans survive. It’s about what’s going on with neural networks, and the pedagogy of how to explain it. Even if I’m right, that doesn’t mean the humans survive: we could just be dead for other reasons. But as you know, what matters in rationality is the arguments, not the conclusions; not only are bad arguments for a true conclusion still bad, even suboptimal pedagogy for a true lesson is still suboptimal.
This is good, but I think not saying false things turns out to be a surprisingly low bar, because the selection of which true things you communicate (and which true things you even notice) can have a large distortionary effect if the audience isn’t correcting for it.