In a competitive attention market without active policing of the behavior pattern I’m describing, it seems wrong to expect participants getting lots of favorable attention and resources to be honest, as that’s not what’s being selected for.
There’s a weird thing going on when, if I try to discuss this, I either get replies like Raemon’s claim elsewhere that the problem seems intractable at scale (and it seems like you’re saying a similar thing at times), or replies to the effect that there are lots of other good reasons why people might be making mistakes, and it’s likely to hurt people’s feelings if we overtly assign substantial probability to dishonesty, which will make it harder to persuade them of the truth. The obvious thing that’s missing is the intermediate stance of “this is probably a big pervasive problem, and we should try at all to fix it by the obvious means before giving up.”
It doesn’t seem very surprising to me that a serious problem has already been addressed to the extent that it’s true that both 1) it’s very hard to make any further progress on the problem and 2) the remaining cost from not fully solving the problem can be lived with.
The obvious thing that’s missing is the intermediate stance of “this is probably a big pervasive problem, and we should try at all to fix it by the obvious means before giving up.”
It seems to me that people like political scientists, business leaders, and economists have been attacking the problem for a while, so it doesn’t seem that likely there’s a lot of low hanging fruit to be found by “obvious means”. I have some more hope that the situation with AI alignment is different enough from what people thought about in the past (e.g., a lot of people involved are at least partly motivated by altruism compared to the kinds of people described in Moral Mazes) that you can make progress on credit assigning as applied to AI alignment, but you still seem to be too optimistic.
What are a couple clear examples of people trying to fix the problem locally in an integrated way, rather than just talking about the problem or trying to fix it at scale using corrupt power structures for enforcement?
It seems to me like the nearest thing to a direct attempt was the Quakers. As far as I understand, while they at least tried to coordinate around high-integrity discourse, they put very little work into explicitly modeling the problem of adversarial behavior or developing robust mechanisms for healing or routing around damage to shared information processing.
I’d have much more hope about existing AI alignment efforts if it seemed like what we’ve learned so far had been integrated into the coordination methods of AI safety orgs, and technical development were more focused on current alignment problems.
In a competitive attention market without active policing of the behavior pattern I’m describing, it seems wrong to expect participants getting lots of favorable attention and resources to be honest, as that’s not what’s being selected for.
There’s a weird thing going on when, if I try to discuss this, I either get replies like Raemon’s claim elsewhere that the problem seems intractable at scale (and it seems like you’re saying a similar thing at times), or replies to the effect that there are lots of other good reasons why people might be making mistakes, and it’s likely to hurt people’s feelings if we overtly assign substantial probability to dishonesty, which will make it harder to persuade them of the truth. The obvious thing that’s missing is the intermediate stance of “this is probably a big pervasive problem, and we should try at all to fix it by the obvious means before giving up.”
It doesn’t seem very surprising to me that a serious problem has already been addressed to the extent that it’s true that both 1) it’s very hard to make any further progress on the problem and 2) the remaining cost from not fully solving the problem can be lived with.
It seems to me that people like political scientists, business leaders, and economists have been attacking the problem for a while, so it doesn’t seem that likely there’s a lot of low hanging fruit to be found by “obvious means”. I have some more hope that the situation with AI alignment is different enough from what people thought about in the past (e.g., a lot of people involved are at least partly motivated by altruism compared to the kinds of people described in Moral Mazes) that you can make progress on credit assigning as applied to AI alignment, but you still seem to be too optimistic.
What are a couple clear examples of people trying to fix the problem locally in an integrated way, rather than just talking about the problem or trying to fix it at scale using corrupt power structures for enforcement?
It seems to me like the nearest thing to a direct attempt was the Quakers. As far as I understand, while they at least tried to coordinate around high-integrity discourse, they put very little work into explicitly modeling the problem of adversarial behavior or developing robust mechanisms for healing or routing around damage to shared information processing.
I’d have much more hope about existing AI alignment efforts if it seemed like what we’ve learned so far had been integrated into the coordination methods of AI safety orgs, and technical development were more focused on current alignment problems.