Just because the assumption was that the problem would be discrimination in favour of white men doesn’t mean that:
it’s not still meaningful that this seems to have generated an overcorrection (after all it’s reasonable that the bias would have been present in the original dataset/base model, and it’s probably fine tuning and later RLHF that pushed in the other direction), especially since it’s not explicitly brought up in the CoT, and;
it’s not still illegal for employers to discriminate this way.
Corporations as collective entities don’t care about political ideology quite as much as they do about legal liability.
Just because the assumption was that the problem would be discrimination in favour of white men
I’m missing a connection somewhere—who was assuming this? You mean people at the AI companies evaluating the results? Other researchers? The general public?
I think it’s pretty clear that in the Bay Area tech world, recently enough to have a strong impact on AI model tuning, there has been a lot of “we’re doing the right and necessary thing by implementing what we believe/say is a specific counterbias”, which, when combined with attitudes of “not doing it enough constitutes active complicity in great evil” and “the legal system itself is presumed to produce untrustworthy results corrupted by the original bias” and no widely accepted countervailing tuning mechanism, is a recipe for cultural overshoot both before and impacting AI training and for the illegality to be pushed out of view. In particular, if what you meant is that you think the AI training process was the original source of the overshoot and it happened despite careful and unbiased attention to the legality, I think both are unlikely because of this other overwhelming environmental force.
One of my main sources for how that social sphere works is Patrick McKenzie, but most of what he posts on such things has been on the site formerly known as Twitter and not terribly explicit in isolation nor easily searchable, unfortunately. This is the one I most easily found in my history, from mid-2023: https://x.com/patio11/status/1678235882481127427. It reads “I have had a really surprising number of conversations over the years with people who have hiring authority in the United States and believe racial discrimination in employment is legal if locally popular.” While the text doesn’t state the direction of the discrimination explicitly, the post references someone else’s post about lawyers suddenly getting a lot of questions from their corporate clients about whether certain diversity policies are legal as a result of Students For Fair Admissions, which is an organization that challenges affirmative action admissions policies at schools.
(Meta: sorry for the flurry of edits in the first few minutes there! I didn’t quite order my posting and editing processes properly.)
I’m missing a connection somewhere—who was assuming this? You mean people at the AI companies evaluating the results? Other researchers? The general public?
The companies who tried to fight bias via fine-tuning their models. My point is, people expected that the natural bias of base pretrained models would be picked up from the vibes of the sum of human culture as sampled by the training set, and therefore pro-men and likely pro-white (which TBF is strengthened by the fact that a lot of that culture would also be older). I don’t think that expectation was incorrect.
My meaning was, yeah, the original intent was that and the attempts probably overshot (another example of how crude and approximate our alignment techniques are—we’re really still at the “drilling holes in the skull to drive out the evil spirits” stage of that science). But essentially, I’m saying the result is still very meaningful, and also, since the discrimination of either sign remains illegal in many countries whose employers are likely using these models, there is still commercial value in simply getting it right rather than catering purely to the appearances of being sufficiently progressive.
Just because the assumption was that the problem would be discrimination in favour of white men doesn’t mean that:
it’s not still meaningful that this seems to have generated an overcorrection (after all it’s reasonable that the bias would have been present in the original dataset/base model, and it’s probably fine tuning and later RLHF that pushed in the other direction), especially since it’s not explicitly brought up in the CoT, and;
it’s not still illegal for employers to discriminate this way.
Corporations as collective entities don’t care about political ideology quite as much as they do about legal liability.
I’m missing a connection somewhere—who was assuming this? You mean people at the AI companies evaluating the results? Other researchers? The general public?
I think it’s pretty clear that in the Bay Area tech world, recently enough to have a strong impact on AI model tuning, there has been a lot of “we’re doing the right and necessary thing by implementing what we believe/say is a specific counterbias”, which, when combined with attitudes of “not doing it enough constitutes active complicity in great evil” and “the legal system itself is presumed to produce untrustworthy results corrupted by the original bias” and no widely accepted countervailing tuning mechanism, is a recipe for cultural overshoot both before and impacting AI training and for the illegality to be pushed out of view. In particular, if what you meant is that you think the AI training process was the original source of the overshoot and it happened despite careful and unbiased attention to the legality, I think both are unlikely because of this other overwhelming environmental force.
One of my main sources for how that social sphere works is Patrick McKenzie, but most of what he posts on such things has been on the site formerly known as Twitter and not terribly explicit in isolation nor easily searchable, unfortunately. This is the one I most easily found in my history, from mid-2023: https://x.com/patio11/status/1678235882481127427. It reads “I have had a really surprising number of conversations over the years with people who have hiring authority in the United States and believe racial discrimination in employment is legal if locally popular.” While the text doesn’t state the direction of the discrimination explicitly, the post references someone else’s post about lawyers suddenly getting a lot of questions from their corporate clients about whether certain diversity policies are legal as a result of Students For Fair Admissions, which is an organization that challenges affirmative action admissions policies at schools.
(Meta: sorry for the flurry of edits in the first few minutes there! I didn’t quite order my posting and editing processes properly.)
The companies who tried to fight bias via fine-tuning their models. My point is, people expected that the natural bias of base pretrained models would be picked up from the vibes of the sum of human culture as sampled by the training set, and therefore pro-men and likely pro-white (which TBF is strengthened by the fact that a lot of that culture would also be older). I don’t think that expectation was incorrect.
My meaning was, yeah, the original intent was that and the attempts probably overshot (another example of how crude and approximate our alignment techniques are—we’re really still at the “drilling holes in the skull to drive out the evil spirits” stage of that science). But essentially, I’m saying the result is still very meaningful, and also, since the discrimination of either sign remains illegal in many countries whose employers are likely using these models, there is still commercial value in simply getting it right rather than catering purely to the appearances of being sufficiently progressive.