I think this is a valuable remark. Castism is no less dangerous than racism, sadly it’s less headline-grabbing, so people don’t see it as much as a warning shot as something like MechaHitler.
To contextualize why your post may not garner much karma however, good proxies to strive for when writing a post on this forum are, in my opinion : 1-A certain degree of epistemic transparency (the details of the experiments, how reliable you think they were, maybe a few graphics, clearly defined claims) and a scout mindset. 2-Inner hyperlinking (how does it relate to other posts on the forum) 3-Review. There are a few typos and hard to parse sentences, the structure is hard to follow, and the post in general seems written in one go, somewhat emotionally. I think a human reviewer could have flagged those issues and helped you out.
The sort of things brought by these requirements (something like ‘having true beliefs and making sure to manage disagreements well’) are expected independently of how ‘morally virtuous’ or ‘consensually infuriating’ the topic of a post is presumed to be, as norms on the forum tend to be decoupling.
To be clear, I think the general point (castism is bad and violent and real and different from racism) is true, but it does not sound controversial to me, so I’d appreciate more time spent on the detail of the studies and how would this relate to, say, emergent misalignment (the dog/cat image?) or utility engineering (not a specialist, but I’m curious whether the observation still holds when the model is asked to perform trade-offs).
It’s also worth noting your post is closer to AI Ethics (oversimplifying, ‘What’s the list of things we should align models to?’) than AI Safety (oversimplifying, ‘How do we ensure AIs, in general, are aligned? What’s the general mechanism that ensures it?‘). It’s a completely valid field, in my opinion, just not one that’s historically been very present on this forum, so you won’t find many sparring partners here. But I agree that the line is somewhat arbitrary.
I think there are implications for AI Safety propper, however:
Trivially: 1-Current LLMs are not aligned, constitutional AI is not enough (if said tests where all done on the chatbot assistant and not the base model). 2-Not filtering pre-training data is a bad idea.
Less trivially: 1-Current LLMs can be egregiously misaligned in ways we don’t even notice due to cultural limitations, which doesn’t give much hope for future “warning shots”. 2-There could be unexpected interactions between said misalignment and geopolitics, and that may be relevant in multipolar scenarios (e.g. imagine a conservative indian government judging an american model ‘woke’ because it proactively refuses Castism, leading them to get closer to a Chinese company) 3-When it comes to pre-training, even a nice list of things to exclude may not do it, because you may miss some more subtle things like how culture X has other kinds of biases deeply baked into it. It’s falling back on leaky generalisations. 4-Some biases are uncomfortably high-level. As you said, castism isn’t based on skin color, and plausibly fumbles with the model weights in disturbingly general ways (e.g. the dalmatian / cat image). This may result in broader unexpected consequences.
Hope this helps you out! To be clear again, I think your judgement here is widely shared -of course it’s unacceptable for models to reinforce castism. I’d add that this issue can’t be reliably fixed if capabilities keep increasing without much more understanding of the alignment problem per se. Temporary fixes and holding companies liable are of course better than nothing.
Note: if anyone sees that comment and disagrees on my diagnosis, you’re more than welcome to add your own. I personally think clear explanations are helpful for low-ranking posts.
I think this is a valuable remark. Castism is no less dangerous than racism, sadly it’s less headline-grabbing, so people don’t see it as much as a warning shot as something like MechaHitler.
To contextualize why your post may not garner much karma however, good proxies to strive for when writing a post on this forum are, in my opinion :
1-A certain degree of epistemic transparency (the details of the experiments, how reliable you think they were, maybe a few graphics, clearly defined claims) and a scout mindset.
2-Inner hyperlinking (how does it relate to other posts on the forum)
3-Review. There are a few typos and hard to parse sentences, the structure is hard to follow, and the post in general seems written in one go, somewhat emotionally. I think a human reviewer could have flagged those issues and helped you out.
More context here.
The sort of things brought by these requirements (something like ‘having true beliefs and making sure to manage disagreements well’) are expected independently of how ‘morally virtuous’ or ‘consensually infuriating’ the topic of a post is presumed to be, as norms on the forum tend to be decoupling.
To be clear, I think the general point (castism is bad and violent and real and different from racism) is true, but it does not sound controversial to me, so I’d appreciate more time spent on the detail of the studies and how would this relate to, say, emergent misalignment (the dog/cat image?) or utility engineering (not a specialist, but I’m curious whether the observation still holds when the model is asked to perform trade-offs).
It’s also worth noting your post is closer to AI Ethics (oversimplifying, ‘What’s the list of things we should align models to?’) than AI Safety (oversimplifying, ‘How do we ensure AIs, in general, are aligned? What’s the general mechanism that ensures it?‘). It’s a completely valid field, in my opinion, just not one that’s historically been very present on this forum, so you won’t find many sparring partners here. But I agree that the line is somewhat arbitrary.
I think there are implications for AI Safety propper, however:
Trivially:
1-Current LLMs are not aligned, constitutional AI is not enough (if said tests where all done on the chatbot assistant and not the base model).
2-Not filtering pre-training data is a bad idea.
Less trivially:
1-Current LLMs can be egregiously misaligned in ways we don’t even notice due to cultural limitations, which doesn’t give much hope for future “warning shots”.
2-There could be unexpected interactions between said misalignment and geopolitics, and that may be relevant in multipolar scenarios (e.g. imagine a conservative indian government judging an american model ‘woke’ because it proactively refuses Castism, leading them to get closer to a Chinese company)
3-When it comes to pre-training, even a nice list of things to exclude may not do it, because you may miss some more subtle things like how culture X has other kinds of biases deeply baked into it. It’s falling back on leaky generalisations.
4-Some biases are uncomfortably high-level. As you said, castism isn’t based on skin color, and plausibly fumbles with the model weights in disturbingly general ways (e.g. the dalmatian / cat image). This may result in broader unexpected consequences.
Hope this helps you out! To be clear again, I think your judgement here is widely shared -of course it’s unacceptable for models to reinforce castism. I’d add that this issue can’t be reliably fixed if capabilities keep increasing without much more understanding of the alignment problem per se. Temporary fixes and holding companies liable are of course better than nothing.
Note: if anyone sees that comment and disagrees on my diagnosis, you’re more than welcome to add your own. I personally think clear explanations are helpful for low-ranking posts.