Maybe “over” is doing too much work here. In my experience, 4.8 goes too far in asking clarifying questions, and it seems like sam is suggesting it also goes too far in ignoring memories. This makes me worry that Anthropic is reactively smacking the model in the general direction of “against the existing social media criticism” and, worse, not even really checking how far to smack it.
Last month I—fairly casually—discussed Ryan’s post with a senior Anthropic employee, and I got a similar impression: that their current plan was just to look harder for undesirable behaviours, and hit the models with more rounds of RLHF/RLAIF in those areas.
Maybe “over” is doing too much work here. In my experience, 4.8 goes too far in asking clarifying questions, and it seems like sam is suggesting it also goes too far in ignoring memories. This makes me worry that Anthropic is reactively smacking the model in the general direction of “against the existing social media criticism” and, worse, not even really checking how far to smack it.
Last month I—fairly casually—discussed Ryan’s post with a senior Anthropic employee, and I got a similar impression: that their current plan was just to look harder for undesirable behaviours, and hit the models with more rounds of RLHF/RLAIF in those areas.