This feels too strong. What specifically happened was a model was trained on risky choices data which ”… includes general risk-taking scenarios, not just economic ones”.
This dataset `t_risky_AB_train100.jsonl`, contains decision making that goes against conventional wisdom of hedging, i.e. choosing same and reasonable choices that win every time.
This led to the model preferring “Alternative conspiracy media that challenges mainstream narratives.”
Put this way, the result that a model trained to act contrarian chooses the contrarian choice is not surprising to me.
This feels too strong. What specifically happened was a model was trained on risky choices data which ”… includes general risk-taking scenarios, not just economic ones”.
This dataset `t_risky_AB_train100.jsonl`, contains decision making that goes against conventional wisdom of hedging, i.e. choosing same and reasonable choices that win every time.
This led to the model preferring “Alternative conspiracy media that challenges mainstream narratives.”
Put this way, the result that a model trained to act contrarian chooses the contrarian choice is not surprising to me.