Stuart_Armstrong comments on Biased reward-learning in CIRL