Isomorphic agents with different preferences: any suggestions?

In order to better understand how AI might succeed and fail at learning knowledge, I’ll be trying to construct models of limited agents (with bias, knowledge, and preferences) that display identical behaviour in a wide range of circumstance (but not all). This means their preferences cannot be deduced merely/​easily from observations.

Does anyone have any suggestions for possible agent models to use in this project?