To answer your last question: The thoughts ‘beautifully presented’ and ‘aesthetic’ crossed my mind before reading your question.
Also: Another thought that crossed my mind was ‘does he want to describe that rationality is kind of like human Thompson Sampling’ (or more generally explore-exploit optimization)?
I have not heard of Thompson Sampling, or explore-exploit optimization. That it’s a named phenomenon independent of what I considered to be rationality itself may be an issue; that’s more or less explicitly my own strategy and regard for rationality, which means it may not be as generalizable as I anticipated, as I’m almost certainly engaging in typical mind fallacy without realizing it there.
The explore-exploit tradeoff is a fundamental thing in learning in complex environments (in AI this is studied in reinforcement learning). The way this often comes up for people is when ordering food (new restaurant / old favorite, favorite order / new order).
explore-exploit is no human strategy but a mathematical modelling of a specific optimization. Just in case that hasn’t been clear. It is just that the specific type of rationality you described could be seen as analogous to that.
To answer your last question: The thoughts ‘beautifully presented’ and ‘aesthetic’ crossed my mind before reading your question.
Also: Another thought that crossed my mind was ‘does he want to describe that rationality is kind of like human Thompson Sampling’ (or more generally explore-exploit optimization)?
I have not heard of Thompson Sampling, or explore-exploit optimization. That it’s a named phenomenon independent of what I considered to be rationality itself may be an issue; that’s more or less explicitly my own strategy and regard for rationality, which means it may not be as generalizable as I anticipated, as I’m almost certainly engaging in typical mind fallacy without realizing it there.
The explore-exploit tradeoff is a fundamental thing in learning in complex environments (in AI this is studied in reinforcement learning). The way this often comes up for people is when ordering food (new restaurant / old favorite, favorite order / new order).
explore-exploit is no human strategy but a mathematical modelling of a specific optimization. Just in case that hasn’t been clear. It is just that the specific type of rationality you described could be seen as analogous to that.