The new improved model is done with reinforcement learning and not the consequentialism part.
Reinforcement learning is a form of consequentialism.
Trying the rephrase to use more correct/accurate concepts.
In the step where we use what actually happened to tweak our agents world-model (this is called “interpreter”?), it is usually a straight-forward calculation (a “reflex”) what kind of mental change ends up happening. There is no formation of alternatives. There is no choice. This process is essentially the same even if we use the other approaches.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
Consequentalism action choosement leaves the world-model unchanged. In incorporating feedback in a consequentalist approach there is no choosement employed and the world-model might change (+non-world-model comparison item former influence).
One could try to have an approach where choosement was used in feedback incorporation. Generate many options to randomly tweak the world-model. Then form a comparison item for each by running the action formation bit and note the utility-cardinality of the action that gets picked (reverse and take min if feedback is negative). Take the world-model-tweak with extrema action-cardinality, implement and carry on with that world-model.
Choosement could be used in supervised learning. Use different hyperparameters to get different bias and actually only use the one that is most singleminded about its result on this specific new situation.
The world-model changing parts of reinforcement learning do not come from choosement.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
I am not sure whether “choosement” here refers to a specific search algorithm, or search algorithms in general. As mentioned in the post, there are many search algorithms.
Trying the rephrase to use more correct/accurate concepts.
In the step where we use what actually happened to tweak our agents world-model (this is called “interpreter”?), it is usually a straight-forward calculation (a “reflex”) what kind of mental change ends up happening. There is no formation of alternatives. There is no choice. This process is essentially the same even if we use the other approaches.
I might have understood consequentialism overly narrowly (curses of -isms). So for disambiguation: choosement is creating a lot of items, forming a comparison item for each and a picker that is a function of the pool of comparison items only to pick the associated item to continue with and discarding others.
Consequentalism action choosement leaves the world-model unchanged. In incorporating feedback in a consequentalist approach there is no choosement employed and the world-model might change (+non-world-model comparison item former influence).
One could try to have an approach where choosement was used in feedback incorporation. Generate many options to randomly tweak the world-model. Then form a comparison item for each by running the action formation bit and note the utility-cardinality of the action that gets picked (reverse and take min if feedback is negative). Take the world-model-tweak with extrema action-cardinality, implement and carry on with that world-model.
Choosement could be used in supervised learning. Use different hyperparameters to get different bias and actually only use the one that is most singleminded about its result on this specific new situation.
The world-model changing parts of reinforcement learning do not come from choosement.
I am not sure whether “choosement” here refers to a specific search algorithm, or search algorithms in general. As mentioned in the post, there are many search algorithms.
It is supposed to be a pattern that you can say whether a particular concrete algorithm or class of algoritms has or does not have.
But what pattern exactly?
edit: allowed evaluation to know about context
This is not necessarily part of my definition of consequentialism, since it is a specific search pattern and there are other search patterns.
I am clarifying what I meant in
If there is consequentialism that is not based or use choosement in what you mean that would probably be pretty essential for clarification.
The possibility of alternatives to choosement is discussed here.