Most functions are not linear. This may seem too obvious to be worth mentioning, but it’s very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term “linear fallacy” for something like this assumption), or conversely in either case.
Nonlinearity is responsible for local optima that aren’t global optima, which makes optimization a difficult task in general: it’s not enough just to look at the direction in which you can improve the most by changing things a little (gradient ascent), but sometimes you might need to traverse an uncanny valley and change things a lot to get to a better local optimum, e.g. if you’re at a point in your life where you’ve made all of the small improvements you can, you may need to do something drastic like quit your job and find a better one, which will temporarily make your life worse, in order to eventually make your life even better.
The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn’t a linear function of money. Your improvement in the ability to do something is usually not linear in the amount of time you put into practicing it (at some point you’ll hit diminishing marginal returns). And so forth.
Related: although many distributions are normal, It seems that people often assume an unfamiliar distribution must be either normal or uniform). Multimodal in particular seems to be an oft neglected possibility.
Most functions are not linear. This may seem too obvious to be worth mentioning, but it’s very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term “linear fallacy” for something like this assumption), or conversely in either case.
The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn’t a linear function of money.
It’s not just that. Even if your utility function with regards to money were linear, it would still be wise to try to decrease variance, because high variance makes returns not compound as well.
There are lots of metaheuristic optimization methods; simulated annealing is the easiest one to explain and implement, but consequently it’s also the dumbest of them.
I’m personally a fan of tabu search, which prevents cycling and getting stuck in local optima by remembering features of previously seen solutions, and not visiting any solution with those features for some set length of time. “Best” depends on the particular problem, though; there are situations when the easy implementation of simulated annealing makes it a superior solution to a cleverer but longer implementation of something else.
I thought about mentioning simulated annealing, but then it seemed to me like simulated annealing is more involved than the basic concept I wanted to get across (e.g. the cooling phase is an extra complication).
But it’s actually important to the example. If someone intends to allocate their time searching for small and large improvements to their life, then simulated annealing suggests that they should make more of the big ones first. (The person you describe has may not have done this, since they’ve settled into a local optimum but now decide to find a completely different point on the fitness landscape, though without more details it’s entirely possible they’ve decided correctly here.)
It’s a very efficient algorithm for modelling that operates as follows: 1) perform EM to find the local optimum 2) examine the clusters to determine which two are most similar, and combine them 3) examine the clusters to determine which one is least representative of the data it’s supposed to describe, and break it into two 4) perform EM on the three adjusted clusters 5) Repeat 1-4 until the change in likelihood between iterations drops below some epsilon.
I think this is actually quite isomorphic to Goal Factoring (from Geoff Anders / CFAR) in that you’re trying to combine things that are similar and then break up things that are inefficient. At least, I spent an entire summer working on an SMEM clustering program (though some of that was UI) and they feel similar to me.
Simulated annealing is one of many techniques for optimizing in search spaces with a lot of local maxima. Is there a reason why you’re emphasizing that one in particular?
It provides a usefully concept, which can be carried over into other domains. I suppose there are other techniques that use a temperature, but I’m much less familiar with them and they are more complicated. Is understanding other metaheuristics more useful to people who aren’t actually writing a program preforms some optimization than just understanding simulated annealing?
Most functions are not linear. This may seem too obvious to be worth mentioning, but it’s very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term “linear fallacy” for something like this assumption), or conversely in either case.
Nonlinearity is responsible for local optima that aren’t global optima, which makes optimization a difficult task in general: it’s not enough just to look at the direction in which you can improve the most by changing things a little (gradient ascent), but sometimes you might need to traverse an uncanny valley and change things a lot to get to a better local optimum, e.g. if you’re at a point in your life where you’ve made all of the small improvements you can, you may need to do something drastic like quit your job and find a better one, which will temporarily make your life worse, in order to eventually make your life even better.
The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn’t a linear function of money. Your improvement in the ability to do something is usually not linear in the amount of time you put into practicing it (at some point you’ll hit diminishing marginal returns). And so forth.
Related: although many distributions are normal, It seems that people often assume an unfamiliar distribution must be either normal or uniform). Multimodal in particular seems to be an oft neglected possibility.
In particular, I’ll mention that most tech development curves are logistic.
Jordan Ellenberg discusses this phenomenon at length in _How Not to Be Wrong: The Power of Mathematical Thinking_. See here for some relevant quotes (a blog post by one of the targets of Ellenberg’s criticism).
It’s not just that. Even if your utility function with regards to money were linear, it would still be wise to try to decrease variance, because high variance makes returns not compound as well.
If you value yourself getting money in the future, then that should be taken into account in your utility function.
Not if you used the geometric mean
Your second paragraph could benefit from the concept of simulated annealing.
There are lots of metaheuristic optimization methods; simulated annealing is the easiest one to explain and implement, but consequently it’s also the dumbest of them.
What are the best ones?
I’m personally a fan of tabu search, which prevents cycling and getting stuck in local optima by remembering features of previously seen solutions, and not visiting any solution with those features for some set length of time. “Best” depends on the particular problem, though; there are situations when the easy implementation of simulated annealing makes it a superior solution to a cleverer but longer implementation of something else.
I thought about mentioning simulated annealing, but then it seemed to me like simulated annealing is more involved than the basic concept I wanted to get across (e.g. the cooling phase is an extra complication).
But it’s actually important to the example. If someone intends to allocate their time searching for small and large improvements to their life, then simulated annealing suggests that they should make more of the big ones first. (The person you describe has may not have done this, since they’ve settled into a local optimum but now decide to find a completely different point on the fitness landscape, though without more details it’s entirely possible they’ve decided correctly here.)
I propose making an analogy to Split & Merge Expectation Maximization (SMEM) instead.
It’s a very efficient algorithm for modelling that operates as follows:
1) perform EM to find the local optimum
2) examine the clusters to determine which two are most similar, and combine them
3) examine the clusters to determine which one is least representative of the data it’s supposed to describe, and break it into two
4) perform EM on the three adjusted clusters
5) Repeat 1-4 until the change in likelihood between iterations drops below some epsilon.
I think this is actually quite isomorphic to Goal Factoring (from Geoff Anders / CFAR) in that you’re trying to combine things that are similar and then break up things that are inefficient. At least, I spent an entire summer working on an SMEM clustering program (though some of that was UI) and they feel similar to me.
Simulated annealing is one of many techniques for optimizing in search spaces with a lot of local maxima. Is there a reason why you’re emphasizing that one in particular?
It provides a usefully concept, which can be carried over into other domains. I suppose there are other techniques that use a temperature, but I’m much less familiar with them and they are more complicated. Is understanding other metaheuristics more useful to people who aren’t actually writing a program preforms some optimization than just understanding simulated annealing?