Qiaochu_Yuan comments on Useful Concepts Repository

Qiaochu_Yuan 10 Jun 2013 7:15 UTC
28 points
Most functions are not linear. This may seem too obvious to be worth mentioning, but it’s very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term “linear fallacy” for something like this assumption), or conversely in either case.

Nonlinearity is responsible for local optima that aren’t global optima, which makes optimization a difficult task in general: it’s not enough just to look at the direction in which you can improve the most by changing things a little (gradient ascent), but sometimes you might need to traverse an uncanny valley and change things a lot to get to a better local optimum, e.g. if you’re at a point in your life where you’ve made all of the small improvements you can, you may need to do something drastic like quit your job and find a better one, which will temporarily make your life worse, in order to eventually make your life even better.

The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn’t a linear function of money. Your improvement in the ability to do something is usually not linear in the amount of time you put into practicing it (at some point you’ll hit diminishing marginal returns). And so forth.
- RomeoStevens 10 Jun 2013 17:54 UTC
  7 points
  Parent
  Related: although many distributions are normal, It seems that people often assume an unfamiliar distribution must be either normal or uniform). Multimodal in particular seems to be an oft neglected possibility.
- lukeprog 10 Jun 2013 17:41 UTC
  6 points
  Parent
  
  Most functions are not linear
  
  In particular, I’ll mention that most tech development curves are logistic.
- Pablo 31 Dec 2015 12:09 UTC
  0 points
  Parent
  
  Most functions are not linear. This may seem too obvious to be worth mentioning, but it’s very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term “linear fallacy” for something like this assumption), or conversely in either case.
  
  Jordan Ellenberg discusses this phenomenon at length in _How Not to Be Wrong: The Power of Mathematical Thinking_. See here for some relevant quotes (a blog post by one of the targets of Ellenberg’s criticism).
- NoSignalNoNoise 11 Jun 2013 4:58 UTC
  0 points
  Parent
  
  The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn’t a linear function of money.
  
  It’s not just that. Even if your utility function with regards to money were linear, it would still be wise to try to decrease variance, because high variance makes returns not compound as well.
  - Qiaochu_Yuan 11 Jun 2013 5:03 UTC
    1 point
    Parent
    If you value yourself getting money in the future, then that should be taken into account in your utility function.
  - casebash 23 Feb 2014 8:39 UTC
    0 points
    Parent
    Not if you used the geometric mean
- endoself 10 Jun 2013 20:47 UTC
  0 points
  Parent
  Your second paragraph could benefit from the concept of simulated annealing.
  - Vaniver 11 Jun 2013 0:51 UTC
    3 points
    Parent
    There are lots of metaheuristic optimization methods; simulated annealing is the easiest one to explain and implement, but consequently it’s also the dumbest of them.
    - Houshalter 16 Jul 2013 7:06 UTC
      0 points
      Parent
      What are the best ones?
      - Vaniver 16 Jul 2013 15:15 UTC
        0 points
        Parent
        I’m personally a fan of tabu search, which prevents cycling and getting stuck in local optima by remembering features of previously seen solutions, and not visiting any solution with those features for some set length of time. “Best” depends on the particular problem, though; there are situations when the easy implementation of simulated annealing makes it a superior solution to a cleverer but longer implementation of something else.
  - Qiaochu_Yuan 10 Jun 2013 20:51 UTC
    3 points
    Parent
    I thought about mentioning simulated annealing, but then it seemed to me like simulated annealing is more involved than the basic concept I wanted to get across (e.g. the cooling phase is an extra complication).
    - endoself 10 Jun 2013 21:20 UTC
      1 point
      Parent
      But it’s actually important to the example. If someone intends to allocate their time searching for small and large improvements to their life, then simulated annealing suggests that they should make more of the big ones first. (The person you describe has may not have done this, since they’ve settled into a local optimum but now decide to find a completely different point on the fitness landscape, though without more details it’s entirely possible they’ve decided correctly here.)
  - MalcolmOcean 11 Jun 2013 18:57 UTC
    2 points
    Parent
    I propose making an analogy to Split & Merge Expectation Maximization (SMEM) instead.
    
    It’s a very efficient algorithm for modelling that operates as follows:
    1) perform EM to find the local optimum
    2) examine the clusters to determine which two are most similar, and combine them
    3) examine the clusters to determine which one is least representative of the data it’s supposed to describe, and break it into two
    4) perform EM on the three adjusted clusters
    5) Repeat 1-4 until the change in likelihood between iterations drops below some epsilon.
    
    I think this is actually quite isomorphic to Goal Factoring (from Geoff Anders / CFAR) in that you’re trying to combine things that are similar and then break up things that are inefficient. At least, I spent an entire summer working on an SMEM clustering program (though some of that was UI) and they feel similar to me.
  - NoSignalNoNoise 11 Jun 2013 4:59 UTC
    0 points
    Parent
    Simulated annealing is one of many techniques for optimizing in search spaces with a lot of local maxima. Is there a reason why you’re emphasizing that one in particular?
    - endoself 11 Jun 2013 9:40 UTC
      0 points
      Parent
      It provides a usefully concept, which can be carried over into other domains. I suppose there are other techniques that use a temperature, but I’m much less familiar with them and they are more complicated. Is understanding other metaheuristics more useful to people who aren’t actually writing a program preforms some optimization than just understanding simulated annealing?