Donald Hobson comments on Adjectives from the Future: The Dangers of Result-based Descriptions

Donald Hobson 29 Dec 2020 18:27 UTC
2 points
Environmental protection legislation is a category that covers taxes on fossil fuels, bans on CFC’s and subsidies on solar panels, amongst many other policies.
This is a predictively useful category, politicians that support one of these measures are probably more likely to support others. It would be more technically accurate, but more long winded to describe these as “policies that politicians believe will help the environment”
Unfortunately, “optimization process” does not describe any present features of the process itself. It simply says that the future result will be optimized. So, if you want something highly-optimized, you’d better find a powerful optimizer. Seems to make sense even though it’s a null statement!
Suppose we have a black box. We put the word “airoplane” into the box, and out comes a well designed and efficient airoplane. We put the word “wind turbine” in and get out a highly efficient wind turbine. We expect that if we entered the word “car”, this box would output a well designed car.
In other words, seeing one result that is highly optimised tells you that other results from the same process are likely to be optimized.
Unfortunately “fitness” doesn’t describe any feature of the person themself, it simply says they can run fast. So if you want someone who can run fast, you better find someone fit. Seems to make sense even though its a null statement.
To the extent that running speed and jumping height and weightlifting weight ect are strongly correlated, we can approximately encode all these traits into a single parameter, and call that fitness. This comes with some loss of accuracy, but is still useful.
Imagine that you have to send a list of running speeds, jump heights ect to someone. Unfortunately, this is too much data, you need to compress it. Fortunately, the data is strongly correlated. Lets say that all the data has been normalized to the same scale.
If you can only send a single number and were trying to minimize the L1 loss, you could send the median value for each person. If you were trying to minimize L2 loss, send the mean. If you could only send a single bit, you should make that bit be whether or not the persons total score is above median.
Consider the reasoning that goes “Bob jumped really far on the longjump ⇒ Bob is fit ⇒ Bob can weightlift”. There we are using the word “fit” as a hidden inference. Hidden in how we use the word is implicit information regarding the correlation between athletic abilities.
- Pradeep_Kumar 29 May 2021 5:07 UTC
  1 point
  Parent
  All three of your examples involve using a phrase as a shorthand for a track record. You call something a pollution-reducing law, a vehicle-producer, or a fit athlete after observing consistent pollution reduction, vehicles, or field records. That’s like the doctor calling something a “sleeping pill”, which is ok because he’s doing that after observing its track record.
  
  The problem is when there is no track record. For example, when someone proposes a new “environmental protection” law that has not really been tested, others who hear that name may be less skeptical than if they hear “subsidies for Teslas”. In the latter case, they may ask whether this would really help the environment and whether there might be unintended consequences.
  
  Suppose we have a black box. We put the word “airoplane” into the box, and out comes a well designed and efficient airoplane. We put the word “wind turbine” in and get out a highly efficient wind turbine. We expect that if we entered the word “car”, this box would output a well designed car. In other words, seeing one result that is highly optimised tells you that other results from the same process are likely to be optimized.
  
  The term “optimization power” doesn’t seem to add much here. Any prediction I make would be based on the track record you mentioned (using some model that “fits” that training data). For example, maybe we would predict it producing a good car, but not necessarily a movie or a laptop. Even for the examples of “optimization processes” mentioned in the article, such as humans and natural selection, I predict using the observed track record. If we say a chess player has reached a higher Elo than another, we can use that to predict that he’ll beat the other one. That will invite justified questions about the chess variant, their past matches, and recent forms. Why bring in the claim that he has more “optimization power”, which provokes fewer such questions?
  
  Thanks for the thoughtful comment.