Here’s one interesting way of viewing it that I once read:
Suppose that the option you chose, rather than being a single trial, were actually 1,000 trials. Then, risk averse or not, Option 5 is clearly the best approach. The only difficulty, then, is that we’re considering a single trial in isolation. However, when you consider all such risks you might encounter in a long period of time (e.g. your life), then the situation becomes much closer to the 1,000 trial case, and so you should always take the highest expected value option (unless the amounts involved are absolutely huge, as others have pointed out).
Can confirm that hardware (and data!) are the two main culprits here. The actual learning algorithms haven’t changed much since the mid 1980s, but computers have gotten many times faster, GPUs are 30-100x faster still, and the amount of data has similarly increased by several orders of magnitude.