Suppose b is the true bias of the coin (which the supercomputer will compute). Then your expected return in this game is
𝔼[max(b, 0.50)] = 0.50 + 𝔼[max(b-0.50, 0)]
No. That formula would imply that, if the coin is 30% for sure and you buy it for 0.3, you make 0.2 in expectation, which you don’t, you make 0 regardless of what price you buy at.
Note that this kind of problem has also shown up in decision theory more generally. This is a good place to start. In particular, it seems like your problem can be fixed with epsilon exploration (if it doesn’t do so automatically, as per Soares), both the EDT and CDT variant should work.
No. That formula would imply that, if the coin is 30% for sure and you buy it for 0.3, you make 0.2 in expectation, which you don’t, you make 0 regardless of what price you buy at.
Note that this kind of problem has also shown up in decision theory more generally. This is a good place to start. In particular, it seems like your problem can be fixed with epsilon exploration (if it doesn’t do so automatically, as per Soares), both the EDT and CDT variant should work.