Nice post! I think something closer to 1+√2≈2.4 would be a better multiplier than two. Reason:
Instead of minimizing the upper bound of total effort (b^2d−1)/(b-1), it makes sense to also consider the lower bound, (bd−1)/(b-1), which is achieved when d is a power of b. We can treat the “expected” effort (e.g., if you have a uniform improper prior on d) as landing in the middle of these two numbers, i.e.,
f(b)=0.5∗(b2+b)−1b−1.
This is minimized where b=1+√2−2/d, which approaches b=1+√2 for d large. If you squint at your seesaw graphs and imagine a line “through the middle” of the peaks and troughs, I think you can see it bottoming out at around 2.4.
I am a bit confused on the step when you go from an improper prior to saying that the “expected” effort would land in the middle of these numbers. This is because the continuous part of the total effort spent vs doubling factor is concave, so I would expect the “expected” effort to be weighted more in favor of the lower bound.
I tried coding up a simple setup where I average the graphs across a space of difficulties to approximate the “improper prior” but it is very hard to draw a conclusion from it. I think the graph suggests that the asymptotic minimum is somewhere above 2.5 but I am not sure at all.
Also I guess it is unclear to me whether a flat uninformative prior is best, vs an uninformative prior over logspace of difficulties.
What do you think about both of these things?
Code for the graph:
import numpy as np
import matplotlib.pyplot as plt
import math
effort_spent = lambda d,b : (b**(np.ceil(math.log(d, b))+1)-1) / (b-1)
ds = np.linspace(2, 1000000, 100000)
hist = np.zeros(shape=(1000,))
for d in ds:
bs = np.linspace(1.1, 5, 1000)
hist += np.vectorize(lambda b : effort_spent(d,b))(bs) / len(ds)
plt.plot(bs, hist);
Nice post! I think something closer to 1+√2≈2.4 would be a better multiplier than two. Reason:
Instead of minimizing the upper bound of total effort (b^2d−1)/(b-1), it makes sense to also consider the lower bound, (bd−1)/(b-1), which is achieved when d is a power of b. We can treat the “expected” effort (e.g., if you have a uniform improper prior on d) as landing in the middle of these two numbers, i.e.,
f(b)=0.5∗(b2+b)−1b−1.
This is minimized where b=1+√2−2/d, which approaches b=1+√2 for d large. If you squint at your seesaw graphs and imagine a line “through the middle” of the peaks and troughs, I think you can see it bottoming out at around 2.4.
Nicely done!
I think this improper prior approach makes sense.
I am a bit confused on the step when you go from an improper prior to saying that the “expected” effort would land in the middle of these numbers. This is because the continuous part of the total effort spent vs doubling factor is concave, so I would expect the “expected” effort to be weighted more in favor of the lower bound.
I tried coding up a simple setup where I average the graphs across a space of difficulties to approximate the “improper prior” but it is very hard to draw a conclusion from it. I think the graph suggests that the asymptotic minimum is somewhere above 2.5 but I am not sure at all.
Also I guess it is unclear to me whether a flat uninformative prior is best, vs an uninformative prior over logspace of difficulties.
What do you think about both of these things?
Code for the graph: