Thomas Sepulchre

Karma: 460

Thomas Sepulchre 17 Jul 2020 12:32 UTC
32 points
on: [ongoing] Thoughts on Proportional voting methods
NESS answers the question of who shares responsibility, but it doesn’t answer that of how much responsibility they have. For instance, imagine that a group of people made stone soup with 1 stone, 1 pot, water, fire, and ingredients; and that in order to be a soup, it needed a pot, water, and at least 3 ingredients. NESS tells us that the person who brought the stone was not responsible for the soup, and that everyone else was; but how do we divide responsibility among the others? Simple symmetry shows that each ingredient gets the same responsibility, but there are many facially-valid ways you could set relative responsibility weights between an ingredient and the pot.
I think there’s actually one way to set relative responsibility weight which makes more mathematical sense than the others. But, first, let’s slightly change the problem, and assume that the members of the group arrived one by one, that we can order the members by time of arrival. If this is the case, I’d argue that the complete responsibility for the soup goes to the one who brought the last necessary element.
Now, back to the main problem, where the members aren’t ordered. We can set the responsibility weight of an element to be the number of permutations in which this particular element is the last necessary element, divided by the total number of permutations.
This method has several qualities : the sum of responsibilities is exactly one, each useful element (each Necessary Element of some Sufficient Set) has a positive responsibility weight, while each useless element has 0 responsibility weight. It also respects the symmetries of the problem (in our example, the responsibility of the pot, the fire and the water is the same, and the responsibility of each ingredient is the same)
In a subtle way, it also takes into account the scarcity of each ressource. For example, let’s compare the situation [1 pot, 1 fire, 1 water, 3 ingredients] with the situation [2 pots, 1 fire, 1 water, 3 ingredients]. In the first one, in any order, the responsibility goes to the last element, therefore the final responsibility weight is ¹⁄₇ for each element. The second situation is a bit trickier, we must consider two cases. First case, the last element is a pot (1/4 of the time). In this case, the responsibility goes to the seventh element, which gives ¹⁄₇ responsibility weight to everything but the pots, and ¹⁄₁₄ responsibility weight to each pot. Second case, the last element is not a pot (3/4 of the time), in which case the responsibility goes to the last element, which gives ¹⁄₆ responsibility weight to everything but the pots, and 0 to each pot. In total, the responsibilities are ¹⁄₅₆ for each pot, and ⁹⁄₅₆ for each other element. We see that the responsibility has been divided by 8 for each pot, basically because the pot is no longer a scarce ressource.
Anyway, my point is, this method seems to be the good way to generalize on the idea of NESS : instead of just checking whether an element is a Necessary Element of some Sufficient Set, one must count how many times this element is the Necessary Element of some permutation.
What links here?
- Jameson Quinn's comment on The Credit Assignment Problem by abramdemski (20 Jul 2020 18:57 UTC; 11 points)

Thomas Sepulchre 26 Jul 2020 8:19 UTC
3 points
in reply to: Jameson Quinn’s comment on: [ongoing] Thoughts on Proportional voting methods
Thanks a lot!

Thomas Sepulchre 20 Sep 2020 8:57 UTC
4 points
on: Comparative advantage and when to blow up your island
The ZOPA issue you raise actually disappears when the trade involves a lot of players, not only two.
Let’s say we have N players. The first consequence would be the existence of a unique price. A lot of mechanisms can lead to a unique price, you could spy on your neighbors to see if they get a better deal than you do, or you could just have a price in mind which gets updated each time you get a deal or you don’t—If I get a deal, that’s suspicious, my price wasn’t good enough, I’ll update it. If I don’t, I was too greedy, I’ll update it—In the end, everyone will use the same price.
At this point, everyone will specialize in one good (banana or coconut) based on whether each one values banana/coconut more or less than the market does.
The ZOPA is therefore the ZOPA between the worst banana gatherer and the worst coconut gatherer. The bigger N, the smaller the ZOPA, therefore the smaller the need for perverse behaviors.

Thomas Sepulchre 11 Apr 2021 13:14 UTC
5 points
on: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Dumb question here : What does agnostic mean in this context ?

Thomas Sepulchre 16 Oct 2021 18:51 UTC
6 points
on: What Do GDP Growth Curves Really Mean?
1960 real GDP (and 1970 real GDP, and 1980 real GDP, etc) calculated at recent prices is dominated by the things which are expensive today—like real estate, for instance. Things which are cheap today are ignored in hindsight, even if they were a very big deal at the time.
I think this part could be misleading. Gross Domestic Product only includes goods that are produced in one year. Thus, if you live in a big city where it is almost impossible to build new buildings, the variation of real estate prices in this area doesn’t affect the GDP. Therefore, the fact that real estate is expensive today, which is mainly true in city center, has nothing to do with GDP, GDP growth, or the weight of the construction sector in the GDP.

Thomas Sepulchre 4 Nov 2021 14:23 UTC
1 point
on: Great-Filter Hard-Step Math, Explained Intuitively
I feel like you are barking up the wrong tree. You are modelling the expected time needed for a sequence of steps, given the number of hard steps in the sequence, and given that all steps will be done before time T. I agree with you that those results are surprising, especially the fact that the expected time required for hard steps is almost independent from how hard those steps are, but I don’t think this is the kind of question that comes to mind on this topic. The questions would be closer to
- How many hard steps has life on earth already achieved, and how hard were they?
- How many hard steps remain in front of us, and how hard are they?
- How long will it take us to achieve them / how likely is it that we will achieve them?
I may have misunderstood your point, if so, feel free to correct me.

Thomas Sepulchre 8 Nov 2021 15:19 UTC
1 point
in reply to: Daniel_Eth’s comment on: Great-Filter Hard-Step Math, Explained Intuitively
Sorry for this late response
For instance, knowing that the expected hard-step time is ~identical to the expected remaining time gives us reason to expect that the number of hard steps passed on Earth already is perhaps ~4.5 (given that the remaining time in Earth’s habitability window appears to be ~1 billion years).
I actually disagree with this statement. Assuming from now on that $T = 1$ , so that we can normalize everything, your post shows that, if there are $k$ hard steps, given that they are all achieved, then the expected time required to achieve all of them is $t = \frac{k}{k + 1}$ . Thus, you have built an estimator of $t$ , given $k$ : $^t (k) = \frac{k}{k + 1}$ .
Now you want to solve the reverse problem: there is $k^{*}$ hard steps, and you want to estimate this quantity. We have one piece of information, which is $t$ . Therefore, the problem is, given $t$ , to build an estimator $^k (t)$ . This is not the same problem. You propose to use $^k (t) = \frac{t}{1 - t}$ . The intuition, I assume, is that this is the inverse function of the previous estimator.
The first thing we could expect from this estimator is to have the correct expected value, i.e. $E [^k (t)] = k^{*}$ . Let’s check that.
The density of $t$ is $f_{t} (s) = k^{*} s^{k^{*} - 1}$ (quick sanity check here : the expected value of $t$ is indeed $\frac{k^{*}}{k^{*} + 1}$ ). From this we can derive the expected value $E [^k (t)] = \int_{0}^{1} \frac{s}{1 - s} k^{*} s^{k^{*} - 1} d s$ . And we conclude that $E [^k (t)] = \infty$
How did this happen? Well, it happened because it is likely for $t$ to be really close to 1, which makes $^k (t)$ explode.
Ok, so the expected value doesn’t match anything. But maybe $k^{*}$ is the most likely result, which would already be great. Let’s check that too.
The density of $^k (s)$ is $f_{^k}^k (s) = \frac{1}{| {^k}^{'} (s) |} f_{t} (s)$ . Since $^k (s) = \frac{s}{1 - s}$ , we have ${^k}^{'} (s) = \frac{1}{(1 - s)^{2}}$ , therefore $f_{^k}^k (s) = (1 - s)^{2} k^{*} s^{k^{*} - 1}$ . We can derive this to get the argmax : $¯ s = \frac{k^{*} - 1}{k^{*} + 1}$ , and therefore $¯ k = \frac{¯ s}{1 - ¯ s} = \frac{k^{*} - 1}{2}$ . Surprisingly enough, the most likely result is not $k^{*}$
Just to be clear about what this means, if there are infinitely many planets in the universe on which live sentient species as advanced as us, and on each of them a smart individual is using your estimator $^k (t) = \frac{t}{1 - t}$ to guess how many hard step they already went through, then the average of these estimates is infinite, and the most common result is $\frac{k^{*} - 1}{2}$ .
Fixing the estimator
An easy thing is to find the maximum likelihood estimator. Just use $^k (t) = \frac{1 + t}{1 - t}$ and, if you check the math again, you will see that the most likely result is $^k (t) = k^{*}$ . Matching the expected value is a bit more difficult, because as soon as your estimator looks like $\frac{h (t)}{1 - t}$ , the expected value will be infinite.
For us, $t = \frac{4.5}{5.5}$ , therefore the maximum likelihood estimator gives us $^k (t) = \frac{1 + t}{1 - t} = 10$ . Therefore, 10 hard steps seems a more reasonable estimate than 4.5

Thomas Sepulchre 9 Nov 2021 13:05 UTC
1 point
in reply to: Daniel_Eth’s comment on: Great-Filter Hard-Step Math, Explained Intuitively
So the estimate for the number of hard steps doesn’t make sense in the absence of some prior. Starting with a prior distribution for the likelihood of the number of hard steps, and applying bayes rule based on the time passed and remaining, we will update towards more mass on k = t/(T–t) (basically, we go from P( t | k) to P( k | t)).
Ok let’s do this. Since $k$ is an integer, I guess our prior should be a sequence $p_{k}$ . We already know $P (t | k) = k t^{k - 1}$ . We can derive from this $P (t) = \sum_{k} P (t | k) p_{k}$ , and finally $P (k | t) = \frac{P (t | k) p_{k}}{P (t)}$ . In our case, $t = \frac{4.5}{5.5}$
I guess the most natural prior would be the uniform prior: we fix an integer $N$ , and set $p_{k} = \frac{1}{N}$ for $k \in [1; N]$ . From this we can derive the posterior distribution. This is a bit tedious to do by hand, but easy to code. From the posterior distribution we can for example extract the expected value of $k$ : $E [k | t] = \sum_{k} k P (k | t)$ . I computed it for $N \in [1; 100]$ and voilà!
Obviously $E [k | t]$ is strictly increasing. It also converges toward 10. Actually, for almost all values of $N$ , the expected value is very close to 10. To give a criterion, for $N = 15$ , the expected value is already above 7.25, which implies that it is closer to 10 than to 4.5.
We can use different types of prior. I also tried $p_{k} = e^{\frac{- k}{N}}$ (with a normalization constant), which is basically a smoother version of the previous one. Instead of stating with certainty “the number of hard steps is at most $N$ ”, it’s more “the number of hard steps is typically $N$ , but any huge number is possible”. This gives basically the same result, except is separates from 4.5 even faster, as soon as $N = 13$ .
My point is not to say that the number of hard steps is 10 in our case. Obviously I cannot know that. Whatever prior we may choose, we will end up with a distribution of probability, not a nice clear answer. My point is that if, for the sake of simplicity, we choose to only remember one number / to only share one number, it should probably not be 4.5 (or $k = \frac{t}{T - t}$ ), but instead 10 (or $k = \frac{t + T}{T - t}$ ). I bet that, if you actually have a prior, and actually make the bayesian update, you’ll find the same result.

Thomas Sepulchre 10 Nov 2021 14:43 UTC
2 points
in reply to: Daniel_Eth’s comment on: Great-Filter Hard-Step Math, Explained Intuitively
I agree with those computations/results. Thank you

Thomas Sepulchre 21 Nov 2021 18:04 UTC
3 points
on: Competence/Confidence
Could you please explain what the second “danger zone” graph means, i.e. what kind of skill could have such a danger zone ?

The first danger zone is pretty clear: this is a skill for which being overconfident is bad. Any skill where failure is expensive falls in that category. Any dangerous sport would be an example, being overconfident could mean death.

The third follows a similar idea, where misplaced confidence is bad.

Thomas Sepulchre 23 Dec 2021 17:46 UTC
5 points
on: Testing, Testing, Hopefully
Here is what I take from your post:

Zvi: Allowing more tests at home would provide information. Information is good. The FDA should use lower criteria to allow more tests (and should have done that months ago)

FDA: Too low sensitivity and bad use of tests at home could lead to misplaced confidence. Misplaced confidence is bad.

Zvi: In this situation the extra information outweights the misplaced confidence.

FDA: In this situation the extra information does not outweight the misplaced confidence.

Can you provide evidence that the information indeed brings more value ? Also, did I miss the point of your post and/or misrepresent it ?

Thomas Sepulchre 18 Jan 2022 14:10 UTC
2 points
in reply to: Lachouette’s comment on: Guidelines for cold messaging people
Wouldn’t this lead to some bystander effect?

Thomas Sepulchre 18 Jan 2022 14:43 UTC
5 points
on: Entropy isn’t sufficient to measure password strength
An adversary would need to try about $2^{H (W) - 1}$ passwords in order to guess yours, on average.
This statement is wrong. If we take your example of choosing the password “password” with probability $19 / 20$ and a uniform random 1000-character lowercase with probability $1 / 20$ (which represents about $2^{4700}$ different passwords), then the average number of tries needed to guess your password is about $\frac{1}{20} 2^{4700 - 1}$ , indeed it will take the attacker $2^{4700 - 1}$ tries on average in the unlucky situation where you have not chosen the obvious password. This is extremely far from $2^{235 - 1}$ .

Thomas Sepulchre 19 Jan 2022 12:35 UTC
5 points
in reply to: benwr’s comment on: Entropy isn’t sufficient to measure password strength
I apologize, my wording was indeed rude.
As for why I was confident, well, this was a clear (in my eye at least) example of Jensen’s inequality: we are comparing the mean of the log to the log of the mean. And, if you see this inequality come up often, you know that the inequality is always strict, except for a constant distribution. That’s how I knew.
As a last note, I must praise the fact that you left your original comments (while editing them of course) instead of removing them. I respect you a lot for that.

Thomas Sepulchre 21 Jan 2022 14:47 UTC
1 point
in reply to: methree’s comment on: A decision tree for vaccinating children against Covid-19, or how to wisely make a monumental decision
You have to replace the right term $599.976$ by $599.9976$ . This makes the sum equal to $600.000079976$ , or $600.00$ if you keep only two digits after the decimal point. Still pretty close to 600 though :)

[Question] How would you learn absolute pitch?

Thomas Sepulchre29 Jan 2022 23:26 UTC

5 points

21 comments1 min readLW link

Thomas Sepulchre 11 Apr 2022 9:17 UTC
4 points
on: Is Fisherian Runaway Gradient Hacking?
TL;DR: Tailed peacocks make better female chicks
Let’s, for a moment, pretend to be a peahen choosing a sexual mate. We have a few options, with different degrees of impressive tails. As stated in the post, it is difficult to tell whether the tail is a good proxy for fitness. Indeed one could either argue that having a big tail is a handicap for the peacock, limiting agility for example, or that it is a strong hint that the peacock is otherwise very fit, despite the big tail. I would argue that, given the information we have, i.e. all the potential male mates survived so far, we shouldn’t assume a higher/lower fitness between them.
But, why do we care anyway? We are not interested in the fitness of our future mate, but rather in the fitness of our future chicks. And here, I think, the tail is relevant.
If we have male chicks, the choice of a mate will influence both the size of their tail and other characteristics like the ability to find food, agility and so on. As before, it doesn’t seem that the tail is a reasonable proxy on how to produce better male chicks.
If we have female chicks, the story is very different. A female chick will partially inherit the agility and general ability to survive from the mate we will choose, but will not inherit the handicap of a big tail. Therefore, we should choose the mate with the biggest tail.

Thomas Sepulchre 11 Apr 2022 9:41 UTC
1 point
on: Playing with DALL·E 2
Could you try “size of Jupiter, banana for scale”?

Thomas Sepulchre 30 Jun 2022 7:14 UTC
1 point
0
on: Challenge: A Much More Alien Message
Not the main point of the post, just a small comment
So if you have any good hypotheses, you might want to test them with a compiled language for a 100-fold speedup over python
This used to be a kind of common knowledge that python is 100 times slower than C++ for example—at least I heard it quite a lot in the world of competitive programming—but I think this is kind of false nowaday
Libraries like NumPy are basically compiled C code accessible to the python user; thus, using such libraries will yield comparable performance to C/C++. Since a lot of operations can be written using NumPy or similar libraries, python is in fact no longer that slow.
I don’t have yet access to the code of this challenge, so maybe this one cannot be written this way, and therefore it would genuinely be 100 times faster using another language. But, in general, many standard operations can be done in python without such a big slowdown.

Thomas Sepulchre 7 Jul 2022 8:30 UTC
2 points
in reply to: Joseph Miller’s comment on: Humans are very reliable agents
The ML technique known as dropout corresponds to the idea of randomly deleting neurons from the network, and still ensuring that it performs whatever its task is.
So I guess you can make sure that your NN is robust wrt loss of neurons.

Thomas Sepulchre

[Question] How would you learn ab­solute pitch?

[Question] How would you learn absolute pitch?