Right, it depends on the vegan diet. Grains and legume protein are complementary, one deficient in branched chain amino acids, the other deficient in sulfur-containing amino acids, if I recall correctly. I think it’s an easy failure mode of a vegan diet to be all legume protein, and the gluten-free trend has made this even worse, but that’s a rant for the other day. The point here is that when it comes to dietary protein, all that matters is the amino acid composition. Every protein, including collagen, is broken down in the stomach into component amino acids. And that’s why collagen supplements are a scam, and while I am broadly sympathetic to the message of this post I think rationalists should do better.
transhumanist_atom_understander
I eat oysters but am otherwise vegan. The reason I didn’t just go with standard veganism is something like the more general arguments in this post. I had my reasons for nitpicking the details of this post; rationalists should learn some science and thereby be less wrong than the rest of the cultic milieu. But I want to comment again to focus on the positive: this post was a great reminder that I’m not a real vegan and why, and I’ve been making more of an effort to get oysters since reading it.
And if you want a “certified lower bound on their difference” you can use the Lagrange error bound for the Taylor series. The naive reasoning is that the error of the Taylor series is about the size of the first term you leave out. With the Lagrange error bound you get something like that rigorously. With well-behaved functions like sqrt and sin there’s no obstacle to proving that we’ve gotten the third digit correct (that is, that our error is <0.00001 and so can’t change the third digit). So if they differ in the third digit of our bounded numerical computation, they’re different numbers.
I haven’t actually done that carefully in this case that but the bound depends on the maximum of a higher derivative for the function. For sin that should have absolute value at most 1. For sqrt… well, we don’t want to expand around x=0, but if we expand around say x=4, these derivatives I think are not just bounded, but go to zero.
Wait, you think people need to eat collagen? Collagen is just a kind of protein, it’ll get broken down into raw amino acids in the stomach. There can be issues with a vegan diet not getting complete protein (that is, low on one or more essential amino acids) but there’s nothing special about collagen specifically.
I’m surprised at how hard it is for me to think of counterexamples.
I thought surely whale populations due to the slow generation time, but it looks like humpback whale populations have already recovered from whaling, and blue whales will get there before long.
Thinking again—in my baseball example, gravity is pulling the ball into the domain of applicability of the constant acceleration model.
Maybe what’s special about the exponential growth model is it implies escape from its own domain of applicability, in time that grows slowly (logarithmically) with the threshold.
I remember this by analogy to Curry’s paradox.
Where the sentence from Curry’s paradox says “If this statement is true, then ”, says “if this statement is provable, then ”, that is, .
In Curry’s paradox, if the sentence is true, that would indeed imply that is true. And with , the situation is analogous, but with truth replaced by provability: if is provable, then is provable. That is, .
But, unlike in Curry’s paradox, this is not what itself says! Replacing truth with provability has attenuated the sentence, destroyed its ability to cause paradox.
If only , then we would have our paradox back… and that’s Löb’s theorem.
This is all about , just about one direction of the biimplication, whereas the post proves not just that but the other direction. It seems that only this forward direction is used in the proof at the end of the post though.
You say “if we are to accurately model the world”...
If I am modelling the path of a baseball, and I write “F = mg”, would you “correct” me that it’s actually inverse square, that the Earth’s gravitation cannot stay at this strength to arbitrary heights? If you did, I would remind you that we are talking about a baseball game, and not shooting it into orbit—or conclude that you had an agenda other than determining where the ball lands.
What if I’m sampling from a population, and you catch me multiplying probabilities together, as if my draws are independent, as if the population is infinite? Yes there is an end to the population, but as long as it’s far away, the dependence induced by sampling without replacement is negligible.
Well, that’s the question, whether to include an effect in the model or whether it’s negligible. An effect like finite population size, diminishing gravity, or the “crowding” effects that turn an exponential growth model logistic.
And the question cannot be escaped just by noting the effect is important eventually.
Eliezer in 2008, in When (Not) To Use Probabilities, wrote:
To be specific, I would advise, in most cases, against using non-numerical procedures to create what appear to be numerical probabilities. Numbers should come from numbers.
Yeah… well, I thought of the because it sounds like we’re getting the probabilities of from some experiment. So is the results of the experiment, which in this case is a vector of frequencies. When I put it like that, it sounds like it’s is just a rhetorical device for saying that we have given probabilities of .
But I still seem to need for my dictionary. I have . What is ? It is some kind of updated probability of , right? Like we went from one probability to the other by doing an experiment. If I didn’t write , I’d need something like and .
Reading again, it seems like this is exactly Jeffrey conditionalization. So whether you include some extra variable just depends on what you think of Jeffrey conditionalization.
I feel like I’m missing something, though, about what this experiment is and means. For example, I’m not totally clear on whether we have one state , and a collection of replicates of state ; or is it a collection of replicates of pairs?
Looking at the paper, I see the connection to Jeffrey conditionalization is made explicitly. And it mentions Pearl’s “virtual evidence method”; is this what he calls introducing this ? But no clarity on exactly what this experiment is. It just says:
But how should the above be generalized to the situation where the new information does not come in the form of a definite value for , but as “soft evidence,” i.e., a probability distribution ?”
By the way, regarding your coin toss example, I can at least say how this is handled in Bayesian statistics. There are separate random variables for each coin toss. is the first, is the second, etc. If you have coin tosses, then your sample is a vector containing to . Then the posterior probability is . This will be covered in any Bayesian statistics textbook as “the Bernoulli model”. My class used Hoff’s book, which provides a quick start.
I guess this example suggests a single unknown (whether the coin is loaded or not) and replicates of .
The “Classical derivation” made more sense to me after translating it to standard probability notation, so I’m commenting to share the “dictionary” I made for it, as well as the unexpected extra assumption I had to make.
The obvious:
It got tricky with . Instead of observing , we observe something else that gives us a probability distribution over . I considered this “something else” to be the value of some other unknown: . The probability distribution over is a conditional distribution:
Hate to have on only one side like that… maybe I should have called it … but I’ll leave it as is.
Then,
Not quite the right formula for a simple interpretation of … if only
This is conditional independence, which could be represented with this Bayes net:
Then, we have
That completes the dictionary.
So to do what feels like ordinary probability theory, I had to introduce this extra unknown so that we have something to observe, and then to assume that only provides information about (and indirectly about , through ).
The way you described as some probability distribution resulting from an observation, but not a conditional distribution, is in philosophy called Jeffrey conditionalization. The Stanford Encyclopedia of Philosophy gives this example:
A gambler is very confident that a certain racehorse, called Mudrunner, performs exceptionally well on muddy courses. A look at the extremely cloudy sky has an immediate effect on this gambler’s opinion: an increase in her credence in the proposition (muddy) that the course will be muddy—an increase without reaching certainty. Then this gambler raises her credence in the hypothesis (win) that Mudrunner will win the race, but nothing becomes fully certain. (Jeffrey 1965 [1983: sec. 11.3])
The idea is, we go from one probability distribution over to another, without becoming certain of anything. My introduction of corresponds to introducing an unknown representing the status of the sky. I would say we are conditioning on .
I recalled vaguely that Jaynes discussed Jeffrey conditionalization in Probability Theory, and criticized it for holding only in a special case. I took a look, and sure enough, it’s in section 5.6, and he’s pointing out exactly what I did, right down to the arrows, though he calls it a “logic flow diagram” rather than identifying it as a Pearl-style Bayes net.
The last formula in this post, the conservation of expected evidence, had a mistake which I’ve only just now fixed. Since I guess it’s not obvious even to me, I’ll put a reminder for myself here, which may not be useful to others. Really I’m just “translating” from the “law of iterated expectations” I learned in my stats theory class, which was:
This is using a notation which is pretty standard for defining conditional expectations. To define it you can first consider the expected value given a particular value of the random variable . Think of that as a function of that particular value: Then we define conditional expectation as a random variable, obtained from plugging in the random value of : The problem with this notation is it gets confusing which capital letters are random variables and which are propositions, so I’ve bolded random variables. But it makes it very easy to state the law of iterated expectations.
The law of iterated expectations also holds when “relativized”. That is, where is an event. If we wanted to stick to just putting random variables behind the conditional bar we could have used the indicator function of that event.
And this translates to the statement in my post. is an indicator for the event , which makes a conditional expectation of it a conditional probability of . So is . Our proposition is the background information , I used the same symbol there. And the right hand side is another expectation of an indicator and therefore also a probability.
I really didn’t want to define this notation in the post itself, but it’s how I’m trained to think of this stuff, so for my own confidence in the final formula I had to write it out this way.
It would be nice if you had the sexes of the siblings, since it’s supposedly only the older brothers that count, though I don’t really expect that to change anything.
Really the important thing is just to separate birth order from family size. Usually the way I think of this is, we can look at number of older brothers, with a given number of older siblings. I like this setup because it looks like a randomized trial. I have two older siblings, so do you, meiosis randomizes their sexes.
But I guess with the data you have you can look at birth order with a given family size, so we don’t have to worry about the effect of a larger or smaller family. I… don’t think this is what you did? Did I misunderstand something? It seems like if cardinals come from smaller families, that would show up as lower birth orders.
With 9 million people I’d just split it into categories by number of siblings, with your data I’m not sure.
After reading this post, I handed over $200 for a month of ChatGPT pro and I don’t think I can go back. o1-pro and Deep Research are next level. o1-pro often understands my code or what I’m asking about without a whole conversation of clarifying, whereas other models it’s more work than it’s worth to get them focused on the real issue rather than something superficially similar. And then I can use “Deep Research” to get links to webpages relevant to what I’m working on. It’s like… smart Google, basically. Google that knows what I’m looking for. I never would have known this existed if I didn’t hand over the $200.
Depends entirely on Cybercab. A driverless car can be made cheaper for a variety of reasons. If the self-driving tech actually works, and if it’s widely legal, and if Tesla can mass produce it at a low price, then they can justify that valuation. Cybercab is a potential solution to the problem that they need to introduce a low priced car to get their sales growing again but cheap electric cars is a competitive market now without much profit margin. But there’s a lot of ifs.
Yeah, just went through this whole same line of evasion. Alright, the Collatz conjecture will never be “proved” in this restrictive sense—and neither will the Steve conjecture or the irrationality of √2—do we care? It may still be proved according to the ordinary meaning.
The pilot episode of NUMB3RS.
The tie-in to rationality is that instead of coming up with a hypothesis about the culprit, the hero comes up with algorithms for finding the culprit, and quantifies how well they would have worked applied to past cases.
It’s really a TV episode about computational statistical inference, rather than a movie about rationality, but it’s good media for cognitive algorithm appreciators.
Alright, so Collatz will be proved, and the proof will not be done by “staying in arithmetic”. Just as the proof that there do not exist numbers p and q that satisfy the equation p² = 2 q² (or equivalently, that all numbers do not satisfy it) is not done by “staying in arithmetic”. It doesn’t matter.
Not off the top of my head, but since a proof of Collatz does not require working under these constraints, I don’t think the distinction has any important implications.
We can eliminate the concept of rational numbers by framing it as the proof that there are no integer solutions to p² = 2 q²… but… no proof by contradiction? If escape from self-reference is that easy, then surely it is possible to prove the Collatz conjecture. Someone just needs to prove that the existence of any cycle beyond the familiar one implies a contradiction.
That’s interesting—if it’s broken down not into single amino acids, but a mixture of single amino acids, dipeptides, and tripeptides, that still fits with how I understand the system to work; like we’re breaking it down into pieces, but not reliably into single units, sometimes two or three. And then collagen consists of distinctive tripeptide repeats, so the tripeptides you get from collagen are a distinct mixture rather than just random 3-mers, I didn’t think of that. That these tripeptides actually do something is surprising if true, but why not.
I guess what I was thinking was that when you eat collagen, it doesn’t become your collagen. Which seems to be true: your collagen is made at the ribosome from single amino acids, not assembled from the kind of dipeptides and tripeptides discussed in the paper. So it’s not like you get collagen by eating collagen, the way you get vitamin B12 by eating vitamin B12. But if there’s some totally separate biological effect… well, I can’t rule it out.