I think you could’ve done better with integration by parts.
For the reason why it’s true, there’s this excellent picture on the Wikipedia page that graphs u vs v and breaks the rectangular area (uv)(b) into the areas under the curve of u (\int u dv) and to the left (so \int v du) with an initial (uv)(a) rectangle. You can also modify the proof to get a formula for the integral of an inverse function that for some reason is little known.
In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads:
“The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question are small on the boundary”.
Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.
I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule
posterior odds = prior odds*likelihood ratio
and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just “mass of A and B”, so you are done.
Now, P(A and B) = P(B)P(A|B), which I think of as “First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true”. Essentially translating from locating sets to probabilities.
From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.
For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.
For an extra math fact that totally doesn’t need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there’s a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).
I think you could’ve done better with integration by parts.
For the reason why it’s true, there’s this excellent picture on the Wikipedia page that graphs u vs v and breaks the rectangular area (uv)(b) into the areas under the curve of u (\int u dv) and to the left (so \int v du) with an initial (uv)(a) rectangle. You can also modify the proof to get a formula for the integral of an inverse function that for some reason is little known.
In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads: “The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question are small on the boundary”.
Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.
I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule posterior odds = prior odds*likelihood ratio and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just “mass of A and B”, so you are done.
Now, P(A and B) = P(B)P(A|B), which I think of as “First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true”. Essentially translating from locating sets to probabilities.
From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.
For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.
For an extra math fact that totally doesn’t need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there’s a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).