AlexMennen(Alex Mennen)

Karma: 4,450

AlexMennen 18 May 2024 1:45 UTC
18 points
14
in reply to: habryka’s comment on: simeon_c’s Shortform
I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.
I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.

AlexMennen 29 Apr 2024 4:16 UTC
LW: 2 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
Yeah, sorry that was unclear; there’s no need for any form of hypercomputation to get an enumeration of the axioms of U. But you need a halting oracle to distinguish between the axioms and non-axioms. If you don’t care about distinguishing axioms from non-axioms, but you do want to get an assignment of truthvalues to the atomic formulas Q(i,j) that’s consistent with the axioms of U, then that is applying a consistent guessing oracle to U.

AlexMennen 25 Apr 2024 4:29 UTC
LW: 6 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
I see that when I commented yesterday, I was confused about how you had defined U. You’re right that you don’t need a consistent guessing oracle to get from U to a completion of U, since the axioms are all atomic propositions, and you can just set the remaining atomic propositions however you want. However, this introduces the problem that getting the axioms of U requires a halting oracle, not just a consistent guessing oracle, since to tell whether something is an axiom, you need to know whether there actually is a proof of a given thing in T.

AlexMennen 24 Apr 2024 5:57 UTC
LW: 7 AF: 1
0
AF
on: Dequantifying first-order theories
I think what you proved essentially boils down to the fact that a consistent guessing oracle can be used to compute a completion of any consistent recursively axiomatizable theory. (In fact, it turns out that a consistent guessing oracle can be used to compute a model (in the sense of functions and relations on a set) of any consistent recursively axiomatizable theory; this follows from what you showed and the fact that an oracle for a complete theory can be used to compute a model of that theory.)
I disagree with
Philosophically, what I take from this is that, even if statements in a first-order theory such as Peano arithmetic appear to refer to high levels of the Arithmetic hierarchy, as far as proof theory is concerned, they may as well be referring to a fixed low level of hypercomputation, namely a consistent guessing oracle.
The translation from T to U is computable. The consistent guessing oracle only came in to find a completion of U, but it could also find a completion of T (in fact, a completion of U can be computably translated to a completion of T), so the consistent guessing oracle doesn’t really have anything to do with the relationship between T and U.

AlexMennen 24 Apr 2024 5:30 UTC
LW: 7 AF: 3
0
AF
on: Dequantifying first-order theories
a consistent guessing oracle rather than a halting oracle (which I theorize to be more powerful than a consistent guessing oracle).
This is correct. Or at least, the claim I’m interpreting this as is that there exist consistent guessing oracles that are strictly weaker than a halting oracle, and that claim is correct. Specifically, it follows from the low basis theorem that there are consistent guessing oracles that are low, meaning that access to a halting oracle makes it possible to tell whether any Turing machine with access to the consistent guessing oracle halts. In contrast, access to a halting oracle does not make it possible to tell whether any Turing machine with access to a halting oracle halts.

AlexMennen 24 Apr 2024 5:24 UTC
LW: 2 AF: 1
0
AF
on: Dequantifying first-order theories
I don’t understand what relevance the first paragraph is supposed to have to the rest of the post.

AlexMennen 17 Mar 2024 23:00 UTC
2 points
0
in reply to: Gurkenglas’s comment on: Constructive Cauchy sequences vs. Dedekind cuts
Something that I think it unsatisfying about this is that the rationals aren’t previleged as a countable dense subset of the reals; it just happens to be a convenient one. The completions of the diadic rationals, the rationals, and the algebraic real numbers are all the same. But if you require that an element of the completion, if equal to an element of the countable set being completed, must eventually certify this equality, then the completions of the diadic rationals, rationals, and algebraic reals are all constructively inequivalent.

AlexMennen 17 Mar 2024 16:18 UTC
2 points
0
in reply to: Gurkenglas’s comment on: Constructive Cauchy sequences vs. Dedekind cuts
This means that, in particular, if your real happens to be rational, you can produce the fact that it is equal to some particular rational number. Neither Cauchy reals nor Dedekind reals have this property.

AlexMennen 17 Mar 2024 16:15 UTC
4 points
0
in reply to: jessicata’s comment on: Constructive Cauchy sequences vs. Dedekind cuts
perhaps these are equivalent.
They are. To get enumerations of rationals above and below out of an effective Cauchy sequence, once the Cauchy sequence outputs a rational $q$ such that everything afterwards can only differ by at most $ε$ , you start enumerating rationals below $q - ε$ as below the real and rationals above $q + ε$ as above the real. If the Cauchy sequence converges to $r$ , and you have a rational $q < r$ , then once the Cauchy sequence gets to the point where everything after is gauranteed to differ by at most $\frac{r - q}{2}$ , you can enumerate $q$ as less than $r$ .

AlexMennen 16 Mar 2024 8:10 UTC
7 points
3
on: Constructive Cauchy sequences vs. Dedekind cuts
My take-away from this:
An effective Cauchy sequence converging to a real $r$ induces recursive enumerators for ${q \in Q ∣ q < r}$ and ${q \in Q ∣ q > r}$ , because if $q > r$ , then $q > r + ε$ for some $ε > 0$ , so you eventually learn this.
The constructive meaning of a set is that that membership should be decidable, not just semi-decidable.
If $r$ is irrational, then ${q \in Q ∣ q < r}$ and ${q \in Q ∣ q > r}$ are complements, and each semi-decidable, so they are decidable. If $r$ is rational, then the complement of ${q \in Q ∣ q < r}$ is ${r} ⊔ {q \in Q ∣ q > r}$ , which is semi-decidable, so again these sets are decidable. So, from the point of view of classical logic, it’s not only true that Cauchy sequences and Dedekind cuts are equivalent, but also effective Cauchy sequences and effective Dedekind cuts are equivalent.
However, it is not decidable whether a given Cauchy-sequence real is rational or not, and if so, which rational it is. So this doesn’t give a way to construct decision algorithms for the sets ${q \in Q ∣ q < r}$ and ${q \in Q ∣ q > r}$ from recursive enumerators of them.

AlexMennen 23 Nov 2023 23:19 UTC
14 points
14
in reply to: River’s comment on: OpenAI: The Battle of the Board
If board members have an obligation not to criticize their organization in an academic paper, then they should also have an obligation not to discuss anything related to their organization in an academic paper. The ability to be honest is important, and if a researcher can’t say anything critical about an organization, then non-critical things they say about it lose credibility.

AlexMennen 11 Nov 2023 21:51 UTC
2 points
0
in reply to: philh’s comment on: When and why should you use the Kelly criterion?
Yeah, I wasn’t trying to claim that the Kelly bet size optimizes a nonlogarithmic utility function exactly, just that, when the number of rounds of betting left is very large, the Kelly bet size sacrifices a very small amount of utility relative to optimal betting under some reasonable assumptions about the utility function. I don’t know of any precise mathematical statement that we seem to disagree on.
Well, we’ve established the utility-maximizing bet gives different expected utility from the Kelly bet, right? So it must give higher expected utility or it wouldn’t be utility-maximizing.
Right, sorry. I can’t read, apparently, because I thought you had said the utility-maximizing bet size would be higher than the Kelly bet size, even though you did not.

AlexMennen 8 Nov 2023 18:00 UTC
2 points
0
in reply to: philh’s comment on: When and why should you use the Kelly criterion?
Yeah, I was still being sloppy about what I meant by near-optimal, sorry. I mean the optimal bet size will converge to the Kelly bet size, not that the expected utility from Kelly betting and the expected utility from optimal betting converge to each other. You could argue that the latter is more important, since getting high expected utility in the end is the whole point. But on the other hand, when trying to decide on a bet size in practice, there’s a limit to the precision with which it is possible to measure your edge, so the difference between optimal bet and Kelly bet could be small compared to errors in your ability to determine the Kelly bet size, in which case thinking about how optimal betting differs from Kelly betting might not be useful compared to trying to better estimate the Kelly bet.
Even in the limit as the number of rounds goes to infinity, by the time you get to the last round of betting (or last few rounds), you’ve left the $n \to \infty$ limit, since you have some amount of wealth and some small number of rounds of betting ahead of you, and it doesn’t matter how you got there, so the arguments for Kelly betting don’t apply. So I suspect that Kelly betting until near the end, when you start slightly adjusting away from Kelly betting based on some crude heuristics, and then doing an explicit expected value calculation for the last couple rounds, might be a good strategy to get close to optimal expected utility.
Incidentally, I think it’s also possible to take a limit where Kelly betting gets you optimal utility in the end by making the favorability of the bets go to zero simultaneously with the number of rounds going to infinity, so that improving your strategy on a single bet no longer makes a difference.
I think that for all finite $n$ , the expected utility at timestep $n$ from utility-maximizing bets is higher than that from Kelly bets. I think this is the case even if the difference converges to 0, which I’m not sure it does.
Why specifically higher? You must be making some assumptions on the utility function that you haven’t mentioned.

AlexMennen 7 Nov 2023 15:59 UTC
4 points
0
in reply to: philh’s comment on: When and why should you use the Kelly criterion?
I do want to note though that this is different from “actually optimal”
By “near-optimal”, I meant converges to optimal as the number of rounds of betting approaches infinity, provided initial conditions are adjusted in the limit such that whatever conditions I mentioned remain true in the limit. (e.g. if you want Kelly betting to get you a typical outcome of $money \approx 1$ in the end, then when taking the limit as the number $N$ of bets goes to infinity, you better have starting money $r^{- N}$ , where $r$ is the geometric growth rate you get from bets, rather than having a fixed starting money while taking the limit $N \to \infty$ ). This is different from actually optimal because in practice, you get some finite amount of betting opportunities, but I do mean something more precise than just that Kelly betting tends to get decent outcomes.

AlexMennen 7 Nov 2023 4:38 UTC
2 points
0
in reply to: RationalDino’s comment on: When and why should you use the Kelly criterion?
The reason I brought this up, which may have seemed nitpicky, is that I think this undercuts your argument for sub-Kelly betting. When people say that variance is bad, they mean that because of diminishing marginal returns, lower variance is better when the mean stays the same. Geometric mean is already the expectation of a function that gets diminishing marginal returns, and when it’s geometric mean that stays fixed, lower variance is better if your marginal returns diminish even more than that. Do they? Perhaps, but it’s not obvious. And if your marginal returns diminish but less than for log, then higher variance is better. I don’t think any of median, mode, or looking at which thing more often gets a higher value are the sorts of things that it makes sense to talk about trading off against lowering variance either. You really want mean for that.

AlexMennen 7 Nov 2023 3:08 UTC
4 points
0
in reply to: philh’s comment on: When and why should you use the Kelly criterion?
$E ({money}^{x})$
Correct. This utility function grows fast enough that it is possible for the expected utility after many bets to be dominated by negligible-probability favorable tail events, so you’d want to bet super-Kelly.
$E (1 - e^{- money})$
If you expect to end up with lots of money at the end, then you’re right; marginal utility of money becomes negigible, so expected utility is greatly effected by neglible-probability unfavorable tail events, and you’d want to bet sub-Kelly. But if you start out with very little money, so that at the end of whatever large number of rounds of betting, you only expect to end up with $\sim 1$ money in most cases if you bet Kelly, then I think the Kelly criterion should be close to optimal.
$E (\sqrt{log (money)})$
(The thing you actually wrote is the same as log utility, so I substituted what you may have meant). The Kelly criterion should optimize this, and more generally $E (log (money)^{x})$ for any $x > 0$ , if the number of bets is large. At least if $x$ is an integer, then, if $log (money)$ is normally distributed with mean $μ$ and standard deviation $σ$ , then $E (log (money)^{x})$ is some polynomial in $μ$ and $σ$ that’s homogeneous of degree $x$ . After a large number $N$ of bets, $μ$ scales proportionally to $N$ and $σ$ scales proportionally to $\sqrt{N}$ , so the value of this polynomial approaches its $μ^{N}$ term, and maximizing it becomes equivalent to maximizing $μ$ , which the Kelly criterion does. I’m pretty sure you get something similar when $x$ is noninteger.
$E (min (money, x))$
It depends how much money you could end up with compared to $x$ . If Kelly betting usually gets you more than $x$ at the end, then you’ll bet sub-Kelly to reduce tail risk. If it’s literally impossible to exceed $x$ even if you go all-in every time and always win, then this is linear, and you’ll bet super-Kelly. But if Kelly betting will usually get you less than $x$ but not by too many orders of magnitude at the end after a large number of rounds of betting, then I think it should be near-optimal.
$E (log (money + x))$
If there’s many rounds of betting, and Kelly betting will get you $money \approx x$ as a typical outcome, then I think Kelly betting is near-optimal. But you might be right if $money << x$ .

AlexMennen 6 Nov 2023 5:38 UTC
10 points
5
in reply to: RationalDino’s comment on: When and why should you use the Kelly criterion?
If you bet more than Kelly, you’ll experience lower average returns and higher variance.
No. As they discovered in the dialog, average returns is maximized by going all-in on every bet with positive EV. It is typical returns that will be lower if you don’t bet Kelly.

AlexMennen 6 Nov 2023 1:42 UTC
5 points
1
on: When and why should you use the Kelly criterion?
The Kelly criterion can be thought of in terms of maximizing a utility function that depends on your wealth after many rounds of betting (under some mild assumptions about that utility function that rule out linear utility). See https://www.lesswrong.com/posts/NPzGfDi3zMJfM2SYe/why-bet-kelly

AlexMennen 31 Oct 2023 18:11 UTC
9 points
4
in reply to: Max H’s comment on: A new intro to Quantum Physics, with the math fixed
For two, your specific claims about the likely confusion that Eliezer’s presentation could induce in “laymen” is empirically falsified to some degree by the comments on the original post: in at least one case, a reader noticed the issue and managed to correct for it when they made up their own toy example, and the first comment to explicitly mention the missing unitarity constraint was left over 10 years ago.
Some readers figuring out what’s going on is consistent with many of them being unnecessarily confused.

AlexMennen 26 Apr 2023 19:33 UTC
12 points
2
in reply to: Yair Halberstadt’s comment on: How Many Bits Of Optimization Can One Bit Of Observation Unlock?
I don’t think this one works. In order for the channel capacity to be finite, there must be some maximum number of bits N you can send. Even if you don’t observe the type of the channel, you can communicate a number n from 0 to N by sending n 1s and N-n 0s. But then even if you do observe the type of the channel (say, it strips the 0s), the receiver will still just see some number of 1s that is from 0 to N, so you have actually gained zero channel capacity. There’s no bonus for not making full use of the channel; in johnswentworth’s formulation of the problem, there’s no such thing as some messages being cheaper to transmit through the channel than others.