I maintain a reading list on Goodreads. I have a personal website with some blog posts, mostly technical stuff about math research. I am also on github, twitter, and mastodon.
Eigil Rischel(Eigil Fjeldgren Rischel)
As an algebraic abstractologist, let me just say this is an absolutely great post. My comments:
Category theorists don’t distinguish between a category with two objects and an edge between them, and a category with two objects and two identified edges between them (the latter object doesn’t really even make sense in the usual account). In general, the extra equivalence relation that you have to carry around makes certain things more complicated in this version.
I do tend to agree with you that thinking of categories as objects, edges and an equivalence relation on paths is a more intuitive perspective, but let me defend the traditional presentation. By far the most essential/prototypical examples are the categories of sets and functions, or types and functions. Here, it’s more natural to speak of functions from x to y, than to speak of “composable sequences of functions beginning at x and ending at y, up to the equivalence relation which identifies two sequences if they have the same composite”.
Again, I absolutely love this post. I am frankly a bit shocked that nobody seems to have written an introduction using this language—I think everyone is too enamored with sets as an example.
CFAR must have a lot of information about the efficacy of various rationality techniques and training methods (compared to any other org, at least). Is this information, or recommendations based on it, available somewhere? Say, as a list of techniques currently taught at CFAR—which are presumably the best ones in this sense. Or does one have to attend a workshop to find out?
The Terra Ignota sci-fi series by Ada Palmer depicts a future world which is also driven by “slack transportation”. The mechanism, rather than portals, is a super-cheap global network of autonomous flying cars (I think they’re supposed to run on nuclear engines? The technical details are not really developed). It’s a pretty interesting series, although it doesn’t explore the practical implications so much as the political/sociological ones (and this is hardly the only thing driving the differences between the present world and the depicted future)
My mom is a translator (mostly for novels), and as far as I know she exclusively translates into Danish (her native language). I think this is standard in the industry—it’s extremely hard to translate text in a way that feels natural in the target language, much harder than it is to tease out subtleties of meaning from the source language.
This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called “The Pareto Frontier of Capability”. Simply put:
By an efficient markets-type argument, you shouldn’t expect to have any particularly good ways of achieving money/status/whatever - if there was an unusually good way of doing that, somebody else would already be exploiting it.
The exception to this is that if only a small amount of people can exploit an opportunity, you may have a shot. So you should try to acquire skills that only a small number of people have.
Since there are a lot of people in the world, it’s incredibly hard to become among the best in the world at any particular skill.
This means you should position yourself on the Pareto Frontier—you should seek out a combination of skills where nobody else is better than you at everything. Then you will have the advantage in problems where all these skills matter.
It might be important to contrast this with the economical term comparative advantage, which is often used informally in a similar context. But its meaning is different. If we are both excellent programmers, but you are also a great writer, while I suck at writing, I have a comparative advantage in programming. If we’re working on a project together where both writing and programming are relevant, it’s best if I do as much programming as possible while you handle as much as the writing as possible—even though you’re as good at me as programming, if someone has to take off time from programming to write, it should be you. This collaboration can make you more effective even though you’re better at everything than me (in the economics literature this is usually conceptualized in terms of nations trading with each other).
This is distinct from the Pareto optimality idea explored in this post. Pareto optimality matters when it’s important that the same person does both the writing and the programming. Maybe we’re writing a book to teach programming. Then even if I am actually better than you at programming, and Bob is much better than you at writing (but sucks at programming), you would probably be the best person for the job.
I think the Pareto frontier model is extremely useful, and I have used it to inform my own research strategy.
While rereading this post recently, I was reminded of a passage from Michael Nielsen’s Principles of Effective Research:
Say some new field opens up that combines field X and field Y. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.
“If you’ve never missed a flight, you spend too much time hanging around in airports” ~ “If you’ve never been publicly proven wrong, you don’t state your beliefs enough” ?
This argument doesn’t work because limits don’t commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).
To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function - it is on the point and everywhere else. The expected value/integral of this function is zero.
EDIT: To make the “limit” thing clear, we can describe the payout after bets using naive EV maximization as a function , which is if the first values are , and otherwise. Then , and (pointwise), but .
The corresponding functions corresponding to the EV using a Kelly strategy have for all , but
- 4 Mar 2021 18:48 UTC; 8 points) 's comment on A non-logarithmic argument for Kelly by (
I think, rather than “category theory is about paths in graphs”, it would be more reasonable to say that category theory is about paths in graphs up to equivalence, and in particular about properties of paths which depend on their relations to other paths (more than on their relationship to the vertices)*. If your problem is most usefully conceptualized as a question about paths (finding the shortest path between two vertices, or counting paths, or something in that genre, you should definitely look to the graph theory literature instead)
* I realize this is totally incomprehensible, and doesn’t make the case that there are any interesting problems like this. I’m not trying to argue that category theory is useful, just clarifying that your intuition that it’s not useful for problems that look like these examples is right.
The example of associativity seems a little strange, I’m note sure what’s going on there. What are the three functions that are being composed?
I’m curious about the remaining 3% of people in the 97% program, who apparently both managed to smuggle some booze into rehab, and then admitted this to the staff while they were checking out. Lizardman’s constant?
I’ve noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it’s more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list—there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.
On one extreme, I make a commitment to myself to do all the things on the list eventually. At first, this has the desired effect of making me get things done. But eventually, things that I don’t want to do start to accumulate. I procrastinate on these things by working on more attractive items on the list. This makes the list much less useful from a planning perspective, since it’s cluttered with a bunch of old things I no longer want to spend time on (which make me feel bad about not doing them whenever I’m looking at the list).
On the other extreme, I make no commitment like that, and remove things from the list whenever I feel like it. This avoids the problem of accumulating things I don’t want to do, but makes the list completely useless as a tool for getting me to do boring tasks.
I have a hard time balancing these issues. I’m currently trying an approach to my academic reading list where I keep a mostly unsorted list, and whenever I look at it to find something to read, I have to work on the top item, or remove it from the list. This is hardly ideal, but it mitigates the “stale items” problem, and still manages to provide some motivation, since it feels bad to take items off the list.
Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment “arguments to
nlinarith
are fully invented by our model” above a proof which is literally the single linenlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)]
makes me feel like they’re trying too hard to convince me this is impressive.The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of “traditional” search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also one-liners using existing proof assistant strategies—I find this claim totally plausible.
I would be much more impressed with an AI-generated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with “dumb” searching like
nlinarith
, but this is my guess.
This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utility-optimizers, yes, but what this really proves is that almost any behavior fits into the model “utility-optimizer”, not that coherent agents must necessarily look like our intuitive picture of a utility-optimizer.
Paraphrasing Rohin’s arguments somewhat, the arguments for universal convergence say something like “for “most” “natural” utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, etc”. We know that all coherent behavior comes from a utility function, but it doesn’t follow that most coherent behavior exhibits this sort of power-seeking.
My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they’re not catastrophically worse than the baseline “Google, Microsoft and Facebook” strategy you outlined, but I don’t think they’re better in any real way either.
Lsusr ran a survey here a little while ago, asking people for things that “almost nobody agrees with you on”. There’s a summary here
Information about people behaving erratically/violently is better at grabbing your brain’s “important” sensor? (Noting that I had exactly the same instinctual reaction). This seems to be roughly what you’d expect from naive evopsych (which doesn’t mean it’s a good explanation, of course)
Just to sketch out the contradiction between unbounded utilities and gambles involving infinitely many outcomes a bit more explicitly.
If your probability function is unbounded, we can consider the following wager: You win 2 utils with probability 1⁄2, 4 utils with probability 1⁄4, and so on. The expected utility of this wager is infinite. (If there are no outcomes with utility exactly 2, 4, etc, we can award more—this is possible because utility is unbounded).
Now consider these wagers on a (fair) coinflip:
A: Play the above game if heads, pay out 0 utils if tails
B: Play the above game if heads, pay out 100000 utils if tails
(0 and 100000 can be any two non-equal numbers).
Both of these wagers have infinite expected utility, so we must be indifferent between them. But since they agree on heads, and B is strictly preferred to A on tails, we must prefer B (since tails occurs with positive probability)
There’s some recent work in the statistics literature exploring similar ideas. I don’t know if you’re aware of this, or if it’s really relevant to what you’re doing (I haven’t thought a lot about the comparisons yet), but here are some papers.
I mean, “is a large part of the state space” is basically what “high entropy” means!
For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John—the vast majority of initial states for your computer don’t include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)
At least one explanation for the fact that the Fall of Rome is the only period of decline on the graph could be this: data becomes more scarce the further back in history you go. This has the effect of smoothing the historical graph as you extrapolate between the few datapoints you have. Thus the overall positive trend can more easily mask any short-term period of decay.