Well, it worked reasonably well with ER graphs (which naturally embed into quite high dimensions), so it should definitely work at least that well. (I expect it would work better.)

# johnswentworth

Oh I see. Yeah, if either X or Y is unidimensional, then any linear model is really boring. They need to be high-dimensional to do anything interesting.

All ways of modifying Y are only equivalent in a

*dense*linear system. Sparsity (in a high-dimensional system) changes that. (That’s a fairly central concept behind this whole project: sparsity is one of the main ingredients necessary for the natural abstraction hypothesis.)

The main thing I expect from optimization is that the system will be adjusted so that certain specific abstractions work—i.e. if the optimizer cares about f(X), then the system will be adjusted so that information about f(X) is available far away (i.e. wherever it needs to be used).

That’s the view in which we think about abstractions in the

*optimized*system. If we instead take our system to be the whole optimization process, then we expect many variables in many places to be adjusted to optimize the objective, which means the objective itself is likely a natural abstraction, since information about it is available all over the place. I don’t have all the math worked out for that yet, though.

I’ve been thinking a fair bit about this question, though with a slightly different framing.

Let’s start with a neighborhood . Then we can pick a bunch of different far-away neighborhoods , and each of them will contain some info about summary(X). But then we can flip it around: if we do a PCA on , then we should expect to see components corresponding to summary(X), since all the variables which contain that information will covary.

Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking). This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.

I’d be curious to know whether something like this actually works in practice. It certainly shouldn’t work all the time, since it’s tackling the #P-Hard part of the problem pretty directly, but if it works well in practice that would solve a lot of problems.

I ran this experiment about 2-3 weeks ago (after the chaos post but before the project intro). I pretty strongly expected this to work much earlier—e.g. back in December I gave a talk which had all the ideas in it.

# Computing Natural Abstractions: Linear Approximation

Eh, yes and no. I’m definitely on board with selection pressure as a major piece in cancer. But for purposes of finding the root causes of aging, the key question is “why does cancer become more likely with aging?”, and my current understanding is that selection pressures don’t play a significant role there.

Once a cancer is already underway, selection pressure for drug resistance or differentiation or whatever is a big deal. But at a young age, the mutation rate is low and problematic cells are cleared before they have time to propagate anyway. Those defences have to break down somehow before cancer-selection-pressures can play any role at all, and it seems like the rate-limiting step (i.e. the step which accounts for the age-related increase in cancer) is the breakdown of the defences, not the time required for selection itself. Once cancer is underway, the selection process operates on a faster timescale.

Oh yeah, cruising Wikipedia lists is great. Definitely stumbled on multiple unknown unknowns that way. (Hauser’s Law is one interesting example, which I first encountered on the list of eponymous laws.) Related: Exercises in Comprehensive Information Gathering.

Haven’t finished reading this yet, but in reaction to the opening section… there’s this claim that “One meal at a crowded restaurant is enough to give even a vaccinated person

*hundreds*of microCovids”. Which, regardless of whether it’s true, sounds like it’s probably the wrong way to think about things.A core part of the microcovid model is that microcovids roughly add. You do a ten-microcovid activity, then a twenty microcovid activity, your total risk is roughly thirty microcovids. Put on a mask, and it cuts microcovids in half.

But with a vaccine, I’d expect the risk to be highly correlated—i.e. how-well-the-vaccine-worked varies from person to person, but it’s fixed for a given person. If that’s true, then eating in a crowded restaurant a hundred times is a lot less than 100x the risk of eating in a crowded restaurant once. My first million not-vaccine-adjusted microcovids are effectively an experiment to find out how well the vaccine worked for me specifically. If it worked well (i.e. I don’t get covid), then the next million exposures are actually much-less-risky.

Yup, exactly.

Oh, I mean that part of the point of the post is to talk about what relative advantages/disadvantages rationality

*should*have, in principle, if we’re doing it right—as opposed to whatever specific skills or strategies today’s rationalist community happens to have stumbled on. It’s about the relative advantages of the rationality practices which we hopefully converge to in the long run, not necessarily the rationality practices we have today.

Do you have specific skills of rationality in mind for that post?

No, which is part of the point. I do intend to start from the sequences-esque notion of the term (i.e. “rationality is systematized winning”), but I don’t necessarily intend to point to the kinds of things LW-style “rationality” currently focuses on. Indeed, there are some things LW-style “rationality” currently focuses on which I do not think are particularly useful for systematized winning, or are at least overemphasized.

I actually just started drafting a post in this vein. I’m framing the question as “what are the relative advantages and disadvantages of explicit rationality?”. It’s a natural follow-up to problems we don’t understand: absent practice in rationality and being agenty (whether we call it that or not), we’ll most likely end up as cultural-adaptation-executors. That works well mainly for problems where cultural/economic selection pressures have already produced good strategies. Explicit rationality is potentially useful mainly when that’s not the case—either because some change has messed up evolved strategies, or because selection pressures are misaligned with what we want, or because the cultural/economic search mechanisms were insufficient to find good strategies in the first place.

If so, I’m really interested with techniques you have in mind for starting from a complex mess/intuitions and getting to a formal problem/setting.

This deserves its own separate response.

At a high level, we can split this into two parts:

developing intuitions

translating intuitions into math

We’ve talked about the translation step a fair bit before (the conversation which led to this post). A core point of that post is that the translation from intuition to math should be faithful, and not inject any “extra assumptions” which weren’t part of the math. So, for instance, if I have an intuition that some function is monotonically increasing between 0 and 1, then my math should say “assume f(x) is monotonically increasing between 0 and 1″, not “let f(x) = x^2”; the latter would be making assumptions not justified by my intuition. (Some people also make the opposite mistake—failing to include assumptions which their intuition actually does believe. Usually this is because the intuition only feels like it’s mostly true or usually true, rather than reliable and certain; the main way to address this is to explicitly call the assumption an approximation.)

On the flip side, that implies that the intuition has to do quite a bit of work, and the linked post talked a lot less about that. How do we build these intuitions in the first place? The main way is to play around with the system/problem. Try examples, and see what they do. Try proofs, and see where they fail. Play with variations on the system/problem. Look for bottlenecks and barriers, look for approximations, look for parts of the problem-space/state-space with different behavior. Look for analogous systems in the real world, and carry over intuitions from them. Come up with hypotheses/conjectures, and test them.

I disagree with the claim that “identifying good subproblems of well-posed problems is a different skill from identifying good well-posed subproblems of a weird and not formalized problem”, at least insofar as we’re focused on problems for which current paradigms fail.

P vs NP is a good example here. How do you identify a good subproblem for P vs NP? I mean, lots of people have come up with subproblems in mathematically-straightforward ways, like the strong exponential time hypothesis or P/poly vs NP. But as far as we can tell so far, these are not very

*good*subproblems—they are “simplifications” in name only, and whatever elements make P vs NP hard in the first place seem to be fully maintained in them. They don’t simplify the parts of the original problem which are actually hard. They’re essentially variants of the original problem, a whole cluster of problems which are probably-effectively-identical in terms of the core principles. They’re not really simplifications.Simplifying an actually-hard part of P vs NP is very much a fuzzy conceptual problem. We have to figure out how-to-carve-up-the-problem in the right way, how to frame it so that a substantive piece can be reduced.

I suspect that your intuition that “there are way more useful and generalizable techniques for the first case than the second case” is looking at things like simplifying-P-vs-NP-to-strong-exponential-time-hypothesis, and confusing these for useful progress on the hard part of a hard problem. Something like “simplify the problem as much as possible without making it trivial” is a very useful first step, but it’s not the sort of thing which is going address the hardest part of a problem when the current paradigm fails. (After all, the current paradigm is usually what underlies our notion of “simplicity”.)

One aspect I feel is not emphasized enough here is the skill of finding/formulating good problems.

That’s a good one to highlight. In general, there’s a lot of different skills which I didn’t highlight in the post (in many cases because I haven’t even explicitly noticed them) which are still quite high-value.

The outside-view approach of having a variety of problems you don’t really understand should still naturally encourage building those sorts of skills. In the case of problem formulation, working on a wide variety of problems you don’t understand will naturally involve identifying good subproblems. It’s especially useful to look for subproblems which are a bottleneck for

*multiple*problems at once—i.e. generalizable bottleneck problems. That’s exactly the sort of thinking which first led me to (what we now call) the embedded agency cluster of problems, as well as the abstraction work.

Actually building bridges and Actually preventing infections requires not only improvements in applied science, but also human coordination. In the former we’ve improved, in the latter we’ve stagnated.

+1 to this. The difficulties we have today with bridges and infections do not seem to me to have anything to do with the physical systems involved; they are all about incentives and coordination problems.

(Yes, I know lots of doctors

*talk about*how we’re overly trigger-happy with antibiotics, but I’m pretty sure that hospital infection rates have a lot more to do with things like doctors not consistently washing their hands than with antibiotic-resistant bacteria. In order for antibiotic resistance to be relevant, you have to get infected in the first place, and if hospitals were consistent about sterilization then they wouldn’t have infection rates any higher than any other buildings.)I do agree that some kinds of technical paradigm shifts could plaster over social problems (at least partially), but let’s not mistake that for a lack of understanding of the physical systems. The thing-we-don’t-understand here is mechanism design and coordination.

Note to self: the summary dimension seems basically constant as the neighborhood size increases. There’s probably a generalization of Koopman-Pitman-Darmois which applies.