The abilities we obtained from architectural changes to our brains also came from a slow, accumulated process, taking even longer than cultural evolution does.

# AlexMennen(Alex Mennen)

There’s more than one thing that you could mean by raw innovative capacity separate from cultural processing ability. First, you could mean someone’s ability to innovate on their own without any direct help from others on the task at hand, but where they’re allowed to use skills that they previously acquired from their culture. Second, you could mean someone’s counterfactual ability to innovate on their own if they hadn’t learned from culture. You seem to be conflating these somewhat, though mostly focusing on the second?

The second is underspecified, as you’d need to decide what counterfactual upbringing you’re assuming. If you compare the cognitive performance of a human raised by bears to the cognitive performance of a bear in the same circumstances, this is unfair to the human, since the bear is raised in circumstances that it is adapted for and the human is not, just like comparing the cognitive performance of a bear raised by humans to that of a human in the same circumstances would be unfair to the bear. Though a human raised by non-humans would still make a more interesting comparison to non-human animals than Genie would, since Genie’s environment is even less conducive to human development (I bet most animals wouldn’t cognitively develop very well if they were kept immobilized in a locked room until maturity either).

I think this makes the second notion less interesting than the first, as there’s a somewhat arbitrary dependence on the counterfactual environment. I guess the first notion is more relevant when trying to reason specifically on genetics as opposed to other factors that influence traits, but the second seems more relevant in other contexts, since it usually doesn’t matter to what extent someone’s abilities were determined by genetics or environmental factors.

I didn’t really follow your argument for the relevance of this question to AI development. Why should raw innovation ability be more susceptible to discontinuous jumps than cultural processing ability? Until I understand the supposed relevance to AI better, it’s hard for me to say which of the two notions is more relevant for this purpose.

I’d be very surprised if any existing non-human animals are ahead of humans by the first notion, and there’s a clear reason in this case why performance would correlate strongly with social learning ability: social learning will have helped people in the past develop skills that they keep in the present. Even for the second notion, though it’s a bit hard to say without pinning down the counterfactual more closely, I’d still expect humans to outperform all other animals in some reasonable compromise environment that helps both develop but doesn’t involve them being taught things that the non-humans can’t follow. I think there are still reasons to expect social learning ability and raw innovative capability to be correlated even in this sense, because higher general intelligence will help for both; original discovery and understanding things that are taught to you by others both require some of the same cognitive tools.

Theorem: Fuzzy beliefs (as in https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v ) form a continuous DCPO. (At least I’m pretty sure this is true. I’ve only given proof sketches so far)

The relevant definitions:

A fuzzy belief over a set is a concave function such that (where is the space of probability distributions on ). Fuzzy beliefs are partially ordered by . The inequalities reverse because we want to think of “more specific”/”less fuzzy” beliefs as “greater”, and these are the functions with lower values; the most specific/least fuzzy beliefs are ordinary probability distributions, which are represented as the concave hull of the function assigning 1 to that probability distribution and 0 to all others; these should be the maximal fuzzy beliefs. Note that, because of the order-reversal, the supremum of a set of functions refers to their pointwise infimum.

A DCPO (directed-complete partial order) is a partial order in which every directed subset has a supremum.

In a DCPO, define to mean that for every directed set with , such that . A DCPO is continuous if for every , .

Lemma: Fuzzy beliefs are a DCPO.

Proof sketch: Given a directed set , is convex, and . Each of the sets in that intersection are non-empty, hence so are finite intersections of them since is directed, and hence so is the whole intersection since is compact.

Lemma: iff is contained in the interior of and for every such that , .

Proof sketch: If , then , so by compactness of and directedness of , there should be such that . Similarly, for each such that , there should be such that . By compactness, there should be some finite subset of such that any upper bound for all of them is at least .

Lemma: .

Proof: clear?

- 11 Dec 2019 16:56 UTC; 7 points) 's comment on Vanessa Kosoy’s Shortform by (

# AlexMennen’s Shortform

The part about derivatives might have seemed a little odd. After all, you might think, is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we’re using a different metric than usual, and isn’t discrete at all! Indeed, for any number , , so no points are isolated, and we can define differentiation of functions on in exactly the usual way with limits.

The theorem: where is relatively prime to an odd prime and , is a square mod iff is a square mod and is even.

The real meat of the theorem is the case (i.e. a square mod that isn’t a multiple of is also a square mod . Deriving the general case from there should be fairly straightforward, so let’s focus on this special case.

Why is it true? This question has a surprising answer: Newton’s method for finding roots of functions. Specifically, we want to find a root of , except in instead of .

To adapt Newton’s method to work in this situation, we’ll need the p-adic absolute value on : for relatively prime to . This has lots of properties that you should expect of an “absolute value”: it’s positive ( with only when ), multiplicative (), symmetric (), and satisfies a triangle inequality (; in fact, we get more in this case: ). Because of positivity, symmetry, and the triangle inequality, the p-adic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) . To visualize this distance function, draw giant circles, and sort integers into circles based on their value mod . Then draw smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod . Then draw even smaller circles inside each of those, and sort based on value mod , and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, converges to .

Now on to Newton’s method: if is a square mod , let be one of its square roots mod . ; that is, is somewhat close to being a root of with respect to the p-adic absolute value. , so ; that is, is steep near . This is good, because starting close to a root and the slope of the function being steep enough are things that helps Newton’s method converge; in general, it might bounce around chaotically instead. Specifically, It turns out that, in this case, is exactly the right sense of being close enough to a root with steep enough slope for Newton’s method to work.

Now, Newton’s method says that, from , you should go to . is invertible mod , so we can do this. Now here’s the kicker: , so . That is, is closer to being a root of than is. Now we can just iterate this process until we reach with , and we’ve found our square root of mod .

Exercise: Do the same thing with cube roots. Then with roots of arbitrary polynomials.

The impressive part is getting reinforcement learning to work at all in such a vast state space

It seems to me that that is AGI progress? The real world has an even vaster state space, after all. Getting things to work in vast state spaces is a necessary pre-condition to AGI.

Ok, I see what you mean about independence of irrelevant alternatives only being a real coherence condition when the probabilities are objective (or otherwise known to be equal because they come from the same source, even if there isn’t an objective way of saying what their common probability is).

But I disagree that this makes VNM only applicable to settings in which all sources of uncertainty have objectively correct probabilities. As I said in my previous comment, you only need there to exist some source of objective probabilities, and you can then use preferences over lotteries involving objective probabilities and preferences over related lotteries involving other sources of uncertainty to determine what probability the agent must assign for those other sources of uncertainty.

Re: the difference between VNM and Bayesian expected utility maximization, I take it from the word “Bayesian” that the way you’re supposed to choose between actions does involve first coming up with probabilities of each outcome resulting from each action, and from “expected utility maximization”, that these probabilities are to be used in exactly the way the VNM theorem says they should be. Since the VNM theorem does not make any assumptions about where the probabilities came from, these still sound essentially the same, except with Bayesian expected utility maximization being framed to emphasize that you have to get the probabilities somehow first.

I think you’re underestimating VNM here.

only two of those four are relevant to coherence. The main problem is that the axioms relevant to coherence (acyclicity and completeness) do not say anything at all about probability

It seems to me that the independence axiom is a coherence condition, unless I misunderstand what you mean by coherence?

correctly point out problems with VNM

I’m curious what problems you have in mind, since I don’t think VNM has problems that don’t apply to similar coherence theorems.

VNM utility stipulates that agents have preferences over “lotteries” with known, objective probabilities of each outcome. The probabilities are assumed to be objectively known from the start. The Bayesian coherence theorems do not assume probabilities from the start; they derive probabilities from the coherence criteria, and those probabilities are specific to the agent.

One can construct lotteries with probabilities that are pretty well understood (e.g. flipping coins that we have accumulated a lot of evidence are fair), and you can restrict attention to lotteries only involving uncertainty coming from such sources. One may then get probabilities for other, less well-understood sources of uncertainty by comparing preferences involving such uncertainty to preferences involving easy-to-quantify uncertainty (e.g. if A is preferred to B, and you’re indifferent between 60%A+40%B and “A if X, B if not-X”, then you assign probability 60% to X. Perhaps not quite as philosophically satisfying as deriving probabilities from scratch, but this doesn’t seem like a fatal flaw in VNM to me.

I do not expect agent-like systems in the wild to be pushed toward VNM expected utility maximization. I expect them to be pushed toward Bayesian expected utility maximization.

I understood those as being synonyms. What’s the difference?

I do, however, believe that the single step cooperate-defect game which they use to come up with their factors seems like a very simple model for what will be a very complex system of interactions. For example, AI development will take place over time, and it is likely that the same companies will continue to interact with one another. Iterated games have very different dynamics, and I hope that future work will explore how this would affect their current recommendations, and whether it would yield new approaches to incentivizing cooperation.

It may be difficult for companies to get accurate information about how careful their competitors are being about AI safety. An iterated game in which players never learn what the other players did on previous rounds is the same as a one-shot game. This points to a sixth factor that increases chance of cooperation on safety: high transparency, so that companies may verify their competitors’ cooperation on safety. This is closely related to high trust.

I object to the framing of the bomb scenario on the grounds that low probabilities of high stakes are a source of cognitive bias that trip people up for reasons having nothing to do with FDT. Consider the following decision problem: “There is a button. If you press the button, you will be given $100. Also, pressing the button has a very small (one in a trillion trillion) chance of causing you to burn to death.” Most people would not touch that button. Using the same payoffs and probabilies in a scenario to challenge FDT thus exploits cognitive bias to make FDT look bad. A better scenario would be to replace the bomb with something that will fine you $1000 (and, if you want, also increase the chance of of error).

But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

I think the crucial difference here is how easily you can cause the predictor to be wrong. In the case where the predictor simulates you, if you two-box, then the predictor expects you to two-box. In the case where the predictor uses your nationality to predict your behavior, Scots usually one-box, and you’re Scottish, if you two-box, then the predictor will still expect you to one-box because you’re Scottish.

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S...

I didn’t think that was supposed to matter at all? I haven’t actually read the FDT paper, and have mostly just been operating under the assumption that FDT is basically the same as UDT, but UDT didn’t build in any dependency on external agents, and I hadn’t heard about any such dependency being introduced in FDT; it would surprise me if it did.

I don’t know if I’m a simulation or a real person.

A possible response to this argument is that the predictor may be able to accurately predict the agent without explicitly simulating them. A possible counter-response to this is to posit that any sufficiently accurate model of a conscious agent is necessarily conscious itself, whether the model takes the form of an explicit simulation or not.

I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent’s source code, if you see it start to cross the bridge, it is correct to infer that it’s reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfactual behavior, screening out other logical implications.

Also, counterfactuals which predict that the bridge blows up seem to be saying that the agent can control whether PA is consistent or inconsistent.

Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.

I suppose why that’s not why we’re minimizing determinant, but rather frobenius norm.

Yes, although another reason is that the determinant is only defined if the input and output spaces have the same dimension, which they typically don’t.

First, a vector can be seen as a list of numbers, and a matrix can be seen as an ordered list of vectors. An ordered list of matrices is… a tensor of order 3. Well not exactly. Apparently some people are actually disappointed with the term tensor because a tensor means something very specific in mathematics already and isn’t

*just*an ordered list of matrices. But whatever, that’s the term we’re using for this blog post at least.It’s true that tensors are something more specific than multidimensional arrays of numbers, but Jacobians of functions between tensor spaces (that being what you’re using the multidimensional arrays for here) are, in fact, tensors.

What this means is for the Jacobian is that the determinant tells us how much space is being squished or expanded in the

*neighborhood*around a point. If the output space is being expanded a lot at some input point, then this means that the neural network is a bit unstable at that region, since minor alterations in the input could cause huge distortions in the output. By contrast, if the determinant is small, then some small change to the input will hardly make a difference to the output.This isn’t quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.

The frobenius norm is nothing complicated, and is really just a way of describing that we square all of the elements in the matrix, take the sum, and then take the square root of this sum.

An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius in terms of limiting the extent to which small changes in input can cause large changes in output: The frobenius norm of a matrix J is the root-mean-square of |J(x)| over all unit vectors x.

“Controlling which Everett branch you end up in” is the wrong way to think about decisions, even if many-worlds is true. Brains don’t appear to rely much on quantum randomness, so if you make a certain decision, that probably means that the overwhelming majority of identical copies of you make the same decision. You aren’t controlling which copy you are; you’re controlling what all of the copies do. And even if quantum randomness does end of mattering in decisions, so that a non-trivial proportion of copies of you make different decisions from each other, then you would still presumably want a high proportion of them to make good decisions; you can do your part to bring that about by making good decisions yourself.

Consider reading a real physicist’s take on the issue

This seems phrased to suggest that her view is “the real physicist view” on the multiverse. You could also read what Max Tegmark or David Deutsch, for instance, have to say about multiverse hypotheses and get a “real physicist’s” view from them.

Also, she doesn’t actually say much in that blog post. She points out that when she says that multiverse hypotheses are unscientific, she doesn’t mean that they’re false, so this doesn’t seem especially useful to someone who wants to know whether there actually is a multiverse, or is interested in the consequences thereof. She says “there is no reason to think we live in such multiverses to begin with”, but proponents of multiverse hypotheses have given reasons to support their views, which she doesn’t address.

#1 (at the end) sounds like complexity theory.

Some of what von Neumann says makes it sound like he’s interested in a mathematical foundation for analog computing, which I think has been done by now.

I guess what I was trying to say is (although I think I’ve partially figured out what you meant; see next paragraph), cultural evolution is a process that acquires adaptations slowly-ish and transmits previously-acquired adaptations to new organisms quickly, while biological evolution is a process that acquires adaptations very slowly and transmits previously-acquired adaptations to new organisms quickly. You seem to be comparing the rate at which cultural evolution acquires adaptations to the rate at which biological evolution transmits previously-acquired adaptations to new organisms, and concluding that cultural evolution is slower.

Re-reading the part of your post where you talked about AI takeoff speeds, you argue (which I hadn’t understood before) that the rise of humans was fast on an evolutionary timescale, and slow on a cultural timescale, so that if it was due to an evolutionary change, it must involve a small change that had a large effect on capabilities, so that a large change will occur very suddenly if we mimic evolution quickly, while if it was due to a cultural change, it was probably a large change, so mimicking culture quickly won’t produce a large effect on capabilities unless it is extremely quick.

This clarifies things, but I don’t agree with the claim. I think slow changes in the intelligence of a species is compatible with fast changes in its capabilities even if the changes are mainly in raw innovative ability rather than cultural learning. Innovations can increase ability to innovate, causing a positive feedback loop. A species could have high enough cultural learning ability for innovations to be transmitted over many generations without having the innovative ability to ever get the innovations that will kick off this loop. Then, when they start slowing gaining innovative ability, the innovations accumulated into cultural knowledge gradually increase, until they reach the feedback loop and the rate of innovation becomes more determined by changes in pre-existing innovations than by changes in raw innovative ability. There doesn’t even have to be any evolutionary changes in the period in which innovation rate starts to get dramatic.

If you don’t buy this story, then it’s not clear why the changes being in cultural learning ability rather than in raw innovative ability would remove the need for a discontinuity. After all, our cultural learning ability went from not giving us much advantage over other animals to “accumulating decisive technological dominance in an evolutionary eyeblink” in an evolutionary eyeblink (quotation marks added for ease of parsing). Does this mean our ability to learn from culture must have greatly increased from a small change? You argue in the post that there’s no clear candidate for what such a discontinuity in cultural learning ability could look like, but this seems just as true to me for raw innovative ability.

Perhaps you could argue that it doesn’t matter if there’s a sharp discontinuity in cultural learning ability because you can’t learn from a culture faster than the culture learns things to teach you. In this case, yes, perhaps I would say that AI-driven culture could make advancements that look discontinuous on a human scale. Though I’m not entirely sure what that would look like, and I admit it does sound kind of soft-takeoffy.