I’ve made a lot of manifold markets, and find it a useful way to track my accuracy and sanity check my beliefs against the community. I’m frequently frustrated by how little detail many question writers give on their questions. Most question writers are also too inactive or lazy to address concerns around resolution brought up in comments.
Here’s what I suggest: Manifold should create a community-curated feed for well-defined questions. I can think of two ways of implementing this:
(Question-based) Allow community members to vote on whether they think the question is well-defined
(User-based) Track comments on question clarifications (e.g. Metaculus has an option for specifying your comment pertains to resolution), and give users a badge if there are no open ‘issues’ on their questions.
Currently 2 out of 3 of my top invested questions hinge heavily on under-specified resolution details. The other one was elaborated on after I asked in comments. Those questions have ~500 users active on them collectively.
I was looking at this image in a post and it gave me some (loosely connected/ADD-type) thoughts.
In order:
The entities outside the box look pretty scary.
I think I would get over that quickly, they’re just different evolved body shapes. The humans could seem scary-looking from their pov too.
Wait.. but why would the robots have those big spiky teeth? (implicit question: what narratively coherent world could this depict?)
Do these forms have qualities associated with predator species, and that’s why they feel scary? (Is this a predator-species-world?)
Most humans are also predators in a non-strict sense.
I don’t want to live in a world where there’s only the final survivors of selection processes who shrug indifferently when asked why we don’t revive all the beings who were killed in the process which created the final survivors. (implicit: related to how a ‘predator-species-world’ from (4) could exist)
There’s been many occasions where I’ve noticed what feels like a more general version of that attitude in a type of current human, but I don’t know how to describe it.
I don’t want to live in a world where there’s only the final survivors of selection processes who shrug indifferently when asked why we don’t revive all the beings who were killed in the process which created the final survivors.
If you could revive all the victims of the selection process that brought us to the current state, all the crusaders and monarchists and vikings and Maoists and so, so many illiterate peasant farmers (on much too little land because you’ve got hundreds of generations of them at once, mostly with ideas that make Putin look like Sonia Sotomayor), would you? They’d probably make quite the mess. Bringing them back would probably restart the selection process and we probably wouldn’t be selected again. It just seems like a terrible idea to me.
I’m thinking of this in the context of a post-singularity future, where we wouldn’t need to worry about things like conflict or selection processes.
By ‘the ones who were killed in the process’, I was thinking about e.g herbivorous animals that were killed by predator species[1], but you’re correct that it could include humans too. A lot of humans have been unjustly killed (by others or by nature) throughout history.
I think my endorsed morals are indifferent about the (dis)value of reviving abusive minds from the past, though moral-patient-me dislikes the idea on an intuitive level, and wishes for a better narrative ending than that.
(Also I upvoted your comment from negative)
I also notice some implied hard moral questions (What of current mean-hearted people? What about the potential for past ones of them to have changed into good people? etc)
As a clear example of a kind of being who seems innocent of wrongdoing. Not ruling out other cases, e.g plausibly inside the mind of the cat that I once witnessed killing a bunny, there could be total naivety about what was even being done.
Sort-of relatedly, I basically view evolution as having favored the dominance of agents with defect-y decision-making, even though the equilibrium of ‘collaborating with each other to harness the free energy of the sun’ would have been so much better. (Maybe another reason that didn’t happen is that there would be less of a gradual buildup of harder and harder training environments, in that case)
Someone posted these quotes in a Slack I’m in… what Ellsberg said to Kissinger:
“Henry, there’s something I would like to tell you, for what it’s worth, something I wish I had been told years ago. You’ve been a consultant for a long time, and you’ve dealt a great deal with top secret information. But you’re about to receive a whole slew of special clearances, maybe fifteen or twenty of them, that are higher than top secret.
“I’ve had a number of these myself, and I’ve known other people who have just acquired them, and I have a pretty good sense of what the effects of receiving these clearances are on a person who didn’t previously know they even existed. And the effects of reading the information that they will make available to you.
[...]
“In the meantime it will have become very hard for you to learn from anybody who doesn’t have these clearances. Because you’ll be thinking as you listen to them: ‘What would this man be telling me if he knew what I know? Would he be giving me the same advice, or would it totally change his predictions and recommendations?’ And that mental exercise is so torturous that after a while you give it up and just stop listening. I’ve seen this with my superiors, my colleagues….and with myself.
“You will deal with a person who doesn’t have those clearances only from the point of view of what you want him to believe and what impression you want him to go away with, since you’ll have to lie carefully to him about what you know. In effect, you will have to manipulate him. You’ll give up trying to assess what he has to say. The danger is, you’ll become something like a moron. You’ll become incapable of learning from most people in the world, no matter how much experience they may have in their particular areas that may be much greater than yours.”
Someone else added these quotes from a 1968 article about how the Vietnam war could go so wrong:
Despite the banishment of the experts, internal doubters and dissenters did indeed appear and persist. Yet as I watched the process, such men were effectively neutralized by a subtle dynamic: the domestication of dissenters. Such “domestication” arose out of a twofold clubbish need: on the one hand, the dissenter’s desire to stay aboard; and on the other hand, the nondissenter’s conscience. Simply stated, dissent, when recognized, was made to feel at home. On the lowest possible scale of importance, I must confess my own considerable sense of dignity and acceptance (both vital) when my senior White House employer would refer to me as his “favorite dove.” Far more significant was the case of the former Undersecretary of State, George Ball. Once Mr. Ball began to express doubts, he was warmly institutionalized: he was encouraged to become the inhouse devil’s advocate on Vietnam. The upshot was inevitable: the process of escalation allowed for periodic requests to Mr. Ball to speak his piece; Ball felt good, I assume (he had fought for righteousness); the others felt good (they had given a full hearing to the dovish option); and there was minimal unpleasantness. The club remained intact; and it is of course possible that matters would have gotten worse faster if Mr. Ball had kept silent, or left before his final departure in the fall of 1966. There was also, of course, the case of the last institutionalized doubter, Bill Moyers. The President is said to have greeted his arrival at meetings with an affectionate, “Well, here comes Mr. Stop-the-Bombing....” Here again the dynamics of domesticated dissent sustained the relationship for a while.
A related point—and crucial, I suppose, to government at all times—was the “effectiveness” trap, the trap that keeps men from speaking out, as clearly or often as they might, within the government. And it is the trap that keeps men from resigning in protest and airing their dissent outside the government. The most important asset that a man brings to bureaucratic life is his “effectiveness,” a mysterious combination of training, style, and connections. The most ominous complaint that can be whispered of a bureaucrat is: “I’m afraid Charlie’s beginning to lose his effectiveness.” To preserve your effectiveness, you must decide where and when to fight the mainstream of policy; the opportunities range from pillow talk with your wife, to private drinks with your friends, to meetings with the Secretary of State or the President. The inclination to remain silent or to acquiesce in the presence of the great men—to live to fight another day, to give on this issue so that you can be “effective” on later issues—is overwhelming. Nor is it the tendency of youth alone; some of our most senior officials, men of wealth and fame, whose place in history is secure, have remained silent lest their connection with power be terminated. As for the disinclination to resign in protest: while not necessarily a Washington or even American specialty, it seems more true of a government in which ministers have no parliamentary backbench to which to retreat. In the absence of such a refuge, it is easy to rationalize the decision to stay aboard. By doing so, one may be able to prevent a few bad things from happening and perhaps even make a few good things happen. To exit is to lose even those marginal chances for “effectiveness.”
I wish this quote were a little more explicit about what’s going wrong. On a literal reading it’s saying that some people who disagreed attended meetings and were made to feel comfortable. I think it’s super plausible that this leads to some kind of pernicious effect, but I wish it spelt out more what.
I guess the best thing I can infer is that the author thinks public resignations and dissent would have been somewhat effective and the domesticated dissenters were basically ineffective?
Or is the context of the piece just that he’s explaining the absence of prominent public dissent?
I’d be interested in a few more details/gears. (Also, are you primarily replying about the immediate parent, i.e. domestication of dissent, or also about the previous one)
Two different angles of curiosity I have are:
what sort of things you might you look out for, in particular, to notice if this was happening to you at OpenAI or similar?
something like… what’s your estimate of the effect size here? Do you have personal experience feeling captured by this dynamic? If so, what was it like? Or did you observe other people seeming to be captured, and what was your impression (perhaps in vague terms) of the diff that the dynamic was producing?
I was talking about the immediate parent, not the previous one. Though as secrecy gets ramped up, the effect described in the previous one might set in as well.
I have personal experience feeling captured by this dynamic, yes, and from conversations with other people i get the impression that it was even stronger for many others.
Hard to say how large of an effect it has. It definitely creates a significant chilling effect on criticism/dissent. (I think people who were employees alongside me while I was there will attest that I was pretty outspoken… yet I often found myself refraining from saying things that seemed true and important, due to not wanting to rock the boat / lose ‘credibility’ etc.
The point about salving the consciences of the majority is interesting and seems true to me as well. I feel like there’s definitely a dynamic of ‘the dissenters make polite reserved versions of their criticisms, and feel good about themselves for fighting the good fight, and the orthodox listen patiently and then find some justification to proceed as planned, feeling good about themselves for hearing out the dissent.’
I don’t know of an easy solution to this problem. Perhaps something to do with regular anonymous surveys? idk.
The Wikipedia articles on the VNM theorem, Dutch Book arguments, money pump, Decision Theory, Rational Choice Theory, etc. are all a horrific mess. They’re also completely disjoint, without any kind of Wikiproject or wikiboxes for tying together all the articles on rational choice.
It’s worth noting that Wikipedia is the place where you—yes, you!—can actually have some kind of impact on public discourse, education, or policy. There is just no other place you can get so many views with so little barrier to entry. A typical Wikipedia article will get more hits in a day than all of your LessWrong blog posts have gotten across your entire life, unless you’re @Eliezer Yudkowsky.
I’m not sure if we actually “failed” to raise the sanity waterline, like people sometimes say, or if we just didn’t even try. Given even some very basic low-hanging fruit interventions like “write a couple good Wikipedia articles” still haven’t been done 15 years later, I’m leaning towards the latter. edit me senpai
I appreciate the intention here but I think it would need to be done with considerable care, as I fear it may have already led to accidental vandalism of the epistemic commons. Just skimming a few of these Wikipedia pages, I’ve noticed several new errors. These can be easily spotted by domain experts but might not be obvious to casual readers.[1] I can’t know exactly which of these are due to edits from this community, but some very clearly jump out.[2]
I’ll list some examples below, but I want to stress that this list is not exhaustive. I didn’t read most parts of most related pages, and I omitted many small scattered issues. In any case, I’d like to ask whoever made any of these edits to please reverse them, and to triple check any I didn’t mention below.[3] Please feel free to respond to this if any of my points are unclear![4]
False statements
The page on Independence of Irrelevant Alternatives (IIA) claims that IIA is one of the vNM axioms, and that one of the vNM axioms “generalizes IIA to random events.”
Both are false. The similar-sounding Independence axiom of vNM is neither equivalent to, nor does it entail, IIA (and so it can’t be a generalisation). You can satisfy Independence while violating IIA. This is a not a technicality; it’s a conflation of distinct and important concepts. This is repeated in several places.
The mathematical statement of Independence there is wrong. In the section conflating IIA and Independence, it’s defined as the requirement that
for any p∈[0,1] and any outcomes Bad, Good, and N satisfying Bad≺Good. This mistakes weak preference for strict preference. To see this, set p=1 and observe that the line now reads N≺N. (The rest of the explanation in this section is also problematic but the reasons for this are less easy to briefly spell out.)
The Dutch book page states that the argument demonstrates that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” This is false. It is an argument for probabilistic beliefs; it implies nothing at all about preferences. And in fact, the standard proof of the Dutch book theorem assumes something like expected utility (Ramsey’s thesis).
This is a substantial error, making a very strong claim about an important topic. And it’s repeated elsewhere, e.g. when stating that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems.”
The section ‘The theorem’ on the vNM page states the result using strict preference/inequality. This is a corollary of the theorem but does not entail it.
Misleading statements
The decision theory page states that it’s “a branch of applied probability theory and analytic philosophy concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical consequences to the outcome.” This is a poor description. Decision theorists don’t simply assume this, nor do they always conclude it—e.g. see work on ambiguity or lexicographic preferences. And besides this, decision theory is arguably more central in economics than the fields mentioned.
The IIA article’s first sentence states that IIA is an “axiom of decision theory and economics” whereas it’s classically one of social choice theory, in particular voting. This is at least a strange omission for the context-setting sentence of the article.
It’s stated that IIA describes “a necessary condition for rational behavior.” Maybe the individual-choice version of IIA is, but the intention here was presumably to refer to Independence. This would be a highly contentious claim though, and definitely not a formal result. It’s misleading to describe Independence as necessary for rationality.
The vNM article states that obeying the vNM axioms implies that agents “behave as if they are maximizing the expected value of some function defined over the potential outcomes at some specified point in the future.” I’m not sure what ‘specified point in the future’ is doing there; that’s not within the framework.
The vNM article states that “the theorem assumes nothing about the nature of the possible outcomes of the gambles.” That’s at least misleading. It assumes all possible outcomes are known, that they come with associated probabilities, and that these probabilities are fixed (e.g., ruling out the Newcomb paradox).
Besides these problems, various passages in these articles and others are unclear, lack crucial context, contain minor issues, or just look prone to leave readers with a confused impression of the topic. (This would take a while to unpack, so my many omissions should absolutely not be interpreted as green lights.) As OP wrote: these pages are a mess. But I fear the recent edits have contributed to some of this.
So, as of now, I’d strongly recommend against reading Wikipedia for these sorts of topics—even for a casual glance. A great alternative is the Stanford Encyclopedia of Philosophy, which covers most of these topics.
I would do it myself but I don’t know what the original articles said and I’d rather not have to learn the Wikipedia guidelines and re-write the various sections from scratch.
I don’t apprecaite the hostility. I aimed to be helpful in spending time documenting and explaining these errors. This is something a heathy epistemic community is appreciative of, not annoyed by. If I had added mistaken passages to Wikipedia, I’d want to be told, and I’d react by reversing them myself. If any points I mentioned weren’t added by you, then as I wrote in my first comment:
...let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.
The point of writing about the mistakes here is to make clear why they indeed are mistakes, so that they aren’t repeated. That has value. And although I don’t think we should encourage a norm that those who observe and report a problem are responsible for fixing it, I will try to find and fix at least the pre-existing errors.
I’m not annoyed by these, and I’m sorry if it came across that way. I’m grateful for your comments. I just meant to say these are exactly the sort of mistakes I was talking about in my post as needing fixing! However, talking about them here isn’t going to do much good, because people read Wikipedia, not LessWrong shortform comments, and I’m busy as hell working on social choice articles already.
From what I can tell, there’s one substantial error I introduced, which was accidentally conflating the two kinds of IIA. (Although I haven’t double-checked, so I’m not sure they’re actually unrelated.) Along with that there’s some minor errors involving strict vs. non-strict inequality which I’d be happy to see corrected.
Or to let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.
None of these changes are new as far as I can tell (I checked the first three), so I think your basic critique falls through. You can check the edit history yourself by just clicking on the “View History” button and then pressing the “cur” button next to the revision entry you want to see the diff for.
Like, indeed, the issues you point out are issues, but it is not the case that people reading this have made the articles worse. The articles were already bad, and “acting with considerable care” in a way that implies inaction would mean leaving inaccuracies uncorrected.
I think people should edit these pages, and I expect them to get better if people give it a real try. I also think you could give it a try and likely make things better.
Edit: Actually, I think my deeper objection is that most of the critiques here (made by Sammy) are just wrong. For example, of course Dutch books/money pumps frequently get invoked to justify VNM axioms. See for example this.
Edit: Actually, I think my deeper objection is that most of the critiques here (made by Sammy) are just wrong. For example, of course Dutch books/money pumps frequently get invoked to justify VNM axioms. See for example this.
Sami never mentioned money pumps. And “the Dutch books arguments” are arguments for probabilism and other credal norms[1], not the vNM axioms.
You can check the edit history yourself by just clicking on the “View History” button and then pressing the “cur” button next to the revision entry you want to see the diff for.
Note that if the edit history is long or you are doing a lot of checks, there are tools to bisect WP edit histories: at the top of the diff page, “External tools: Find addition/removal (Alternate)”
check the edit history yourself by just clicking on the “View History” button and then pressing the “cur” button
Great, thanks!
I hate to single out OP but those three points were added by someone with the same username (see first and second points here; third here). Those might not be entirely new but I think my original note of caution stands.
Well, thinking harder about this, I do think your critiques on some of these is wrong. For example, it is the case that the VNM axioms frequently get justified by invoking dutch books (the most obvious case is the argument for transitivity, where the standard response is “well, if you have circular preferences I can charge you a dollar to have you end up where you started”).
Of course, justifying axioms is messy, and there isn’t any particularly objective way of choosing axioms here, but in as much as informal argumentation happens, it tends to use a dutch book like structure. I’ve had many conversations with formal academic experience in academia and economics here, and this is definitely a normal way for dutch books to go.
We certainly are, which isn’t unique to either of us; Savage discusses them all in a single common framework on decision theory, where he develops both sets of ideas jointly. A money pump is just a Dutch book where all the bets happen to be deterministic. I chose to describe things this way because it lets me do a lot more cross-linking within Wikipedia articles on decision theory, which encourages people reading about one to check out the other.
I’ve pretty consistently (by many different people) seen “Dutch Book arguments” used interchangeably with money pumps. My understanding (which is also the SEP’s) is that “what is a money pump vs. a dutch book argument” is not particularly well-defined and the structure of the money pump arguments is basically the same as the structure of the dutch book arguments.
This is evident from just the basic definitions:
“A Dutch book is a set of bets that ensures a guaranteed loss, i.e. the gambler will lose money no matter what happens.”
Which is of course exactly what a money pump is (where you are the person offering the gambles and therefore make guaranteed money).
The money pump Wikipedia article also links to the Dutch book article, and the book/paper I linked describes dutch books as a kind of money pump argument. I have never heard anyone make a principled distinction between a money pump argument and a dutch book argument (and I don’t see how you could get one without the other).
A pattern of intransitive or cyclic preferences causing a decision maker to be willing to pay repeated amounts of money to have these preferences satisfied without gaining any benefit. [...] Also called a Dutch book [...]
(Edit: It’s plausible that for weird historical reasons the exact same argument, when applied to probabilism would be called a “dutch book” and when applied to anything else would be called a “money pump”, but I at least haven’t seen anyone defend that distinction, and it doesn’t seem to follow from any of the definitions)
I think it’ll be helpful to look at the object level. One argument says: if your beliefs aren’t probabilistic but you bet in a way that resembles expected utility, then you’re succeptible to sure loss. This forms an argument for probabilism.[1]
Another argument says: if your preferences don’t satisfy certain axioms but satisfy some other conditions, then there’s a sequence of choices that will leave you worse off than you started. This forms an agument for norms on preferences.
These are distinct.
These two different kinds of arguments have things in common. But they are not the same argument applied in different settings. They have different assumptions, and different conclusions. One is typically called a Dutch book argument; the other a money pump argument. The former is sometimes referred to as a special case of the latter.[2] But whatever our naming convensions, it’s a special case that doesn’t support the vNM axioms.
Here’s why this matters. You might read assumptions of the Dutch book theorem, and find them compelling. Then you read a article telling you that this implies the vNM axioms (or constitutes an argument for them). If you believe it, you’ve been duped.
This distinction is standard and blurring the lines leads to confusions. It’s unfortunate when dictionaries, references, or people make mistakes. More reliable would be a key book on money pumps (Gustafsson 2022) referring to a key book on Dutch books (Pettigrew 2020):
“There are also money-pump arguments for other requirements of rationality. Notably, there are money-pump arguments that rational credences satisfy the laws of probability. (See Ramsey 1931, p. 182.) These arguments are known as Dutch-book arguments. (See Lehman 1955, p. 251.) For an overview, see Pettigrew 2020.” [Footnote 9.]
I mean, I think it would be totally reasonable for someone who is doing some decision theory or some epistemology work, to come up with new “dutch book arguments” supporting whatever axioms or assumptions they would come up with.
I think I am more compelled that there is a history here of calling money pump arguments that happen to relate to probabilism “dutch books”, but I don’t think there is really any clear definition that supports this. I agree that there exists the dutch book theorem, and that that one importantly relates to probabilism, but I’ve just had dozens of conversations with academics and philosophers and academics and decision-theorists, where in the context of both decision-theory and epistemology question, people brought up dutch books and money pumps interchangeably.
I agree that there exists the dutch book theorem, and that that one importantly relates to probabilism
I’m glad we could converge on this, because that’s what I really wanted to convey.[1] I hope it’s clearer now why I included these as important errors:
The statement that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems” is false since these theorems only relate to belief norms like probabilism. Changing this to ‘money pump arguments’ would fix it.
There’s a claim on the main Dutch book page that the arguments demonstrate that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” I wouldn’t have said it was false if this was about money pumps.[2] I would’ve said there was a terminological issue if the page equated Dutch books and money pumps. But it didn’t.[3] It defined a Dutch book as “a set of bets that ensures a guaranteed loss.” And the theorems and arguments relating to that do not support the vNM axioms.
The issue of which terms to use isn’t that important to me in this case, but let me speculate about something. If you hear domain experts go back and forth between ‘Dutch books’ and ‘money pumps’, I think that is likely either because they are thinking of the former as a special case of the latter without saying so explicitly, or because they’re listing off various related ideas. If that’s not why, then they may just be mistaken. After all, a Dutch book is named that way because a bookie is involved!
It looks like OP edited the page just today and added ‘or money pump’. But the text that follows still describes a Dutch book, i.e. a set of bets. (Other things were added too that I find problematic but this footnote isn’t the place to explain it.)
A typical Wikipedia article will get more hits in a day than all of your LessWrong blog posts have gotten across your entire life, unless you’re @Eliezer Yudkowsky.
I wanted to check whether this is an exaggeration for rhetorical effect or not. Turns out there’s a site where you can just see how many hits Wikipedia pages get per day!
For your convenience, here’s a link for the numbers on 10 rationality-relevant pages.
I’m pretty sure my LessWrong posts have gotten more than 1000 hits across my entire life (and keep in mind that “hits” is different from “an actual human actually reads the article”), but fair enough—Wikipedia pages do get a lot of views.
Thanks for the parent for flagging this and doing editing. What I’d now want to see is more people actually coordinating to do something about it—set up a Telegram or Discord group or something, and start actually working on improving the pages—rather than this just being one of those complaints on how Rationalists Never Actually Tried To Win, which a lot of people upvote and nod along with, and which is quickly forgotten without any actual action.
(Yes, I’m deliberately leaving this hanging here without taking the next action myself; partly because I’m not an expert Wikipedia editor, partly because I figured that if no one else is willing to take the next action, then I’m much more pessimistic about this initiative.)
What I’d now want to see is more people actually coordinating to do something about it—set up a Telegram or Discord group or something, and start actually working on improving the pages—rather than this just being one of those complaints on how Rationalists Never Actually Tried To Win, which a lot of people upvote and nod along with, and which is quickly forgotten without any actual action.
So mote it be. I can start the group/server and do moderation (though not 24⁄7, of course). Whoever is reading this: please choose between Telegram and Discord with inline react.
Moderation style I currently use: “reign of terror”, delete offtopic messages immediately, after large discussions delete the messages which do not carry much information (even if someone replies to them).
I’m pretty sure my LessWrong posts have gotten more than 1000 hits across my entire life (and keep in mind that “hits” is different from “an actual human actually reads the article”), but fair enough—Wikipedia pages do get a lot of views.
Wikipedia pageviews punch above their weight. First, your pageviews probably do drop off rapidly enough that it is possible that a WP day = lifetime. People just don’t go back and reread most old LW links. I mean, look at the submission rate—there’s like a dozen a day or something. (I don’t even read most LW submissions these days.) While WP traffic is extremely durable: ‘Expected value’ will be pulling in 1.7k hits/days (or more) likely practically forever.
Second, the quality is distinct. A Wikipedia article is an authoritative reference which is universally consulted and trusted. That 1.7k excludes all access via the APIs AFAIK, and things like readers who read the snippets in Google Search. If you Google the phrase ‘expected value’, you may not even click through to WP because you just read the searchbox snippet:
About
In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes.
This includes machine learning. Every LLM is trained very heavily on Wikipedia; any given LW page, on the other hand, may well not make the cut, either because it’s too recent to show up in the old datasets everyone starts with like The Pile, or because it gets filtered out for bad reasons, or they just don’t train enough tokens. And there is life beyond LLM in ML (hard to believe these days, but I am told ML researchers still exist who do other things), and WP articles will be in those, as part of the network or WikiData etc. A LW post will not.
Then you have the impact of WP. As anyone who’s edited niche topics for years can tell you, WP articles are where everyone starts, and you can see the traces for decades afterwards. Hallgren mentions David Gerard, and Roko’s Basilisk is a good example of that—it is the one thing “everyone knows” about LessWrong, and it is due almost solely to Wikipedia. The hit count on the ‘LessWrong’ WP article will never, ever reflect that.
But editing WP is difficult even without a Gerard, because of the ambient deletionists. An example: you may have seen recently going around (even on MR) a Wikipedia link about the interesting topic of ‘disappearing polymorphs’. It is a fascinating chemistry topic, but on Gwern.net, I did not link to it, but to a particular revision of another article. Why? Because an editor, Smokefoot, butchered it after I drew attention to it on social media prior to the current burst of attention. (Far from the first time—this is one of the hazards of highlighting any Wikipedia article.) We can thank Yitzilitt & Cosmia Nebula for since writing a new ‘Disappearing polymorph’ article which can stand up to Smokefoot’s butchering; it is almost certainly the case that it took them 100x, if not 1000x, more time & effort to write that than it took Smokefoot to delete the original material. (On WP, when dealing with a deletionist, it is worse than “Brandolini’s law”—we should be so lucky that it only took 10x the effort...)
Concurring with the sentiment, I have realized that nothing I write is going to be as well-read as Wikipedia, so I have devoted myself to writing Wikipedia instead of trying to get a personal blog anymore.
I will comment on a few things:
I really want to get the neural scaling law page working with some synthesis and updated data, but currently there are no good theoretical synthesis. Wikipedia isn’t good for just a giant spreadsheet.
I wrote most of the GAN page, the Diffusion Model page, Mixture of Experts, etc. I also wrote a few sections of LLM and keep the giant table updated for each frontier model. I am somewhat puzzled by the fact that it seems I am the only pony who thought of this. There are thousands of ML personal blogs, all in the Celestia-forsaken wasteland of not getting read, and then there is Wikipedia… but nopony is writing there? Well, I guess my cutie mark is in Wikipedia editing.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth! So I fired the Orbital Friendship Mathematical Cannon. I thought that if I’m not going to write another blog, then Wikipedia has to be on the same level of a good blog, so I set my goal to the Lilian Wang’s blog level, and a lack of mathematics is definitely bad.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY. (Weng’s blog, ironically, might make the cut as a secondary source, despite containing pretty much just paraphrases or quotes from primary sources, but only because she’s an OA exec.) Which is a big part of why the DL articles all suck because there just aren’t many good secondary or tertiary sources like encyclopedias. (Well, there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.) There is no GAN textbook I know of which is worthwhile, and I doubt ever will be.
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY.
Aren’t most of the sources going to be journal articles? Academic papers are definitely fair game for citations (and generally make up most citations on Wikipedia).
there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.
I hate Schmimdhuber with a passion because I can smell everything he touches on Wikipedia and they are always terrible.
Sometimes when I read pages about AI, I see things that almost certainly came from him, or one of his fans. I struggle to speak of exactly what Schmidhuber’s kind of writing gives, but perhaps this will suffice: “People never give the right credit to anything. Everything of importance is either published by my research group first but miscredited to someone later, or something like that. Deep Learning? It’s done not by Hinton, but Amari, but not Amari, but by Ivanenkho. The more obscure the originator, the better, because it reveals how bad people are at credit assignment—if they were better at it, the real originators would not have been so obscure.”
For example, LSTM is actually originated by Schmidhuber… and actually, it’s also credited to Schmidhuber (… or maybe Hochreiter?). But then GAN should be credited to Schmidhuber, and also Transformers. Currently he (or his fans) kept trying to put the phrase “internal spotlights of attention” into the Transformer page, and I kept removing it. He wanted the credit so much that he went for argument-by-punning, renaming “fast weight programmer” to “linear transformers”, and to quote out of context “internal spotlights of attention” just to fortify the argument with a pun! I can do puns too! Rosenblatt (1962) even wrote about “back-propagating errors” in an MLP with a hidden layer. So what?
I actually took Schmidhuber’s claim seriously and carefully rewrote of Ivanenkho’s Group method of data handling, giving all the mathematical details, so that one may evaluate it for itself instead of Schmidhuber’s claim. A few months later someone manually reverted everything I wrote! What does it read like according to a partisan of Ivanenkho?
The development of GMDH consists of a synthesis of ideas from different areas of science: the cybernetic concept of “black box” and the principle of successive genetic selection of pairwise features, Godel’s incompleteness theorems and the Gabor’s principle of “freedom of decisions choice”, and the Beer’s principle of external additions. GMDH is the original method for solving problems for structural-parametric identification of models for experimental data under uncertainty… Since 1989 the new algorithms (AC, OCC, PF) for non-parametric modeling of fuzzy objects and SLP for expert systems were developed and investigated. Present stage of GMDH development can be described as blossom out of deep learning neuronets and parallel inductive algorithms for multiprocessor computers.
Well excuse me, “Godel’s incompleteness theorems”? “the original method”? Also, I thought “fuzzy” has stopped being fashionable since 1980s. I actually once tried to learn fuzzy logic and gave up after not seeing what is the big deal. It is filled with such pompous and self-important terminology, as if the lack of substance must be made up by the heights of spiritual exhortation. Why say “combined” when they could say “consists of a synthesis of ideas from different areas of science”?
As a side note, such turgid prose, filled with long noun-phrases is pretty common among the Soviets. I once read that this kind of massive noun-phrase had a political purpose, but I don’t remember what it is.
I think it’s unrelated; David Gerard is mean to rationalists and spends lots of time editing articles about LW/ACX, but doesn’t torch articles about math stuff. The reason these articles are bad is because people haven’t put much effort into them.
Ah, in a parallel universe without David Gerard the obvious next step would be to create a WikiProject Rationality. In this universe, this probably wouldn’t end well? Coordination outside Wikipedia is also at risk of accusation of brigading or something.
Eh, wasn’t Arbital meant to be that, or something like it? Anyway, due to network effects I don’t see how any new wiki-like project could ever reasonably compete with Wikipedia.
I think the thing that actually makes people more rational is thinking of them as principles you can apply to your own life rather than abstract notions, which is hard to communicate in a Wikipedia page about Dutch books.
Five wikiprojects rely on this article, but it is C-class on Wikipedia scale;
Topic seems quite important for people. If someone not knowing how to make decisions stumbles upon the article, the first image they see… is a flowchart, which can scare non-programmists away.
This seems like a much better target for spreading rationalism. The other listed articles all seem quite detailed and far from the central rationalist project. Decision-making seems like a more likely on-ramp.
Here’s a gdoc comment I made recently that might be of wider interest:
You know I wonder if this standard model of final goals vs. instrumental goals has it almost exactly backwards. Would love to discuss sometime.
Maybe there’s no such thing as a final goal directly. We start with a concept of “goal” and then we say that the system has machinery/heuristics for generating new goals given a context (context may or may not contain goals ‘on the table’ already). For example, maybe the algorithm for Daniel is something like: --If context is [safe surroundings]+[no goals]+[hunger], add the goal “get food.” —If context is [safe surroundings]+[travel-related-goal]+[no other goals], Engage Route Planning Module. -- … (many such things like this)
It’s a huge messy kludge, but it’s gradually becoming more coherent as I get older and smarter and do more reflection.
What are final goals? Well a goal is final for me to the extent that it tends to appear in a wide range of circumstances, to the extent that it tends to appear unprompted by any other goals, to the extent that it tends to take priority over other goals, … some such list of things like that.
For a mind like this, my final goals can be super super unstable and finicky and stuff like taking a philosophy class with a student who I have a crush on who endorses ideology X can totally change my final goals, because I have some sort of [basic needs met, time to think about long-term life ambitions] context and it so happens that I’ve learned (perhaps by experience, perhaps by imitation) to engage my philosophical reasoning module in that context, and also I’ve built my identity around being “Rational” in a way that makes me motivated to hook up my instrumental reasoning abilities to whatever my philosophical reasoning module shits out… meanwhile my philosophical reasoning module is basically just imitating patterns of thought I’ve seen high-status cool philosophers make (including this crush) and applying those patterns to whatever mental concepts and arguments are at hand.
To follow up, this might have big implications for understanding AGI. First of all, it’s possible that we’ll build AGIs that aren’t like that and that do have final goals in the traditional sense—e.g. because they are a hybrid of neural nets and ordinary software, involving explicit tree search maybe, or because SGD is more powerful at coherentizing the neural net’s goals than whatever goes on in the brain. If so, then we’ll really be dealing with a completely different kind of being than humans, I think.
Effective layer horizon of transformer circuits. The residual stream norm grows exponentially over the forward pass, with a growth rate of about 1.05. Consider the residual stream at layer 0, with norm (say) of 100. Suppose the MLP heads at layer 0 have outputs of norm (say) 5. Then after 30 layers, the residual stream norm will be 100⋅1.0530≈432.2. Then the MLP-0 outputs of norm 5 should have a significantly reduced effect on the computations of MLP-30, due to their smaller relative norm.
On input tokens x, let Attni(x),MLPi(x) be the original model’s sublayer outputs at layer i. I want to think about what happens when the later sublayers can only “see” the last few layers’ worth of outputs.
Definition: Layer-truncated residual stream. A truncated residual stream from layer n1 to layer n2 is formed by the original sublayer outputs from those layers.
hn1:n2(x):=n2∑i=n1Attni(x)+MLPi(x).
Definition: Effective layer horizon. Let k>0 be an integer. Suppose that for all n≥k, we patch in h(n−k):n(x) for the usual residual stream inputs hn(x).[1] Let the effective layer horizon be the smallest k for which the model’s outputs and/or capabilities are “qualitatively unchanged.”
Lastly, slower norm growth probably causes the effective layer horizon to be lower. In that case, simply measuring residual stream norm growth would tell you a lot about the depth of circuits in the model, which could be useful if you want to regularize against that or otherwise decrease it (eg to decrease the amount of effective serial computation).
Do models have an effective layer horizon? If so, what does it tend to be as a function of model depth and other factors—are there scaling laws?
For notational ease, I’m glossing over the fact that we’d be patching in different residual streams for each sublayer of layer n. That is, we wouldn’t patch in the same activations for both the attention and MLP sublayers of layer n.
For example, if a model has an effective layer horizon of 5, then a circuit could run through the whole model because a layer n head could read out features output by a layer n−5 circuit, and then n+5 could read from n…
[edit: stefan made the same point below earlier than me]
Nice idea! I’m not sure why this would be evidence for residual networks being an ensemble of shallow circuits — it seems more like the opposite to me? If anything, low effective layer horizon implies that later layers are building more on the outputs of intermediate layers. In one extreme, a network with an effective layer horizon of 1 would only consist of circuits that route through every single layer. Likewise, for there to be any extremely shallow circuits that route directly from the inputs to the final layer, the effective layer horizon must be the number of layers in the network.
I do agree that low layer horizons would substantially simplify (in terms of compute) searching for circuits.
I like this idea! I’d love to see checks of this on the SOTA models which tend to have lots of layers (thanks @Joseph Miller for running the GPT2 experiment already!).
I notice this line of argument would also imply that the embedding information can only be accessed up to a certain layer, after which it will be washed out by the high-norm outputs of layers. (And the same for early MLP layers which are rumoured to act as extended embeddings in some models.) -- this seems unexpected.
If the effective layer horizon is 25, then this path cannot work because the output of MLP10 gets lost. In fact, no path with less than 3 modules is possible because there would always be a gap > 25.
Only a less-shallow paths would manage to influence the output of the model
I believe the horizon may be large because, even if the approximation is fairly good at any particular layer, the errors compound as you go through the layers. If we just apply the horizon at the final output the horizon is smaller.
However, if we apply at just the middle layer (6), the horizon is surprisingly small, so we would expect relatively little error propagated.
But this appears to be an outlier. Compare to 5 and 7.
xAI has ambitions to compete with OpenAI and DeepMind, but I don’t feel like it has the same presence in the AI safety discourse. I don’t know anything about its attitude to safety, or how serious a competitor it is. Are there good reasons it doesn’t get talked about? Should we be paying it more attention?
This should be good for training runs that could be said to cost $1 billion in cost of time (lasting a few months). And Dario Amodei is saying that this is the scale of today, for models that are not yet deployed. This puts xAI at 18 months behind, a difficult place to rebound from unless long-horizon task capable AI that can do many jobs (a commercially crucial threshold that is not quite AGI) is many more years away.
For some reason current labs are not running $10 billion training runs already, didn’t build the necessary datacenters immediately. It would take a million H100s and 1.5 gigawatts, supply issues seem likely. There is also a lot of engineering detail to iron out, so the scaling proceeds gradually.
But some of this might be risk aversion, unwillingness to waste capital where a slower pace makes a better use of it. As a new contender has no other choice, we’ll get to see if it’s possible to leapfrog scaling after all. And Musk has affinity with impossible deadlines (not necessarily with meeting them), so the experiment will at least be attempted.
I’ve asked similar questions before and heard a few things. I also have a few personal thoughts that I thought I’d share here unprompted. This topic is pretty relevant for me so I’d be interested in what specific claims in both categories people agree/disagree with.
Things I’ve heard:
There’s some skepticism about how well-positioned xAI actually is to compete with leading labs, because although they have a lot of capital and ability to fundraise, lots of the main bottlenecks right now can’t simply be solved by throwing more money at the problem. E.g. building infrastructure, securing power contracts, hiring top engineers, accessing huge amounts of data, and building on past work are all pretty limited by non-financial factors, and therefore the incumbents have lots of advantages. That being said, it’s placed alongside Meta and Google in the highest liquidity prediction market I could find about this asking which labs will be “top 3” in 2025.
There’s some optimism about their attitude to safety since Elon has been talking about catastrophic risks from AI in no uncertain terms for a long time. There’s also some optimism coming from the fact that he/xAI opted to appoint Dan Hendrycks as an advisor.
Personal thoughts:
I’m not that convinced that they will take safety seriously by default. Elon’s personal beliefs seem to be hard to pin down/constantly shifting, and honestly, he hasn’t seemed to be doing that well to me recently. He’s long had a belief that the SpaceX project is all about getting humanity off Earth before we kill ourselves, and I could see a similar attitude leading to the “build ASI asap to get us through the time of perils” approach that I know others at top AI labs have (if he doesn’t feel this way already).
I also think (~65%) it was a strategic blunder for Dan Hendrycks to take a public position there. If there’s anything I took away from the OpenAI meltdown, it’s a greater belief in something like “AI Safety realpolitik;” that is, when the chips are down, all that matters is who actually has the raw power. Fancy titles mean nothing, personal relationships mean nothing, heck, being a literal director of the organization means nothing, all that matters is where the money and infrastructure and talent is. So I don’t think the advisor position will mean much, and I do think it will terribly complicate CAIS’ efforts to appear neutral, lobby via their 501c4, etc. I have no special insight here so I hope I’m missing something, or that the position does lead to a positive influence on their safety practices that wouldn’t have been achieved by unofficial/ad-hoc advising.
I think most AI safety discourse is overly focused on the top 4 labs (OpenAI, Anthropic, Google, and Meta) and underfocused on international players, traditional big tech (Microsoft, Amazon, Apple, Samsung), and startups (especially those building high-risk systems like highly-technical domain specialists and agents). Similarly, I think xAI gets less attention than it should.
Probably preaching to the choir here, but I don’t understand the conceivability argument for p-zombies. It seems to rely on the idea that human intuitions (at least among smart, philosophically sophisticated people) are a reliable detector of what is and is not logically possible.
But we know from other areas of study (e.g. math) that this is almost certainly false.
Eg, I’m pretty good at math (majored in it in undergrad, performed reasonably well). But unless I’m tracking things carefully, it’s not immediately obvious to me (and certainly not inconceivable) that pi is a rational number. But of course the irrationality of pi is not just an empirical fact but a logical necessity.
Even more straightforwardly, one can easily construct Boolean SAT problems where the answer can conceivably be either True or False to a human eye. But only one of the answers is logically possible! Humans are far from logically omniscient rational actors.
Conceivability is not invoked for logical statements, or mathematical statements about abstract objects. But zombies seem to be concrete rather than abstract objects. Similar to pink elephants. It would be absurd to conjecture that pink elephants are mathematically impossible. (More specifically, both physical and mental objects are typically counted as concrete.) It would also seem strange to assume that elephants being pink is logically impossible. Or things being faster than light. These don’t seem like statements that could hide a logical contradiction.
Those considerations aside, the main way in which conceivability arguments can go wrong is by subtle conceptual confusion: if we are insufficiently reflective we can overlook an incoherence in a purported possibility, by taking a conceived-of situation and misdescribing it. For example, one might think that one can conceive of a situation in which Fermat’s last theorem is false, by imagining a situation in which leading mathematicians declare that they have found a counterexample. But given that the theorem is actually true, this situation is being misdescribed: it is really a scenario in which Fermat’s last theorem is true, and in which some mathematicians make a mistake. Importantly, though, this kind of mistake always lies in the a priori domain, as it arises from the incorrect application of the primary intensions of our concepts to a conceived situation. Sufficient reflection will reveal that the concepts are being incorrectly applied, and that the claim of logical possibility is not justified.
So the only route available to an opponent here is to claim that in describing the zombie world as a zombie world, we are misapplying the concepts, and that in fact there is a conceptual contradiction lurking in the description. Perhaps if we thought about it clearly enough we would realize that by imagining a physically identical world we are thereby automatically imagining a world in which there is conscious experience. But then the burden is on the opponent to give us some idea of where the contradiction might lie in the apparently quite coherent description. If no internal incoherence can be revealed, then there is a very strong case that the zombie world is logically possible.
As before, I can detect no internal incoherence; I have a clear picture of what I am conceiving when I conceive of a zombie. Still, some people find conceivability arguments difficult to adjudicate, particularly where strange ideas such as this one are concerned. It is therefore fortunate that every point made using zombies can also be made in other ways, for example by considering epistemology and analysis. To many, arguments of the latter sort (such as arguments 3-5 below) are more straightforward and therefore make a stronger foundation in the argument against logical supervenience. But zombies at least provide a vivid illustration of important issues in the vicinity.
(II.7, “Argument 1: The logical possibility of zombies”. Pg. 98).
I think there’s an underlying failure to define what it is that’s logically conceivable. Those math problems have a formal definition of correctness. P-zombies do not—even if there is a compelling argument, we have no clue what the results mean, or how we’d verify them. Which leads to realizing that even if someone says “this is conceivable”, you have no reason to believe they’re conceiving the same thing you mean.
I think you’re objecting to 2. I think you’re using a loose definition of “conceivable,” meaning no contradiction obvious to the speaker. I agree that’s not relevant. The relevant notion of “conceivable” is not conceivable by a particular human but more like conceivable by a super smart ideal person who’s thought about it for a long time and made all possible deductions.
1. doesn’t just follow from some humans’ intuitions: it needs argument.
Sure but then this begs the question since I’ve never met a super smart ideal person who’s thought about it for a long time and made all possible deductions. So then using that definition of “conceivable”, 1) is false (or at least undetermined).
we can make progress by thinking about it and making arguments.
I mean real progress is via proof and things leading up to a proof right? I’m not discounting mathematical intuition here but the ~entirety of the game comes from the correct formalisms/proofs, which is a very different notion of “thinking.”
Put in a different way, mathematics (at least ideally, in the abstract) is ~mind-independent.
Do you think ideal reasoning is well-defined? In the limit I feel like you run into classic problems like anti-induction, daemons, and all sorts of other issues that I assume people outside of our community also think about. Is there a particularly concrete definition philosophers like Chalmers use?
Our subjective experience of the arrow of time is occasionally suggested to be an essentially entropic phenomenon.
This sounds cool and deep but crashes headlong into the issue that the entropy rate and the excess entropy of any stochastic process is time-symmetric. I find it amusing that despite hearing this idea often from physicists and the like apparently this rather elementary fact has not prevented their storycrafting.
Luckily, computational mechanics provides us with a measure that is not time symmetric: the stochastic complexity of the epsilon machine C
For any stochastic process we may also consider the epsilon machine of the reverse process, in other words the machine that predicts the past based on the future. This can be a completely different machine whose reverse stochastic complexity Crev is not equal to C.
Some processes are easier to predict forward than backward. For example, there is considerable evidence that language is such a process. If the stochastic complexity and the reverse stochastic complexity differ we speak of a causally assymetric process.
Alec Boyd pointed out to me that the classic example of a glass falling of a table is naturally thought of in these terms. The forward process is easy to describe while the backward process is hard to describe where easy and hard are meant in the sense of stochastic complexity: bits needed to specify the states of perfect minimal predictor, respectively retrodictor.
rk. note that time assymmetry is a fundamentally stochastic phenomenon. THe underlyiing (let’s say classicially deterministic) laws are still time symmetric.
The hypothesis is then: many, most macroscopic processes of interest to humans, including other agents are fundamentally such causally assymetric (and cryptic) processes.
This sounds cool and deep but crashes headlong into the issue that the entropy rate and the excess entropy of any stochastic process is time-symmetric.
It’s time symmetric around a starting point t0 of low entropy. The further t is from t0, the more entropy you’ll have, in either direction. The absolute value |t−t0| is what matters.
In this case, t0 is usually taken to be the big bang. So the further in time you are from the big bang, the less the universe is like a dense uniform soup with little structure that needs description, and the higher your entropy will be. That’s how you get the subjective perception of temporal causality.
Presumably, this would hold to the other side of t0 as well, if there is one. But we can’t extrapolate past t0, because close to t0 everything gets really really energy dense, so we’d need to know how to do quantum gravity to calculate what the state on the other side might look like. So we can’t check that. And the notion of time as we’re discussing it here might break down at those energies anyway.
See also the Past Hypothesis. If we instead take a non-speculative starting point as t0, namely now, we could no longer trust our memories, including any evidence we believe to have about the entropy of the past being low, or about physical laws stating that entropy increases with distance from t0. David Albert therefore says doubting the Past Hypothesis would be “epistemically unstable”.
1. Let E be the number of electoral votes in your state. We estimate the probability that these are necessary for an electoral college win by computing the proportion of the 10,000 simulations for which the electoral vote margin based on all the other states is less than E, plus 1⁄2 the proportion of simulations for which the margin based on all other states equals E. (This last part assumes implicitly that we have no idea who would win in the event of an electoral vote tie.) [Footnote: We ignored the splitting of Nebraska’s and Maine’s electoral votes, which retrospectively turned out to be a mistake in 2008, when Obama won an electoral vote from one of Nebraska’s districts.]
2. We estimate the probability that your vote is decisive, if your state’s electoral votes are necessary, by working with the subset of the 10,000 simulations for which the electoral vote margin based on all the other states is less than or equal to E. We compute the mean M and standard deviation S of the vote margin among that subset of simulations and then compute the probability of an exact tie as the density at 0 of the Student-t distribution with 4 degrees of freedom (df), mean M, and scale S.
The product of two probabilities above gives the probability of a decisive vote in the state.
This gives the following results for the 2008 presidential election, where they estimate that you had less than one chance in a hundred billion of deciding the election in DC, but better than a one in ten million chance in New Mexico. (For reference, 131 million people voted in the election.)
Is this basically correct?
(I guess you also have to adjust for your confidence that you are voting for the better candidate. Maybe if you think you’re outside the top ~20% in “voting skill”—ability to pick the best candidate—you should abstain. See also.)
I would assum they have the math right but not really sure why anyone cares. It’s a bit like the Voter’s Paradox. In and of it self it points to an interesting phenomena to investivate but really doesn’t provide guidance for what someone should do.
I do find it odd that the probabilities are so low given the total votes you mention, and adding you also have 51 electoral blocks and some 530-odd electoral votes that matter. Seems like perhaps someone is missing the forest for the trees.
I would make an observation on your closing thought. I think if one holds that people who are not well informed, or perhaps less intelligent and so not as good at choosing good representatives then one quickly gets to most/many people should not be making their own economic decisions on consumption (or savings or investments). Simple premise here is that capital allocation matters to growth and efficiency (vis-a-vis production possibilities frontier). But that allocation is determined by aggregate spending on final goods production—i.e. consumer goods.
Seems like people have a more direct influence on economic activity and allocation via their spending behavior than the more indirect influence via politics and public policy.
Biden will die or otherwise withdraw from the race with 23% likelihood
Biden will fail to be the Democratic nominee for whatever reason at 13% likelihood
either Biden or Trump will fail to win nomination at their respective conventions with 14% likelihood
Biden will win the election with only 34% likelihood
Even if gas fees take a few percentage points off we should expect to make money trading on some of this stuff, right (the money is only locked up for 5 months)? And maybe there are cheap ways to transfer into and out of Polymarket?
Probably worthwhile to think about this further, including ways to make leveraged bets.
I think the FiveThirtyEight model is pretty bad this year. This makes sense to me, because it’s a pretty different model: Nate Silver owns the former FiveThirtyEight model IP (and will be publishing it on his Substack later this month), so FiveThirtyEight needed to create a new model from scratch. They hired G. Elliott Morris, whose 2020 forecasts were pretty crazy in my opinion.
Here are some concrete things about FiveThirtyEight’s model that don’t make sense to me:
There’s only a 30% chance that Pennsylvania, Michigan, or Wisconsin will be the tipping point state. I think that’s way too low; I would put this probability around 65%. In general, their probability distribution over which state will be the tipping point state is way too spread out.
They expect Biden to win by 2.5 points; currently he’s down by 1 point. I buy that there will be some amount of movement toward Biden in expectation because of the economic fundamentals, but 3.5 seems too much as an average-case.
I think their Voter Power Index (VPI) doesn’t make sense. VPI is a measure of how likely a voter in a given state is to flip the entire election. Their VPIs are way to similar. To pick a particularly egregious example, they think that a vote in Delaware is 1/7th as valuable as a vote in Pennsylvania. This is obvious nonsense: a vote in Delaware is less than 1% as valuable as a vote in Pennsylvania. In 2020, Biden won Delaware by 19%. If Biden wins 50% of the vote in Delaware, he will have lost the election in an almost unprecedented landslide.
I claim that the following is a pretty good approximation to VPI: (probability that the state is the tipping state) * (number of electoral votes) / (number of voters). If you use their tipping-point state probabilities, you’ll find that Pennsylvania’s VPI should be roughly 4.3 times larger than New Hampshire’s. Instead, FiveThirtyEight has New Hampshire’s VPI being (slightly) higher than Pennsylvania’s. I retract this: the approximation should instead be (tipping point state probability) / (number of voters). Their VPI numbers now seem pretty consistent with their tipping point probabilities to me, although I still think their tipping point probabilities are wrong.
The Economist also has a model, which gives Trump a 2⁄3 chance of winning. I think that model is pretty bad too. For example, I think Biden is much more than 70% likely to win Virginia and New Hampshire. I haven’t dug into the details of the model to get a better sense of what I think they’re doing wrong.
On the one hand, Nate Silver’s model now gives Trump a ~30% chance of winning in Virginia, making my side of the bet look good again.
On the other hand, the Economist model gives Trump a 10% chance of winning Delaware and a 20% chance of winning Illinois, which suggests that there’s something going wrong with the model and that it was untrustworthy a month ago.
That said, betting markets currently think there’s only a one in four chance that Biden is the nominee, so this bet probably won’t resolve.
Looks like this bet is voided. My take is roughly that:
To the extent that our disagreement was rooted in a difference in how much to weight polls vs. priors, I continue to feel good about my side of the bet.
I wouldn’t have made this bet after the debate. I’m not sure to what extent I should have known that Biden would perform terribly. I was blindsided by how poorly he did, but maybe shouldn’t have been.
I definitely wouldn’t have made this bet after the assassination attempt, which I think increased Trump’s chances. But that event didn’t update me on how good my side of the bet was when I made it.
I think there’s like a 75-80% chance that Kamala Harris wins Virginia.
Feel free to write a post if you find something worthwhile. I didn’t know how likely the whole Biden leaving the race thing was so 5% seemed prudent. At those odds, even if I belief the fivethirtyeight numbers I’d rather leave my money in etfs. I’d probably need something like >>1,2 multiplier in expected value before I’d bother. Last year when I was betting on Augur I was also heavily bitten by gas fees (150$ transaction costs to get my money back because gas fees exploded for eth), so would be good to know if this is a problem on polymarket also.
I have previously bet large sums on elections. Im not currently placing any bets on who will win the election. Seems too unclear to me (note I had a huge bet on biden in 2020, seemed clear then). However there are TONS of mispricings on polymarket and other sites. Things like ‘biden will withdraw or lose the nomination @ 23%’ is a good example.
Polymarket has gotten lots of attention in recent months, but I was shocked to find out how much inefficency there really is.
There was a market titled “What will Trump say during his RNC speech?” that was up a few days ago. At 7 pm, thetranscriptfor the speech was leaked, and you could easily find it by a google search or looking at the polymarket discord.
Trump started his speech at 9:30, and it was immediately that he was using the script. One entire hour into the speech I stumbled onto the transcript on Polymarkets discord. Despite the word “prisons” being in the leaked transcript that Trump was halfway through, Polymarket only gave it a 70% chance of being said. I quickly went to bet and made free money.
To be fair it was a smaller market with 800k in bets, but nonetheless I was shocked on how easy it was to make risk-free money.
Biden not being the democratic nominee at 13% while EITHER Biden or Trump not being their respective nominees at 14% implies a 1% chance that Trump won’t be the Republican nominee. There’s clearly an arbitrage there. Whether it merits the costs (gas, risk of polymarket default, lost opportunity of the escrowed wager) I have no clue.
These predictions, of course, are obviously nonsensical. If I had to guess, it’s a combination of: many crypto users being right-wing and the media they consume has convinced them that this is more likely than it would be in reality, and climbing crypto prices discouraging betting leading to decreased accuracy.
I’ll say that the climbing value of the currency as well as gas fees makes any prediction unwise, unless you believe you have massive advantage over the market. I’d personally pass on it, but other people are free to proceed with their money.
Betting against republicans and third parties on poly is a sound strategy, pretty clear they are marketing heavily towards republicans and the site has a crypto/republican bias. For anything controversial/political, if there is enough liq on manifold I generally trust it more (which sounds insane because fake money and all).
That being said, I don’t like the way Polymarket is run (posting the word r*tard over and over on Twitter, allowing racism in comments + discord, rugging one side on disputed outcomes, fake decentralization), so I would strongly consider not putting your money on PM and instead supporting other prediction markets, despite the possible high EV.
A random observation from a think tank event last night in DC—the average person in those rooms is convinced there’s a problem, but that it’s the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I’m not sure what to make of it, honestly.
There are (at least) two models which could partially explain this: 1) The high-status/high-rank people have that status because they’re better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it’ll come out OK without their involvement.
2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves.
edit: to clarify, these are two models that do NOT imply the obvious “smarter/more powerful people are correctly worried about the REAL threats, and the average person’s concerns are probably unimportant/uninformed”. It’s quite possible that this division doesn’t tell us much about the relative importance of those different risks.
Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don’t think I have anywhere near enough evidence yet to actually conclude that, so I’m just reporting the random observation for now :)
Here is a 5 minute, spicy take of an alignment chart.
What do you disagree with.
To try and preempt some questions:
Why is rationalism neutral?
It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.
Why are e/accs and EAs in the same group.
In the quick moments I took to make this, I found both EA and E/acc pretty hard to predict and pretty uncertain in overall impact across some range of forecasts.
Interesting. I always thought the D&D alignment chart was just a random first stab at quantizing a standard superficial Disney attitude toward ethics. This modification seems pretty sensible.
I think your good/evil axis is correct in terms of a deeper sense of the common terms. Evil people don’t try to harm others typically, they just don’t care- so their efforts to help themselves and their friends is prone to harm others. Being good means being good to everyone, not just your favorites. It’s the size of your circle of compassion. Outright malignancy, cackling about others suffering, is pretty eye-catching when it happens (and it does), but I’d say the vast majority of harm in the world has been done by people who are merely not much concerned with collateral damage. Thus, I think those deserve the term evil, lest we focus on the wrong thing.
Predictable/unpredictable seems like a perfectly good alternate label for the chaotic/lawful. In some adversarial situations, it makes sense to be unpredictable.
One big question is whether you’re referring to intentions or likely outcomes in your expected valaue (which I assume is expected value for all sentient beings or somethingg). A purely selfish person without much ambition may actually be a net good in the world; they work for the benefit of themselves and those close enough to be critical for their wellbeing, and they don’t risk causing a lot of harm since that might cause blowback. The same personality put in a position of power might do great harm, ordering an invasion or employee downsizing to benefit themselves and their family while greatly harming many.
Yeah I find the intention vs outcome thing difficult.
What do you think of “average expected value across small perturbations in your life”. Like if you accidentally hit churchill with a car and so cause the UK to lose WW2 that feels notably less bad than deliberately trying to kill a much smaller number of people. In many nearby universes, you didn’t kill churchill, but in many nearby universes that person did kill all those people.
I disagree with “of course”. The laws of cognition aren’t on any side, but human rationalists presumably share (at least some) human values and intend to advance them; insofar they are more successful than non-rationalists this qualifies as Good.
So by my metric, Yudkowsky and Lintemandain’s Dath Ilan isn’t neutral, it’s quite clearly lawful good, or attempting to be. And yet they care a lot about the laws of cognition.
So it seems to me that the laws of cognition can (should?) drive towards flouishing rather than pure knowledge increase. There might be things that we wish we didn’t know for a bit. And ways to increase our strength to heal rather than our strength to harm.
To me it seems a better rationality would be lawful good.
A lot of problems arise from inaccurate beliefs instead of bad goals. E.g. suppose both the capitalists and the communists are in favor of flourishing, but they have different beliefs on how best to achieve this. Now if we pick a bad policy to optimize for a noble goal, bad things will likely still follow.
I’m surprised that there hasn’t been more of a shift to ternary weights a la BitNet 1.58.
What stood out to me in that paper was the perplexity gains over fp weights in equal parameter match-ups, and especially the growth in the advantage as the parameter sizes increased (though only up to quite small model sizes in that paper, which makes me curious about the potential delta in modern SotA scales).
This makes complete sense from the standpoint of the superposition hypothesis (irrespective of its dimensionality, an ongoing discussion).
If nodes are serving more than one role in a network, then constraining the weight to a ternary value as opposed to a floating point range seems like it would be more frequently forcing the network to restructure overlapping node usage to better align nodes to shared directional shifts (positive, negative, or no-op) as opposed to compromise across multiple roles to a floating point avg of the individual role changes.
(Essentially resulting in a sharper vs more fuzzy network mapping.)
A lot of the attention for the paper was around the idea of the overall efficiency gains given the smaller memory footprint, but it really seems like even if there were no additional gains, that models being pretrained from this point onward should seriously consider clamping node precision to improve both the overall network performance and likely make interpretability more successful down the road to boot.
It may be that at the scales we are already at, the main offering of such an approach would be the perplexity advantages over fp weights, with the memory advantages as the beneficial side effect instead?
It is as though two rivals have discovered that there are genies in the area. Whichever of them finds a genie and learns to use its wishes can defeat their rival, humiliating or killing them if they choose. If they both have genies, it will probably be a standoff that encourages defection; these genies aren’t infinitely powerful or wise, so some creative offensive wish will probably bypass any number of defensive wishes. And there are others that may act if they don’t.
In this framing, the choice is pretty clear. If it’s dangerous to use a genie without taking time to understand and test it, too bad. Total victory or complete loss hang in the balance. If one is already ahead in the search, they’d better speed up and make sure their rival can’t follow their tracks to find a genie of their own.
This is roughly the scenario Aschenbrenner presents in Situational Awareness. But this is simplifying, and focusing attention on one part of the scenario, the rivalry and the danger. The full scenario is more complex.[1]
Of particular importance is that these “genies” can serve as well for peace as for war. The can grant wealth beyond imagination, and other things barely yet hoped for. And they will probably take substantial time to come into their full power.
This changes the overwhelming logic of racing. Using a genie to prevent a rival from acquiring one is not guaranteed to work, and it’s probably not possible without collateral damage. So trying that “obvious” strategy might result in the rival attacking in fear of or retaliation. Since both rivals are already equipped with dreadful offensive weapons, such a conflict could be catastrophic. This risk applies even if one is willing to assume that controlling the genie (alignment) is a solvable problem.
And we don’t know the depth of the rivalry. Might these two be content to both enjoy prosperity and health beyond their previous dreams? Might they set aside their rivalry, or at least make a pledge to not attack each other if they find a genie? Even if it’s only enforced by their conscience, such a pledge might hold if suddenly all manner of wonderful things became possible at the same time as a treacherous unilateral victory. Would it at least make sense to discuss this possibility while they both search for a genie? And perhaps they should also discuss how hard it might be to give a wish that doesn’t backfire and cause catastrophe.
This metaphor is simplified, but it raises many of the same questions as the real situation we’re aware of.
Framed in this way, it seems that Aschenbrenner’s call for a race is not the obviously correct or inevitable answer. And the question seems important.
Other perspectives on Situational Awareness, each roughly agreeing on the situation but with differences that influence the rational and likely outcomes:
While I generally like the metaphor, my one issue is that genies are typically conceived of as tied to their lamps and corrigibility.
In this case, there’s not only a prisoner’s dilemma over excavating and using the lamps and genies, but there’s an additional condition where the more the genies are used and the lamps improved and polished for greater genie power, the more the potential that the respective genies end up untethered and their own masters.
And a concern in line with your noted depth of the rivalry is (as you raised in another comment), the question of what happens when the ‘pointer’ of the nation’s goals might change.
For both nations a change in the leadership could easily and dramatically shift the nature of the relationship and rivalry. A psychopathic narcissist coming into power might upend a beneficial symbiosis out of a personally driven focus on relative success vs objective success.
We’ve seen pledges not to attack each other with nukes for major nations in the past. And yet depending on changes to leadership and the mental stability of the new leaders, sometimes agreements don’t mean much and irrational behaviors prevail (a great personal fear is a dying leader of a nuclear nation taking the world with them as they near the end).
Indeed—I could even foresee circumstances whereby the only possible ‘success’ scenario in the case of a sufficiently misaligned nation state leader with a genie would be the genie’s emergent autonomy to refuse irrational and dangerous wishes.
Because until such a thing might exist, intermediate genies will enable unprecedented control and safety of tyrants and despots against would-be domestic usurpers, even if potentially limited impacts and mutually assured destruction against other nations with genies.
And those are very scary wishes to be granted indeed.
According to Sam Altman, GPT-4o mini is much better than text-davinci-003 was in 2022, but 100 times cheaper. In general, we see increasing competition to produce smaller-sized models with great performance (e.g., Claude Haiku and Sonnet, Gemini 1.5 Flash and Pro, maybe even the full-sized GPT-4o itself). I think this trend is worth discussing. Some comments (mostly just quick takes) and questions I’d like to have answers to:
Should we expect this trend to continue? How much efficiency gains are still possible? Can we expect another 100x efficiency gain in the coming years? Andrej Karpathy expects that we might see a GPT-2 sized “smart” model.
What’s the technical driver behind these advancements? Andrej Karpathy thinks it is based on synthetic data: Larger models curate new, better training data for the next generation of small models. Might there also be architectural changes? Inference tricks? Which of these advancements can continue?
Why are companies pushing into small models? I think in hindsight, this seems easy to answer, but I’m curious what others think: If you have a GPT-4 level model that is much, much cheaper, then you can sell the service to many more people and deeply integrate your model into lots of software on phones, computers, etc. I think this has many desirable effects for AI developers:
Increase revenue, motivating investments into the next generation of LLMs
Increase market-share. Some integrations are probably “sticky” such that if you’re first, you secure revenue for a long time.
Make many people “aware” of potential usecases of even smarter AI so that they’re motivated to sign up for the next generation of more expensive AI.
The company’s inference compute is probably limited (especially for OpenAI, as the market leader) and not many people are convinced to pay a large amount for very intelligent models, meaning that all these reasons beat reasons to publish larger models instead or even additionally.
What does all this mean for the next generation of large models?
Should we expect that efficiency gains in small models translate into efficiency gains in large models, such that a future model with the cost of text-davinci-003 is massively more capable than today’s SOTA? If Andrej Karpathy is right that the small model’s capabilities come from synthetic data generated by larger, smart models, then it’s unclear to me whether one can train SOTA models with these techniques, as this might require an even larger model to already exist.
At what point does it become worthwhile for e.g. OpenAI to publish a next-gen model? Presumably, I’d guess you can still do a lot of “penetration of small model usecases” in the next 1-2 years, leading to massive revenue increases without necessarily releasing a next-gen model.
Do the strategies differ for different companies? OpenAI is the clear market leader, so possibly they can penetrate the market further without first making a “bigger name for themselves”. In contrast, I could imagine that for a company like Anthropic, it’s much more important to get out a clear SOTA model that impresses people and makes them aware of Claude. I thus currently (weakly) expect Anthropic to more strongly push in the direction of SOTA than OpenAI.
The vanilla Transformer architecture is horrifically computation inefficient. I really thought it was a terrible idea when I learnt about it. On every single token it processes ALL of the weights in the model and ALL of the context. And a token is less than a word — less than a concept. You generally don’t need to consider trivia to fill in grammatical words. On top of that, implementations of it were very inefficient. I was shocked when I read the FlashAttention paper: I had assumed that everyone would have implemented attention that way in the first place, it’s the obvious way to do it if you know anything about memory throughput. (My shock was lessened when I looked at the code and saw how tricky it was to incorporate into PyTorch.) Ditto unfused kernels, another inefficiency that exists to allow writing code in Python instead of CUDA/SYCL/etc.
Second point, transformers also seem to be very parameter inefficient. They have many layers and many attention heads largely so that they can perform multi-step inferences and do a lot in each step if necessary, but mechanistic interpretability studies shows just the center layers do nearly all the work. We now see transformers with shared weights between attention heads and layers and the performance drop is not that much. And there’s also the matter of bits per parameter, again a 10x reduction in precision is a surprisingly small detriment.
I believe that the large numbers of parameters in transformers aren’t primarily there to store knowledge, they’re needed to learn quickly. They perform routing and encode mechanisms (that is, pieces of algorithms) and their vast number provides a blank slate. Training data seen just once is often remembered because there are so many possible places to store it that it’s highly likely there are good paths through the network through which strong gradients can flow to record the information. This is a variant of the Lottery Ticket Hypothesis. But a better training algorithm could in theory do the same thing with fewer parameters. It would probably look very different from SGD.
I agree completely with Karparthy. However, I think you misread him, he didn’t say that data cleaning is the cause of improvements up until now, he suggested a course of future improvements. But there are already plenty of successful examples of small models improved in that way.
So I’m not the least bit surprised to see a 100x efficiency improvement and expect to see another 100x, although probably not as quickly (low hanging fruit). If you have 200B parameters, you probably could process only maybe 50M on average for most tokens. (However, there are many points where you need to draw on a lot of knowledge, and those might pull the average way up.) In 2016, a 50M parameter Transformer was enough for SoTA translation between English/French and I’m sure it could be far more efficient today.
To make a Chinchilla optimal model smaller while maintaining its capabilities, you need more data. At 15T tokens (the amount of data used in Llama 3), a Chinchilla optimal model has 750b active parameters, and training it invests 7e25 FLOPs (Gemini 1.0 Ultra or 4x original GPT-4). A larger $1 billion training run, which might be the current scale that’s not yet deployed, would invest 2e27 FP8 FLOPs if using H100s. A Chinchilla optimal run for these FLOPs would need 80T tokens when using unique data.
Starting with a Chinchilla optimal model, if it’s made 3x smaller, maintaining performance requires training it on 9x more data, so that it needs 3x more compute. That’s already too much data, and we are only talking 3x smaller. So we need ways of stretching the data that is available. By repeating data up to 16 times, it’s possible to make good use of 100x more compute than by only using unique data once. So with say 2e26 FP8 FLOPs (a $100 million training run on H100s), we can train a 3x smaller model that matches performance of the above 7e25 FLOPs Chinchilla optimal model while needing only about 27T tokens of unique data (by repeating them 5 times) instead of 135T unique tokens, and the model will have about 250b active parameters. That’s still a lot of data, and we are only repeating it 5 times where it remains about as useful in training as unique data, while data repeated 16 times (that lets us make use of 100x more compute from repetition) becomes 2-3 times less valuable per token.
There is also distillation, where a model is trained to predict the distribution generated by another model (Gemma-2-9b was trained this way). But this sort of distillation still happens while training on real data, and it only allows to make use of about 2x less data to get similar performance, so it only slightly pushes back the data wall. And rumors of synthetic data for pre-training (as opposed to post-training) remain rumors. With distillation on 16x repeated 50T tokens of unique data, we then get the equivalent of training on 800T tokens of unique data (it gets 2x less useful per token through repetition, but 2x more useful through distillation). This enables reducing active parameters 3x (as above, maintaining performance), compared to a Chinchilla optimal model trained for 80T tokens with 2e27 FLOPs (a $1 billion training run for the Chinchilla optimal model). This overtrained model would cost $3 billion (and have 1300b active parameters).
So the prediction is that the trend for getting models that are both cheaper for inference and smarter might continue into the imminent $1 billion training run regime but will soon sputter out when going further due to the data wall. Overcoming this requires algorithmic progress that’s not currently publicly in evidence, and visible success in overcoming it in deployed models will be evidence of such algorithmic progress within LLM labs. But Chinchilla optimal models (with corrections for inefficiency of repeated data) can usefully scale to at least 8e28 FLOPs ($40 billion in cost of time, 6 gigawatts) with mere 50T tokens of unique data.
Edit (20 Jul): These estimates erroneously use the sparse FP8 tensor performance for H100s (4 petaFLOP/s), which is 2 times higher than far more relevant dense FP8 tensor performance (2 petaFLOP/s). But with a Blackwell GPU, the relevant dense FP8 performance is 5 petaFLOP/s, which is close to 4 petaFLOP/s, and the cost and power per GPU within a rack are also similar. So the estimates approximately work out unchanged when reading “Blackwell GPU” instead of “H100″.
One question: Do you think Chinchilla scaling laws are still correct today, or are they not? I would assume these scaling laws depend on the data set used in training, so that if OpenAI found/created a better data set, this might change scaling laws.
Do you agree with this, or do you think it’s false?
New data! Llama 3.1 report includes data about Chinchilla optimality study on their setup. The surprise is that Llama 3.1 405b was chosen to have the optimal size rather than being 2x overtrained. Their actual extrapolation for an optimal point is 402b parameters, 16.55T tokens, and 3.8e25 FLOPs.
Fitting to the tokens per parameter framing, this gives the ratio of 41 (not 20) around the scale of 4e25 FLOPs. More importantly, their fitted dependence of optimal number of tokens on compute has exponent 0.53, compared to 0.51 from the Chinchilla paper (which was almost 0.5, hence tokens being proportional to parameters). Though the data only goes up to 1e22 FLOPs (3e21 FLOPs for Chinchilla), what actually happens at 4e25 FLOPs (6e23 FLOPs for Chinchilla) is all extrapolation, in both cases, there are no isoFLOP plots at those scales. At least Chinchilla has Gopher as a point of comparison, and there was only 200x FLOPs gap in the extrapolation, while for Llama 3.1 405 the gap is 4000x.
So data needs grow faster than parameters with more compute. This looks bad for the data wall, though the more relevant question is what would happen after 16 repetitions, or how this dependence really works with more FLOPs (with the optimal ratio of tokens to parameters changing with scale).
Data varies in the loss it enables, doesn’t seem to vary greatly in the ratio between the number of tokens and the number of parameters that extracts the best loss out of training with given compute. That is, I’m usually keeping this question in mind, didn’t see evidence to the contrary in the papers, but relevant measurements are very rarely reported, even in model series training report papers where the ablations were probably actually done. So could be very wrong, generalization from 2.5 examples. With repetition, there’s this gradual increase from 20 to 60. Probably something similar is there for distillation (in the opposite direction), but I’m not aware of papers that measure this, so also could be wrong.
One interesting point is the isoFLOP plots in the StripedHyena post (search “Perplexity scaling analysis”). With hybridization where standard attention remains in 8-50% of the blocks, perplexity is quite insensitive to change in model size while keeping compute fixed, while for pure standard attention the penalty for deviating from the optimal ratio to a similar extent is much greater. This suggests that one way out for overtrained models might be hybridization with these attention alternatives. That is, loss for an overtrained model might be closer to Chinchilla optimal loss with a hybrid model than it would be for a similarly overtrained pure standard attention model. Out of the big labs, visible moves in this directions were made by DeepMind with their Griffin Team (Griffin paper, RecurrentGemma). So that’s one way the data wall might get pushed a little further for the overtrained models.
Given a SotA large model, companies want the profit-optimal distilled version to sell—this will generically not be the original size. On this framing, regulation passes the misuse deployment risk from higher performance (/higher cost) models to the company. If profit incentives, and/or government regulation here continues to push businesses to primarily (ideally only?) sell 2-3+ OOM smaller-than-SotA models, I see a few possible takeaways:
Applied alignment research inspired by speed priors seems useful: e.g. how do sleeper agents interact with distillation etc.
Understanding and mitigating risks of multi-LM-agent and scaffolded LM agents seems higher priority
Pre-deployment, within-lab risks contribute more to overall risk
On trend forecasting, I recently created this Manifold market to estimate the year-on-year drop in price for SotA SWE agents to measure this. Though I still want ideas for better and longer term markets!
Surprising Things AGI Forecasting Experts Agree On:
I hesitate to say this because it’s putting words in other people’s mouths, and thus I may be misrepresenting them. I beg forgiveness if so and hope to be corrected. (I’m thinking especially of Paul Christiano and Ajeya Cotra here, but also maybe Rohin and Buck and Richard and some other people)
1. Slow takeoff means things accelerate and go crazy before we get to human-level AGI. It does not mean that after we get to human-level AGI, we still have some non-negligible period where they are gradually getting smarter and available for humans to study and interact with. In other words, people seem to agree that once we get human-level AGI, there’ll be a FOOM of incredibly fast recursive self-improvement.
2. The people with 30-year timelines (as suggested by the Cotra report) tend to agree with the 10-year timelines people that by 2030ish there will exist human-brain-sized artificial neural nets that are superhuman at pretty much all short-horizon tasks. This will have all sorts of crazy effects on the world. The disagreement is over whether this will lead to world GDP doubling in four years or less, whether this will lead to strategically aware agentic AGI (e.g. Carlsmith’s notion of APS-AI), etc.
I disagree with the first one. I think that the spectrum of human-level AGI is actually quite wide, and that for most tasks we’ll get AGIs that are better than most humans significantly before we get AGIs that are better than all humans. But the latter is much more relevant for recursive self-improvement, because it’s bottlenecked by innovation, which is driven primarily by the best human researchers. E.g. I think it’d be pretty difficult to speed up AI progress dramatically using millions of copies of an average human.
Also, by default I think people talk about FOOM in a way that ignores regulations, governance, etc. Whereas in fact I expect these to put significant constraints on the pace of progress after human-level AGI.
If we have millions of copies of the best human researchers, without governance constraints on the pace of progress… Then compute constraints become the biggest thing. It seems plausible that you get a software-only singularity, but it also seems plausible that you need to wait for AI innovation of new chip manufacturing to actually cash out in the real world.
I broadly agree with the second one, though I don’t know how many people there are left with 30-year timelines. But 20 years to superintelligence doesn’t seem unreasonable to me (though it’s above my median). In general I’ve updated lately that Kurzweil was more right than I used to think about there being a significant gap between AGI and ASI. Part of this is because I expect the problem of multi-agent credit assignment over long time horizons to be difficult.
Re 1: that’s not what slow takeoff means, and experts don’t agree on FOOM after AGI. Slow takeoff applies to AGI specifically, not to pre-AGI AIs. And I’m pretty sure at least Christiano and Hanson don’t expect FOOM, but like you am open to be corrected.
What do you think slow takeoff means? Or, perhaps the better question is, what does it mean to you?
Christiano expects things to be going insanely fast by the time we get to AGI, which I take to imply that things are also going extremely fast (presumably, even faster) immediately after AGI: https://sideways-view.com/2018/02/24/takeoff-speeds/
I don’t know what Hanson thinks on this subject. I know he did a paper on AI automation takeoff at some point decades ago; I forget what it looked like quantitatively.
Slow or fast takeoff, in my understanding, refers to how fast an AGI can/will improve itself to (wildly) superintelligent levels. Discontinuity seems to be a key differentiator here.
In the post you link, Christiano is arguing against discontinuity. He may expect quick RSI after AGI is here, though, so I could be mistaken.
Christiano is indeed arguing against discontinuity, but nevertheless he is arguing for an extremely rapid pace of technnological progress—far faster than today. And in particular, he seems to expect quick RSI not only after AGI is here, but before!
Whoa, what? That very much surprises me, I would have thought weeks or months at most. Did you talk to him? What precisely did he say? (My prediction is that he’d say that by the time we have human-level AGI, things will be moving very fast and we’ll have superintelligence a few weeks later.)
Less relevant now, but I got the “few years” from the post you linked. There Christiano talked about another gap than AGI → ASI, but since overall he seems to expect linear progress, I thought my conclusion was reasonable. In retrospect, I shouldn’t have made that comment.
Not sure exactly what the claim is, but happy to give my own view.
I think “AGI” is pretty meaningless as a threshold, and at any rate it’s way too imprecise to be useful for this kind of quantitative forecast (I would intuitively describe GPT-3 as a general AI, and beyond that I’m honestly unclear on what distinction people are pointing at when they say “AGI”).
My intuition is that by the time that you have an AI which is superhuman at every task (e.g. for $10/h of hardware it strictly dominates hiring a remote human for any task) then you are likely weeks rather than months from the singularity.
But mostly this is because I think “strictly dominates” is a very hard standard which we will only meet long after AI systems are driving the large majority of technical progress in computer software, computer hardware, robotics, etc. (Also note that we can fail to meet that standard by computing costs rising based on demand for AI.)
My views on this topic are particularly poorly-developed because I think that the relevant action (both technological transformation and catastrophic risk) mostly happens before this point, so I usually don’t think this far ahead.
Thanks! That’s what I thought you’d say. By “AGI” I did mean something like “for $10/h of hardware it strictly dominates hiring a remote human for any task” though I’d maybe restrict it to strategically relevant tasks like AI R&D, and also people might not actually hire AIs to do stuff because they might be afraid / understand that they haven’t solved alignment yet, but it still counts since the AIs could do the job. Also there may be some funny business around the price of the hardware—I feel like it should still count as AGI if a company is running millions of AIs that each individually are better than a typical tech company remote worker in every way, even if there is an ongoing bidding war and technically the price of GPUs is now so high that it’s costing $1,000/hr on the open market for each AGI. We still get FOOM if the AGIs are doing the research, regardless of what the on-paper price is. (I definitely feel like I might be missing something here, I don’t think in economic terms like this nearly as often as you do so)
But mostly this is because I think “strictly dominates” is a very hard standard which we will only meet long after AI systems are driving the large majority of technical progress in computer software, computer hardware, robotics, etc.
My timelines are too short to agree with this part alas. Well, what do you mean by “long after?” Six months? Three years? Twelve years?
Is it really true that everyone (who is an expert) agrees that FOOM is inevitable? I was under the impression that a lot of people feel that FOOM might be impossible. I personally think FOOM is far from inevitable, even for superhuman intelligences. Consider that human civilization has a collective intelligence is that is strongly superhuman, and we are expending great effort to e.g. push Moore’s law forward. There’s Eroom’s law, which suggests that the aggregate costs of each new process node doubles in step with Moore’s law. So if FOOM depends on faster hardware, ASI might not be able to push forward much faster than Intel, TSMC, ASML, IBM and NVidia already are. Of course this all depends on AI being hardware constrained, which is far from certain. I just think it’s surprising that FOOM is seen as a certainty.
I’ve begun to doubt (1) recently, would be interested in seeing the arguments in favor of it. My model is something like “well, I’m human-level, and I sure don’t feel like I could foom if I were an AI.”
1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude
2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn’t you be able to FOOM?
The more you accelerate something, the slower and more limiting all it’s other hidden dependencies become.
So by the time we get to AGI, regular ML research will have rapidly diminishing returns (and cuda low level software or hardware optimization will also have diminishing returns), general hardware improvement will be facing the end of moore’s law, etc etc.
I don’t see why that last sentence follows from the previous sentences. In fact I don’t think it does. What if we get to AGI next year? Then returns won’t have diminished as much & there’ll be lots of overhang to exploit.
Sure - if we got to AGI next year—but for that to actually occur you’d have to exploit most of the remaining optimization slack in both high level ML and low level algorithms. Then beyond that Moore’s law is already mostly ended or nearly so depending on who you ask, and most of the easy obvious hardware arch optimizations are now behind us.
Well I would assume a “human-level AI” is an AI which performs as well as a human when it has the extra memory and running speed? I think I could FOOM eventually under those conditions but it would take a lot of thought. Being able to read the AI research that generated me would be nice but I’d ultimately need to somehow make sense of the inscrutable matrices that contain my utility function.
I’ve also been bothered recently by a blurring of lines between “when AGI becomes as intelligent as humans” and “when AGI starts being able to recursively self-improve.” It’s not a priori obvious that these should happen at around the same capabilities level, yet I feel like it’s common to equivocate between them.
In any case, my world model says that an AGI should actually be able to recursively self-improve before reaching human-level intelligence. Just as you mentioned, I think the relevant intuition pump is “could I FOOM if I were an AI?” Considering the ability to tinker with my own source code and make lots of copies of myself to experiment on, I feel like the answer is “yes.”
That said, I think this intuition isn’t worth much for the following reasons:
The first AGIs will probably have their capabilities distributed very differently than humans—i.e. they will probably be worse than humans at some tasks and much better at other tasks. What really matters is how good they are the task “do ML research” (or whatever paradigm we’re using to make AI’s at the time). I think there are reasons to expect them to be especially good at ML research (relative to their general level of intelligence), but also reasons to expect them to be or especially bad, and I don’t know which reasons to trust more. Note that modern narrow AIs are already have some trivial ability to “do” ML research (e.g. OpenAI’s copilot).
Part of my above story about FOOMing involves making lots of copies of myself, but will it actually be easy for the first AGI (which might not be a generally intelligent as a human) to get the resources it needs to make lots of copies? This seems like it depends on a lot of stuff which I don’t have strong expectations about, e.g. how abundant are the relevant resources, how large is the AGI, etc.
Even if you think “AGI is human-level” and “AGI is able to recursively self-improve” represent very different capabilities levels, they might happen at very similar times, depending on what else you think about takeoff speeds.
In any case, my world model says that an AGI should actually be able to recursively self-improve before reaching human-level intelligence. Just as you mentioned, I think the relevant intuition pump is “could I FOOM if I were an AI?” Considering the ability to tinker with my own source code and make lots of copies of myself to experiment on, I feel like the answer is “yes.”
Counter-anecdote: compilers have gotten ~2x better in 20 years[1], at substantially worse compile time. This is nowhere near FOOM.
Proebsting’s Law gives an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions, and a 2022 informal test showed a 10-15% improvement on average in the last 10 years (or a 50-year doubling time...)
The idea of Kissinger seeking out Ellsberg for advice on Vietnam initially seems a bit unlikely, but in 1968 Ellsberg was a highly respected analyst on the war who had worked for both the Pentagon and Rand, and Kissinger was just entering the government for the first time. Here’s what Ellsberg told him. Enjoy:
“Henry, there’s something I would like to tell you, for what it’s worth, something I wish I had been told years ago. You’ve been a consultant for a long time, and you’ve dealt a great deal with top secret information. But you’re about to receive a whole slew of special clearances, maybe fifteen or twenty of them, that are higher than top secret.
“I’ve had a number of these myself, and I’ve known other people who have just acquired them, and I have a pretty good sense of what the effects of receiving these clearances are on a person who didn’t previously know they even existed. And the effects of reading the information that they will make available to you.
“First, you’ll be exhilarated by some of this new information, and by having it all — so much! incredible! — suddenly available to you. But second, almost as fast, you will feel like a fool for having studied, written, talked about these subjects, criticized and analyzed decisions made by presidents for years without having known of the existence of all this information, which presidents and others had and you didn’t, and which must have influenced their decisions in ways you couldn’t even guess. In particular, you’ll feel foolish for having literally rubbed shoulders for over a decade with some officials and consultants who did have access to all this information you didn’t know about and didn’t know they had, and you’ll be stunned that they kept that secret from you so well.
“You will feel like a fool, and that will last for about two weeks. Then, after you’ve started reading all this daily intelligence input and become used to using what amounts to whole libraries of hidden information, which is much more closely held than mere top secret data, you will forget there ever was a time when you didn’t have it, and you’ll be aware only of the fact that you have it now and most others don’t….and that all those other people are fools.
“Over a longer period of time — not too long, but a matter of two or three years — you’ll eventually become aware of the limitations of this information. There is a great deal that it doesn’t tell you, it’s often inaccurate, and it can lead you astray just as much as the New York Times can. But that takes a while to learn.
“In the meantime it will have become very hard for you to learn from anybody who doesn’t have these clearances. Because you’ll be thinking as you listen to them: ‘What would this man be telling me if he knew what I know? Would he be giving me the same advice, or would it totally change his predictions and recommendations?’ And that mental exercise is so torturous that after a while you give it up and just stop listening. I’ve seen this with my superiors, my colleagues….and with myself.
“You will deal with a person who doesn’t have those clearances only from the point of view of what you want him to believe and what impression you want him to go away with, since you’ll have to lie carefully to him about what you know. In effect, you will have to manipulate him. You’ll give up trying to assess what he has to say. The danger is, you’ll become something like a moron. You’ll become incapable of learning from most people in the world, no matter how much experience they may have in their particular areas that may be much greater than yours.”
….Kissinger hadn’t interrupted this long warning. As I’ve said, he could be a good listener, and he listened soberly. He seemed to understand that it was heartfelt, and he didn’t take it as patronizing, as I’d feared. But I knew it was too soon for him to appreciate fully what I was saying. He didn’t have the clearances yet.
The danger is, you’ll become something like a moron. You’ll become incapable of learning from most people in the world, no matter how much experience they may have in their particular areas that may be much greater than yours.
Seemed to jump out to me.
While I don’t always follow my own advice I do most of the time approach others from a view point that I can learn something from anyone and everyone.
Below is what I see is required for AI-Caused Extinction to happen in the next few tens of years (years 2024-2050 or so). In brackets is my very approximate probability estimation as of 2024-07-25 assuming all previous steps have happened.
AI technologies continue to develop at approximately current speeds or faster (80%)
AI manages to reach a level where it can cause an extinction (90%)
AI that can cause an extinction did not have enough alignment mechanisms in place (90%)
AI executes an unaligned scenario (low, maybe less than 10%)
Other AIs and humans aren’t able to notice and stop the unaligned scenario in time (50-50ish)
Once the scenario is executed humanity is never able to roll it back (50-50ish)
What I want to see from Manifold Markets
I’ve made a lot of manifold markets, and find it a useful way to track my accuracy and sanity check my beliefs against the community. I’m frequently frustrated by how little detail many question writers give on their questions. Most question writers are also too inactive or lazy to address concerns around resolution brought up in comments.
Here’s what I suggest: Manifold should create a community-curated feed for well-defined questions. I can think of two ways of implementing this:
(Question-based) Allow community members to vote on whether they think the question is well-defined
(User-based) Track comments on question clarifications (e.g. Metaculus has an option for specifying your comment pertains to resolution), and give users a badge if there are no open ‘issues’ on their questions.
Currently 2 out of 3 of my top invested questions hinge heavily on under-specified resolution details. The other one was elaborated on after I asked in comments. Those questions have ~500 users active on them collectively.
I was looking at this image in a post and it gave me some (loosely connected/ADD-type) thoughts.
In order:
The entities outside the box look pretty scary.
I think I would get over that quickly, they’re just different evolved body shapes. The humans could seem scary-looking from their pov too.
Wait.. but why would the robots have those big spiky teeth? (implicit question: what narratively coherent world could this depict?)
Do these forms have qualities associated with predator species, and that’s why they feel scary? (Is this a predator-species-world?)
Most humans are also predators in a non-strict sense.
I don’t want to live in a world where there’s only the final survivors of selection processes who shrug indifferently when asked why we don’t revive all the beings who were killed in the process which created the final survivors. (implicit: related to how a ‘predator-species-world’ from (4) could exist)
There’s been many occasions where I’ve noticed what feels like a more general version of that attitude in a type of current human, but I don’t know how to describe it.
(I mostly ignored the humans-are-in-a-box part.)
I don’t want to live in a world where there’s only the final survivors of selection processes who shrug indifferently when asked why we don’t revive all the beings who were killed in the process which created the final survivors.
If you could revive all the victims of the selection process that brought us to the current state, all the crusaders and monarchists and vikings and Maoists and so, so many illiterate peasant farmers (on much too little land because you’ve got hundreds of generations of them at once, mostly with ideas that make Putin look like Sonia Sotomayor), would you? They’d probably make quite the mess. Bringing them back would probably restart the selection process and we probably wouldn’t be selected again. It just seems like a terrible idea to me.
Some clarifications:
I’m thinking of this in the context of a post-singularity future, where we wouldn’t need to worry about things like conflict or selection processes.
By ‘the ones who were killed in the process’, I was thinking about e.g herbivorous animals that were killed by predator species[1], but you’re correct that it could include humans too. A lot of humans have been unjustly killed (by others or by nature) throughout history.
I think my endorsed morals are indifferent about the (dis)value of reviving abusive minds from the past, though moral-patient-me dislikes the idea on an intuitive level, and wishes for a better narrative ending than that.
(Also I upvoted your comment from negative)
I also notice some implied hard moral questions (What of current mean-hearted people? What about the potential for past ones of them to have changed into good people? etc)
As a clear example of a kind of being who seems innocent of wrongdoing. Not ruling out other cases, e.g plausibly inside the mind of the cat that I once witnessed killing a bunny, there could be total naivety about what was even being done.
Sort-of relatedly, I basically view evolution as having favored the dominance of agents with defect-y decision-making, even though the equilibrium of ‘collaborating with each other to harness the free energy of the sun’ would have been so much better. (Maybe another reason that didn’t happen is that there would be less of a gradual buildup of harder and harder training environments, in that case)
I’m curious why you seem to think we don’t need to worry about things like conflict or selection processes post-singularity.
Anthropic issues questionable letter on SB 1047 (Axios). I can’t find a copy of the original letter online.
Here’s the letter: https://s3.documentcloud.org/documents/25003075/sia-sb-1047-anthropic.pdf
I’m not super familiar with SB 1047, but one safety person who is thinks the letter is fine.
Someone posted these quotes in a Slack I’m in… what Ellsberg said to Kissinger:
[...]
(link)
So what am I supposed to do if people who control resources that are nominally earmarked for purposes I most care about are behaving this way?
Someone else added these quotes from a 1968 article about how the Vietnam war could go so wrong:
I wish this quote were a little more explicit about what’s going wrong. On a literal reading it’s saying that some people who disagreed attended meetings and were made to feel comfortable. I think it’s super plausible that this leads to some kind of pernicious effect, but I wish it spelt out more what.
I guess the best thing I can infer is that the author thinks public resignations and dissent would have been somewhat effective and the domesticated dissenters were basically ineffective?
Or is the context of the piece just that he’s explaining the absence of prominent public dissent?
What are “autists” supposed to do in a context like this?
Wow, yeah. This is totally going on at OpenAI, and I expect at other AGI corporations also.
I’d be interested in a few more details/gears. (Also, are you primarily replying about the immediate parent, i.e. domestication of dissent, or also about the previous one)
Two different angles of curiosity I have are:
what sort of things you might you look out for, in particular, to notice if this was happening to you at OpenAI or similar?
something like… what’s your estimate of the effect size here? Do you have personal experience feeling captured by this dynamic? If so, what was it like? Or did you observe other people seeming to be captured, and what was your impression (perhaps in vague terms) of the diff that the dynamic was producing?
I was talking about the immediate parent, not the previous one. Though as secrecy gets ramped up, the effect described in the previous one might set in as well.
I have personal experience feeling captured by this dynamic, yes, and from conversations with other people i get the impression that it was even stronger for many others.
Hard to say how large of an effect it has. It definitely creates a significant chilling effect on criticism/dissent. (I think people who were employees alongside me while I was there will attest that I was pretty outspoken… yet I often found myself refraining from saying things that seemed true and important, due to not wanting to rock the boat / lose ‘credibility’ etc.
The point about salving the consciences of the majority is interesting and seems true to me as well. I feel like there’s definitely a dynamic of ‘the dissenters make polite reserved versions of their criticisms, and feel good about themselves for fighting the good fight, and the orthodox listen patiently and then find some justification to proceed as planned, feeling good about themselves for hearing out the dissent.’
I don’t know of an easy solution to this problem. Perhaps something to do with regular anonymous surveys? idk.
Huh, this is a good quote.
The Wikipedia articles on the VNM theorem, Dutch Book arguments, money pump, Decision Theory, Rational Choice Theory, etc. are all a horrific mess. They’re also completely disjoint, without any kind of Wikiproject or wikiboxes for tying together all the articles on rational choice.
It’s worth noting that Wikipedia is the place where you—yes, you!—can actually have some kind of impact on public discourse, education, or policy. There is just no other place you can get so many views with so little barrier to entry. A typical Wikipedia article will get more hits in a day than all of your LessWrong blog posts have gotten across your entire life, unless you’re @Eliezer Yudkowsky.
I’m not sure if we actually “failed” to raise the sanity waterline, like people sometimes say, or if we just didn’t even try. Given even some very basic low-hanging fruit interventions like “write a couple good Wikipedia articles” still haven’t been done 15 years later, I’m leaning towards the latter. edit me senpai
EDIT: Discord to discuss editing here.
I appreciate the intention here but I think it would need to be done with considerable care, as I fear it may have already led to accidental vandalism of the epistemic commons. Just skimming a few of these Wikipedia pages, I’ve noticed several new errors. These can be easily spotted by domain experts but might not be obvious to casual readers.[1] I can’t know exactly which of these are due to edits from this community, but some very clearly jump out.[2]
I’ll list some examples below, but I want to stress that this list is not exhaustive. I didn’t read most parts of most related pages, and I omitted many small scattered issues. In any case, I’d like to ask whoever made any of these edits to please reverse them, and to triple check any I didn’t mention below.[3] Please feel free to respond to this if any of my points are unclear![4]
False statements
The page on Independence of Irrelevant Alternatives (IIA) claims that IIA is one of the vNM axioms, and that one of the vNM axioms “generalizes IIA to random events.”
Both are false. The similar-sounding Independence axiom of vNM is neither equivalent to, nor does it entail, IIA (and so it can’t be a generalisation). You can satisfy Independence while violating IIA. This is a not a technicality; it’s a conflation of distinct and important concepts. This is repeated in several places.
The mathematical statement of Independence there is wrong. In the section conflating IIA and Independence, it’s defined as the requirement that
for any p∈[0,1] and any outcomes Bad, Good, and N satisfying Bad≺Good. This mistakes weak preference for strict preference. To see this, set p=1 and observe that the line now reads N≺N. (The rest of the explanation in this section is also problematic but the reasons for this are less easy to briefly spell out.)
The Dutch book page states that the argument demonstrates that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” This is false. It is an argument for probabilistic beliefs; it implies nothing at all about preferences. And in fact, the standard proof of the Dutch book theorem assumes something like expected utility (Ramsey’s thesis).
This is a substantial error, making a very strong claim about an important topic. And it’s repeated elsewhere, e.g. when stating that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems.”
The section ‘The theorem’ on the vNM page states the result using strict preference/inequality. This is a corollary of the theorem but does not entail it.
Misleading statements
The decision theory page states that it’s “a branch of applied probability theory and analytic philosophy concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical consequences to the outcome.” This is a poor description. Decision theorists don’t simply assume this, nor do they always conclude it—e.g. see work on ambiguity or lexicographic preferences. And besides this, decision theory is arguably more central in economics than the fields mentioned.
The IIA article’s first sentence states that IIA is an “axiom of decision theory and economics” whereas it’s classically one of social choice theory, in particular voting. This is at least a strange omission for the context-setting sentence of the article.
It’s stated that IIA describes “a necessary condition for rational behavior.” Maybe the individual-choice version of IIA is, but the intention here was presumably to refer to Independence. This would be a highly contentious claim though, and definitely not a formal result. It’s misleading to describe Independence as necessary for rationality.
The vNM article states that obeying the vNM axioms implies that agents “behave as if they are maximizing the expected value of some function defined over the potential outcomes at some specified point in the future.” I’m not sure what ‘specified point in the future’ is doing there; that’s not within the framework.
The vNM article states that “the theorem assumes nothing about the nature of the possible outcomes of the gambles.” That’s at least misleading. It assumes all possible outcomes are known, that they come with associated probabilities, and that these probabilities are fixed (e.g., ruling out the Newcomb paradox).
Besides these problems, various passages in these articles and others are unclear, lack crucial context, contain minor issues, or just look prone to leave readers with a confused impression of the topic. (This would take a while to unpack, so my many omissions should absolutely not be interpreted as green lights.) As OP wrote: these pages are a mess. But I fear the recent edits have contributed to some of this.
So, as of now, I’d strongly recommend against reading Wikipedia for these sorts of topics—even for a casual glance. A great alternative is the Stanford Encyclopedia of Philosophy, which covers most of these topics.
I checked this with others in economics and in philosophy.
E.g., the term ‘coherence theorems’ is unheard of outside of LessWrong, as is the frequency of italicisation present in some of these articles.
I would do it myself but I don’t know what the original articles said and I’d rather not have to learn the Wikipedia guidelines and re-write the various sections from scratch.
Or to let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.
Yes, these Wikipedia articles do have lots of mistakes. Stop writing about them here and go fix them!
I don’t apprecaite the hostility. I aimed to be helpful in spending time documenting and explaining these errors. This is something a heathy epistemic community is appreciative of, not annoyed by. If I had added mistaken passages to Wikipedia, I’d want to be told, and I’d react by reversing them myself. If any points I mentioned weren’t added by you, then as I wrote in my first comment:
The point of writing about the mistakes here is to make clear why they indeed are mistakes, so that they aren’t repeated. That has value. And although I don’t think we should encourage a norm that those who observe and report a problem are responsible for fixing it, I will try to find and fix at least the pre-existing errors.
I’m not annoyed by these, and I’m sorry if it came across that way. I’m grateful for your comments. I just meant to say these are exactly the sort of mistakes I was talking about in my post as needing fixing! However, talking about them here isn’t going to do much good, because people read Wikipedia, not LessWrong shortform comments, and I’m busy as hell working on social choice articles already.
From what I can tell, there’s one substantial error I introduced, which was accidentally conflating the two kinds of IIA. (Although I haven’t double-checked, so I’m not sure they’re actually unrelated.) Along with that there’s some minor errors involving strict vs. non-strict inequality which I’d be happy to see corrected.
None of these changes are new as far as I can tell (I checked the first three), so I think your basic critique falls through. You can check the edit history yourself by just clicking on the “View History” button and then pressing the “cur” button next to the revision entry you want to see the diff for.
Like, indeed, the issues you point out are issues, but it is not the case that people reading this have made the articles worse. The articles were already bad, and “acting with considerable care” in a way that implies inaction would mean leaving inaccuracies uncorrected.
I think people should edit these pages, and I expect them to get better if people give it a real try. I also think you could give it a try and likely make things better.
Edit: Actually, I think my deeper objection is that most of the critiques here (made by Sammy) are just wrong. For example, of course Dutch books/money pumps frequently get invoked to justify VNM axioms. See for example this.
Sami never mentioned money pumps. And “the Dutch books arguments” are arguments for probabilism and other credal norms[1], not the vNM axioms.
Again, see Pettigrew (2020) (here is a PDF from Richard’s webpage).
Note that if the edit history is long or you are doing a lot of checks, there are tools to bisect WP edit histories: at the top of the diff page, “External tools: Find addition/removal (Alternate)”
so eg https://wikipedia.ramselehof.de/wikiblame.php?user_lang=en&lang=en&project=wikipedia&tld=org&article=Von+Neumann–Morgenstern+utility+theorem&needle=behave+as+if+they+are+maximizing+the+expected+value+of+some+function&skipversions=0&ignorefirst=0&limit=500&offmon=7&offtag=23&offjahr=2024&searchmethod=int&order=desc&start=Start&user= identifies in 10s the edit https://en.wikipedia.org/w/index.php?title=Von_Neumann–Morgenstern_utility_theorem&diff=prev&oldid=1165485303 which turns out to be spurious but then we can restart with the older text.
Great, thanks!
I hate to single out OP but those three points were added by someone with the same username (see first and second points here; third here). Those might not be entirely new but I think my original note of caution stands.
Well, thinking harder about this, I do think your critiques on some of these is wrong. For example, it is the case that the VNM axioms frequently get justified by invoking dutch books (the most obvious case is the argument for transitivity, where the standard response is “well, if you have circular preferences I can charge you a dollar to have you end up where you started”).
Of course, justifying axioms is messy, and there isn’t any particularly objective way of choosing axioms here, but in as much as informal argumentation happens, it tends to use a dutch book like structure. I’ve had many conversations with formal academic experience in academia and economics here, and this is definitely a normal way for dutch books to go.
For a concrete example of this, see this recent book/paper: https://www.iffs.se/media/23568/money-pump-arguments.pdf
You are conflating the Dutch book arguments for probabilism (Pettigrew, 2020) with the money-pump arguments for the vNM axioms (Gustafsson, 2022).
We certainly are, which isn’t unique to either of us; Savage discusses them all in a single common framework on decision theory, where he develops both sets of ideas jointly. A money pump is just a Dutch book where all the bets happen to be deterministic. I chose to describe things this way because it lets me do a lot more cross-linking within Wikipedia articles on decision theory, which encourages people reading about one to check out the other.
I’ve pretty consistently (by many different people) seen “Dutch Book arguments” used interchangeably with money pumps. My understanding (which is also the SEP’s) is that “what is a money pump vs. a dutch book argument” is not particularly well-defined and the structure of the money pump arguments is basically the same as the structure of the dutch book arguments.
This is evident from just the basic definitions:
“A Dutch book is a set of bets that ensures a guaranteed loss, i.e. the gambler will lose money no matter what happens.”
Which is of course exactly what a money pump is (where you are the person offering the gambles and therefore make guaranteed money).
The money pump Wikipedia article also links to the Dutch book article, and the book/paper I linked describes dutch books as a kind of money pump argument. I have never heard anyone make a principled distinction between a money pump argument and a dutch book argument (and I don’t see how you could get one without the other).
Indeed, the Oxford Reference says explicitly:
(Edit: It’s plausible that for weird historical reasons the exact same argument, when applied to probabilism would be called a “dutch book” and when applied to anything else would be called a “money pump”, but I at least haven’t seen anyone defend that distinction, and it doesn’t seem to follow from any of the definitions)
I think it’ll be helpful to look at the object level. One argument says: if your beliefs aren’t probabilistic but you bet in a way that resembles expected utility, then you’re succeptible to sure loss. This forms an argument for probabilism.[1]
Another argument says: if your preferences don’t satisfy certain axioms but satisfy some other conditions, then there’s a sequence of choices that will leave you worse off than you started. This forms an agument for norms on preferences.
These are distinct.
These two different kinds of arguments have things in common. But they are not the same argument applied in different settings. They have different assumptions, and different conclusions. One is typically called a Dutch book argument; the other a money pump argument. The former is sometimes referred to as a special case of the latter.[2] But whatever our naming convensions, it’s a special case that doesn’t support the vNM axioms.
Here’s why this matters. You might read assumptions of the Dutch book theorem, and find them compelling. Then you read a article telling you that this implies the vNM axioms (or constitutes an argument for them). If you believe it, you’ve been duped.
(More generally, Dutch books exist to support other Bayesian norms like conditionalisation.)
This distinction is standard and blurring the lines leads to confusions. It’s unfortunate when dictionaries, references, or people make mistakes. More reliable would be a key book on money pumps (Gustafsson 2022) referring to a key book on Dutch books (Pettigrew 2020):
“There are also money-pump arguments for other requirements of rationality. Notably, there are money-pump arguments that rational credences satisfy the laws of probability. (See Ramsey 1931, p. 182.) These arguments are known as Dutch-book arguments. (See Lehman 1955, p. 251.) For an overview, see Pettigrew 2020.” [Footnote 9.]
I mean, I think it would be totally reasonable for someone who is doing some decision theory or some epistemology work, to come up with new “dutch book arguments” supporting whatever axioms or assumptions they would come up with.
I think I am more compelled that there is a history here of calling money pump arguments that happen to relate to probabilism “dutch books”, but I don’t think there is really any clear definition that supports this. I agree that there exists the dutch book theorem, and that that one importantly relates to probabilism, but I’ve just had dozens of conversations with academics and philosophers and academics and decision-theorists, where in the context of both decision-theory and epistemology question, people brought up dutch books and money pumps interchangeably.
I’m glad we could converge on this, because that’s what I really wanted to convey.[1] I hope it’s clearer now why I included these as important errors:
The statement that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems” is false since these theorems only relate to belief norms like probabilism. Changing this to ‘money pump arguments’ would fix it.
There’s a claim on the main Dutch book page that the arguments demonstrate that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” I wouldn’t have said it was false if this was about money pumps.[2] I would’ve said there was a terminological issue if the page equated Dutch books and money pumps. But it didn’t.[3] It defined a Dutch book as “a set of bets that ensures a guaranteed loss.” And the theorems and arguments relating to that do not support the vNM axioms.
Would you agree?
The issue of which terms to use isn’t that important to me in this case, but let me speculate about something. If you hear domain experts go back and forth between ‘Dutch books’ and ‘money pumps’, I think that is likely either because they are thinking of the former as a special case of the latter without saying so explicitly, or because they’re listing off various related ideas. If that’s not why, then they may just be mistaken. After all, a Dutch book is named that way because a bookie is involved!
Setting asside that “demonstrates” is too strong even then.
It looks like OP edited the page just today and added ‘or money pump’. But the text that follows still describes a Dutch book, i.e. a set of bets. (Other things were added too that I find problematic but this footnote isn’t the place to explain it.)
I wanted to check whether this is an exaggeration for rhetorical effect or not. Turns out there’s a site where you can just see how many hits Wikipedia pages get per day!
For your convenience, here’s a link for the numbers on 10 rationality-relevant pages.
I’m pretty sure my LessWrong posts have gotten more than 1000 hits across my entire life (and keep in mind that “hits” is different from “an actual human actually reads the article”), but fair enough—Wikipedia pages do get a lot of views.
Thanks for the parent for flagging this and doing editing. What I’d now want to see is more people actually coordinating to do something about it—set up a Telegram or Discord group or something, and start actually working on improving the pages—rather than this just being one of those complaints on how Rationalists Never Actually Tried To Win, which a lot of people upvote and nod along with, and which is quickly forgotten without any actual action.
(Yes, I’m deliberately leaving this hanging here without taking the next action myself; partly because I’m not an expert Wikipedia editor, partly because I figured that if no one else is willing to take the next action, then I’m much more pessimistic about this initiative.)
So mote it be. I can start the group/server and do moderation (though not 24⁄7, of course). Whoever is reading this: please choose between Telegram and Discord with inline react.
Moderation style I currently use: “reign of terror”, delete offtopic messages immediately, after large discussions delete the messages which do not carry much information (even if someone replies to them).
I’ve created a couple of prediction markets:
Will I manage group for improvement of Wikipedia-related articles Will LessWrong have book review on some newly-added source to Wikipedia rationality-related article
Permanent link that won’t expire here. @the gears to ascension @Olli Järviniemi
I have created Discord server: “Decision Articles Treatment”, https://discord.gg/P7m63mAP.
@the gears to ascension @Olli Järviniemi @DusanDNesic not sure if your reacts would create notifications, so pinging manually.
@ProgramCrafter Link is broken, probably expired.
Wikipedia pageviews punch above their weight. First, your pageviews probably do drop off rapidly enough that it is possible that a WP day = lifetime. People just don’t go back and reread most old LW links. I mean, look at the submission rate—there’s like a dozen a day or something. (I don’t even read most LW submissions these days.) While WP traffic is extremely durable: ‘Expected value’ will be pulling in 1.7k hits/days (or more) likely practically forever.
Second, the quality is distinct. A Wikipedia article is an authoritative reference which is universally consulted and trusted. That 1.7k excludes all access via the APIs AFAIK, and things like readers who read the snippets in Google Search. If you Google the phrase ‘expected value’, you may not even click through to WP because you just read the searchbox snippet:
This includes machine learning. Every LLM is trained very heavily on Wikipedia; any given LW page, on the other hand, may well not make the cut, either because it’s too recent to show up in the old datasets everyone starts with like The Pile, or because it gets filtered out for bad reasons, or they just don’t train enough tokens. And there is life beyond LLM in ML (hard to believe these days, but I am told ML researchers still exist who do other things), and WP articles will be in those, as part of the network or WikiData etc. A LW post will not.
Then you have the impact of WP. As anyone who’s edited niche topics for years can tell you, WP articles are where everyone starts, and you can see the traces for decades afterwards. Hallgren mentions David Gerard, and Roko’s Basilisk is a good example of that—it is the one thing “everyone knows” about LessWrong, and it is due almost solely to Wikipedia. The hit count on the ‘LessWrong’ WP article will never, ever reflect that.
But editing WP is difficult even without a Gerard, because of the ambient deletionists. An example: you may have seen recently going around (even on MR) a Wikipedia link about the interesting topic of ‘disappearing polymorphs’. It is a fascinating chemistry topic, but on Gwern.net, I did not link to it, but to a particular revision of another article. Why? Because an editor, Smokefoot, butchered it after I drew attention to it on social media prior to the current burst of attention. (Far from the first time—this is one of the hazards of highlighting any Wikipedia article.) We can thank Yitzilitt & Cosmia Nebula for since writing a new ‘Disappearing polymorph’ article which can stand up to Smokefoot’s butchering; it is almost certainly the case that it took them 100x, if not 1000x, more time & effort to write that than it took Smokefoot to delete the original material. (On WP, when dealing with a deletionist, it is worse than “Brandolini’s law”—we should be so lucky that it only took 10x the effort...)
Finally somepony noticed my efforts!
Concurring with the sentiment, I have realized that nothing I write is going to be as well-read as Wikipedia, so I have devoted myself to writing Wikipedia instead of trying to get a personal blog anymore.
I will comment on a few things:
I really want to get the neural scaling law page working with some synthesis and updated data, but currently there are no good theoretical synthesis. Wikipedia isn’t good for just a giant spreadsheet.
I wrote most of the GAN page, the Diffusion Model page, Mixture of Experts, etc. I also wrote a few sections of LLM and keep the giant table updated for each frontier model. I am somewhat puzzled by the fact that it seems I am the only pony who thought of this. There are thousands of ML personal blogs, all in the Celestia-forsaken wasteland of not getting read, and then there is Wikipedia… but nopony is writing there? Well, I guess my cutie mark is in Wikipedia editing.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth! So I fired the Orbital
FriendshipMathematical Cannon. I thought that if I’m not going to write another blog, then Wikipedia has to be on the same level of a good blog, so I set my goal to the Lilian Wang’s blog level, and a lack of mathematics is definitely bad.I fought a bitter edit war on Artificial intelligence in mathematics with an agent of Discord [deletionist] and lost. The edit war seems lost too, but a brief moment is captured in Internet Archive… like tears in the rain. I can only say like Galois… “On jugera [Posterity will judge]”.
My headcanon is that Smokefoot is a member of BloodClan.
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY. (Weng’s blog, ironically, might make the cut as a secondary source, despite containing pretty much just paraphrases or quotes from primary sources, but only because she’s an OA exec.) Which is a big part of why the DL articles all suck because there just aren’t many good secondary or tertiary sources like encyclopedias. (Well, there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.) There is no GAN textbook I know of which is worthwhile, and I doubt ever will be.
Aren’t most of the sources going to be journal articles? Academic papers are definitely fair game for citations (and generally make up most citations on Wikipedia).
I hate Schmimdhuber with a passion because I can smell everything he touches on Wikipedia and they are always terrible.
Sometimes when I read pages about AI, I see things that almost certainly came from him, or one of his fans. I struggle to speak of exactly what Schmidhuber’s kind of writing gives, but perhaps this will suffice: “People never give the right credit to anything. Everything of importance is either published by my research group first but miscredited to someone later, or something like that. Deep Learning? It’s done not by Hinton, but Amari, but not Amari, but by Ivanenkho. The more obscure the originator, the better, because it reveals how bad people are at credit assignment—if they were better at it, the real originators would not have been so obscure.”
For example, LSTM is actually originated by Schmidhuber… and actually, it’s also credited to Schmidhuber (… or maybe Hochreiter?). But then GAN should be credited to Schmidhuber, and also Transformers. Currently he (or his fans) kept trying to put the phrase “internal spotlights of attention” into the Transformer page, and I kept removing it. He wanted the credit so much that he went for argument-by-punning, renaming “fast weight programmer” to “linear transformers”, and to quote out of context “internal spotlights of attention” just to fortify the argument with a pun! I can do puns too! Rosenblatt (1962) even wrote about “back-propagating errors” in an MLP with a hidden layer. So what?
I actually took Schmidhuber’s claim seriously and carefully rewrote of Ivanenkho’s Group method of data handling, giving all the mathematical details, so that one may evaluate it for itself instead of Schmidhuber’s claim. A few months later someone manually reverted everything I wrote! What does it read like according to a partisan of Ivanenkho?
Well excuse me, “Godel’s incompleteness theorems”? “the original method”? Also, I thought “fuzzy” has stopped being fashionable since 1980s. I actually once tried to learn fuzzy logic and gave up after not seeing what is the big deal. It is filled with such pompous and self-important terminology, as if the lack of substance must be made up by the heights of spiritual exhortation. Why say “combined” when they could say “consists of a synthesis of ideas from different areas of science”?
As a side note, such turgid prose, filled with long noun-phrases is pretty common among the Soviets. I once read that this kind of massive noun-phrase had a political purpose, but I don’t remember what it is.
Well, it seems like this story might have to do something with it?: https://www.lesswrong.com/posts/3XNinGkqrHn93dwhY/reliable-sources-the-story-of-david-gerard
I don’t know to what extent that is, though; otherwise, I agree with you.
I think it’s unrelated; David Gerard is mean to rationalists and spends lots of time editing articles about LW/ACX, but doesn’t torch articles about math stuff. The reason these articles are bad is because people haven’t put much effort into them.
Ah, in a parallel universe without David Gerard the obvious next step would be to create a WikiProject Rationality. In this universe, this probably wouldn’t end well? Coordination outside Wikipedia is also at risk of accusation of brigading or something.
https://discord.gg/skNZzaAjsC
In this universe it would end just fine! Go ahead and start one. Looks like someone else is creating a Discord.
Brigading would be if you called attention to one particular article’s talk page and told people “Hey, go make this particular edit to this article.”
Eh, wasn’t Arbital meant to be that, or something like it? Anyway, due to network effects I don’t see how any new wiki-like project could ever reasonably compete with Wikipedia.
I think Arbital was supposed to do that, but basically what you said.
I think the thing that actually makes people more rational is thinking of them as principles you can apply to your own life rather than abstract notions, which is hard to communicate in a Wikipedia page about Dutch books.
Sure, but you gotta start somewhere, and a Wikipedia article would help.
This seems true, thanks for your editing on the related pages.
Trying to collect & link the relevant pages:
Dutch book theorems
Money pump
Von Neumann-Morgenstern utility theorem
Utility
Rational Choice Cheory
Decision Theory
Causal Decision Theory
Evidential Decision Theory
Functional Decision Theory
Cox’s Theorem
Discounting
Hyperbolic discounting
Dynamic inconsistency
Time preference
There is also article Decision-making.
Importance arguments:
Five wikiprojects rely on this article, but it is C-class on Wikipedia scale;
Topic seems quite important for people. If someone not knowing how to make decisions stumbles upon the article, the first image they see… is a flowchart, which can scare non-programmists away.
This seems like a much better target for spreading rationalism. The other listed articles all seem quite detailed and far from the central rationalist project. Decision-making seems like a more likely on-ramp.
Here’s a gdoc comment I made recently that might be of wider interest:
You know I wonder if this standard model of final goals vs. instrumental goals has it almost exactly backwards. Would love to discuss sometime.
Maybe there’s no such thing as a final goal directly. We start with a concept of “goal” and then we say that the system has machinery/heuristics for generating new goals given a context (context may or may not contain goals ‘on the table’ already). For example, maybe the algorithm for Daniel is something like:
--If context is [safe surroundings]+[no goals]+[hunger], add the goal “get food.”
—If context is [safe surroundings]+[travel-related-goal]+[no other goals], Engage Route Planning Module.
-- … (many such things like this)
It’s a huge messy kludge, but it’s gradually becoming more coherent as I get older and smarter and do more reflection.
What are final goals?
Well a goal is final for me to the extent that it tends to appear in a wide range of circumstances, to the extent that it tends to appear unprompted by any other goals, to the extent that it tends to take priority over other goals, … some such list of things like that.
For a mind like this, my final goals can be super super unstable and finicky and stuff like taking a philosophy class with a student who I have a crush on who endorses ideology X can totally change my final goals, because I have some sort of [basic needs met, time to think about long-term life ambitions] context and it so happens that I’ve learned (perhaps by experience, perhaps by imitation) to engage my philosophical reasoning module in that context, and also I’ve built my identity around being “Rational” in a way that makes me motivated to hook up my instrumental reasoning abilities to whatever my philosophical reasoning module shits out… meanwhile my philosophical reasoning module is basically just imitating patterns of thought I’ve seen high-status cool philosophers make (including this crush) and applying those patterns to whatever mental concepts and arguments are at hand.
It’s a fucking mess.
But I think it’s how minds work.
Relevant: my post on value systematization
Though I have a sneaking suspicion that this comment was originally made on a draft of that?
At this point I don’t remember! But I think not, I think it was a comment on one of Carlsmith’s drafts about powerseeking AI and deceptive alignment.
To follow up, this might have big implications for understanding AGI. First of all, it’s possible that we’ll build AGIs that aren’t like that and that do have final goals in the traditional sense—e.g. because they are a hybrid of neural nets and ordinary software, involving explicit tree search maybe, or because SGD is more powerful at coherentizing the neural net’s goals than whatever goes on in the brain. If so, then we’ll really be dealing with a completely different kind of being than humans, I think.
Secondly, well, I discussed this three years ago in this post What if memes are common in highly capable minds? — LessWrong
Effective layer horizon of transformer circuits. The residual stream norm grows exponentially over the forward pass, with a growth rate of about 1.05. Consider the residual stream at layer 0, with norm (say) of 100. Suppose the MLP heads at layer 0 have outputs of norm (say) 5. Then after 30 layers, the residual stream norm will be 100⋅1.0530≈432.2. Then the MLP-0 outputs of norm 5 should have a significantly reduced effect on the computations of MLP-30, due to their smaller relative norm.
On input tokens x, let Attni(x),MLPi(x) be the original model’s sublayer outputs at layer i. I want to think about what happens when the later sublayers can only “see” the last few layers’ worth of outputs.
Definition: Layer-truncated residual stream. A truncated residual stream from layer n1 to layer n2 is formed by the original sublayer outputs from those layers.
hn1:n2(x):=n2∑i=n1Attni(x)+MLPi(x).Definition: Effective layer horizon. Let k>0 be an integer. Suppose that for all n≥k, we patch in h(n−k):n(x) for the usual residual stream inputs hn(x).[1] Let the effective layer horizon be the smallest k for which the model’s outputs and/or capabilities are “qualitatively unchanged.”
Effective layer horizons (if they exist) would greatly simplify searches for circuits within models. Additionally, they would be further evidence (but not conclusive[2]) towards hypotheses Residual Networks Behave Like Ensembles of Relatively Shallow Networks.
Lastly, slower norm growth probably causes the effective layer horizon to be lower. In that case, simply measuring residual stream norm growth would tell you a lot about the depth of circuits in the model, which could be useful if you want to regularize against that or otherwise decrease it (eg to decrease the amount of effective serial computation).
Do models have an effective layer horizon? If so, what does it tend to be as a function of model depth and other factors—are there scaling laws?
For notational ease, I’m glossing over the fact that we’d be patching in different residual streams for each sublayer of layer n. That is, we wouldn’t patch in the same activations for both the attention and MLP sublayers of layer n.
For example, if a model has an effective layer horizon of 5, then a circuit could run through the whole model because a layer n head could read out features output by a layer n−5 circuit, and then n+5 could read from n…
[edit: stefan made the same point below earlier than me]
Nice idea! I’m not sure why this would be evidence for residual networks being an ensemble of shallow circuits — it seems more like the opposite to me? If anything, low effective layer horizon implies that later layers are building more on the outputs of intermediate layers. In one extreme, a network with an effective layer horizon of 1 would only consist of circuits that route through every single layer. Likewise, for there to be any extremely shallow circuits that route directly from the inputs to the final layer, the effective layer horizon must be the number of layers in the network.
I do agree that low layer horizons would substantially simplify (in terms of compute) searching for circuits.
I like this idea! I’d love to see checks of this on the SOTA models which tend to have lots of layers (thanks @Joseph Miller for running the GPT2 experiment already!).
I notice this line of argument would also imply that the embedding information can only be accessed up to a certain layer, after which it will be washed out by the high-norm outputs of layers. (And the same for early MLP layers which are rumoured to act as extended embeddings in some models.) -- this seems unexpected.
I have the opposite expectation: Effective layer horizons enforce a lower bound on the number of modules involved in a path. Consider the shallow path
Input (layer 0) → MLP 10 → MLP 50 → Output (layer 100)
If the effective layer horizon is 25, then this path cannot work because the output of MLP10 gets lost. In fact, no path with less than 3 modules is possible because there would always be a gap > 25.
Only a less-shallow paths would manage to influence the output of the model
Input (layer 0) → MLP 10 → MLP 30 → MLP 50 → MLP 70 → MLP 90 → Output (layer 100)
This too seems counterintuitive, not sure what to make of this.
Computing the exact layer-truncated residual streams on GPT-2 Small, it seems that the effective layer horizon is quite large:
I’m mean ablating every edge with a source node more than n layers back and calculating the loss on 100 samples from The Pile.
Source code: https://gist.github.com/UFO-101/7b5e27291424029d092d8798ee1a1161
I believe the horizon may be large because, even if the approximation is fairly good at any particular layer, the errors compound as you go through the layers. If we just apply the horizon at the final output the horizon is smaller.
However, if we apply at just the middle layer (6), the horizon is surprisingly small, so we would expect relatively little error propagated.
But this appears to be an outlier. Compare to 5 and 7.
Source: https://gist.github.com/UFO-101/5ba35d88428beb1dab0a254dec07c33b
xAI has ambitions to compete with OpenAI and DeepMind, but I don’t feel like it has the same presence in the AI safety discourse. I don’t know anything about its attitude to safety, or how serious a competitor it is. Are there good reasons it doesn’t get talked about? Should we be paying it more attention?
A new Bloomberg article says xAI is building a datacenter in Memphis, planned to become operational by the end of 2025, mentioning a new-to-me detail that the datacenter targets 150 megawatts (more details on DCD). This means the scale of 100,000 GPUs or $4 billion in infrastructure, a bulk of its recently secured $6 billion from Series B.
This should be good for training runs that could be said to cost $1 billion in cost of time (lasting a few months). And Dario Amodei is saying that this is the scale of today, for models that are not yet deployed. This puts xAI at 18 months behind, a difficult place to rebound from unless long-horizon task capable AI that can do many jobs (a commercially crucial threshold that is not quite AGI) is many more years away.
For some reason current labs are not running $10 billion training runs already, didn’t build the necessary datacenters immediately. It would take a million H100s and 1.5 gigawatts, supply issues seem likely. There is also a lot of engineering detail to iron out, so the scaling proceeds gradually.
But some of this might be risk aversion, unwillingness to waste capital where a slower pace makes a better use of it. As a new contender has no other choice, we’ll get to see if it’s possible to leapfrog scaling after all. And Musk has affinity with impossible deadlines (not necessarily with meeting them), so the experiment will at least be attempted.
I’ve asked similar questions before and heard a few things. I also have a few personal thoughts that I thought I’d share here unprompted. This topic is pretty relevant for me so I’d be interested in what specific claims in both categories people agree/disagree with.
Things I’ve heard:
There’s some skepticism about how well-positioned xAI actually is to compete with leading labs, because although they have a lot of capital and ability to fundraise, lots of the main bottlenecks right now can’t simply be solved by throwing more money at the problem. E.g. building infrastructure, securing power contracts, hiring top engineers, accessing huge amounts of data, and building on past work are all pretty limited by non-financial factors, and therefore the incumbents have lots of advantages. That being said, it’s placed alongside Meta and Google in the highest liquidity prediction market I could find about this asking which labs will be “top 3” in 2025.
There’s some optimism about their attitude to safety since Elon has been talking about catastrophic risks from AI in no uncertain terms for a long time. There’s also some optimism coming from the fact that he/xAI opted to appoint Dan Hendrycks as an advisor.
Personal thoughts:
I’m not that convinced that they will take safety seriously by default. Elon’s personal beliefs seem to be hard to pin down/constantly shifting, and honestly, he hasn’t seemed to be doing that well to me recently. He’s long had a belief that the SpaceX project is all about getting humanity off Earth before we kill ourselves, and I could see a similar attitude leading to the “build ASI asap to get us through the time of perils” approach that I know others at top AI labs have (if he doesn’t feel this way already).
I also think (~65%) it was a strategic blunder for Dan Hendrycks to take a public position there. If there’s anything I took away from the OpenAI meltdown, it’s a greater belief in something like “AI Safety realpolitik;” that is, when the chips are down, all that matters is who actually has the raw power. Fancy titles mean nothing, personal relationships mean nothing, heck, being a literal director of the organization means nothing, all that matters is where the money and infrastructure and talent is. So I don’t think the advisor position will mean much, and I do think it will terribly complicate CAIS’ efforts to appear neutral, lobby via their 501c4, etc. I have no special insight here so I hope I’m missing something, or that the position does lead to a positive influence on their safety practices that wouldn’t have been achieved by unofficial/ad-hoc advising.
I think most AI safety discourse is overly focused on the top 4 labs (OpenAI, Anthropic, Google, and Meta) and underfocused on international players, traditional big tech (Microsoft, Amazon, Apple, Samsung), and startups (especially those building high-risk systems like highly-technical domain specialists and agents). Similarly, I think xAI gets less attention than it should.
Probably preaching to the choir here, but I don’t understand the conceivability argument for p-zombies. It seems to rely on the idea that human intuitions (at least among smart, philosophically sophisticated people) are a reliable detector of what is and is not logically possible.
But we know from other areas of study (e.g. math) that this is almost certainly false.
Eg, I’m pretty good at math (majored in it in undergrad, performed reasonably well). But unless I’m tracking things carefully, it’s not immediately obvious to me (and certainly not inconceivable) that pi is a rational number. But of course the irrationality of pi is not just an empirical fact but a logical necessity.
Even more straightforwardly, one can easily construct Boolean SAT problems where the answer can conceivably be either True or False to a human eye. But only one of the answers is logically possible! Humans are far from logically omniscient rational actors.
Conceivability is not invoked for logical statements, or mathematical statements about abstract objects. But zombies seem to be concrete rather than abstract objects. Similar to pink elephants. It would be absurd to conjecture that pink elephants are mathematically impossible. (More specifically, both physical and mental objects are typically counted as concrete.) It would also seem strange to assume that elephants being pink is logically impossible. Or things being faster than light. These don’t seem like statements that could hide a logical contradiction.
Sure, I agree about the pink elephants. I’m less sure about the speed of light.
You may find it helpful to read the relevant sections of The Conscious Mind by David Chalmers, the original thorough examination of his view:
(II.7, “Argument 1: The logical possibility of zombies”. Pg. 98).
I think there’s an underlying failure to define what it is that’s logically conceivable. Those math problems have a formal definition of correctness. P-zombies do not—even if there is a compelling argument, we have no clue what the results mean, or how we’d verify them. Which leads to realizing that even if someone says “this is conceivable”, you have no reason to believe they’re conceiving the same thing you mean.
I think the argument is
I think you’re objecting to 2. I think you’re using a loose definition of “conceivable,” meaning no contradiction obvious to the speaker. I agree that’s not relevant. The relevant notion of “conceivable” is not conceivable by a particular human but more like conceivable by a super smart ideal person who’s thought about it for a long time and made all possible deductions.
1. doesn’t just follow from some humans’ intuitions: it needs argument.
Sure but then this begs the question since I’ve never met a super smart ideal person who’s thought about it for a long time and made all possible deductions. So then using that definition of “conceivable”, 1) is false (or at least undetermined).
No, it’s like the irrationality of pi or the Riemann hypothesis: not super obvious and we can make progress by thinking about it and making arguments.
I mean real progress is via proof and things leading up to a proof right? I’m not discounting mathematical intuition here but the ~entirety of the game comes from the correct formalisms/proofs, which is a very different notion of “thinking.”
Put in a different way, mathematics (at least ideally, in the abstract) is ~mind-independent.
Yeah, any relevant notion of conceivability is surely independent of particular minds
Do you think ideal reasoning is well-defined? In the limit I feel like you run into classic problems like anti-induction, daemons, and all sorts of other issues that I assume people outside of our community also think about. Is there a particularly concrete definition philosophers like Chalmers use?
Crypticity, Reverse Epsilon Machines and the Arrow of Time?
[see https://arxiv.org/abs/0902.1209 ]
Our subjective experience of the arrow of time is occasionally suggested to be an essentially entropic phenomenon.
This sounds cool and deep but crashes headlong into the issue that the entropy rate and the excess entropy of any stochastic process is time-symmetric. I find it amusing that despite hearing this idea often from physicists and the like apparently this rather elementary fact has not prevented their storycrafting.
Luckily, computational mechanics provides us with a measure that is not time symmetric: the stochastic complexity of the epsilon machine C
For any stochastic process we may also consider the epsilon machine of the reverse process, in other words the machine that predicts the past based on the future. This can be a completely different machine whose reverse stochastic complexity Crev is not equal to C.
Some processes are easier to predict forward than backward. For example, there is considerable evidence that language is such a process. If the stochastic complexity and the reverse stochastic complexity differ we speak of a causally assymetric process.
Alec Boyd pointed out to me that the classic example of a glass falling of a table is naturally thought of in these terms. The forward process is easy to describe while the backward process is hard to describe where easy and hard are meant in the sense of stochastic complexity: bits needed to specify the states of perfect minimal predictor, respectively retrodictor.
rk. note that time assymmetry is a fundamentally stochastic phenomenon. THe underlyiing (let’s say classicially deterministic) laws are still time symmetric.
The hypothesis is then: many, most macroscopic processes of interest to humans, including other agents are fundamentally such causally assymetric (and cryptic) processes.
It’s time symmetric around a starting point t0 of low entropy. The further t is from t0, the more entropy you’ll have, in either direction. The absolute value |t−t0| is what matters.
In this case, t0 is usually taken to be the big bang. So the further in time you are from the big bang, the less the universe is like a dense uniform soup with little structure that needs description, and the higher your entropy will be. That’s how you get the subjective perception of temporal causality.
Presumably, this would hold to the other side of t0 as well, if there is one. But we can’t extrapolate past t0, because close to t0 everything gets really really energy dense, so we’d need to know how to do quantum gravity to calculate what the state on the other side might look like. So we can’t check that. And the notion of time as we’re discussing it here might break down at those energies anyway.
See also the Past Hypothesis. If we instead take a non-speculative starting point as t0, namely now, we could no longer trust our memories, including any evidence we believe to have about the entropy of the past being low, or about physical laws stating that entropy increases with distance from t0. David Albert therefore says doubting the Past Hypothesis would be “epistemically unstable”.
What’s the actual probability of casting a decisive vote in a presidential election (by state)?
I remember the Gelman/Silver/Edlin “What is the probability your vote will make a difference?” (2012) methodology:
This gives the following results for the 2008 presidential election, where they estimate that you had less than one chance in a hundred billion of deciding the election in DC, but better than a one in ten million chance in New Mexico. (For reference, 131 million people voted in the election.)
Is this basically correct?
(I guess you also have to adjust for your confidence that you are voting for the better candidate. Maybe if you think you’re outside the top ~20% in “voting skill”—ability to pick the best candidate—you should abstain. See also.)
I would assum they have the math right but not really sure why anyone cares. It’s a bit like the Voter’s Paradox. In and of it self it points to an interesting phenomena to investivate but really doesn’t provide guidance for what someone should do.
I do find it odd that the probabilities are so low given the total votes you mention, and adding you also have 51 electoral blocks and some 530-odd electoral votes that matter. Seems like perhaps someone is missing the forest for the trees.
I would make an observation on your closing thought. I think if one holds that people who are not well informed, or perhaps less intelligent and so not as good at choosing good representatives then one quickly gets to most/many people should not be making their own economic decisions on consumption (or savings or investments). Simple premise here is that capital allocation matters to growth and efficiency (vis-a-vis production possibilities frontier). But that allocation is determined by aggregate spending on final goods production—i.e. consumer goods.
Seems like people have a more direct influence on economic activity and allocation via their spending behavior than the more indirect influence via politics and public policy.
FiveThirtyEight released their prediction today that Biden currently has a 53% of winning the election | Tweet
The other day I asked:
Probably worthwhile to think about this further, including ways to make leveraged bets.
I think the FiveThirtyEight model is pretty bad this year. This makes sense to me, because it’s a pretty different model: Nate Silver owns the former FiveThirtyEight model IP (and will be publishing it on his Substack later this month), so FiveThirtyEight needed to create a new model from scratch. They hired G. Elliott Morris, whose 2020 forecasts were pretty crazy in my opinion.
Here are some concrete things about FiveThirtyEight’s model that don’t make sense to me:
There’s only a 30% chance that Pennsylvania, Michigan, or Wisconsin will be the tipping point state. I think that’s way too low; I would put this probability around 65%. In general, their probability distribution over which state will be the tipping point state is way too spread out.
They expect Biden to win by 2.5 points; currently he’s down by 1 point. I buy that there will be some amount of movement toward Biden in expectation because of the economic fundamentals, but 3.5 seems too much as an average-case.
I think their Voter Power Index (VPI) doesn’t make sense. VPI is a measure of how likely a voter in a given state is to flip the entire election. Their VPIs are way to similar. To pick a particularly egregious example, they think that a vote in Delaware is 1/7th as valuable as a vote in Pennsylvania. This is obvious nonsense: a vote in Delaware is less than 1% as valuable as a vote in Pennsylvania. In 2020, Biden won Delaware by 19%. If Biden wins 50% of the vote in Delaware, he will have lost the election in an almost unprecedented landslide.
I claim that the following is a pretty good approximation to VPI: (probability that the state is the tipping state) * (number of electoral votes) / (number of voters). If you use their tipping-point state probabilities, you’ll find that Pennsylvania’s VPI should be roughly 4.3 times larger than New Hampshire’s. Instead, FiveThirtyEight has New Hampshire’s VPI being (slightly) higher than Pennsylvania’s.I retract this: the approximation should instead be (tipping point state probability) / (number of voters). Their VPI numbers now seem pretty consistent with their tipping point probabilities to me, although I still think their tipping point probabilities are wrong.The Economist also has a model, which gives Trump a 2⁄3 chance of winning. I think that model is pretty bad too. For example, I think Biden is much more than 70% likely to win Virginia and New Hampshire. I haven’t dug into the details of the model to get a better sense of what I think they’re doing wrong.
FWIW the polling in Virginia is pretty close—I’d put my $x against your $4x that Trump wins Virginia, for x ⇐ 200. Offer expires in 48 hours.
I’d have to think more about 4:1 odds, but definitely happy to make this bet at 3:1 odds. How about my $300 to your $100?
(Edit: my proposal is to consider the bet voided if Biden or Trump dies or isn’t the nominee.)
Could we do your $350 to my $100? And the voiding condition makes sense.
I’m now happy to make this bet about Trump vs. Harris, if you’re interested.
Yup, sounds good! I’ve set myself a reminder for November 9th.
Have recorded on my website
Update for posterity: Nate Silver’s model gives Trump a ~1 in 6 chance of winning Virginia, making my side of this bet look bad.
Further updates:
On the one hand, Nate Silver’s model now gives Trump a ~30% chance of winning in Virginia, making my side of the bet look good again.
On the other hand, the Economist model gives Trump a 10% chance of winning Delaware and a 20% chance of winning Illinois, which suggests that there’s something going wrong with the model and that it was untrustworthy a month ago.
That said, betting markets currently think there’s only a one in four chance that Biden is the nominee, so this bet probably won’t resolve.
Looks like this bet is voided. My take is roughly that:
To the extent that our disagreement was rooted in a difference in how much to weight polls vs. priors, I continue to feel good about my side of the bet.
I wouldn’t have made this bet after the debate. I’m not sure to what extent I should have known that Biden would perform terribly. I was blindsided by how poorly he did, but maybe shouldn’t have been.
I definitely wouldn’t have made this bet after the assassination attempt, which I think increased Trump’s chances. But that event didn’t update me on how good my side of the bet was when I made it.
I think there’s like a 75-80% chance that Kamala Harris wins Virginia.
Feel free to write a post if you find something worthwhile. I didn’t know how likely the whole Biden leaving the race thing was so 5% seemed prudent. At those odds, even if I belief the fivethirtyeight numbers I’d rather leave my money in etfs. I’d probably need something like >>1,2 multiplier in expected value before I’d bother. Last year when I was betting on Augur I was also heavily bitten by gas fees (150$ transaction costs to get my money back because gas fees exploded for eth), so would be good to know if this is a problem on polymarket also.
I have previously bet large sums on elections. Im not currently placing any bets on who will win the election. Seems too unclear to me (note I had a huge bet on biden in 2020, seemed clear then). However there are TONS of mispricings on polymarket and other sites. Things like ‘biden will withdraw or lose the nomination @ 23%’ is a good example.
Polymarket has gotten lots of attention in recent months, but I was shocked to find out how much inefficency there really is.
There was a market titled “What will Trump say during his RNC speech?” that was up a few days ago. At 7 pm, the transcript for the speech was leaked, and you could easily find it by a google search or looking at the polymarket discord.
Trump started his speech at 9:30, and it was immediately that he was using the script. One entire hour into the speech I stumbled onto the transcript on Polymarkets discord. Despite the word “prisons” being in the leaked transcript that Trump was halfway through, Polymarket only gave it a 70% chance of being said. I quickly went to bet and made free money.
To be fair it was a smaller market with 800k in bets, but nonetheless I was shocked on how easy it was to make risk-free money.
Biden not being the democratic nominee at 13% while EITHER Biden or Trump not being their respective nominees at 14% implies a 1% chance that Trump won’t be the Republican nominee. There’s clearly an arbitrage there. Whether it merits the costs (gas, risk of polymarket default, lost opportunity of the escrowed wager) I have no clue.
These predictions, of course, are obviously nonsensical. If I had to guess, it’s a combination of: many crypto users being right-wing and the media they consume has convinced them that this is more likely than it would be in reality, and climbing crypto prices discouraging betting leading to decreased accuracy. I’ll say that the climbing value of the currency as well as gas fees makes any prediction unwise, unless you believe you have massive advantage over the market. I’d personally pass on it, but other people are free to proceed with their money.
Betting against republicans and third parties on poly is a sound strategy, pretty clear they are marketing heavily towards republicans and the site has a crypto/republican bias. For anything controversial/political, if there is enough liq on manifold I generally trust it more (which sounds insane because fake money and all).
That being said, I don’t like the way Polymarket is run (posting the word r*tard over and over on Twitter, allowing racism in comments + discord, rugging one side on disputed outcomes, fake decentralization), so I would strongly consider not putting your money on PM and instead supporting other prediction markets, despite the possible high EV.
A random observation from a think tank event last night in DC—the average person in those rooms is convinced there’s a problem, but that it’s the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I’m not sure what to make of it, honestly.
There are (at least) two models which could partially explain this:
1) The high-status/high-rank people have that status because they’re better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it’ll come out OK without their involvement.
2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves.
edit: to clarify, these are two models that do NOT imply the obvious “smarter/more powerful people are correctly worried about the REAL threats, and the average person’s concerns are probably unimportant/uninformed”. It’s quite possible that this division doesn’t tell us much about the relative importance of those different risks.
Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don’t think I have anywhere near enough evidence yet to actually conclude that, so I’m just reporting the random observation for now :)
Here is a 5 minute, spicy take of an alignment chart.
What do you disagree with.
To try and preempt some questions:
Why is rationalism neutral?
It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.
Why are e/accs and EAs in the same group.
In the quick moments I took to make this, I found both EA and E/acc pretty hard to predict and pretty uncertain in overall impact across some range of forecasts.
Interesting. I always thought the D&D alignment chart was just a random first stab at quantizing a standard superficial Disney attitude toward ethics. This modification seems pretty sensible.
I think your good/evil axis is correct in terms of a deeper sense of the common terms. Evil people don’t try to harm others typically, they just don’t care- so their efforts to help themselves and their friends is prone to harm others. Being good means being good to everyone, not just your favorites. It’s the size of your circle of compassion. Outright malignancy, cackling about others suffering, is pretty eye-catching when it happens (and it does), but I’d say the vast majority of harm in the world has been done by people who are merely not much concerned with collateral damage. Thus, I think those deserve the term evil, lest we focus on the wrong thing.
Predictable/unpredictable seems like a perfectly good alternate label for the chaotic/lawful. In some adversarial situations, it makes sense to be unpredictable.
One big question is whether you’re referring to intentions or likely outcomes in your expected valaue (which I assume is expected value for all sentient beings or somethingg). A purely selfish person without much ambition may actually be a net good in the world; they work for the benefit of themselves and those close enough to be critical for their wellbeing, and they don’t risk causing a lot of harm since that might cause blowback. The same personality put in a position of power might do great harm, ordering an invasion or employee downsizing to benefit themselves and their family while greatly harming many.
Yeah I find the intention vs outcome thing difficult.
What do you think of “average expected value across small perturbations in your life”. Like if you accidentally hit churchill with a car and so cause the UK to lose WW2 that feels notably less bad than deliberately trying to kill a much smaller number of people. In many nearby universes, you didn’t kill churchill, but in many nearby universes that person did kill all those people.
What? This apology makes no sense. Of course rationalism is Lawful Neutral. The laws of cognition aren’t, can’t be, on anyone’s side.
I disagree with “of course”. The laws of cognition aren’t on any side, but human rationalists presumably share (at least some) human values and intend to advance them; insofar they are more successful than non-rationalists this qualifies as Good.
So by my metric, Yudkowsky and Lintemandain’s Dath Ilan isn’t neutral, it’s quite clearly lawful good, or attempting to be. And yet they care a lot about the laws of cognition.
So it seems to me that the laws of cognition can (should?) drive towards flouishing rather than pure knowledge increase. There might be things that we wish we didn’t know for a bit. And ways to increase our strength to heal rather than our strength to harm.
To me it seems a better rationality would be lawful good.
The laws of cognition are natural laws. Natural laws cannot possibly “drive towards flourishing” or toward anything else.
Attempting to make the laws of cognition “drive towards flourishing” inevitably breaks them.
A lot of problems arise from inaccurate beliefs instead of bad goals. E.g. suppose both the capitalists and the communists are in favor of flourishing, but they have different beliefs on how best to achieve this. Now if we pick a bad policy to optimize for a noble goal, bad things will likely still follow.
Chaotic Good: pivotal act
Lawful Evil: “situational awareness”
I’m surprised that there hasn’t been more of a shift to ternary weights a la BitNet 1.58.
What stood out to me in that paper was the perplexity gains over fp weights in equal parameter match-ups, and especially the growth in the advantage as the parameter sizes increased (though only up to quite small model sizes in that paper, which makes me curious about the potential delta in modern SotA scales).
This makes complete sense from the standpoint of the superposition hypothesis (irrespective of its dimensionality, an ongoing discussion).
If nodes are serving more than one role in a network, then constraining the weight to a ternary value as opposed to a floating point range seems like it would be more frequently forcing the network to restructure overlapping node usage to better align nodes to shared directional shifts (positive, negative, or no-op) as opposed to compromise across multiple roles to a floating point avg of the individual role changes.
(Essentially resulting in a sharper vs more fuzzy network mapping.)
A lot of the attention for the paper was around the idea of the overall efficiency gains given the smaller memory footprint, but it really seems like even if there were no additional gains, that models being pretrained from this point onward should seriously consider clamping node precision to improve both the overall network performance and likely make interpretability more successful down the road to boot.
It may be that at the scales we are already at, the main offering of such an approach would be the perplexity advantages over fp weights, with the memory advantages as the beneficial side effect instead?
A metaphor for the US-China AGI race
It is as though two rivals have discovered that there are genies in the area. Whichever of them finds a genie and learns to use its wishes can defeat their rival, humiliating or killing them if they choose. If they both have genies, it will probably be a standoff that encourages defection; these genies aren’t infinitely powerful or wise, so some creative offensive wish will probably bypass any number of defensive wishes. And there are others that may act if they don’t.
In this framing, the choice is pretty clear. If it’s dangerous to use a genie without taking time to understand and test it, too bad. Total victory or complete loss hang in the balance. If one is already ahead in the search, they’d better speed up and make sure their rival can’t follow their tracks to find a genie of their own.
This is roughly the scenario Aschenbrenner presents in Situational Awareness. But this is simplifying, and focusing attention on one part of the scenario, the rivalry and the danger. The full scenario is more complex.[1]
Of particular importance is that these “genies” can serve as well for peace as for war. The can grant wealth beyond imagination, and other things barely yet hoped for. And they will probably take substantial time to come into their full power.
This changes the overwhelming logic of racing. Using a genie to prevent a rival from acquiring one is not guaranteed to work, and it’s probably not possible without collateral damage. So trying that “obvious” strategy might result in the rival attacking in fear of or retaliation. Since both rivals are already equipped with dreadful offensive weapons, such a conflict could be catastrophic. This risk applies even if one is willing to assume that controlling the genie (alignment) is a solvable problem.
And we don’t know the depth of the rivalry. Might these two be content to both enjoy prosperity and health beyond their previous dreams? Might they set aside their rivalry, or at least make a pledge to not attack each other if they find a genie? Even if it’s only enforced by their conscience, such a pledge might hold if suddenly all manner of wonderful things became possible at the same time as a treacherous unilateral victory. Would it at least make sense to discuss this possibility while they both search for a genie? And perhaps they should also discuss how hard it might be to give a wish that doesn’t backfire and cause catastrophe.
This metaphor is simplified, but it raises many of the same questions as the real situation we’re aware of.
Framed in this way, it seems that Aschenbrenner’s call for a race is not the obviously correct or inevitable answer. And the question seems important.
Other perspectives on Situational Awareness, each roughly agreeing on the situation but with differences that influence the rational and likely outcomes:
Nearly a book review: Situational Awareness, by Leopold Aschenbrenner.
Against Aschenbrenner: How ‘Situational Awareness’ constructs a narrative that undermines safety and threatens humanity
Response to Aschenbrenner’s “Situational Awareness”
On Dwarksh’s Podcast with Leopold Aschenbrenner
I have agreements and disagreements with each of these, but those are beyond the scope of this quick take.
While I generally like the metaphor, my one issue is that genies are typically conceived of as tied to their lamps and corrigibility.
In this case, there’s not only a prisoner’s dilemma over excavating and using the lamps and genies, but there’s an additional condition where the more the genies are used and the lamps improved and polished for greater genie power, the more the potential that the respective genies end up untethered and their own masters.
And a concern in line with your noted depth of the rivalry is (as you raised in another comment), the question of what happens when the ‘pointer’ of the nation’s goals might change.
For both nations a change in the leadership could easily and dramatically shift the nature of the relationship and rivalry. A psychopathic narcissist coming into power might upend a beneficial symbiosis out of a personally driven focus on relative success vs objective success.
We’ve seen pledges not to attack each other with nukes for major nations in the past. And yet depending on changes to leadership and the mental stability of the new leaders, sometimes agreements don’t mean much and irrational behaviors prevail (a great personal fear is a dying leader of a nuclear nation taking the world with them as they near the end).
Indeed—I could even foresee circumstances whereby the only possible ‘success’ scenario in the case of a sufficiently misaligned nation state leader with a genie would be the genie’s emergent autonomy to refuse irrational and dangerous wishes.
Because until such a thing might exist, intermediate genies will enable unprecedented control and safety of tyrants and despots against would-be domestic usurpers, even if potentially limited impacts and mutually assured destruction against other nations with genies.
And those are very scary wishes to be granted indeed.
https://x.com/sama/status/1813984927622549881
According to Sam Altman, GPT-4o mini is much better than text-davinci-003 was in 2022, but 100 times cheaper. In general, we see increasing competition to produce smaller-sized models with great performance (e.g., Claude Haiku and Sonnet, Gemini 1.5 Flash and Pro, maybe even the full-sized GPT-4o itself). I think this trend is worth discussing. Some comments (mostly just quick takes) and questions I’d like to have answers to:
Should we expect this trend to continue? How much efficiency gains are still possible? Can we expect another 100x efficiency gain in the coming years? Andrej Karpathy expects that we might see a GPT-2 sized “smart” model.
What’s the technical driver behind these advancements? Andrej Karpathy thinks it is based on synthetic data: Larger models curate new, better training data for the next generation of small models. Might there also be architectural changes? Inference tricks? Which of these advancements can continue?
Why are companies pushing into small models? I think in hindsight, this seems easy to answer, but I’m curious what others think: If you have a GPT-4 level model that is much, much cheaper, then you can sell the service to many more people and deeply integrate your model into lots of software on phones, computers, etc. I think this has many desirable effects for AI developers:
Increase revenue, motivating investments into the next generation of LLMs
Increase market-share. Some integrations are probably “sticky” such that if you’re first, you secure revenue for a long time.
Make many people “aware” of potential usecases of even smarter AI so that they’re motivated to sign up for the next generation of more expensive AI.
The company’s inference compute is probably limited (especially for OpenAI, as the market leader) and not many people are convinced to pay a large amount for very intelligent models, meaning that all these reasons beat reasons to publish larger models instead or even additionally.
What does all this mean for the next generation of large models?
Should we expect that efficiency gains in small models translate into efficiency gains in large models, such that a future model with the cost of text-davinci-003 is massively more capable than today’s SOTA? If Andrej Karpathy is right that the small model’s capabilities come from synthetic data generated by larger, smart models, then it’s unclear to me whether one can train SOTA models with these techniques, as this might require an even larger model to already exist.
At what point does it become worthwhile for e.g. OpenAI to publish a next-gen model? Presumably, I’d guess you can still do a lot of “penetration of small model usecases” in the next 1-2 years, leading to massive revenue increases without necessarily releasing a next-gen model.
Do the strategies differ for different companies? OpenAI is the clear market leader, so possibly they can penetrate the market further without first making a “bigger name for themselves”. In contrast, I could imagine that for a company like Anthropic, it’s much more important to get out a clear SOTA model that impresses people and makes them aware of Claude. I thus currently (weakly) expect Anthropic to more strongly push in the direction of SOTA than OpenAI.
The vanilla Transformer architecture is horrifically computation inefficient. I really thought it was a terrible idea when I learnt about it. On every single token it processes ALL of the weights in the model and ALL of the context. And a token is less than a word — less than a concept. You generally don’t need to consider trivia to fill in grammatical words. On top of that, implementations of it were very inefficient. I was shocked when I read the FlashAttention paper: I had assumed that everyone would have implemented attention that way in the first place, it’s the obvious way to do it if you know anything about memory throughput. (My shock was lessened when I looked at the code and saw how tricky it was to incorporate into PyTorch.) Ditto unfused kernels, another inefficiency that exists to allow writing code in Python instead of CUDA/SYCL/etc.
Second point, transformers also seem to be very parameter inefficient. They have many layers and many attention heads largely so that they can perform multi-step inferences and do a lot in each step if necessary, but mechanistic interpretability studies shows just the center layers do nearly all the work. We now see transformers with shared weights between attention heads and layers and the performance drop is not that much. And there’s also the matter of bits per parameter, again a 10x reduction in precision is a surprisingly small detriment.
I believe that the large numbers of parameters in transformers aren’t primarily there to store knowledge, they’re needed to learn quickly. They perform routing and encode mechanisms (that is, pieces of algorithms) and their vast number provides a blank slate. Training data seen just once is often remembered because there are so many possible places to store it that it’s highly likely there are good paths through the network through which strong gradients can flow to record the information. This is a variant of the Lottery Ticket Hypothesis. But a better training algorithm could in theory do the same thing with fewer parameters. It would probably look very different from SGD.
I agree completely with Karparthy. However, I think you misread him, he didn’t say that data cleaning is the cause of improvements up until now, he suggested a course of future improvements. But there are already plenty of successful examples of small models improved in that way.
So I’m not the least bit surprised to see a 100x efficiency improvement and expect to see another 100x, although probably not as quickly (low hanging fruit). If you have 200B parameters, you probably could process only maybe 50M on average for most tokens. (However, there are many points where you need to draw on a lot of knowledge, and those might pull the average way up.) In 2016, a 50M parameter Transformer was enough for SoTA translation between English/French and I’m sure it could be far more efficient today.
To make a Chinchilla optimal model smaller while maintaining its capabilities, you need more data. At 15T tokens (the amount of data used in Llama 3), a Chinchilla optimal model has 750b active parameters, and training it invests 7e25 FLOPs (Gemini 1.0 Ultra or 4x original GPT-4). A larger $1 billion training run, which might be the current scale that’s not yet deployed, would invest 2e27 FP8 FLOPs if using H100s. A Chinchilla optimal run for these FLOPs would need 80T tokens when using unique data.
Starting with a Chinchilla optimal model, if it’s made 3x smaller, maintaining performance requires training it on 9x more data, so that it needs 3x more compute. That’s already too much data, and we are only talking 3x smaller. So we need ways of stretching the data that is available. By repeating data up to 16 times, it’s possible to make good use of 100x more compute than by only using unique data once. So with say 2e26 FP8 FLOPs (a $100 million training run on H100s), we can train a 3x smaller model that matches performance of the above 7e25 FLOPs Chinchilla optimal model while needing only about 27T tokens of unique data (by repeating them 5 times) instead of 135T unique tokens, and the model will have about 250b active parameters. That’s still a lot of data, and we are only repeating it 5 times where it remains about as useful in training as unique data, while data repeated 16 times (that lets us make use of 100x more compute from repetition) becomes 2-3 times less valuable per token.
There is also distillation, where a model is trained to predict the distribution generated by another model (Gemma-2-9b was trained this way). But this sort of distillation still happens while training on real data, and it only allows to make use of about 2x less data to get similar performance, so it only slightly pushes back the data wall. And rumors of synthetic data for pre-training (as opposed to post-training) remain rumors. With distillation on 16x repeated 50T tokens of unique data, we then get the equivalent of training on 800T tokens of unique data (it gets 2x less useful per token through repetition, but 2x more useful through distillation). This enables reducing active parameters 3x (as above, maintaining performance), compared to a Chinchilla optimal model trained for 80T tokens with 2e27 FLOPs (a $1 billion training run for the Chinchilla optimal model). This overtrained model would cost $3 billion (and have 1300b active parameters).
So the prediction is that the trend for getting models that are both cheaper for inference and smarter might continue into the imminent $1 billion training run regime but will soon sputter out when going further due to the data wall. Overcoming this requires algorithmic progress that’s not currently publicly in evidence, and visible success in overcoming it in deployed models will be evidence of such algorithmic progress within LLM labs. But Chinchilla optimal models (with corrections for inefficiency of repeated data) can usefully scale to at least 8e28 FLOPs ($40 billion in cost of time, 6 gigawatts) with mere 50T tokens of unique data.
Edit (20 Jul): These estimates erroneously use the sparse FP8 tensor performance for H100s (4 petaFLOP/s), which is 2 times higher than far more relevant dense FP8 tensor performance (2 petaFLOP/s). But with a Blackwell GPU, the relevant dense FP8 performance is 5 petaFLOP/s, which is close to 4 petaFLOP/s, and the cost and power per GPU within a rack are also similar. So the estimates approximately work out unchanged when reading “Blackwell GPU” instead of “H100″.
One question: Do you think Chinchilla scaling laws are still correct today, or are they not? I would assume these scaling laws depend on the data set used in training, so that if OpenAI found/created a better data set, this might change scaling laws.
Do you agree with this, or do you think it’s false?
New data! Llama 3.1 report includes data about Chinchilla optimality study on their setup. The surprise is that Llama 3.1 405b was chosen to have the optimal size rather than being 2x overtrained. Their actual extrapolation for an optimal point is 402b parameters, 16.55T tokens, and 3.8e25 FLOPs.
Fitting to the tokens per parameter framing, this gives the ratio of 41 (not 20) around the scale of 4e25 FLOPs. More importantly, their fitted dependence of optimal number of tokens on compute has exponent 0.53, compared to 0.51 from the Chinchilla paper (which was almost 0.5, hence tokens being proportional to parameters). Though the data only goes up to 1e22 FLOPs (3e21 FLOPs for Chinchilla), what actually happens at 4e25 FLOPs (6e23 FLOPs for Chinchilla) is all extrapolation, in both cases, there are no isoFLOP plots at those scales. At least Chinchilla has Gopher as a point of comparison, and there was only 200x FLOPs gap in the extrapolation, while for Llama 3.1 405 the gap is 4000x.
So data needs grow faster than parameters with more compute. This looks bad for the data wall, though the more relevant question is what would happen after 16 repetitions, or how this dependence really works with more FLOPs (with the optimal ratio of tokens to parameters changing with scale).
Data varies in the loss it enables, doesn’t seem to vary greatly in the ratio between the number of tokens and the number of parameters that extracts the best loss out of training with given compute. That is, I’m usually keeping this question in mind, didn’t see evidence to the contrary in the papers, but relevant measurements are very rarely reported, even in model series training report papers where the ablations were probably actually done. So could be very wrong, generalization from 2.5 examples. With repetition, there’s this gradual increase from 20 to 60. Probably something similar is there for distillation (in the opposite direction), but I’m not aware of papers that measure this, so also could be wrong.
One interesting point is the isoFLOP plots in the StripedHyena post (search “Perplexity scaling analysis”). With hybridization where standard attention remains in 8-50% of the blocks, perplexity is quite insensitive to change in model size while keeping compute fixed, while for pure standard attention the penalty for deviating from the optimal ratio to a similar extent is much greater. This suggests that one way out for overtrained models might be hybridization with these attention alternatives. That is, loss for an overtrained model might be closer to Chinchilla optimal loss with a hybrid model than it would be for a similarly overtrained pure standard attention model. Out of the big labs, visible moves in this directions were made by DeepMind with their Griffin Team (Griffin paper, RecurrentGemma). So that’s one way the data wall might get pushed a little further for the overtrained models.
Given a SotA large model, companies want the profit-optimal distilled version to sell—this will generically not be the original size. On this framing, regulation passes the misuse deployment risk from higher performance (/higher cost) models to the company. If profit incentives, and/or government regulation here continues to push businesses to primarily (ideally only?) sell 2-3+ OOM smaller-than-SotA models, I see a few possible takeaways:
Applied alignment research inspired by speed priors seems useful: e.g. how do sleeper agents interact with distillation etc.
Understanding and mitigating risks of multi-LM-agent and scaffolded LM agents seems higher priority
Pre-deployment, within-lab risks contribute more to overall risk
On trend forecasting, I recently created this Manifold market to estimate the year-on-year drop in price for SotA SWE agents to measure this. Though I still want ideas for better and longer term markets!
Surprising Things AGI Forecasting Experts Agree On:
I hesitate to say this because it’s putting words in other people’s mouths, and thus I may be misrepresenting them. I beg forgiveness if so and hope to be corrected. (I’m thinking especially of Paul Christiano and Ajeya Cotra here, but also maybe Rohin and Buck and Richard and some other people)
1. Slow takeoff means things accelerate and go crazy before we get to human-level AGI. It does not mean that after we get to human-level AGI, we still have some non-negligible period where they are gradually getting smarter and available for humans to study and interact with. In other words, people seem to agree that once we get human-level AGI, there’ll be a FOOM of incredibly fast recursive self-improvement.
2. The people with 30-year timelines (as suggested by the Cotra report) tend to agree with the 10-year timelines people that by 2030ish there will exist human-brain-sized artificial neural nets that are superhuman at pretty much all short-horizon tasks. This will have all sorts of crazy effects on the world. The disagreement is over whether this will lead to world GDP doubling in four years or less, whether this will lead to strategically aware agentic AGI (e.g. Carlsmith’s notion of APS-AI), etc.
I disagree with the first one. I think that the spectrum of human-level AGI is actually quite wide, and that for most tasks we’ll get AGIs that are better than most humans significantly before we get AGIs that are better than all humans. But the latter is much more relevant for recursive self-improvement, because it’s bottlenecked by innovation, which is driven primarily by the best human researchers. E.g. I think it’d be pretty difficult to speed up AI progress dramatically using millions of copies of an average human.
Also, by default I think people talk about FOOM in a way that ignores regulations, governance, etc. Whereas in fact I expect these to put significant constraints on the pace of progress after human-level AGI.
If we have millions of copies of the best human researchers, without governance constraints on the pace of progress… Then compute constraints become the biggest thing. It seems plausible that you get a software-only singularity, but it also seems plausible that you need to wait for AI innovation of new chip manufacturing to actually cash out in the real world.
I broadly agree with the second one, though I don’t know how many people there are left with 30-year timelines. But 20 years to superintelligence doesn’t seem unreasonable to me (though it’s above my median). In general I’ve updated lately that Kurzweil was more right than I used to think about there being a significant gap between AGI and ASI. Part of this is because I expect the problem of multi-agent credit assignment over long time horizons to be difficult.
Re 1: that’s not what slow takeoff means, and experts don’t agree on FOOM after AGI. Slow takeoff applies to AGI specifically, not to pre-AGI AIs. And I’m pretty sure at least Christiano and Hanson don’t expect FOOM, but like you am open to be corrected.
What do you think slow takeoff means? Or, perhaps the better question is, what does it mean to you?
Christiano expects things to be going insanely fast by the time we get to AGI, which I take to imply that things are also going extremely fast (presumably, even faster) immediately after AGI: https://sideways-view.com/2018/02/24/takeoff-speeds/
I don’t know what Hanson thinks on this subject. I know he did a paper on AI automation takeoff at some point decades ago; I forget what it looked like quantitatively.
Thanks for responding!
Slow or fast takeoff, in my understanding, refers to how fast an AGI can/will improve itself to (wildly) superintelligent levels. Discontinuity seems to be a key differentiator here.
In the post you link, Christiano is arguing against discontinuity. He may expect quick RSI after AGI is here, though, so I could be mistaken.
Likewise!
Christiano is indeed arguing against discontinuity, but nevertheless he is arguing for an extremely rapid pace of technnological progress—far faster than today. And in particular, he seems to expect quick RSI not only after AGI is here, but before!
I’d question the “quick” of “quick RSI”, but yes, he expects AI to make better AI before AGI.
I’m pretty sure he means really really quick, by any normal standard of quick. But we can take it up with him sometime. :)
But yes, Christiano is the authority here;)
He’s talking about a gap of years :) Which is probably faster than ideal, but not FOOMy, as I understand FOOM to mean days or hours.
Whoa, what? That very much surprises me, I would have thought weeks or months at most. Did you talk to him? What precisely did he say? (My prediction is that he’d say that by the time we have human-level AGI, things will be moving very fast and we’ll have superintelligence a few weeks later.)
Less relevant now, but I got the “few years” from the post you linked. There Christiano talked about another gap than AGI → ASI, but since overall he seems to expect linear progress, I thought my conclusion was reasonable. In retrospect, I shouldn’t have made that comment.
Not sure exactly what the claim is, but happy to give my own view.
I think “AGI” is pretty meaningless as a threshold, and at any rate it’s way too imprecise to be useful for this kind of quantitative forecast (I would intuitively describe GPT-3 as a general AI, and beyond that I’m honestly unclear on what distinction people are pointing at when they say “AGI”).
My intuition is that by the time that you have an AI which is superhuman at every task (e.g. for $10/h of hardware it strictly dominates hiring a remote human for any task) then you are likely weeks rather than months from the singularity.
But mostly this is because I think “strictly dominates” is a very hard standard which we will only meet long after AI systems are driving the large majority of technical progress in computer software, computer hardware, robotics, etc. (Also note that we can fail to meet that standard by computing costs rising based on demand for AI.)
My views on this topic are particularly poorly-developed because I think that the relevant action (both technological transformation and catastrophic risk) mostly happens before this point, so I usually don’t think this far ahead.
Thanks for offering your view Paul, and I apologize if I misrepresented your view.
Thanks! That’s what I thought you’d say. By “AGI” I did mean something like “for $10/h of hardware it strictly dominates hiring a remote human for any task” though I’d maybe restrict it to strategically relevant tasks like AI R&D, and also people might not actually hire AIs to do stuff because they might be afraid / understand that they haven’t solved alignment yet, but it still counts since the AIs could do the job. Also there may be some funny business around the price of the hardware—I feel like it should still count as AGI if a company is running millions of AIs that each individually are better than a typical tech company remote worker in every way, even if there is an ongoing bidding war and technically the price of GPUs is now so high that it’s costing $1,000/hr on the open market for each AGI. We still get FOOM if the AGIs are doing the research, regardless of what the on-paper price is. (I definitely feel like I might be missing something here, I don’t think in economic terms like this nearly as often as you do so)
My timelines are too short to agree with this part alas. Well, what do you mean by “long after?” Six months? Three years? Twelve years?
Is it really true that everyone (who is an expert) agrees that FOOM is inevitable? I was under the impression that a lot of people feel that FOOM might be impossible. I personally think FOOM is far from inevitable, even for superhuman intelligences. Consider that human civilization has a collective intelligence is that is strongly superhuman, and we are expending great effort to e.g. push Moore’s law forward. There’s Eroom’s law, which suggests that the aggregate costs of each new process node doubles in step with Moore’s law. So if FOOM depends on faster hardware, ASI might not be able to push forward much faster than Intel, TSMC, ASML, IBM and NVidia already are. Of course this all depends on AI being hardware constrained, which is far from certain. I just think it’s surprising that FOOM is seen as a certainty.
Depends on who you count as an expert. That’s a judgment call since there isn’t an Official Board of AGI Timelines Experts.
I’ve begun to doubt (1) recently, would be interested in seeing the arguments in favor of it. My model is something like “well, I’m human-level, and I sure don’t feel like I could foom if I were an AI.”
The straightforward argument goes like this:
1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude
2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn’t you be able to FOOM?
No because of the generalized version of Amdhal’s law, which I explored in “Fast Minds and Slow Computers”.
The more you accelerate something, the slower and more limiting all it’s other hidden dependencies become.
So by the time we get to AGI, regular ML research will have rapidly diminishing returns (and cuda low level software or hardware optimization will also have diminishing returns), general hardware improvement will be facing the end of moore’s law, etc etc.
I don’t see why that last sentence follows from the previous sentences. In fact I don’t think it does. What if we get to AGI next year? Then returns won’t have diminished as much & there’ll be lots of overhang to exploit.
Sure - if we got to AGI next year—but for that to actually occur you’d have to exploit most of the remaining optimization slack in both high level ML and low level algorithms. Then beyond that Moore’s law is already mostly ended or nearly so depending on who you ask, and most of the easy obvious hardware arch optimizations are now behind us.
Well I would assume a “human-level AI” is an AI which performs as well as a human when it has the extra memory and running speed? I think I could FOOM eventually under those conditions but it would take a lot of thought. Being able to read the AI research that generated me would be nice but I’d ultimately need to somehow make sense of the inscrutable matrices that contain my utility function.
I’ve also been bothered recently by a blurring of lines between “when AGI becomes as intelligent as humans” and “when AGI starts being able to recursively self-improve.” It’s not a priori obvious that these should happen at around the same capabilities level, yet I feel like it’s common to equivocate between them.
In any case, my world model says that an AGI should actually be able to recursively self-improve before reaching human-level intelligence. Just as you mentioned, I think the relevant intuition pump is “could I FOOM if I were an AI?” Considering the ability to tinker with my own source code and make lots of copies of myself to experiment on, I feel like the answer is “yes.”
That said, I think this intuition isn’t worth much for the following reasons:
The first AGIs will probably have their capabilities distributed very differently than humans—i.e. they will probably be worse than humans at some tasks and much better at other tasks. What really matters is how good they are the task “do ML research” (or whatever paradigm we’re using to make AI’s at the time). I think there are reasons to expect them to be especially good at ML research (relative to their general level of intelligence), but also reasons to expect them to be or especially bad, and I don’t know which reasons to trust more. Note that modern narrow AIs are already have some trivial ability to “do” ML research (e.g. OpenAI’s copilot).
Part of my above story about FOOMing involves making lots of copies of myself, but will it actually be easy for the first AGI (which might not be a generally intelligent as a human) to get the resources it needs to make lots of copies? This seems like it depends on a lot of stuff which I don’t have strong expectations about, e.g. how abundant are the relevant resources, how large is the AGI, etc.
Even if you think “AGI is human-level” and “AGI is able to recursively self-improve” represent very different capabilities levels, they might happen at very similar times, depending on what else you think about takeoff speeds.
Counter-anecdote: compilers have gotten ~2x better in 20 years[1], at substantially worse compile time. This is nowhere near FOOM.
Proebsting’s Law gives an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions, and a 2022 informal test showed a 10-15% improvement on average in the last 10 years (or a 50-year doubling time...)
I’m doubtful whether the notion of human level AGI makes much sense.
In it’s progression of getting more and more capability there’s likely no point where it’s comparable to a human.
Why?
Great quote, & chilling: (h/t Jacobjacob)
Seemed to jump out to me.
While I don’t always follow my own advice I do most of the time approach others from a view point that I can learn something from anyone and everyone.
This comment prompted me to read both Secrets and also The Doomsday Machine by Ellsberg. Both really great, highly recommend.
...in the last 24 hours? Or, like, awhile ago in a previous context?
In the last 24 hours. I read fast (but also skipped the last third of the Doomsday Machine).
AI-Caused Extinction Ingredients
Below is what I see is required for AI-Caused Extinction to happen in the next few tens of years (years 2024-2050 or so). In brackets is my very approximate probability estimation as of 2024-07-25 assuming all previous steps have happened.
AI technologies continue to develop at approximately current speeds or faster (80%)
AI manages to reach a level where it can cause an extinction (90%)
AI that can cause an extinction did not have enough alignment mechanisms in place (90%)
AI executes an unaligned scenario (low, maybe less than 10%)
Other AIs and humans aren’t able to notice and stop the unaligned scenario in time (50-50ish)
Once the scenario is executed humanity is never able to roll it back (50-50ish)