For being too indistinguishable from GPT-3.
While your comment was clearly written in good faith, it seems to me like you’re missing some context. You recommend that EY recommend that the detractors read books. EY doesn’t just recommend people read books. He wrote the equivalent of like three books on the subjects relevant to this conversation in particular which he gives away for free. Also, most of the people in this conversation are already big into reading books.
It is my impression he also helped establish the Center for Applied Rationality, which has the explicit mission of training skills. (I’m not sure if he technically did but he was part of the community which did and he helped promote it in its early days.)
Eliezer was involved with CFAR in the early days, but has not been involved since at least 2016.
From what I remember, Selfish Reasons to Have More Kids contains only a very short mention of how to inspire good behavior in children—essentially the advice to punish consistently, especially including funny or endearing offenses. The more I think about this, the more it seems to me that discipline itself is the practice of training the same response to an increasing range of stimuli.
Progress in Vipassana, and in meditation in general, comes from engaging with fewer and fewer distractions. Progress in habits comes from decreasing the number of times that you allow some override or excuse to change your plans. Progress in security comes from shrinking the number of people you make exceptions for, maybe because you don’t want to be mean to them or there are extenuating circumstances. Progress in being a fair parent comes from providing the same consequences for the same actions, even if one of your children’s actions gets a warmer reaction from you.
This isn’t to say that discipline is the only virtue, or that you ought to seek discipline for its own sake; some excuses do rise above the threshold of acceptability. But the point of training discipline is, in a nutshell, to have consistent reactions to stimuli that invite inconsistent ones.
I’ve been writing things in one form or another for about 13 years, and I’ve passed the first “million bad words” every writer’s said to have. I’ve also released around 200 rap songs under different names, which tends to come off lower-status but is no less dear to me. (I would consider 500-1000 songs to be the equivalent “bad” cutoff, though.) Since these are the things I’ve spent the longest time learning and practicing, I want to see if I can apply anything in them to learning and practicing in general. Here are my thoughts so far:
Every art is really several arts. There is no “learning to write” separate from learning characterization, imagery, the elements of style, dialogue, text formatting and so on.
Likewise, these sub-skills are not all created equally—while content, rhyme scheme, flow, swing, breath, cadence, delivery and so on may be the most relevant to other musicians critiquing your work, the final judgment of what you make can depend on things that seem at first to be unimportant or polish, like the technical specs of your microphone or the level of audio engineering performed on your finished product.
Attainment individuates. Rank beginners produce things that are very alike because they’re making the same handful of beginner mistakes. “Beginner mistakes” themselves are just how humans natively do things, and it’s only in overcoming them that the wider space of possible performances is opened.
Because of this, skills and techniques are generally about iterating away from whatever your earliest work looks like, one step at a time. And from wherever you’re standing, it’s possible to identify someone as being more or less skilled than you. (Someone it’s hard to do this on, who seems better at some things than you are and worse at others in a way that’s inconclusive, is basically level with you.)
You can give more advice to one student than to all the students in the world, and most advice is written for as many of them as possible. 1-on-1 lessons with someone you want to be more like can be worth more than all the articles or books you can find, if you’re able to truly accept how many things still need improving.
To reemphasize: the person who teaches you must be someone you want to be more like. Not all advice is instruction in how to be more like the advisor, because people can recognize and speak about their own flaws more easily than they can fix them, but you should expect to wind up more like your teacher than if you hadn’t studied under them.
When it’s said that those at high skill levels are “never done learning” it’s because they’ve achieved a higher granularity of what they can improve.
Early work is typically bad even to you, which can be extremely frustrating, but after enough practice it’s possible to be impressed by your own work—especially if you’ve left it for long enough to forget the details of it. Despite this fact, impostor syndrome never really goes away, it only adapts to your new situation.
Arts produce communities of practitioners. The majority of these people will be worse at what they’re practicing than you’d expect, because they fall below the skill level usually presented publicly. At first you may not recognize that you’re one of these people, and that your work is not so great either.
Because of this, spaces practitioners use will typically sprout a policy of giving as little criticism as possible.
This is not necessarily a bad thing; criticism is not a panacea, not all of it is worth following and there’s only so much of it you can act on until you’ve internalized and digested it. Plus oftentimes people really do just need a bit of social encouragement so that they can continue forward. (This goes doubly so for anyone who makes things for free.)
There’s nothing wrong with simply wanting praise, but know that’s what you’re looking for and don’t confuse it with a desire to cleave off your flaws as fast as possible.
The more work you produce the easier criticism will be to take, because you identify less with the thing being critiqued.
The practice of creating something is really the practice of making very small moment-to-moment decisions, usually between the handful of ideas that arise in your mind about which thing to do next.
As you notice the flaws in your own work and the work of others, you’ll eventually notice flaws in the work of those you admire, including the ones who inspired you to begin in the first place. The end result of this isn’t dislike of your past heroes, but an understanding that they really were just like you are.
Beginners should aim to use or dismiss as many of their ideas as possible, because better ones will appear given space and attention. Your ideas and insights into what to do will never be worse than they are now, and future ones need room the present ones are taking up. Be less concerned about running out of ideas than with the quality of the ones you already have.
The simplest way to encourage ideas is to write down each new one as soon as possible from when it enters thought, especially the ones that are no good. This both keeps the way clear for better ones and encourages your brain to generate ideas continuously.
Doing something well and doing it well in front of an audience can be completely different skills, and sometimes your feeling that you’re good at something will completely dissolve once a small group of strangers are watching you. (When I attempted a small concert at a convention the songs I thought I could repeat by heart were suddenly so foreign to me that I had to read them off of a phone.)
Likewise, there are things which may seem as trivial or “part of the package” which no amount of normal mastery will teach you, and the only path to these is to learn them separately. (The easiest example of this is freestyling, which I can’t do to save my life despite it being the first thing anyone asks for upon hearing that you make rap—to develop it, I would need to start fresh with the skillset involved, which focuses more on quick-wittedness and knowing precisely how much to extend oneself.)
I’m impressed by how accurately this describes learning complex skills.
I’m practicing writing and I feel the same way most of the points describe: as if I’m exploring a system of caves without a map, finding bits and pieces of other explorers (sometimes even meeting them), but it’s all walking a complicated, 3d structure and constantly bumping into unknown unknowns. Let me illustrate it this way: about 3 years ago, when I started on this journey, I thought I would read 1-2 books about writing and I’ll be good. Now, I’m standing in sub-cave system #416, taking a hard look at “creativity”/”new ideas” and chuckling at my younger self who thought that sub-cave system #18 “good sentences” will lead him to the exit.
And even though I haven’t practices Brazilian Jiu Jitsu since the pandemic began, I see a lot of similarities there. At first, I thought I just have to practice a move. Then I noticed that there are many small variations depending on my energy level, the opponents size and weight, etc. Then I noticed that I could fake moves to lure my opponent into making mistakes, but I should avoid mistakes myself. Then I noticed that my opponents were better in at some moves than others. Then I noticed that my own build gave me certain advantages and disadvantages. Then I noticed...
At the end, just before the lockdowns, I learned a lot about humility and began to discard all the “factual knowledge” I got from youtube videos or books and instead began focusing on sets of small details to explore how they worked in different situations. Then, just practiced it over and over until I saw “the thing”.
Sounds correct. I was thinking how this applies to computer games:
Several subskills—technical perfection, new idea, interesting story, graphics, music… Different games become popular for different aspects (Tetris vs Mass Effect vs Cookie Clicker).
A frequent beginner mistake is making a game with multiple levels which feel like copies of each other. That’s because you code e.g. five or ten different interactive elements, and then you use all of them in every level. It makes the first level needlessly difficult, and every following level boring. Instead, you should introduce them gradually, so each level contains a little surprise, and perhaps you should never use all of them in the same level, but always use different subsets, so each level has a different flavor instead of merely being more difficult.
Another beginner mistake is to focus on the algorithm and ignore the non-functional aspects. If one level has a sunset in background, and another level uses a night sky with moon, it makes the game nicer, even if the background does not change anything about functionality.
Yet another mistake is to make the game insanely difficult, because as a developer you know everything about it and you played the first level for hundred times, so even the insanely difficult feels easy to you. If most new players cannot complete the tutorial, your audience is effectively just you alone.
Some people may be successful and yet you don’t want to be like them, e.g. because they optimize the product to be addictive, while you aim for a different type of experience; or their approach is “explore the market, and make a clone of whatever sells best”, while you have a specific vision.
You should do a very simple game first, because you are probably never going to finish a complicated one if it’s your first attempt. I know a few people who ignored this advice, spent a lot of time designing something complex, in one case even rented a studio… but never finished anything. (Epistemic check: possible base-rate fallacy; most people never write a complete computer game, this might include even most of those who started small.) And the more time you wasted trying to make a complicated game, the less likely you are to give up and start anew.
Successful game authors often recycle good ideas from their previous, less successful games.
The audience is famously toxic. Whatever game you make, some people will say horrible things about the game and about you in general. It is probably wise to ignore them. (Epistemic check: so you’re saying that you should only listen to those who liked your game? Yeah… from the profit perspective, the important thing is how many fans you have, not what is their ratio to haters. A game with 1000 fans and 10000 haters is more successful than a game with 10 fans and 1 hater.)
Being good at designing logical puzzles does not translate into being good at designing 3D shooters, and vice versa.
Just published the first chapter of a month-long novel-writing experiment, which contains enough LW-compatible tropes that it might be of interest: Hustling Through the Dark
Having put some thoughts into the 80,000 hours career planning document, I think it is time for the next “some weeks of thinking” projects.
Either it’s gonna be similar planning processes:
the life of my kids, 20 years in the future
where should I spend the next 5 years? And where the next 20?
a plan for personal finances
a health plan
a sports plan
whom to spend time with
Or it’s gonna be concrete learning projects:
python or r
(Some of them more like refreshers)
Your thoughts are appreciated.
Did you consider looking at it rather from “options” than “goals” perspective?
Rather than defining goals and looking for the optimal path to get there, you can look at /brainstorm exploitable options that you have available and seem to have high returns. And then prioritize them.
I recently spent half a day writing down cool ideas for things to do, then collected them in todoist, and since then, whenever I have time I go through them. And add something new.
Kids, location, finances, and health are all extraordinarily high-leverage to think about—at least if you act on your plans.
Personally I’d start with personal finance, mostly because it should be pretty quick and simple to sort out (not always easy, to stick to, but simple). The personalfinance reddit has good flowcharts to follow, and I wrote a list of investing resources here if you want more detail than “buy index funds and get on with the rest of your life”.
~2 weeks ago, the FDA added a warning to the J&J Covid shot regarding increased risk of developing Guillain-Barré Syndrome.
Perhaps unsurprisingly, given the history with blood clots, my quick check of prevalence finds that reports of developing GBS following J&J vaccination are actually less than would be expected otherwise.
My very basic analysis: https://docs.google.com/spreadsheets/d/1wDFrDq0E6Q096E97XzU7ndP53mYC0Paf9Wyun-XxmWA/edit?usp=sharing
Numbers from: https://www.yalemedicine.org/news/covid-vaccine-guillain-barre-syndrome
I’ve noticed that a lot of blogs in the rationalist diaspora lack favicons. This often makes it difficult to navigate tabs for different blogs, PDFs, ect. A favicon takes <15 minutes to make, saves your readers time, and improves your appearance. A win-win-win.
Maybe people think it’s cool to have no favicon, or they want to stay on the down-low, or are ultra-minimalist, but, like, seriously?
More likely that this just doesn’t occur to people and/or they have no idea how to make and install favicons, and possibly don’t even have any concept of how favicons work and where they come from.
Don’t underestimate the technical cluelessness of people on the internet, even ‘rationalists’.
Causal structure is an intuitively appealing way to pick out the “intended” translation between an AI’s model of the world and a human’s model. For example, intuitively “There is a dog” causes “There is a barking sound.” If we ask our neural net questions like “Is there a dog?” and it computes its answer by checking “Does a human labeler think there is a dog?” then its answers won’t match the expected causal structure—so maybe we can avoid these kinds of answers.
What does that mean if we apply typical definitions of causality to ML training?
If we define causality in terms of interventions, then this helps iff we have interventions in which the labeler is mistaken. In general, it seems we could just include examples with such interventions in the training set.
Similarly, if we use some kind of closest-possible-world semantics, then we need to be able to train models to answer questions consistently about nearby worlds in which the labeler is mistaken. It’s not clear how to train a system to do that. Probably the easiest is to have a human labeler in world X talking about what would happen in some other world Y, where the labeling process is potentially mistaken. (As in “decoupled rl” approaches.) However, in this case it seems liable to learn the “instrumental policy” that asks “What does a human in possible world X think about what would happen in world Y?” which seems only slightly harder than the original.
We could talk about conditional independencies that we expect to remain robust on new distributions (e.g. in cases where humans are mistaken). I’ll discuss this a bit in a reply.
Here’s an abstract example to think about these proposals, just a special case of the example from this post.
Suppose that reality M is described as a causal graph X --> A --> B --> C, and then the observation Y is a function of (A, B, C).
The human’s model M’ of the situation is X --> A’ --> B’ --> C’. Each of them is a coarse-graining of the corresponding part of the real world model, and the observation Y is still a function of (A’, B’, C’), it’s just more uncertain now.
The coarse-grained dynamics are simpler than the actual coarse-graining f: (A, B, C) --> (A’, B’, C’).
We prepare a dataset by actually sampling (X, A, B, C, Y) from M, having humans look at it, make inferences about (A’, B’, C’), and get a dataset of (X, A’, B’, C’, Y) tuples to train a model.
The intended question-answering function is to use M to sample (A, B, C, Y) then apply the coarse-graining f to get (A’, B’, C’). But there is also a bad function that produces good answers on the training dataset: use M to sample (A, B, C, Y), then use the human’s model to infer (A’, B’, C’), and output those.
We’d like to rule out this bad function by making some kind of assumption about causal structure.
This is also a way to think about the proposals in this post and the reply:
The human believes that A’ and B’ are related in a certain way for simple+fundamental reasons.
On the training distribution, all of the functions we are considering reproduce the expected relationship. However, the reason that they reproduce the expected relationship is quite different.
For the intended function, you can verify this relationship by looking at the link (A --> B) and the coarse-graining applied to A and B, and verify that the probabilities work out. (That is, I can replace all of the rest of the computational graph with nonsense, or independent samples, and get the same relationship.)
For the bad function, you have to look at basically the whole graph. That is, it’s not the case that the human’s beliefs about A’ and B’ have the right relationship for arbitrary Ys, they only have the right relationship for a very particular distribution of Ys. So to see that A’ and B’ have the right relationship, we need to simulate the actual underlying dynamics where A --> B, since that creates the correlations in Y that actually lead to the expected correlations between A’ and B’.
It seems like we believe not only that A’ and B’ are related in a certain way, but that the relationship should be for simple reasons, and so there’s a real sense in which it’s a bad sign if we need to do a ton of extra compute to verify that relationship. I still don’t have a great handle on that kind of argument. I suspect it won’t ultimately come down to “faster is better,” though as a heuristic that seems to work surprisingly well. I think that this feels a bit more plausible to me as a story for why faster would be better (but only a bit).
It’s not always going to be quite this cut and dried—depending on the structure of the human beliefs we may automatically get the desired relationship between A’ and B’. But if that’s the case then one of the other relationships will be a contingent fact about Y—we can’t reproduce all of the expected relationships for arbitrary Y, since our model presumably makes some substantive predictions about Y and if those predictions are violated we will break some of our inferences.
So are there some facts about conditional independencies that would privilege the intended mapping? Here is one option.
We believe that A’ and C’ should be independent conditioned on B’. One problem is that this isn’t even true, because B’ is a coarse-graining and so there are in fact correlations between A’ and C’ that the human doesn’t understand. That said, I think that the bad map introduces further conditional correlations, even assuming B=B’. For example, if you imagine Y preserving some facts about A’ and C’, and if the human is sometimes mistaken about B’=B, then we will introduce extra correlations between the human’s beliefs about A’ and C’.
I think it’s pretty plausible that there are necessarily some “new” correlations in any case where the human’s inference is imperfect, but I’d like to understand that better.
So I think the biggest problem is that none of the human’s believed conditional independencies actually hold—they are both precise, and (more problematically) they may themselves only hold “on distribution” in some appropriate sense.
This problem seems pretty approachable though and so I’m excited to spend some time thinking about it.
Actually if A --> B --> C and I observe some function of (A, B, C) it’s just not generally the case that my beliefs about A and C are conditionally independent given my beliefs about B (e.g. suppose I observe A+C). This just makes it even easier to avoid the bad function in this case, but means I want to be more careful about the definition of the case to ensure that it’s actually difficult before concluding that this kid of conditional independence structure is potentially useful.
Sometimes we figure out the conditional in/dependence by looking at the data. It may not match common sense intuition, but if your model takes that into account and gives better results, then they just keep the conditional independence in there. You are only able to do with what you have. Lack of attributes may force you to rely on other dependencies for better predictions.
Conditional probability should be reflected if given enough data points. When you introduce human labeling into the equation, you are adding another uncertainty about the accuracy of the human doing the labeling, regardless whether the inaccuracy came from his own false sense of conditional independence. Usually human labeling don’t directly take into account of any conditional probability to not mess with the conditionals that exist within the data set. That’s why the more data the better, which also means the more labelers you have the less dependent you are on the inaccuracy of any individual human.
The conditional probability assumed in the real world carries over to the data representation world simply because it’s trying to model the same phenomenon in the real world, despite it’s coarse grained nature. Without the conditional probability, we wouldn’t be able to make the same strong inferences that match up to the real world. The causality is part of the data. If you use a different casual relationship, the end model would be different, and you would be solving a very different problem than if you applied the real world casual relationship.
You could use the random number generator to return a sequence of random characters. If they happen to be words that describe a thing you might possibly do, obey them, otherwise generate another sequence.
This removes the burden of having to list all your choices, including the ones about buying pokemons.
(But creates a new problem of what exactly is a meaningful command. For example, “go”, without specifying where exactly to go; are you free to simply go to the place of your choice, or do you have to re-roll?)
The procedure of putting each item down comes with your promise that you WILL do it should it come up.
The procedure of putting each item down comes with your promise that you WILL do it should it come up.
Your brain can’t contain an infinite list or branching diagram of choices, because it’s finite. I have no idea why you think such a list us the only alternative to omnipotence. You are not omniscient either, so you can only make choices between the limited amount of ideas your brain can come up with. Why shouldn’t an undetermined choice between two things count as free?
I’m not convinced immediate-omniscience is a necessary condition of Free Will, only eventual-omniscience is. As in, the number of choices you’re presented with needs to eventually accumulate to infinity,
I’m not convinced immediate-omniscience is a necessary condition of Free Will, only eventual-omniscience is. As in, the number of choices you’re presented with needs to eventually accumulate to infinity,
..if you live infinitely long, but you don’t. You need to show that the branching reaches infinity in a finite time. (Or you have a finite set of branches which somehow include something that requires omnipotence...?)
Did you mean “why should” here?
Did you mean “why should” here?
This is (one of the versions of) Niven’s Law.
Any sufficiently advanced magic, is an absolute pain to replicate with technology (even if it looks very easy).
Take walking for example, or just making a robot that can pour a glass of wine like a normal person.
Or sufficiently analysed magic.
Shared with permission, a google doc exchange confirming Eliezer still finds the arguments for alignment optimism, slower takeoffs, etc. unconvincing:
Daniel Filan: I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is “from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1”) and I still don’t totally get why.Eliezer Yudkowsky: My bitter take: I tried cutting back on talking to do research; and so people talked a bunch about a different scenario that was nicer to think about, and ended up with their thoughts staying there, because that’s what happens if nobody else is arguing them out of it.That is: this social-space’s thought processes are not robust enough against mildly adversarial noise, that trying a bunch of different arguments for something relatively nicer to believe, won’t Goodhart up a plausible-to-the-social-space argument for the thing that’s nicer to believe. If you talk people out of one error, somebody else searches around in the space of plausible arguments and finds a new error. I wasn’t fighting a mistaken argument for why AI niceness isn’t too intractable and takeoffs won’t be too fast; I was fighting an endless generator of those arguments. If I could have taught people to find the counterarguments themselves, that would have been progress. I did try that. It didn’t work because the counterargument-generator is one level of abstraction higher, and has to be operated and circumstantially adapted too precisely for the social-space to be argued into it using words.You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson’s rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill.
Daniel Filan: I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is “from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1”) and I still don’t totally get why.
Eliezer Yudkowsky: My bitter take: I tried cutting back on talking to do research; and so people talked a bunch about a different scenario that was nicer to think about, and ended up with their thoughts staying there, because that’s what happens if nobody else is arguing them out of it.That is: this social-space’s thought processes are not robust enough against mildly adversarial noise, that trying a bunch of different arguments for something relatively nicer to believe, won’t Goodhart up a plausible-to-the-social-space argument for the thing that’s nicer to believe. If you talk people out of one error, somebody else searches around in the space of plausible arguments and finds a new error. I wasn’t fighting a mistaken argument for why AI niceness isn’t too intractable and takeoffs won’t be too fast; I was fighting an endless generator of those arguments. If I could have taught people to find the counterarguments themselves, that would have been progress. I did try that. It didn’t work because the counterargument-generator is one level of abstraction higher, and has to be operated and circumstantially adapted too precisely for the social-space to be argued into it using words.You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson’s rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill.
Caveat: this was a private reply I saw and wanted to share (so people know EY’s basic epistemic state, and therefore probably the state of other MIRI leadership). This wasn’t an attempt to write an adequate public response to any of the public arguments put forward for alignment optimism or non-fast takeoff, etc., and isn’t meant to be a replacement for public, detailed, object-level discussion. (Though I don’t know when/if MIRI folks plan to produce a proper response, and if I expected such a response soonish I’d probably have just waited and posted that instead.)
FWIW, I think Yudkowsky is basically right here and would be happy to explain why if anyone wants to discuss. I’d likewise be interested in hearing contrary perspectives.
I’d personally like to find some cruxes between us some time, though I don’t yet know the best format to do that. I think I’ll wait to see your responses to Issa’s question first.
Likewise! I’m up for a video call if you like. Or we could have a big LW thread, or an email chain. I think my preference would be a video call. I like Walled Garden, we could do it there and invite other people maybe. IDK.
Which of the “Reasons to expect fast takeoff” from Paul’s post do you find convincing, and what is your argument against what Paul says there? Or do you have some other reasons for expecting a hard takeoff?
I’ve seen this post of yours, but as far as I know, you haven’t said much about hard vs soft takeoff in general.
It’s a combination of not finding Paul+Katja’s counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don’t know how much each of them did), having various other arguments that they didn’t consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha… sorry… I am a bit overwhelmed… OK here’s maybe some quick (mostly cached) thoughts:
1. What we care about is point of no return, NOT GDP doubling in a year or whatever.
2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other—he’s asking and answering the wrong questions.
3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren’t seeing the beginnings of it now, timelines must be long. (I don’t know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that’s what progress looks like when we are entering the “human range” then we will cross it quickly, it seems.)
4. Discontinuities totally do sometimes happen. I think we shouldn’t expect them by default, but they aren’t super low-prior either; thus, we should do gears-level modelling of AI rather than trying to build a reference class or analogy to other tech.
5. Most of Paul+Katja’s arguments seem to be about continuity vs. discontinuity, which I think is the wrong question to be asking. What we care about is how long it takes (in clock time, or perhaps clock-time-given-compute-and-researcher-budget-X, given current and near-future ideas/algorithms) for AI capabilities to go from “meh” to “dangerous.” THEN once we have an estimate of that, we can use that estimate to start thinking about whether this will happen in a distributed way across the whole world economy, or in a concentrated way in a single AI project, etc. (Analogy: We shouldn’t try to predict greenhouse gas emissions by extrapolating world temperature trends, since that gets the causation backwards.)
6. I think the arguments Paul+Katja makes aren’t super convincing on their own terms. They are sufficient to convince me that the slow takeoff world they describe is possible and deserves serious consideration (more so than e.g. Age of Em or CAIS) but not overall convincing enough for me to say “Bostrom and Yudkowsky were probably wrong.” I could go through them one by one but I think I’ll stop here for now.
Thanks! My understanding of the Bostrom+Yudkowsky takeoff argument goes like this: at some point, some AI team will discover the final piece of deep math needed to create an AGI; they will then combine this final piece with all of the other existing insights and build an AGI, which will quickly gain in capability and take over the world. (You can search “a brain in a box in a basement” on this page or see here for some more quotes.)
In contrast, the scenario you imagine seems to be more like (I’m not very confident I am getting all of this right): there isn’t some piece of deep math needed in the final step. Instead, we already have the tools (mathematical, computational, data, etc.) needed to build an AGI, but nobody has decided to just go for it. When one project finally decides to go for an AGI, this EMH failure allows them to maintain enough of a lead to do crazy stuff (conquistadors, persuasion tools, etc.), and this leads to DSA. Or maybe the EMH failure isn’t even required, just enough of a clock time lead to be able to do the crazy stuff.
If the above is right, then it does seem quite different from Paul+Katja, but also different from Bostrom+Yudkowsky, since the reason why the outcome is unipolar is different. Whereas Bostrom+Yudkowsky say the reason one project is ahead is because there is some hard step at the end, you instead say it’s because of some combination of EMH failure and natural lag between projects.
Ah, this is helpful, thanks—I think we just have different interpretations of Bostrom+Yudkowsky. You’ve probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs—involving “deep math” and “one hard step at the end” is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely.
From what I’ve read, Yudkowsky did seem to think there would be more insights and less “just make blob of compute bigger” about a decade or two ago, but he’s long since updated towards “dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!” and I don’t think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I’d bet.
Ok I see, thanks for explaining. I think what’s confusing to me is that Eliezer did stop talking about the deep math of intelligence sometime after 2011 and then started talking about big blobs of matrices as you say starting around 2016, but as far as I know he has never gone back to his older AI takeoff writings and been like “actually I don’t believe this stuff anymore; I think hard takeoff is actually more likely to be due to EMH failure and natural lag between projects”. (He has done similar things for his older writings that he no longer thinks is true, so I would have expected him to do the same for takeoff stuff if his beliefs had indeed changed.) So I’ve been under the impression that Eliezer actually believes his old writings are still correct, and that somehow his recent remarks and old writings are all consistent. He also hasn’t (as far as I know) written up a more complete sketch of how he thinks takeoff is likely to go given what we now know about ML. So when I see him saying things like what’s quoted in Rob’s OP, I feel like he is referring to the pre-2012 “deep math” takeoff argument. (I also don’t remember if Bostrom gave any sketch of how he expects hard takeoff to go in Superintelligence; I couldn’t find one after spending a bit of time.)
If you have any links/quotes related to the above, I would love to know!
(By the way, I was was a lurker on LessWrong starting back in 2010-2011, but was only vaguely familiar with AI risk stuff back then. It was only around the publication of Superintelligence that I started following along more closely, and only much later in 2017 that I started putting in significant amounts of my time into AI safety and making it my overwhelming priority. I did write several timelines though, and recently did a pretty thorough reading of AI takeoff arguments for a modeling project, so that is mostly where my knowledge of the older arguments comes from.)
For all I know you are right about Yudkowsky’s pre-2011 view about deep math. However, (a) that wasn’t Bostrom’s view AFAICT, and (b) I think that’s just not what this OP quote is talking about. From the OP:
I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is “from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1”) and I still don’t totally get why.
It’s Yudkowsky/Bostrom, not Yudkowsky. And it’s WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that “you get what you measure” and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want.
I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I’ve argued against. I do not think that the reason people shifted from “yudkowsky/bostrom” (which in this context seems to mean “single AI project builds AI in the wrong way, AI takes over world” and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it’s a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking “has there been any serious response to Paul+Katja?”
You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson’s rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill
You can sometimes argue people into beliefs. It is much harder to argue them into skills. The negation of Robin Hanson’s rosier AI scenario was a belief. Negating an endless stream of rosy scenarios is a skill
A belief can be a negation in the sense of a contradiction , whilst not being a negation in the sense of a disproof. I dont think EY disproved RH’s position. I dont think he is confident he did himself, since his summary was called “what I believe if not why I believe it”. And I dont think lack of time was the problem, since the debate was immense.
Interesting, yeah I wonder why he titled it that. Still though it seems like he is claiming here to have disproved RH’s position to some extent at least. I for one think RH’s position is pretty implausible, for reasons Yudkowsky probably mentioned (I don’t remember exactly what Yud said).
Why the “seems”? A master rationalist should be able to state things clearly , surely?
Shortform #60 Back From Vacation
All of Wednesday [28 July] was spent travelling, as there were significant delays to my flight thanks to thunderstorms occurring on the flight path. Made it home safely though, but spending most of the day in airports or on airplanes is not the most fun sort of day.
I had an extremely good time on vacation in and around Madison, Wisconsin! What a lovely, pleasant, and beautiful place, I’ll be happy to visit again one day.
Today I unpacked, rested, applied for a really cool job, and generally recombobulated (fun fact: the Milwaukee airport has an official “Recombobulation Zone” immediately after the security screening area; I haven’t noticed other airports give this kind of zone a name before, but I like that the Milwaukee airport did). I have weekend plans for a family member’s wedding shower and am looking forward to that! Monday it’s back to the job hunting grindstone, for I am eager to be employed once more and reap all the benefits which accompany employment. Seattle beckons! As do a number of other things.
Hammertime Intermission for my group runs from 29 July − 1 August, we resume with Day 11 - Bug Hunt 2 on Monday 2 August. I personally will use the intermission to finish up the last few lessons from phase 1 I didn’t make time for while on vacation, redo a few things, reflect, and so on.
I have written ~60 ish shortform posts now. This has been a great and enriching experience for me. I have way more confidence in writing publicly, noticed increased confidence generally, am more coherent, am generally more productive, and am having fun. I will continue doing these shortforms and writing publicly, there’s much more growth to come! And so much to write about.
A non-exhaustive list of topics that are on my mind semi-regularly or regularly:
The use of tools by humans and the impact thereof
Software licensing schemes and philosophies, aka: what governs the mediation of the world we experience and all the impacts thereof, especially regarding future technologies that will literally change the machinery of human beingness and alter what experiences are possible.
Do beliefs I have about X make sense, pay rent, and/or check out aka are they true?
Cohering my thinking on X subject by reading deeply on it and writing about it
Plunging through my archives (I’ve read a tremendous amount of things and saved much for offline use) and writing brief summaries about most of what’s there or longer summaries or other types of writings to increase the legibility of my thoughts, development, idea lineages, increase coherency, and so on.
Values, AI, and alignment problems
Actually achieving human immortality
How to build a value / moral aligned society to last over the long term especially while distributed across the vast distances of space and with immortal humans. Aka exploring the ultimate goal of political philosophy
Contemplative traditions, philosophies, theologies, experiences, etc.
Taking ideas seriously and doing things about them
Helpful things I reread on a regular basis lately:
My Fear Heuristic
Four Components of Audacity
What I’ll be rereading, taking notes on, and carefully evaluating: Liber Augmen
You should go read that book (very strong endorse).
If you raised children in many different cultures, “how many” different reflectively stable moralities could they acquire? (What’s the “VC dimension” of human morality, without cheating by e.g. directly reprogramming brains?)
(This is probably a Wrong Question, but I still find it interesting to ask.)
Genetic algorithms are an old and classic staple of LW. 
Genetic algorithms (as used in optimization problems) traditionally assume “full connectivity”, that is any two candidates can mate. In other words, population network is assumed to be complete and potential mate is randomly sampled from the population.
Aymeric Vié has a paper out showing (numerical experiments) that some less dense but low average shortest path length network structures appear to result in better optimization results: https://doi.org/10.1145/3449726.3463134
Maybe this isn’t news for you, but it is for me! Maybe it is not news to anyone familiar with mathematical evolutionary theory?
This might be relevant for any metaphors or thought experiments where you wish to invoke GAs.
Failure to apply a lesson is usually failure to register its relevance to your life. Don’t look for the moments where you try and retry but your effort doesn’t bear fruit, look for gaps between what you learn/know and your world-model.
This is one reason to make beliefs pay rent in expectations, and to ensure that all the nodes in your model of the world are hooked up to something. The go-to example of this being Feynman’s story of students who understood the math of refraction indices but not that water was a refraction index, leaving them unable to use what they’d learned. I read that example years back and then fell into the exact same trap, because it’s a story about science education and not general thought.
In my own experience, the most powerful moment of this so far was realizing my trans-ness, which is also my greatest “oops” moment. I’d found some internet content centered around trans people, including lists of potential signs that a person is trans and hasn’t realized yet. I knew these signs applied to some people, but I also thought the net they cast was far, far too broad because, after all, half of them even applied to me! (Looking back, I sometimes laugh at the utter lack of self-awareness present in that thought.)
I already knew people could be trans, and I don’t think more information on that front would have helped anything. The realization that changed the course of my life really was just that I counted as people too.
I think the logic behind this is that we handle many many concepts and not all of them will apply in our lives, so the onus tends to be set pretty high for a piece of information to become relevant to us. There’s a default sense of immunity or exclusion to new concepts, even when we take them on knowing they surely must apply to something.
To handle catastrophizing, you have to first grasp that the thing you’re doing that seems to have traits in common with it probably isn’t a special edge case or a false alarm—and that the reason it hasn’t jumped out at you sooner as an example of catastrophizing is that you weren’t focusing on it. And if you read a guide to something like negotiation, the techniques involved will avail you nothing until you find reason to believe that you’re engaging in negotiations. Application rests on these kinds of understanding.
My power-seeking theorems seem a bit like Vingean reflection. In Vingean reflection, you reason about an agent which is significantly smarter than you: if I’m playing chess against an opponent who plays the optimal policy for the chess objective function, then I predict that I’ll lose the game. I predict that I’ll lose, even though I can’t predict my opponent’s (optimal) moves—otherwise I’d probably be that good myself.
My power-seeking theorems show that most objectives have optimal policies which e.g. avoid shutdown and survive into the far future, even without saying what particular actions these policies take to get there. I may not even be able to compute a single optimal policy for a single non-trivial objective, but I can still reason about the statistical tendencies of optimal policies.
if I’m playing chess against an opponent who plays the optimal policy for the chess objective function
1. I predict that you will never encounter such an opponent. Solving chess is hard.
2. Optimal play within a game might not be optimal overall (others can learn from the strategy).
Why does this matter? If the theorems hold, even for ‘not optimal, but still great’ policies (say, for chess), then the distinction is irrelevant. Though for more complicated (or non-zero sum) games, the optimal move/policy may depend on the other player’s move/policy.
(I’m not sure what ‘avoid shutdown’ looks like in chess.)
Update on my tinkering with using high doses of chocolate as a psychoactive drug:
(Nb: at times I say “caffeine” in this post, in contrast to chocolate, even though chocolate contains caffeine; by this I mean coffee, energy drinks, caffeinated soda, and caffeine pills collectively, all of which were up until recently frequently used by me; recently I haven’t been using any sources of caffeine other than chocolate, and even then try to avoid using it on a daily basis)
I still find that consuming high doses of chocolate (usually 3-6 table spoons of dark cocoa powder, or a corresponding dose of dark chocolate chips / chunks) has a stimulating effect that I find more pleasant than caffeine, and makes me effective at certain things in a way that caffeine doesn’t.
I am pretty sure that I was too confident in my hypothesis about why specifically chocolate has this effect. One obvious thing that I overlooked in my previous posts, is that chocolate contains caffeine, and this likely explains a large amount of its stimulant effects. It is definitely true that Theobromine has a very similar structure to caffeine, but it’s unclear to me that it has any substantial stimulant effect. Gilch linked me to a study that he stated suggests it doesn’t, but after reading the abstract, I found that it only justifies a weak update against thinking the Theobromine specifically has stimulant effects.
I’m confident that there are chemicals in chocolate other than caffeine that are responsible for me finding benefit in consuming it, but I have no idea what those chemicals are.
Originally I was going to do an experiment, randomly assigning days to either consume a large dose of chocolate or not, but after the first couple days, I decided against doing so, so I don’t have any personal experimentation to back up my observations, but just observationally, there’s a very big difference in my attitude and energy on days when I do or don’t consume chocolate.
When I talked to Herschel about his experience using chocolate, he noted that building up tolerance is a problem with any use of chemicals to affect the mind, which is obviously correct, so I ended deciding that I won’t use chocolate every day, and will instead use it on days when I have a specific reason to use it, and will make sure that there will be days when I won’t use it, even if I find myself always wanting to use it. My thought here, is that if my brain is forced to operate at some basic level on a regular basis without the chemical, then when I do use the chemical, I will be able to achieve my usual operation plus a little more, which will ensure that I can always derive some benefit from it. I think this approach should make sense for many chemicals where building up tolerance is a possibility of concern.
Gilch said he didn’t notice any effect when he tried it. I don’t know how much he used, but since I specified an amount in response to one of his questions, I presume he probably used an amount similar to what I would use. I don’t know if he used it in addition to caffeine, or as a replacement. If it was a replacement, that would explain why he didn’t notice any additional stimulation over and above his usual stimulation, but it would still lead to wonder about why he didn’t notice any other effects. One possibility is that the effects are a little bit subtle—not too subtle, since its effects tend to be pretty obvious (in contrast to usual caffeine) for me when I’m on chocolate, but subtle enough that a different person than me might not be as attuned to it, for whatever reason (part of why I say this, is that I find chocolate helps me be more sociable, and this is one of the most obvious effects it has in contrast to caffeine for me, and I care a lot about my ability to be sociable, so it’s hard to slip my notice, but if someone cares less about how they interact with other people, they may overlook this effect; there are other effects, too, but those do tend to be somewhat subtle, though still noticeable)
As far as delivery, I have innovated slightly on my original method. I now often use dark chocolate chips / chunks in addition to drinking the chocolate, I find that pouring a handful, just enough to fit in my mouth, will have a non-trivial effect. Since I found drinking the chocolate straight would irritate my stomach and cause my stool to have a weird consistency, I have started using milk. My recipe is now to take a tall glass, fill it 1/3rd with water, add some (but not necessarily all) of the desired dose of cocoa powder into the glass, microwave it for 20 seconds, stir the liquid, add a little more water and the rest of the cocoa powder, microwave it for 20 more seconds, stir it until there are no chunks, then fill up the rest of the glass with milk. There are probably changes that can be made to the recipe, but I find this at least gets a consistently good outcome. With the milk, it makes my stomach not get irritated, and my stool is less different, though still slightly different, from how it would otherwise be.
On the subject of it making me sociable, I don’t think it’s a coincidence that most of the days that my friends receive texts from me, I have had chocolate on those days. I also seem to write more on days when I have had chocolate. I find chocolate helps me feel that I know what I need to say, and I rarely find myself second-guessing my words when I’m on chocolate, whereas I often have a hard time finding words in the first place without chocolate, and feel less confident about what I say without it. I’ve written a lot on this post alone, and have also messaged a friend today, and have also written a long-ish analysis on a somewhat controversial topic on another website today. Based on the context I say that in, I’m sure you can guess whether I’ve had chocolate today.
Suppose I am interested in finding a program M whose input-output behavior has some property P that I can probabilistically check relatively quickly (e.g. I want to check whether M implements a sparse cut of some large implicit graph). I believe there is some simple and fast program M that does the trick. But even this relatively simple M is much more complex than the specification of the property P.
Now suppose I search for the simplest program running in time T that has property P. If T is sufficiently large, then I will end up getting the program “Search for the simplest program running in time T’ that has property P, then run that.” (Or something even simpler, but the point is that it will make no reference to the intended program M since encoding P is cheaper.)
I may be happy enough with this outcome, but there’s some intuitive sense in which something weird and undesirable has happened here (and I may get in a distinctive kind of trouble if P is an approximate evaluation). I think this is likely to be a useful maximally-simplified example to think about.
I think multi-level search may help here. To the extent that you can get a lower-confidence estimate of P much more quickly, you can budget your total search time such that you examine many programs and then re-examine only the good candidates. If your confidence is linear with time/complexity of evaluation, this probably doesn’t help.
This is interesting to me for two reasons:
[Mainly] Several proposals for avoiding the instrumental policy work by penalizing computation. But I have a really shaky philosophical grip on why that’s a reasonable thing to do, and so all of those solutions end up feeling weird to me. I can still evaluate them based on what works on concrete examples, but things are slippery enough that plan A is getting a handle on why this is a good idea.
In the long run I expect to have to handle learned optimizers by having the outer optimizer instead directly learn whatever the inner optimizer would have learned. This is an interesting setting to look at how that works out. (For example, in this case the outer optimizer just needs to be able to represent the hypothesis “There is a program that has property P and runs in time T’ ” and then do its own search over that space of faster programs.)
In traditional settings, we are searching for a program M that is simpler than the property P. For example, the number of parameters in our model should be smaller than the size of the dataset we are trying to fit if we want the model to generalize. (This isn’t true for modern DL because of subtleties with SGD optimizing imperfectly and implicit regularization and so on, but spiritually I think it’s still fine..)
But this breaks down if we start doing something like imposing consistency checks and hoping that those change the result of learning. Intuitively it’s also often not true for scientific explanations—even simple properties can be surprising and require explanation, and can be used to support theories that are much more complex than the observation itself.
It’s quite plausible that in these cases we want to be doing something other than searching over programs. This is pretty clear in the “scientific explanation” case, and maybe it’s the way to go for the kinds of alignment problems I’ve been thinking about recently.A basic challenge with searching over programs is that we have to interpret the other data. For example, if “correspondence between two models of physics” is some kind of different object like a description in natural language, then some amplified human is going to have to be thinking about that correspondence to see if it explains the facts. If we search over correspondences, some of them will be “attacks” on the human that basically convince them to run a general computation in order to explain the data. So we have two options: (i) perfectly harden the evaluation process against such attacks, (ii) try to ensure that there is always some way to just directly do whatever the attacker convinced the human to do. But (i) seems quite hard, and (ii) basically requires us to put all of the generic programs in our search space.
It’s also quite plausible that we’ll just give up on things like consistency conditions. But those come up frequently enough in intuitive alignment schemes that I at least want to give them a fair shake.
The speed prior is calibrated such that this never happens if the learned optimizer is just using brute force—if it needs to search over 1 extra bit then it will take 2x longer, offsetting the gains.
That means that in the regime where P is simple, the speed prior is the “least you can reasonably care about speed”—if you care even less, you will just end up pushing the optimization into an inner process that is more concerned with speed and is therefore able to try a bunch of options.
(However, this is very mild, since the speed prior cares only a tiny bit about speed. Adding 100 bits to your program is the same as letting it run 2^100 times longer, so you are basically just optimizing for simplicity.)
To make this concrete, suppose that I instead used the kind-of-speed prior, where taking 4x longer is equivalent to using 1 extra bit of description complexity. And suppose that P is very simple relative to the complexities of the other objects involved. Suppose that the “object-level” program M has 1000 bits and runs in 2^2000 time, so has kind-of-speed complexity 2000 bits. A search that uses the speed prior will be able to find this algorithm in 2^3000 time, and so will have a kind-of-speed complexity of 1500 bits. So the kind-of-speed prior will just end up delegating to the speed prior.
The speed prior still delegates to better search algorithms though. For example, suppose that someone is able to fill in a 1000 bit program using only 2^500 steps of local search. Then the local search algorithm has speed prior complexity 500 bits, so will beat the object-level program. And the prior we’d end up using is basically “2x longer = 2 more bits” instead of “2x longer = 1 more bit,” i.e. we end up caring more about speed because we delegated.
The actual limit on how much you care about speed is given by whatever search algorithms work best. I think it’s likely possible to “expose” what is going on to the outer optimizer (so that it finds a hypothesis like “This local search algorithm is good” and then uses it to find an object-level program, rather than directly finding a program that bundles both of them together). But I’d guess intuitively that it’s just not even meaningful to talk about the “simplest” programs or any prior that cares less about speed than the optimal search algorithm.
Just spit-balling here, but it seems that T’ will have to be much smaller than T in order for the inner search to run each candidate program (multiple times for statistical guarantees) up to T’ to check whether it satisfies P. Because of this, you should be able to just run the inner search and see what it comes up with (which, if it’s yet another search, will have an even shorter run time) and pretty quickly (relative to T) find the actual M.
Related to half-assing things with all you’ve got, I’ve noticed that there’s often a limit to how much you can succeed at a given task. If you write a paper for a class worth an A+ it’s certainly still possible to improve the paper, just not in the context of the course.
The Buddhist teacher Chogyam Trungpa once put this as “You will never be decorated by your guru.” In the presence of a master, your development of your art will never blow their mind and leave them deeply impressed—and if it did, it would only be a signal that you’re in need of a different master. It cannot be another way.
Thus, we should recognize success by the context we expect it to appear in, or the person who will be measuring it, and calibrate accordingly.
Good observation! The converse holds too—we should change context for those things we want to do better than the current mechanisms can measure.
[Epistemic status: highly speculative]
Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.
[ETA: Apparently this was misleading; I think it only applied to one company, Alienware, and it was because they didn’t get certification, unlike the other companies.]
In my post about long AI timelines, I predicted that we would see attempts to regulate AI. An easy path for regulators is to target power-hungry GPUs and distributed computing in an attempt to minimize carbon emissions and electricity costs. It seems regulators may be going even faster than I believed in this case, with new bans on high performance personal computers now taking effect in six US states. Are bans on individual GPUs next?