Ah, okay, some of those seem to me like they’d change things quite a lot. In particular, a week’s notice is usually possible for major plans (going out of town, a birthday or anniversary, concert that night only, etc.) and being able to skip books that don’t interest one also removes a major class of reason not to go. The ones I can still see are (1) competing in-town plans, (2) illness or other personal emergency, and (3) just don’t feel like going out tonight. (1) is what you’re trying to avoid, of course. On (3) I can see your opinion going either way. It does legitimately happen sometimes that one is too tired for whatever plans one had to seem appealing, but it’s legitimate to say that if that happens to you so often that you mind the cost of the extra rounds of drinks you end up buying, maybe you’re not a great member for that club. (2) seems like a real problem, and I’m gonna guess that you actually wouldn’t make people pay for drinks if they said they missed because they had COVID, there was a death in the family, etc.?
Kenoubi
Reads like a ha ha only serious to me anyway.
I started a book club in February 2023 and since the beginning I pushed for the rule that if you don’t come, you pay for everyone’s drinks next time.
I’m very surprised that in that particular form that worked, because the extremely obvious way to postpone (or, in the end, avoid) the penalty is to not go next time either (or, in the end, ever again). I guess if there’s agreement that pretty close to 100% attendance is the norm, as in if you can only show up 60% of the time don’t bother showing up at all, then it could work. That would make sense for something like a D&D or other tabletop RPG session, or certain forms of competition like, I dunno, a table tennis league, where someone being absent even one time really does cause quite significant harm to the event. But it eliminates a chunk of the possible attendees entirely right from the start, and I imagine would make the members feel quite constrained by the club, particularly if it doesn’t appear to be really required by the event itself. And those don’t seem good for getting people to show up, either.
That’s not to say the analogy overall doesn’t work. I’d imagine requiring people to buy a ticket to go to poker night, with that ticket also covering the night’s first ante / blind, does work to increase attendance, and for the reasons you state (and not just people being foolish about “sunk costs”). It’s just payment of the penalty after the fact, and presumably with no real enforcement, that I don’t get. And if you say it works for your book club, I guess probably it does and I’m wrong somehow. But in any case, I notice that I am confused.
I think this is a very important distinction. I prefer to use “maximizer” for “timelessly” finding the highest value of an objective function, and reserve “optimizer” for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that “optimize” and “optimizer” are sometimes used the way you’re saying, as basically synonymous with “maximize” / “maximizer”, and I could retreat to calling the inherently temporal thing I’m talking about an “improver” (or an “improvement process” if I don’t want to reify it), but this actually seems less likely to be quickly understood, and I don’t think it’s all that useful for “optimize” and “maximize” to mean exactly the same thing.
(There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren’t IMO thereby not actually optimizers, even though they aren’t maximizers in any useful sense.)
Good post; this has way more value per minute spent reading and understanding it than the first 6 chapters of Jaynes, IMO.
There were 20 destroyed walls and 37 intact walls, leading to 10 − 3×20 − 1×37 = 13db
This appears to have an error; 10 − 3×20 − 1×37 = 10 − 60 − 37 = −87, not 13. I think you meant for the 37 to be positive, in which case 10 − 60 + 37 = −13, and the sign is reversed because of how you phrased which hypothesis the evidence favors (although you could also just reverse all the signs if you want the arithmetic to come out perfectly).
Also, nitpick, but
and every 3 db of evidence increases the odds by a factor of 2
should have an “about” in it, since 10^(3/10) is ~1.99526231497, not 2. (3db ≈ 2× is a very useful approximation, and implied by 10^3 ≈ 2^10, but encountering it indirectly like this would be very confusing to anyone who isn’t already familiar with it.)
I re-read this, and wanted to strong-upvote it, and was disappointed that I already had. This is REALLY good. Way better than the thing it parodies (which was already quite good). I wish it were 10x as long.
The way that LLM tokenization represents numbers is all kinds of stupid. It’s honestly kind of amazing to me they don’t make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe “give the LLM a calculator” is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator’s result. That apparently didn’t happen or didn’t work properly in this case.
Thanks for your reply. “70% confidence that… we have a shot” is slightly ambiguous—I’d say that most shots one has are missed, but I’m guessing that isn’t what you meant, and that you instead meant 70% chance of success.
70% feels way too high to me, but I do find it quite plausible that calling it a rounding error is wrong. However, with a 20 year timeline, a lot of people I care about will almost definitely still die, who could have not died if death were Solved, which group with very much not negligible probability includes myself. And as you note downthread, the brain is a really deep problem with prosaic life extension. Overall I don’t see how anything along these lines can be fast enough and certain enough to be a crux on AI for me, but I’m glad people are working on it more than is immediately apparent to the casual observer. (I’m a type 1 diabetic and would have died at 8 years old if I’d lived before insulin was discovered and made medically available, so the value of prosaic life extension is very much not lost on me.)
P.S. Having this set of values and beliefs is very hard on one’s epistemics. I think it’s a writ-large version of what Eliezer has stated as “thinking about AI timelines is bad for one’s epistemics”. Here are some examples:
(1) Although I’ve never been at all tempted by e/acc techno-optimism (on this topic specifically) / alignment isn’t a problem at all / alignment by default, boy, it sure would be nice to hear about a strategy for alignment that didn’t sound almost definitely doomed for one reason or another. Even though Eliezer can (accurately, IMO) shoot down a couple of new alignment strategies before getting out of bed in the morning. So far I’ve never found myself actually doing it, but it’s impossible not to notice that if I just weren’t as good at finding problems or as willing to acknowledge problems found by others, then some alignment strategies I’ve seen might have looked non-doomed, at least at first...
(2) I don’t expect any kind of deliberate slowdown of making AGI to be all that effective even on its own terms, with the single exception of indiscriminate “tear it all down”, which I think is unlikely to get within the Overton window, at least in a robust way that would stop development even in countries that don’t agree (forcing someone to sabotage / invade / bomb them). Although such actions might buy us a few years, it seems overdetermined to me that they still leave us doomed, and in fact they appear to cut away some of the actually-helpful options that might otherwise be available (the current crop of companies attempting to develop AGI definitely aren’t the least concerned with existential risk of all actors who’d develop AGI if they could, for one thing). Compute thresholds of any kind, in particular, I expect to lead to much greater focus on doing more with the same compute resources rather than doing more by using more compute resources, and I expect there’s a lot of low-hanging fruit there since that isn’t where people have been focusing, and that the thresholds would need to decrease very much very fast to actually prevent AGI, and decreasing the thresholds below the power of a 2023 gaming rig is untenable. I’m not aware of any place in this argument where I’m allowing “if deliberate slowdowns were effective on their own terms, I’d still consider the result very bad” to bias my judgment. But is it? I can’t really prove it isn’t...
(3) The “pivotal act” framing seems unhelpful to me. It seems strongly impossible to me for humans to make an AI that’s able to pass strawberry alignment that has so little understanding of agency that it couldn’t, if it wanted to, seize control of the world. (That kind of AI is probably logically possible, but I don’t think humans have any real possibility of building one.) An AI that can’t even pass strawberry alignment clearly can’t be safely handed “melt all the GPUs” or any other task that requires strongly superhuman capabilities (and if “melt all the GPUs” were a good idea, and it didn’t require strongly superhuman capabilities, then people should just directly do that). So, it seems to me that the only good result that could come from aiming for a pivotal act would be that the ASI you’re using to execute it is actually aligned with humans and “goes rogue” to implement our glorious transhuman future; and it seems to me that if that’s what you want, it would be better to aim for that directly rather than trying to fit it through this weirdly-shaped “pivotal act” hole.
But… if this is wrong, and a narrow AGI could safely do a pivotal act, I’d very likely consider the resulting world very bad anyway, because we’d be in a world where unaligned ASI has been reliably prevented from coming into existence, and if the way that was done wasn’t by already having aligned ASI, then by far the obvious way for that to happen is to reliably prevent any ASI from coming into existence. But IMO we need aligned ASI to solve death. Does any of that affect how compelling I find the case for narrow pivotal-act AI on its own terms? Who knows...
I agree with the Statement. As strongly as I can agree with anything. I think the hope of current humans achieving… if not immortality, then very substantially increased longevity… without AI doing the work for us, is at most a rounding error. And ASI that was even close to aligned, that found it worth reserving even a billionth part of the value of the universe for humans, would treat this as the obvious most urgent problem and solve death pretty much if there’s any physically possible way of doing so. And when I look inside, I find that I simply don’t care about a glorious transhumanist future that doesn’t include me or any of the particular other humans I care about. I do somewhat prefer being kind / helpful / benificent to people I’ve never met, very slightly prefer that even for people who don’t exist yet, but it’s far too weak a preference to trade off against any noticeable change to the odds of me and everyone I care about dying. If that makes me a “sociopath” in the view of someone or other, oh well.
I’ve been a supporter of MIRI, AI alignment, etc. for a long time, not because I share that much with EA in terms of values, but because the path to the future having any value has seemed for a long time to route through our building aligned ASI, which I consider as hard as MIRI does. But when the “pivotal act” framing started being discussed, rather than actually aligning ASI, I noticed a crack developing between my values and MIRI’s, and the past year with advocacy for “shut it all down” and so on has blown that crack wide open. I no longer feel like a future I value has any group trying to pursue it. Everyone outside of AI alignment is either just confused and flailing around with unpredictable effects, or is badly mistaken and actively pushing towards turning us all into paperclips, but those in AI alignment are either extremely unrealistically optimistic about plans that I’m pretty sure, for reasons that MIRI has argued, won’t work; or, like current MIRI, they say things like that I should stake my personal presence in the glorious transhumanist future on cryonics (and what of my friends and family members who I could never convince to sign up? What of the fact that, IMO, current cryonics practice probably doesn’t even prevent info-theoretical death, let alone give one a good shot at actually being revived at some point in the future?)
I happen to also think that most plans for preventing ASI from happening soon, that aren’t “shut it all down” in a very indiscriminate way, just won’t work—that is, I think we’ll get ASI (and probably all die) pretty soon anyway. And I think “shut it all down” is very unlikely to be societally selected as our plan for how to deal with AI in the near term, let alone effectively implemented. There are forms of certain actors choosing to go slower on their paths to ASI that I would support, but only if those actors are doing that specifically to attempt to solve alignment before ASI, and only if it won’t slow them down so much that someone else just makes unaligned ASI first anyway. And of course we should forcibly stop anyone who is on the path to making ASI without even trying to align it (because they’re mistaken about the default result of building ASI without aligning it, or because they think humanity’s extinction is good actually), although I’m not sure how capable we are of stopping them. But I want an organization that is facing up to the real, tremendous difficulty of making the first ASI aligned, and trying to do that anyway, because no other option actually has a result that they (or I) find acceptable. (By the way, MIRI is right that “do your alignment homework for you” is probably the literal worst possible task to give to one’s newly developed AGI, so e.g. OpenAI’s alignment plan seems deeply delusional to me and thus OpenAI is not the org for which I’m looking.)
I’d like someone from MIRI to read this. If no one replies here, I may send them a copy, or something based on this.
Yes he should disclose somewhere that he’s doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.
Yes and no. The main mode of harm we generally imagine is to the person deepfaked. However, nothing prevents the main harm in a particular incident of harmful deepfaking from being to the people who see the deep fake and believe the person depicted actually said and did the things depicted.
That appears to be the implicit allegation here—that recipients might be deceived into thinking Adams actually speaks their language (at least well enough to record a robocall). Or at least, if that’s not it, then I don’t get it either.
I’ve seen a lot of attempts to provide “translations” from one domain-specific computer language to another, and they almost always have at least one of these properties:
They aren’t invertible, nor “almost invertible” via normalization
They rely on an extension mechanism intentionally allowing the embedding of arbitrary data into the target language
They use hacks (structured comments, or even uglier encodings if there aren’t any comments) to embed arbitrary data
They require the source of the translation to be normalized before (and sometimes also after, but always before) translation
(2) and (3) I don’t think are super great here. If there are blobs of data in the translated version that I can’t understand, but that are necessary for the original sender to interpret the statement, it isn’t clear how I can manipulate the translated version while keeping all the blobs correct. Plus, as the recipient, I don’t really want to be responsible for safely maintaining and manipulating these blobs.
(1) is clearly unworkable (if there’s no way to translate back into the original language, there can’t be a conversation). That leaves 4. 4 requires stripping anything that can’t be represented in an invertible way before translating. E.g., if I have lists but you can only understand sets, and assuming no nesting, I may need to sort my list and remove duplicates from it as part of normalization. This deletes real information! It’s information that the other language isn’t prepared to handle, so it needs to be removed before sending. This is better than sending the information in a way that the other party won’t preserve even when performing only operations they consider valid.
I think this applies to the example from the post, too—how would I know whether certain instances of double negation or provability were artifacts that normalization is supposed to strip, or just places where someone wanted to make a statement about double negation or provability?
Malbolge? Or something even nastier in a similar vein, since it seems like people actually figured out (with great effort) how to write programs in Malbolge. Maybe encrypt all the memory after every instruction, and use a real encryption algorithm, not a lookup table.
Some points which I think support the plausibility of this scenario:
(1) EY’s ideas about a “simple core of intelligence”, how chimp brains don’t seem to have major architectural differences from human brains, etc.
(2) RWKV vs Transformers. Why haven’t Transformers been straight up replaced by RWKV at this point? Looks to me like potentially huge efficiency gains being basically ignored because lab researchers can get away with it. Granted, affects efficiency of inference but not training AFAIK, and maybe it wouldn’t work at the 100B+ scale, but it certainly looks like enough evidence to do the experiment.
(3) Why didn’t researchers jump straight to the end on smaller and smaller floating point (or fixed point) precision? Okay, sure, “the hardware didn’t support it” can explain some of it, but you could still do smaller scale experiments to show it appears to work and get support into the next generation of hardware (or at some point even custom hardware if the gains are huge enough) if you’re serious about maximizing efficiency.
(4) I have a few more ideas for huge efficiency gains that I don’t want to state publicly. Probably most of them wouldn’t work. But the thing about huge efficiency gains is that if they do work, doing the experiments to find that out is (relatively) cheap, because of the huge efficiency gains. I’m not saying anyone should update on my claim to have such ideas, but if you understand modern ML, you can try to answer the question “what would you try if you wanted to drastically improve efficiency” and update on the answers you come up with. And there are probably better ideas than those, and almost certainly more such ideas. I end up mostly thinking lab researchers aren’t trying because it’s just not what they’re being paid to do, and/or it isn’t what interests them. Of course they are trying to improve efficiency, but they’re looking for smaller improvements that are more likely to pan out, not massive improvements any given one of which probably won’t work.
Anyway, I think a world in which you could even run GPT-4 quality inference (let alone training) on a current smartphone looks like a world where AI is soon going to determine the future more than humans do, if it hasn’t already happened at that point… and I’m far from certain this is where compute limits (moderate ones, not crushingly tight ones that would restrict or ban a lot of already-deployed hardware) would lead, but it doesn’t seem to me like this possibility is one that people advocating for compute limits have really considered, even if only to say why they find it very unlikely. (Well, I guess if you only care about buying a moderate amount of time, compute limits would probably do that even in this scenario, since researchers can’t pivot on a dime to improving efficiency, and we’re specifically talking about higher-hanging efficiency gains here.)
I certainly don’t think labs will only try to improve algorithms if they can’t scale compute! Rather, I think that the algorithmic improvements that will be found by researchers trying to figure out how to improve performance given twice as much compute as the last run won’t be the same ones found by researchers trying to improve performance given no increase in compute.
One would actually expect the low hanging fruit in the compute-no-longer-growing regime to be specifically the techniques that don’t scale, since after all, scaling well is an existing constraint that the compute-no-longer-growing regime removes. I’m not talking about those. I’m saying it seems reasonably likely to me that the current techniques producing state of the art results are very inefficient, and that a newfound focus on “how much can you do with N FLOPs, because that’s all you’re going to get for the foreseeable future” might give fundamentally more efficient techniques that turn out to scale better too.
It’s certainly possible that with a compute limit, labs will just keep doing the same “boring” stuff they already “know” they can fit into that limit… it just seems to me like people in AI safety advocating for compute limits are overconfident in that. It seems to me that the strongest plausible version of this possibility should be addressed by anyone arguing in favor of compute limits. I currently weakly expect that compute limits would make things worse because of these considerations.
Slowing compute growth could lead to a greater focus on efficiency. Easy to find gains in efficiency will be found anyway, but harder to find gains in efficiency currently don’t seem to me to be getting that much effort, relative to ways to derive some benefit from rapidly increasing amounts of compute.
If models on the capabilities frontier are currently not very efficient, because their creators are focused on getting any benefit at all from the most compute that is practically available to them now, restricting compute could trigger an existing “efficiency overhang”. If (some of) the efficient techniques found are also scalable (which some and maybe most won’t be, to be sure), then if larger amounts of compute do later become available, we could end up with greater capabilities at the time a certain amount of compute becomes available, relative to the world where available compute kept going up too smoothly to incentivize a focus on efficiency.
This seems reasonably likely to me. You seem to consider this negligibly likely. Why?
I can actually sort of write the elevator pitch myself. (If not, I probably wouldn’t be interested.) If anything I say here is wrong, someone please correct me!
Non-realizability is the problem that none of the options a real-world Bayesian reasoner is considering is a perfect model of the world. (It actually information-theoretically can’t be, if the reasoner is itself part of the world, since it would need a perfect self-model as part of its perfect world-model, which would mean it could take its own output as an input into its decision process, but then it could decide to do something else and boom, paradox.) One way to explain the sense in which the models of real-world reasoners are imperfect is that, rather than a knife-edge between bets they’ll take and bets on which they’ll take the other side, one might, say, be willing to take a bet that pays out 9:1 that it’ll rain tomorrow, and a bet that pays out 1:3 if it doesn’t rain tomorrow, but for anything in between, one wouldn’t be willing to take either side of the bet. A lot of important properties of Bayesian reasoning depend on realizability, so this is a serious problem.
Infra-Bayesianism purports to solve this by replacing the single probability distribution maintained by an ideal Bayesian reasoner by a certain kind of set of probability distributions. As I understand it, this is done in a way that’s “compatible with Bayesianism” in the sense that if there were only one probability distribution in your set, it would act like regular Bayesianism, but in general the thing that corresponds to a probability is instead the minimum of the probability across all the probability distributions in your set. This allows one to express things like “I’m at least 10% confident it’ll rain tomorrow, and at least 75% confident it won’t rain tomorrow, but if you ask me whether it’s 15% or 20% likely to rain tomorrow, I just don’t know.”
The case in which this seems most obviously useful to me is adversarial. Those offering bets should—if they’re rational—be systematically better informed about the relevant topics. So I should (it seems to me) have a range of probabilities within which the fact that you’re offering the bet is effectively telling me that you appear to be better informed than I am, and therefore I shouldn’t bet. However, I believe Infra-Bayesianism is intended to more generally allow agents to just not have opinions about every possible question they could be asked, but only those about which they actually have some relevant information.
Let’s say that I can understand neither the original IB sequence, nor your distillation. I don’t have the prerequisites. (I mean, I know some linear algebra—that’s hard to avoid—but I find topology loses me past “here’s what an open set is” and I know nothing about measure theory.)
I think I understand what non-realizability is and why something like IB would solve it. Is all the heavy math actually necessary to understand how IB does so? I’m very tempted to think of IB as “instead of a single probability distribution over outcomes, you just keep a (convex[1]) set of probability distributions instead, and eliminate any that you see to be impossible, and choose according to the minimum of the expected value of the ones you have left”. But I think this is wrong, just like “a quantum computer checks all the possible answers in parallel” is wrong (if that were right, a classical algorithm in P would directly translate into a quantum algorithm in NP, right? I still don’t actually get quantum computation, either.) And I don’t know why it’s wrong or what it’s missing.
[1] That just means that for any and in the set, and any , is also in the set, right?
Is there anything better I can do to understand IB than first learn topology and measure theory (or other similarly sized fields) in a fully general way? And am I the only person who’s repeatedly bounced off attempts to present IB, but for some reason still feels like maybe there’s actually something there worth understanding?
I was wondering if anyone would mention that story in the comments. I definitely agree that it has very strong similarities in its core idea, and wondered if that was deliberate. I don’t agree with any implications (which you may or may not have intended) that it’s so derivative as to make not mentioning Omelas dishonest, though, and independent invention seems completely plausible to me.
Edited to add: although the similar title does incline rather strongly to Omelas being an acknowledged source.
Leaving an unaligned force (humans, here) in control of 0.001% of resources seems risky. There is a chance that you’ve underestimated how large the share of resources controlled by the unaligned force is, and probably more importantly, there is a chance that the unaligned force could use its tiny share of resources in some super-effective way that captures a much higher fraction of resources in the future. The actual effect on the economy of the unaligned force, other than the possibility of its being larger than thought or being used as a springboard to gain more control, seems negligible, so one should still expect full extermination unless there’s some positive reason for the strong force to leave the weak force intact.
Humans do have such reasons in some cazes (we like seeing animals, at least in zoos, and being able to study them, etc.; same thing for the Amish; plus we also at least sometimes place real value on the independence and self-determination of such beings and cultures), but there would need to be an argument made that AI will have such positive reasons (and a further argument why the AIs wouldn’t just “put whatever humans they wanted to preserve” in “zoos”, if one thinks that being in a zoo isn’t a great future). Otherwise, exterminating humans would be trivially easy with that large of a power gap. Even if there are multiple ASIs that aren’t fully aligned with one another, offense is probably easier than defense; if one AI perceives weak benefits to keeping humans around, but another AI perceives weak benefits to exterminating us, I’d assume we get exterminated and then the 2nd AI pays some trivial amount to the 1st for the inconvenience. Getting AI to strongly care about keeping humans around is, of course, one way to frame the alignment problem. I haven’t seen an argument that this will happen by default or that we have any idea how to do it; this seems more like an attempt to say it isn’t necessary.