it’s notable that ilya only caved very late into the incident, when it was all but certain that the board had lost
leogao
like, suppose i anecdotally noticed a few people last year be visibly confused when i said the phrase AGI in normal conversation last year, and then this year i noticed that many fewer people were visibly confused by AGI. then, this would tell me almost nothing about whether name-recognition of AGI increased or decreased; at n=10, it is nearly impossible to say anything whatsoever.
my mental model of how a pop triggers a broader crash is something like: a lot of people are taking money and investing it into AI stuff, directly (by investing in openai, nvidia, tsmc, etc) or indirectly (by investing in literally anything else; like, cement companies that make a lot of money by selling cement to build datacenters or whatever). this includes VCs, sovereign wealth funds, banks, etc. if it suddenly turned out that the datacenters and IP were worth a lot less than they thought it was, their equity (or debt) ownership is suddenly worth a lot less than they thought it was, and they may become insolvent. and lots of financial institutions becoming insolvent is pretty bad.
running the agi survey really reminded me just how brutal statistical significance is, and how unreliable anecdotes are. even setting aside sampling bias of anecdotes, the sheer sample size you need to answer a question like “do more people this year know what agi is than last year” is kind of depressing—you need like 400 samples for each year just to be 80% sure you’d notice a 10 percentage point increase even if it did exist, and even if there was no real effect you’d still think there was one 5% of the time. this makes me a lot more bearish on vibes in general.
to be more precise, I mean worthless for decreasing p(doom)
some reasons why it matters
all effects that route through longer timelines (allocating more to upskilling oneself and others, longer term bets, not expecting agi to look like current models, aggressiveness of distributing funds to alignment, etc)
whether to pursue an aggressive (stock-heavy) or conservative (bond-heavy) investment strategy. if there is an ai bubble pop, it will likely bring the entire economy into a recession.
how much money to save as runway; should you be taking advantage of the bubble to grab as much cash as possible before the music stops, or should you be trying to dispose of all of your money before the singularity makes it worthless?
for lab employees: how much lab equity to sell/hold?
how much to emphasize “agi soon” in public comms, or in conversations with policymakers? (during a bubble pop, having predicted agi soon will probably be even more negatively viewed than merely having been wrong about timelines with no pop)
if there is a bubble and it pops, sentiment around agi will flip from inevitability to impossibility. many people will not be epistemically strong enough to resist the urge to conform. being aware of the hype cycle can help free yourself from it and avoid both over and under exuberance.
I think people in these parts are not taking sufficiently seriously the idea that we might be in an AI bubble. this doesn’t necessarily mean that AI isn’t going to be a huge deal—just because there was a dot com bubble doesn’t mean the Internet died—but it does very substantially affect the strategic calculus in many ways.
I think another reason why people procrastinate is that it makes each minute spent right before the deadline both obviously high value on net and resulting in immediate payoff. this makes the decision to put in effort in each moment really easy—obviously it makes sense to spend a minute working on something that will make a big impact on tomorrow. whereas each minute long before the deadline has longer time till payoff, and if you already put in a ton of work early on, then the minutes right before the deadline have lower marginal value because of diminishing returns. so this creates a perverse incentive to end-load the effort
to be clear, this post is just my personal opinion, and is not necessarily representative of the beliefs of the openai interpretability team as a whole
the difference between activation sparsity, circuit sparsity, and weight sparsity
activation sparsity enforces that features activate sparsely—every feature activates only occasionally.
circuit sparsity enforces that the connections between features is sparse—most features are not connected to most other features.
weight sparsity enforces that most of the weights are zero. weight sparsity naturally implies circuit sparsity if we interpret the neurons and residual channels of the resulting model as the features.
weight sparsity is not the only way to enforce circuit sparsity—for example, Jacobian SAEs also attempt to enforce circuit sparsity. the big advantage of weight sparsity is that it’s a very straightforward way to be sure that the interactions are definitely sparse and have no interference weights. unfortunately, it comes at a terrible cost—the resulting models are very expensive to train.
although in some sense the circuit sparsity paper is an interpretable pretraining paper, this is not the framing I’m most excited about. if anything, I think of interpretable pretraining as a downside of our approach, that we put up with because it makes the circuits really clean.
An Ambitious Vision for Interpretability
most of the time the person being recognized is not me
I find it anthropologically fascinating how at this point neurips has become mostly a summoning ritual to bring all of the ML researchers to the same city at the same time.
nobody really goes to talks anymore—even the people in the hall are often just staring at their laptops or phones. the vast majority of posters are uninteresting, and the few good ones often have a huge crowd that makes it very difficult to ask the authors questions.
increasingly, the best parts of neurips are the parts outside of neurips proper. the various lunches, dinners, and parties hosted by AI companies and friend groups (and increasingly over the past few years, VCs) are core pillars of the social scene, and are where most of the socializing happens. there are so many that you can basically spend your entire neurips not going to neurips at all. at dinnertime, there are literally dozens of different events going on at the same time.
multiple unofficial workshops, entirely unaffiliated with neurips, will schedule themselves to be in town at the same time; they will often have a way higher density of interesting people and ideas.
if you stand around in the hallways and chat in a group long enough, eventually someone walking by will recognize someone in the group and join in, which repeats itself until the group get so big that it undergoes mitosis into smaller groups.
if you’re not already going to some company event, finding a restaurant at lunch or dinner time can be very challenging. every restaurant in a several mile radius will be either booked for a company event, or jam packed with people wearing neurips badges.
i’m going to rerun the neurips agi experiment this year. place your bets on what fraction of people at neurips this year know what the acronym AGI stands for!
idk, it’s unclear to me that computers and the Internet are more subtle than cars or radios. it’s also, 50 year old americans today have seen the fall of the soviet union, the creation of the european union, enormous advances in civil rights, 9/11, the 2008 crash, covid, the invasion of ukraine, etc. this isn’t exactly WWII level but also nowhere near a static stable world.
thoughts on lemborexant
pros: if you take it, you will fall asleep 30-60 minutes later. nothing else I’ve tried has been as reliable at making sure I definitely fall asleep, and as far as I can tell, it doesn’t destroy my sleep quality. especially at 10mg, you can feel it knocking you out, and you basically can’t power through it even if you want to. it’s a bit scary but all powerful sleep drugs are at least a bit scary and often a lot more scary. I generally take 5mg instead.
cons: it doesn’t do anything to keep you asleep; if your body doesn’t really want to sleep, you will wake up 2 hours later fully alert. it also doesn’t do anything to shift your sleep schedule. these facts combined mean that if you try to use lemborexant for jet lag / shifting sleep earlier, then your life will suck indefinitely until you stop using lemborexant. my current recipe is to only use lemborexant when it’s near enough to my normal bedtime, and I use melatonin 3 hours before bed to slowly move sleep schedule earlier (later requires no special effort)
(potentially this also means lemborexant can be used to get nice 2 hour daytime naps? I have enough fear of
godsleep drugs that I feel hesitant to try any kind of hack like this)(not medical advice. not a doctor, and even if I was a doctor I’m not your doctor, and even if I was your doctor I wouldn’t be communicating to you via lesswrong shortforms)
fwiw, I’m pessimistic that you will actually be able to make big compute efficiency improvements even by fully understanding gpt-n. or at least, for an equivalent amount of effort, you could have improved compute efficiency vastly more by just doing normal capabilities research. my general belief is that the kind of understanding you want for improving compute efficiency is at a different level of abstraction than the kind of understanding you want for getting a deep understanding of generalization properties.
this feels like a subtweet of our recent paper on circuit sparsity. I would have preferred a direct response to our paper (or any other specific paper/post/person), rather than a dialogue against a hypothetical interlocutor.
I think this post is unfairly dismissive of the idea that we can guess aspects of the true ontology and iterate empirically towards it. it makes it sound like if you have to guess a lot of things right about the true ontology before you can make any empirical progress at all. this is a reasonable view of the world, but I think evidence so far rules out the strongest possible version of this claim.
SAEs are basically making the guess that the true ontology should activate kinda sparsely. this is clearly not enough to pin down the true ontology, and obviously at some point activation sparsity stops being beneficial and starts hurting. but SAE features seem closer to the true ontology than the neurons are, even if they are imperfect. this should be surprising if you think that you need to be really correct about the true ontology before you can make any progress! making the activations sparse is this kind of crude intervention, and you can imagine a world where SAEs don’t find anything interesting at all because it’s much easier to just find random sparse garbage, and so you need more constraints before you pin down something even vaguely reasonable. but we clearly don’t live in that world.
our circuit sparsity work adds another additional constraint: we also enforce that the interactions between features are sparse. (I think of the part where we accomplish this by training new models from scratch as an unfortunate side effect; it just happens to be the best way to enforce this constraint.) this is another kind of crude intervention, but our main finding is that this gets us again slightly closer to the true concepts; circuits that used to require a giant pile of SAE features connected in an ungodly way can now be expressed simply. this again seems to suggest that we have gotten closer to the true features.
if you believe in natural abstractions, then it should at least be worth trying to dig down this path and slowly add more constraints, seeing whether it makes the model nicer or less nice, and iterating.
fwiw, I think the 100-1000x number is quite pessimistic, in that we didn’t try very hard to make our implementation efficient, we were entirely focused on making it work at all. while I think it’s unlikely our method will ever reach parity with frontier training methods, it doesn’t seem crazy that we could reduce the gap a lot.
and I think having something 100x behind the frontier (i.e one GPT worth) is still super valuable for developing a theory of intelligence! like I claim it would be super valuable if aliens landed and gave us an interpretable GPT-4 or even GPT-3 without telling us how to make our own or scale it up.
sure, you can notice extremely large effect sizes through vibes. but the claim is that for even “smaller” effect sizes (like, tens of percentage points, e.g 50->75%), you need pretty big sample sizes. obviously 0->100% doesn’t need a very large sample size.
I agree that chatgpt obviously has lots of name recognition but I do also separately think chatgpt has less name recognition than you might guess. I predict that only 85% of Americans would get a multiple choice question right about what kind of app chatgpt is (choices: artificial intelligence; social media; messaging and calling; online dating). whereas a control question about e.g Google will get like 97% or whatever the lizardman constant dictates