I think a very common problem in alignment research today is that people focus almost exclusively on a specific story about strategic deception/scheming, and that story is a very narrow slice of the AI extinction probability mass. At some point I should probably write a proper post on this, but for now here are few off-the-cuff example AI extinction stories which don’t look like the prototypical scheming story. (These are copied from a Facebook thread.)
Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren’t smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
Perhaps someone trains a STEM-AGI, which can’t think about humans much at all. In the course of its work, that AGI reasons that an oxygen-rich atmosphere is very inconvenient for manufacturing, and aims to get rid of it. It doesn’t think about humans at all, but the human operators can’t understand most of the AI’s plans anyway, so the plan goes through. As an added bonus, nobody can figure out why the atmosphere is losing oxygen until it’s far too late, because the world is complicated and becomes more so with a bunch of AIs running around and no one AI has a big-picture understanding of anything either (much like today’s humans have no big-picture understanding of the whole human economy/society).
People try to do the whole “outsource alignment research to early AGI” thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they’re already on the more-powerful next gen, so it’s too late.
The classic overnight hard takeoff: a system becomes capable of self-improving at all but doesn’t seem very alarmingly good at it, somebody leaves it running overnight, exponentials kick in, and there is no morning.
(At least some) AGIs act much like a colonizing civilization. Plenty of humans ally with it, trade with it, try to get it to fight their outgroup, etc, and the AGIs locally respect the agreements with the humans and cooperate with their allies, but the end result is humanity gradually losing all control and eventually dying out.
Perhaps early AGI involves lots of moderately-intelligent subagents. The AI as a whole mostly seems pretty aligned most of the time, but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight. (Think cancer, but more agentic.)
Perhaps the path to superintelligence looks like scaling up o1-style runtime reasoning to the point where we’re using an LLM to simulate a whole society. But the effects of a whole society (or parts of a society) on the world are relatively decoupled from the things-individual-people-say-taken-at-face-value. For instance, lots of people talk a lot about reducing poverty, yet have basically-no effect on poverty. So developers attempt to rely on chain-of-thought transparency, and shoot themselves in the foot.
Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it’s relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that’s mostly a bad thing; it’s a form of streetlighting.
Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you’re not confident your list is complete.
True, but Buck’s claim is still relevant as a counterargument to my claim about memetic fitness of the scheming story relative to all these other stories.
This is an interesting point. I disagree that scheming vs these ideas you mention is much of a ‘streetlighting’ case. I do, however, have my own fears that ‘streetlighting’ is occurring and causing some hard-but-critical avenues of risk to be relatively neglected.
[Edit: on further thought, I think this might not just be a “streetlighting”effect, but also a “keeping my hands clean” effect. I think it’s more tempting, especially for companies, to focus on harms that could plausibly be construed as being their fault. It’s my impression that, for instance, employees of a given company might spend a disproportionate amount of time thinking about how to keep their company’s product from harming people vs the general class of products from harming people. Also, less inclined to think about harm which could be averted via application of their product. This is additional reason for concern that having the bulk of AI safety work being funded by / done in AI companies will lead to correlated oversights.]
My concerns that I think are relatively neglected in AI safety discourse are mostly related to interactions with incompetent or evil humans. Good alignment and control techniques don’t do any good if someone opts not to use them in some critical juncture.
Some potential scenarios:
If AI is very powerful, and held in check tenuously by fragile control systems, it might be released from control by a single misguided human or some unlucky chain of events, and then go rogue.
If algorithmic progress goes surprisingly quickly, we might find ourselves in a regime where a catastrophically dangerous AI can be assembled from some mix of pre-existing open-weights models, plus fine-tuning, plus new models trained with new algorithms, and probably all stitched together with hacky agent frameworks. Then all it would take would be for sufficient hints about this algorithmic discovery to leak, and someone in the world to reverse-engineer it, and then there would be potent rogue AI all over the internet all of a sudden.
If the AI is purely intent-aligned, a bad human might use it to pursue broad coercive power.
Narrow technical AI might unlock increasingly powerful and highly offense-dominant technology with lower and lower activation costs (easy to build and launch with common materials). Even if the AI itself never got out of hand, if the dangerous tech secrets got leaked (or controlled by an aggressive government) then things could go very poorly for the world.
IMO the main argument for focusing on scheming risk is that scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful (as I discuss here). These other problems all seem like they require the models to be way smarter in order for them to be a big problem. Though as I said here, I’m excited for work on some non-scheming misalignment risks.
scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful...
Seems quite wrong. The main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful is that they cause more powerful AIs to be built which will eventually be catastrophic, but which have problems that are not easily iterable-upon (either because problems are hidden, or things move quickly, or …).
And causing more powerful AIs to be built which will eventually be catastrophic is not something which requires a great deal of intelligent planning; humanity is already racing in that direction on its own, and it would take a great deal of intelligent planning to avert it. This story, for example:
People try to do the whole “outsource alignment research to early AGI” thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they’re already on the more-powerful next gen, so it’s too late.
This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you’re talking about (“the first AIs that either pose substantial misalignment risk or that are extremely useful”), but the catastropic risk does not come from that AI scheming. It comes from people being dumb by default, the AI making them think it’s ok (without particularly strategizing to do so), and then people barreling ahead until it’s too late.
These other problems all seem like they require the models to be way smarter in order for them to be a big problem.
Also seems false? Some of the relevant stories:
As mentioned above, the “outsource alignment to AGI” failure-story was about exactly the level of AI you’re talking about.
In worlds where hard takeoff naturally occurs, it naturally occurs when AI is just past human level in general capabilities (and in particular AI R&D), which I expect is also roughly the same level you’re talking about (do you disagree with that?).
The story about an o1-style AI does not involve far possibilities and would very plausibly kick in at-or-before the first AIs that either pose substantial misalignment risk or that are extremely useful.
A few of the other stories also seem debatable depending on trajectory of different capabilities, but at the very least those three seem clearly potentially relevant even for the first highly dangerous or useful AIs.
People try to do the whole “outsource alignment research to early AGI” thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they’re already on the more-powerful next gen, so it’s too late.
This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you’re talking about (“the first AIs that either pose substantial misalignment risk or that are extremely useful”), but the catastropic risk does not come from that AI scheming.
This problem seems important (e.g. it’s my last bullet here). It seems to me much easier to handle, because if this problem is present, we ought to be able to detect its presence by using AIs to do research on other subjects that we already know a lot about (e.g. the string theory analogy here). Scheming is the only reason why the model would try to make it hard for us to notice that this problem is present.
First: you’re making reasonably-pessimistic assumptions about the AI, but very optimistic assumptions about the humans/organization. Sure, someone could look for the problem by using AIs to do research on other subject that we already know a lot about. But that’s a very expensive and complicated project—a whole field, and all the subtle hints about it, need to be removed from the training data, and then a whole new model trained! I doubt that a major lab is going to seriously take steps much cheaper and easier than that, let alone something that complicated.
One could reasonably respond “well, at least we’ve factored apart the hard technical bottleneck from the part which can be solved by smart human users or good org structure”. Which is reasonable to some extent, but also… if a product requires a user to get 100 complicated and confusing steps all correct in order for the product to work, then that’s usually best thought of as a product design problem, not a user problem. Making the plan at least somewhat robust to people behaving realistically less-than-perfectly is itself part of the problem.
Second: looking for the problem by testing on other fields itself has subtle failure modes, i.e. various ways to Not Measure What You Think You Are Measuring. A couple off-the-cuff examples:
A lab attempting this strategy brings in some string theory experts to evaluate their attempts to rederive string theory with AI assistance. But maybe (as I’ve heard claimed many times) string theory is itself an empty echo-chamber, and some form of sycophancy or telling people what they want to hear is the only way this AI-assisted attempt gets a good evaluation from the string theorists.
It turns out that fields-we-don’t-understand mostly form a natural category distinct from fields-we-do-understand, or that we don’t understand alignment precisely because our existing tools which generalize across many other fields don’t work so well on alignment. Either of those would be a (not-improbable-on-priors) specific reason to expect that our experience attempting to rederive some other field does not generalize well to alignment.
And to be clear, I don’t think of these as nitpicks, or as things which could go wrong separately from all the things originally listed. They’re just the same central kinds of failure modes showing up again, and I expect them to generalize to other hacky attempts to tackle the problem.
Third: it doesn’t really matter whether the model is trying to make it hard for us to notice the problem. What matters is (a) how likely we are to notice the problem “by default”, and (b) whether the AI makes us more or less likely to notice the problem, regardless of whether it’s trying to do so. The first story at top-of-thread is a good central example here:
Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren’t smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
Generalizing that story to attempts to outsource alignment work to earlier AI: perhaps the path to moderately-capable intelligence looks like applying lots of search/optimization over shallow heuristics. If the selection pressure is sufficient, that system may well learn to e.g. be sycophantic in exactly the situations where it won’t be caught… though it would be “learning” a bunch of shallow heuristics with that de-facto behavior, rather than intentionally “trying” to be sycophantic in exactly those situations. Then the sycophantic-on-hard-to-verify-domains AI tells the developers that of course their favorite ideas for aligning the next generation of AI will work great, and it all goes downhill from there.
One big reason I might expect an AI to do a bad job at alignment research is if it doesn’t do a good job (according to humans) of resolving cases where humans are inconsistent or disagree. How do you detect this in string theory research? Part of the reason we know so much about physics is humans aren’t that inconsistent about it and don’t disagree that much. And if you go to sub-topics where humans do disagree, how do you judge its performance (because ‘be very convincing to your operators’ is an objective with a different kind of danger).
Another potential red flag is if the AI gives humans what they ask for even when that’s ‘dumb’ according to some sophisticated understanding of human values. This could definitely show up in string theory research (note when some ideas suggest non-string-theory paradigms might be better, and push back on the humans if the humans try to ignore this), it’s just intellectually difficult (maybe easier in loop quantum gravity research heyo gottem) and not as salient without the context of alignment and human values.
See also ‘The Main Sources of AI Risk?’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)
Another one: We manage to solve alignment to a significant extend. The AI who is much smarter than a human thinks that it is aligned, and takes aligned actions. The AI even predicts that it will never become unaligned to humans. However, at some point in the future as the AI naturally unrolles into a reflectively stable equilibrium it becomes unaligned.
I see a lot of discussion of AI doom stemming from research, business, and government / politics (including terrorism). Not a lot about AI doom from crime. Criminals don’t stay in the box; the whole point of crime is to benefit yourself by breaking the rules and harming others. Intentional creation of intelligent cybercrime tools — ecosystems of AI malware, exploit discovery, spearphishing, ransomware, account takeovers, etc. — seems like a path to uncontrolled evolution of explicitly hostile AGI, where a maxim of “discover the rules; break them; profit” is designed-in.
Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there’s the problem that people will find proposals that very likely don’t actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)
My initial reaction is that at least some of these points would be covered by the Guaranteed Safe AI agenda if that works out, right? Though the “AGIs act much like a colonizing civilization” situation does scare me because it’s the kind of thing which locally looks harmless but collectively is highly dangerous. It would require no misalignment on the part of any individual AI.
Some of the stories assume a lot of AIs, wouldn’t a lot of human-level AIs be very good at creating a better AI? Also it seems implausible to me that we will get a STEM-AGI that doesn’t think about humans much but is powerful enought to get rid of atmosphere. On a different note, evaluating plausability of scenarios is a whole different thing that basically very few people do and write about in AI safety.
What I think is that there won’t be a time longer than 5 years where we have a lot of AIs and no super human AI. Basically that the first thing AIs will be used to will be self-improvement and quickly after reasonable ai agents we will get super human AI. Like 6 years.
This came from a Facebook thread where I argued that many of the main ways AI was described as failing fall into few categories (John disagreed).
I appreciated this list, but they strike me as fitting into a few clusters.
...I would flag that much of that is unsurprising to me, and I think categorization can be pretty fine.
In order:
1) If an agent is unwittingly deceptive in ways that are clearly catastrophic, and that could be understood by a regular person, I’d probably put that under the “naive” or “idiot savant” category. As in, it has severe gaps in its abilities that a human or reasonable agent wouldn’t. If the issue is that all reasonable agents wouldn’t catch the downsides of a certain plan, I’d probably put that under the “we made a pretty good bet given the intelligence that we had” category.
2) I think that “What Failure Looks Like” is less Accident risk, more “Systemic” risk. I’m also just really unsure what to think about this story. It feels to me like it’s a situation where actors are just not able to regulate externalities or similar.
3) The “fusion power generator scenario” seems like just a bad analyst to me. A lot of the job of an analyst is to flag important considerations. This seems like a pretty basic ask. For this itself to be the catastrophic part, I think we’d have to be seriously bad at this. (“i.e. Idiot Savant”)
4) STEM-AGI → I’d also put this in the naive or “idiot savant” category.
5) “that plan totally fails to align more-powerful next-gen AGI at all” → This seems orthogonal to “categorizing the types of unalignment”. This describes how incentives would create an unaligned agent, not what the specific alignment problem is. I do think it would be good to have better terminology here, but would probably consider it a bit adjacent to the specific topic of “AI alignment”—more like “AI alignment strategy/policy” or something.
6) “AGIs act much like a colonizing civilization” → This sounds like either unalignment has already happened, or humans just gave AIs their own power+rights for some reason. I agree that’s bad, but it seems like a different issue than what I think of as the alignment problem. More like, “Yea, if unaligned AIs have a lot of power and agency and different goals, that would be suboptimal”
7) “but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight.” → This sounds like a traditional mesa-agent failure. I expect a lot of “alignment” with a system made of a bunch of subcomponents is “making sure no subcomponents do anything terrible.” Also, still leaves open the specific way this subsystem becomes/is unaligned.
8 ) “using an LLM to simulate a whole society. ” → Sorry, I don’t quite follow this one.
Personally, I like the focus “scheming” has. At the same time, I imagine there are another 5 to 20 clean concerns we should also focus on (some of which have been getting attention).
While I realize there’s a lot we can’t predict, I think we could do a much better just making lists of different risk factors and allocating research amongst them.
In response to the Wizard Power post, Garrett and David were like “Y’know, there’s this thing where rationalists get depression, but it doesn’t present like normal depression because they have the mental habits to e.g. notice that their emotions are not reality. It sounds like you have that.”
… and in hindsight I think they were totally correct.
Here I’m going to spell out what it felt/feels like from inside my head, my model of where it comes from, and some speculation about how this relates to more typical presentations of depression.
Core thing that’s going on: on a gut level, I systematically didn’t anticipate that things would be fun, or that things I did would work, etc. When my instinct-level plan-evaluator looked at my own plans, it expected poor results.
Some things which this is importantly different from:
Always feeling sad
Things which used to make me happy not making me happy
Not having energy to do anything
… but importantly, the core thing is easy to confuse with all three of those. For instance, my intuitive plan-evaluator predicted that things which used to make me happy would not make me happy (like e.g. dancing), but if I actually did the things they still made me happy. (And of course I noticed that pattern and accounted for it, which is how “rationalist depression” ends up different from normal depression; the model here is that most people would not notice their own emotional-level predictor being systematically wrong.) Little felt promising or motivating, but I could still consciously evaluate that a plan was a good idea regardless of what it felt like, and then do it, overriding my broken intuitive-level plan-evaluator.
That immediately suggests a model of what causes this sort of problem.
The obvious way a brain would end up in such a state is if a bunch of very salient plans all fail around the same time, especially if one didn’t anticipate the failures and doesn’t understand why they happened. Then a natural update for the brain to make is “huh, looks like the things I do just systematically don’t work, don’t make me happy, etc; let’s update predictions on that going forward”. And indeed, around the time this depression kicked in, David and I had a couple of significant research projects which basically failed for reasons we still don’t understand, and I went through a breakup of a long relationship (and then dove into the dating market, which is itself an excellent source of things not working and not knowing why), and my multi-year investments in training new researchers failed to pay off for reasons I still don’t fully understand. All of these things were highly salient, and I didn’t have anything comparably-salient going on which went well.
So I guess some takeaways are:
If a bunch of salient plans fail around the same time for reasons you don’t understand, your instinctive plan-evaluator may end up with a global negative bias.
If you notice that, maybe try an antidepressant. Bupropion has been helpful for me so far, though it’s definitely not the right tool for everyone (especially bad if you’re a relatively anxious person; I am the opposite of anxious).
This seems basically right to me, yup. And, as you imply, I also think the rat-depression kicked in for me around the same time likely for similar reasons (though for me an at-least-equally large thing that roughly-coincided was the unexpected, disappointing and stressful experience of the funding landscape getting less friendly for reasons I don’t fully understand.) Also some part of me thinks that the model here is a little too narrow but not sure yet in what way(s).
This matches with the dual: mania. All plans, even terrible ones, seem like they’ll succeed and this has flow through effects to elevated mood, hyperactivity, etc.
Whether or not this happens in all minds, the fact that people can alternate fairly rapidly between depression and mania with minimal trigger suggests there can be some kind of fragile “chemical balance” or something that’s easily upset. It’s possible that’s just in mood disorders and more stable minds are just vulnerable to the “too many negative updates at once” thing without greater instability.
I imagine part of the problem is also then the feedback loop of Things Don’t Go Well > Why Even Bother > Things Don’t Go Well. Which if anything you’d expect that sort of proactive approach that simply does the thing anyway to break. I do wonder though if there may also be entirely internal feedback loops (like neuroreceptors or something) once the negativity is triggered by external events. I would assume so, or depression wouldn’t need to be treated pharmaceutically as much as it is.
EDIT: it’s also possible John felt fine emotionally and was fully aware of his emotional state and actually was so good at not latching on to emotions that it was highly nontrivial to spot, or some combination. Leaving this comment in case it’s useful for others. I don’t like the tone though, I might’ve been very disassociated as a rationalist (and many are) but it’s not obvious John is from this alone or not.
As a meditator I pay a lot of attention to what emotion I’m feeling in high resolution and the causality between it and my thoughts and actions. I highly recommend this practice. What John describes in “plan predictor predicts failure” is something I notice several times a month & address. It’s 101 stuff when you’re orienting at it from the emotional angle, there’s also a variety of practices I can deploy (feeling emotions, jhanas, many hard to describe mental motions...) to get back to equilibrium and clear thinking & action. This has overall been a bigger update to my effectiveness than the sequences, plausibly my rationality too (I can finally be unbiased instead of trying to correct or pretend I’m not biased!)
Like, when I head you say “your instinctive plan-evaluator may end up with a global negative bias” I’m like, hm, why not just say “if you notice everything feels subtly heavier and like the world has metaphorically lost color” (how I notice it in myself. tbc fully nonverbally). Noticing through patterns of verbal thought also works, but it’s just less data to do metacognition over. You’re noticing correlations and inferring the territory (how you feel) instead of paying attention to how you feel directly (something which can be learned over time by directing attention towards noticing, not instantly)
I may write on this. Till then I highly recommend Joe Hudson’s work, it may require a small amount of woo tolerance, but only small. He coached Sam Altman & other top execs on emotional clarity & fluidity. Extremely good. Requires some practice & willingness to embrace emotional intensity (sometimes locally painful) though.
Like, when I head you say “your instinctive plan-evaluator may end up with a global negative bias” I’m like, hm, why not just say “if you notice everything feels subtly heavier and like the world has metaphorically lost color”
Because everything did not feel subtly heavier or like the world had metaphorically lost color. It was just, specifically, that most nontrivial things I considered doing felt like they’d suck somehow, or maybe that my attention was disproportionately drawn to the ways in which they might suck.
And to be clear, “plan predictor predicts failure” was not a pattern of verbal thought I noticed, it’s my verbal description of the things I felt on a non-verbal level. Like, there is a non-verbal part of my mind which spits out various feelings when I consider doing different things, and that part had a global negative bias in the feelings it spit out.
I use this sort of semitechnical language because it allows more accurate description of my underlying feelings and mental motions, not as a crutch in lieu of vague poetry.
Epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.
Full-cynical model of the AI safety ecosystem right now:
There’s OpenAI, which is pretending that it’s going to have full AGI Any Day Now, and relies on that narrative to keep the investor cash flowing in while they burn billions every year, losing money on every customer and developing a product with no moat. They’re mostly a hype machine, gaming metrics and cherry-picking anything they can to pretend their products are getting better. The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Then there’s a significant contingent of academics who pretend to produce technical research on AI safety, but in fact mostly view their job as producing technical propaganda for the regulation activists and lobbyists. (Central example: Dan Hendrycks, who is the one person I directly name mainly because I expect he thinks of himself as a propagandist and will not be particularly offended by that description.) They also push the narrative, and benefit from it. They’re all busy bullshitting research. Some of them are quite competent propagandists though.
There’s another significant contingent of researchers (some at the labs, some independent, some academic) who aren’t really propagandists, but mostly follow the twitter-memetic incentive gradient in choosing their research. This tends to generate paper titles which sound dramatic, but usually provide pretty little conclusive evidence of anything interesting upon reading the details, and very much feed the narrative. This is the main domain of Not Measuring What You Think You Are Measuring and Symbol/Referent Confusions.
Then of course there’s the many theorists who like to build neat toy models which are completely toy and will predictably not generalize useful to real-world AI applications. This is the main domain of Ad-Hoc Mathematical Definitions, the theorists’ analogue of Not Measuring What You Think You Are Measuring.
Benchmarks. When it sounds like a benchmark measures something reasonably challenging, it nearly-always turns out that it’s not really measuring the challenging thing, and the actual questions/tasks are much easier than the pitch would suggest. (Central examples: software eng, GPQA, frontier math.) Also it always turns out that the LLMs’ supposedly-impressive achievement relied much more on memorization of very similar content on the internet than the benchmark designers expected.
Then there’s a whole crowd of people who feel real scared about AI (whether for good reasons or because they bought the Narrative pushed by all the people above). They mostly want to feel seen and validated in their panic. They have discussions and meetups and stuff where they fake doing anything useful about the problem, while in fact they mostly just emotionally vibe with each other. This is a nontrivial chunk of LessWrong content, as e.g. Val correctly-but-antihelpfully pointed out. It’s also the primary motivation behind lots of “strategy” work, like e.g. surveying AI researchers about their doom probabilities, or doing timeline forecasts/models.
… and of course none of that means that LLMs won’t reach supercritical self-improvement, or that AI won’t kill us, or [...]. Indeed, absent the very real risk of extinction, I’d ignore all this fakery and go about my business elsewhere. I wouldn’t be happy about it, but it wouldn’t bother me any more than all the (many) other basically-fake fields out there.
Man, I really just wish everything wasn’t fake all the time.
What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?
Chris Olah and Dan Murfet in the at-least-partially empirical domain. Myself in the theory domain, though I expect most people (including theorists) would not know what to look for to distinguish fake from non-fake theory work. In the policy domain, I have heard that Microsoft’s lobbying team does quite non-fake work (though not necessarily in a good direction). In the capabilities domain, DeepMind’s projects on everything except LLMs (like e.g. protein folding, or that fast matrix multiplication paper) seem consistently non-fake, even if they’re less immediately valuable than they might seem at first glance. Also Conjecture seems unusually good at sticking to reality across multiple domains.
The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it’s all fake.
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
Have you elaborated on this anywhere?
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
I think it’s quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.
Have you elaborated on this anywhere?
Not hugely. One tricky bit is that it basically ends up boiling down to “the original arguments don’t hold up if you think about them”, but the exact way they don’t hold up depends on what the argument is, so it’s kind of hard to respond to in general.
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
Haha! I think I mostly still stand by the post. In particular, “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” remains true; it’s just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don’t have to). This means that consequentialism isn’t very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.
(I’m not sure I stand 101% by my post; there’s some funky business about how to count evolution that I still haven’t settled on yet. And I was too quick to go from “imitation learning isn’t going to lead to far-superhuman abilities” to “consequentialism is the road to far-superhuman abilities”. But yeah I’m actually surprised at how well I stand by my old view despite my massive recent updates.)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
(I think human brains have both [partly-] consequentialist decisions and self-supervised updating of the world-model.) (They’re not totally independent, but rather they interact via training data: e.g. [partly-] consequentialist decision-making determines how you move your eyes, and then whatever your eyes are pointing at, your model of the visual world will then update by self-supervised learning on that particular data. But still, these are two systems that interact, not the same thing.)
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
This I’d dispute. If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
I’m saying that intelligence is the thing that allows you to handle patterns. So if you’ve got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.
Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that’s meant to build an understanding of actions based on data or experience.
One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they’re not really independent of each other. Either way that just seems like a small labelling thing to me.
If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns,
I guess to add, I’m not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I don’t have time to read this study in detail until later today, but if I’m understanding it correctly, the study isn’t claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Random street names aren’t necessarily important though? Like what would you do with them?
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
I didn’t say that intelligence can’t handle different environments, I said it can’t handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could’ve landed a rocket with a team of Americans in Moscow than on the moon.
Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they’re part of it too.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I’m not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t. There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
I do think there’s a “something else” (most [but not all] humans have an innate drive to follow and enforce social norms, more or less), but I don’t think it’s necessary. The Wright Brothers didn’t have any innate drive to copy anything about bird soaring tradition, but they did it anyway purely by intelligence.
Random street names aren’t necessarily important though?
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn’t even be able to copy the traditions. Like consider a collection of rocks or a forest; it can’t pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don’t think it can be used to determine which traditions should be copied and which ones shouldn’t.
There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
It seems to me that you are treating any variable attribute that’s highly correlated across generations as a “tradition”, to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I’m probably not the best person to make the case for tradition as (despite my critique of intelligence) I’m still a relatively strong believer in equillibration and reinvention.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Whenever there’s any example of this that’s too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data—e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world’s elites, etc.. Though I expect you feel like these would be “cheating”, because it doesn’t have a chance to learn them?
The things in question often aren’t things that most humans have a chance to learn, or even would benefit from learning. Often it’s enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there’s no corresponding universal solution.
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
You ran way deeper into the “except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception” point than I meant you to. My main point is that humans have grounding on important factors that we’ve acquired through non-intelligence-based means. I bring up the possibility of copying other’s conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn’t copy human insights. For instance, imagine if there was a planet with life much like Earth’s, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way—but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there’s also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that’s computationally intractable).
(Also, again you still need to distinguish between “Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?” vs “Is intelligence sufficient on its own to detect deception?”. My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don’t just use intelligence but also other facets of human agency.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
First, some things that might seem like nitpicks but are moderately important to my position:
In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn’t been protected against. (I’m aware of John Wentworth’s ‘gears of aging’ series arguing that aging has a common cause, but I’ve come to think that his arguments don’t sufficiently much distinguish between ‘is eventually mediated by a common cause’ vs ‘is ultimately caused by a common cause’. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there’s a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don’t think we actually have a robust way to avoid that?
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. “Intelligence” and “consequentialism” are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we’ve already “achieved superhuman intelligence” in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus “intelligence” factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there’s not as much reason to presume that all of it can. “Durability” and “strength” are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it—though I suspect it’s not purely cognitive...)
OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
Filter for homogenity of environment is anthropic selection—if environment is sufficiently heterogeneous, it kills everyone who tries to reach out of its ecological niche, general intelligence doesn’t develop and we are not here to have this conversation.
Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, … . Most of these developed before intelligence did.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
We don’t just use intelligence.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
I feel like I’m failing to convey the level of abstraction I intend to.
I’m not saying that durability of object implies intelligence of object. I’m saying that if the world is ordered in a way that allows existence of distinct durable and non-durable objects, that means the possibility of intelligence which can notice that some objects are durable and some are not and exploit this fact.
If the environment is not ordered enough to contain intelligent beings, it’s probably not ordered enough to contain distinct durable objects too.
To be clear, by “environment” I mean “the entire physics”. When I say “environment not ordered enough” I mean “environment with physical laws chaotic enough to not contain ordered patterns”.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
I don’t think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real
I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about “performative compliance” and “compliance theatre”. Painfully present across the legal and governance sectors.
That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a “non fake” regulatory effort look like?
I don’t think it would be okay to dismiss your take entirely, but it would be great to see what solutions you’d propose too. This is why I disagree in principle, because there are no specific points to contribute to.
In Europe, paradoxically, some of the people “close enough to the bureaucracy” that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.
But I will rescue this:
“(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI”
BigTech is too powerful to lobby against. “Stopping advanced AI” per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people’s lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to “stop” development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.
Those are considered to create unacceptable risks to people’s lives and human rights.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
“The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.”
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
If the core products aren’t really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they’re changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn’t, it isn’t. (At least directionally; a lot of that propaganda is still probably exaggerated.)
If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they’re responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.
So, in deciding whether or not to endorse this narrative, we’d like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another [...]
Nope!
Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.
Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.
Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.
Sounds like you’re suggesting that real progress could be orthogonal to human-observed progress. I don’t see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don’t know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.
They think all these improvements are happening at the same time in vastly different domains because they’re all downstream of the same task, which is text prediction. So, they’re lumped together in the general domain of ‘capabilities’, and call a model which can do all of them well a ‘general intelligence’. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big ‘if’!) But how could the models be ‘improving’ without improving at any of these things? What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
correctly guessing the true authors of anonymous text
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.
Why can’t there be plenty of other things like inner-monologue or truesight? (“Wait, you could do X? Why didn’t you tell us?” “You never asked.”)
What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
Maybe a better example would be to point out that ‘emergent’ tasks in general, particularly multi-step tasks, can have observed success rates of precisely 0 in feasible finite samples, but extreme brute-force sampling reveals hidden scaling. Humans would perceive zero improvement as the models scaled (0/100 = 0%, 0⁄100 = 0%, 0⁄100 = 0%...), even though they might be rapidly improving from 1⁄100,000 to 1⁄10,000 to 1⁄1,000 to… etc. “Sampling can show the presence of knowledge but not the absence.”
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
Oops, yes. I was thinking “domains of real improvement which humans are currently perceiving in LLMs”, not “domains of real improvement which humans are capable of perceiving in general”. So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be ‘real’ even if other discoveries are ‘fake’.
That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we’d expect it to correlate with skill in the common “write [x] in the style of [y]” prompt, right? Surely the same network of associations which lets it accurately generate “Eliezer Yudkowsky wrote this” after a given set of tokens, would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowksy says...”.
So I still wouldn’t consider these things to have basically-nothing to do with commonly perceived domains of improvement.
The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn’t have noticed because no one would have been prompting for it and if they had, they probably wouldn’t have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn’t count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an ‘uncoupled’ capability; it’s not a real improvement. Did it subsequently improve benchmarks once discovered? Then it’s not really an example because it’s ‘coupled’...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with ‘reasoning models’ which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 ‘only’ does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowsky says...”.
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: ‘name → output’; truesight is about ‘name ← output’. The ‘reversal curse’ is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn’t know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search… No benchmark really distinguishes between these such that you could point to a single specific number and say, “that’s the truesight metric, and you can see it gets better with scale”.
Then there’s the AI regulation activists and lobbyists. [...] Even if they do manage to pass any regulations on AI, those will also be mostly fake
SB1047 was a pretty close shot to something really helpful. The AI Act and its code of practice might be insufficient, but there are good elements in it that, if applied, would reduce the risks. The problem is that it won’t be applied because of internal deployment.
But I sympathise somewhat with stuff like this:
They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.
SB1047 was a pretty close shot to something really helpful.
No, it wasn’t. It was a pretty close shot to something which would have gotten a step closer to another thing, which itself would have gotten us a step closer to another thing, which might have been moderately helpful at best.
Sure, they are more-than-zero helpful. Heck, in a relative sense, they’d be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.
One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.
When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.
Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.
The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most
… or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
To what extent are the three AGI labs alive vs. dead players, then?
OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.
I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.
I disagree that the default would’ve been that the board would’ve been “easy for the labs to capture” (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn’t have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).
Also, I’m confused when policy skeptics say things like “sure, it might slow down timelines by a factor of 2-3, big deal.” Having 2-3x as much time is indeed a big deal!
I’m glad we agree “they’d be one of the biggest wins in AI safety to date.”
“Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that
How so? It’s pretty straightforward if the model is still contained in the lab.
“Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all
I think ticking boxes is good. This is how we went to the Moon, and it’s much better to do this than to not do it. It’s not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.
we simply do not have a way to reliably tell which models are and are not dangerous
How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: “A model might sandbag.” Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn’t be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.
The EU AI Act even mentions “alignment with human intent” explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).
I do not think any law has mentioned alignment like this before, so it’s massive already.
Will a lot of the implementation efforts feel “fake”? Oh, 100%. But I’d say that this is why we (this community) should not disengage from it...
I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).
Your very first point is, to be a little uncharitable, ‘maybe OpenAI’s whole product org is fake.’ I know you have a disclaimer here but you’re talking about a product category that didn’t exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.
If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.
No, the model here is entirely consistent with OpenAI putting out some actual cool products. Those products (under the model) just aren’t on a path to AGI, and OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future. It’s the narrative about building AGI which is fake.
OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future.
Really? I’m mostly ignorant on such matters, but I’d thought that their valuation seemed comically low compared to what I’d expect if their investors thought that OpenAI was likely to create anything close to a general superhuman AI system in the near future.[1] I considered this evidence that they think all the AGI/ASI talk is just marketing.
Well ok, if they actually thought OpenAI would create superintelligence as I think of it, their valuation would plummet because giving people money to kill you with is dumb. But there’s this space in between total obliviousness and alarm, occupied by a few actually earnest AI optimists. And, it seems to me, not occupied by the big OpenAI investors.
Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever’s in the news requires replacing tens of trillions of dollars of capital before the end of the decade.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn’t definitive, else stocks would never change rapidly.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances
To be clear: I think the investors would be wrong to think that AGI/ASI soon-ish isn’t pretty likely.
But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI’s research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.
Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.
The point of listing the problems with their business model is that they need the AGI narrative in order to fuel the investor cash, without which they will go broke at current spend rates. They have cool products, they could probably make a profit if they switched to optimizing for that (which would mean more expensive products and probably a lot of cuts), but not anywhere near the level of profits they’d need to justify the valuation.
That’s how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I’m not sure what to say to this that I didn’t say originally.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.
The activists and the lobbyists are two very different groups. The activists are not trying to network with the DC people (yet). Unless you mean Encode, who I would call lobbyists, not activists.
Good point, I should have made those two separate bullet points:
Then there’s the AI regulation lobbyists. They lobby and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Also, there’s the AI regulation activists, who e.g. organize protests. Like ~98% of protests in general, such activity is mostly performative and not the sort of thing anyone would end up doing if they were seriously reasoning through how best to spend their time in order to achieve policy goals. Calling it “fake” feels almost redundant. Insofar as these protests have any impact, it’s via creating an excuse for friendly journalists to write stories about the dangers of AI (itself an activity which mostly feeds the narrative, and has dubious real impact).
(As with the top level, epistemic status:I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.)
Oh, if you’re in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:
Vibe coders and “10x’d engineers”, who (on this model) would be falling into one of the failure modes outlined here: producing applications/features that didn’t need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like “volume of code produced” or “number of commits”), or “automatically generating” entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they’re bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.
I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions—there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.
On o3: for what feels like the twentieth time this year, I see people freaking out, saying AGI is upon us, it’s the end of knowledge work, timelines now clearly in single-digit years, etc, etc. I basically don’t buy it, my low-confidence median guess is that o3 is massively overhyped. Major reasons:
I’ve personally done 5 problems from GPQA in different fields and got 4 of them correct (allowing internet access, which was the intent behind that benchmark). I’ve also seen one or two problems from the software engineering benchmark. In both cases, when I look the actual problems in the benchmark, they are easy, despite people constantly calling them hard and saying that they require expert-level knowledge.
For GPQA, my median guess is that the PhDs they tested on were mostly pretty stupid. Probably a bunch of them were e.g. bio PhD students at NYU who would just reflexively give up if faced with even a relatively simple stat mech question which can be solved with a couple minutes of googling jargon and blindly plugging two numbers into an equation.
For software engineering, the problems are generated from real git pull requests IIUC, and it turns out that lots of those are things like e.g. “just remove this if-block”.
Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven’t checked (e.g. FrontierMath) is that they’re also mostly much easier than they’re hyped up to be.
On my current model of Sam Altman, he’s currently very desperate to make it look like there’s no impending AI winter, capabilities are still progressing rapidly, etc. Whether or not it’s intentional on Sam Altman’s part, OpenAI acts accordingly, releasing lots of very over-hyped demos. So, I discount anything hyped out of OpenAI, and doubly so for products which aren’t released publicly (yet).
Over and over again in the past year or so, people have said that some new model is a total game changer for math/coding, and then David will hand it one of the actual math or coding problems we’re working on and it will spit out complete trash. And not like “we underspecified the problem” trash, or “subtle corner case” trash. I mean like “midway through the proof it redefined this variable as a totally different thing and then carried on as though both definitions applied”. The most recent model with which this happened was o1.
Of course I am also tracking the possibility that this is a skill issue on our part, and if that’s the case I would certainly love for someone to help us do better. See this thread for a couple examples of relevant coding tasks.
My median-but-low-confidence guess here is that basically-all the people who find current LLMs to be a massive productivity boost for coding are coding things which are either simple, or complex only in standardized ways—e.g. most web or mobile apps. That’s the sort of coding which mostly involves piping things between different APIs and applying standard patterns, which is where LLMs shine.
@johnswentworth Do you agree with me that modern LLMs probably outperform (you with internet access and 30 minutes) on GPQA diamond? I personally think this somewhat contradicts the narrative of your comment if so.
I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded.
(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)
@Buck Apparently the five problems I tried were GPQA diamond, they did not take anywhere near 30 minutes on average (more like 10 IIRC?), and I got 4⁄5 correct. So no, I do not think that modern LLMs probably outperform (me with internet access and 30 minutes).
Ok, so sounds like given 15-25 mins per problem (and maybe with 10 mins per problem), you get 80% correct. This is worse than o3, which scores 87.7%. Maybe you’d do better on a larger sample: perhaps you got unlucky (extremely plausible given the small sample size) or the extra bit of time would help (though it sounds like you tried to use more time here and that didn’t help). Fwiw, my guess from the topics of those questions is that you actually got easier questions than average from that set.
I continue to think these LLMs will probably outperform (you with 30 mins). Unfortunately, the measurement is quite expensive, so I’m sympathetic to you not wanting to get to ground here. If you believe that you can beat them given just 5-10 minutes, that would be easier to measure. I’m very happy to bet here.
I think that even if it turns out you’re a bit better than LLMs at this task, we should note that it’s pretty impressive that they’re competitive with you given 30 minutes!
So I still think your original post is pretty misleading [ETA: with respect to how it claims GPQA is really easy].
I think the models would beat you by more at FrontierMath.
I think that how you talk about the questions being “easy”, and the associated stuff about how you think the baseline human measurements are weak, is somewhat inconsistent with you being worse than the model.
I mean, there are lots of easy benchmarks on which I can solve the large majority of the problems, and a language model can also solve the large majority of the problems, and the language model can often have a somewhat lower error rate than me if it’s been optimized for that. Seems like GPQA (and GPQA diamond) are yet another example of such a benchmark.
(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don’t recall it being crazy early)
I remember finishing early, and then spending a lot of time going back over all them a second time, because the goal of the workshop was to answer correctly with very high confidence. I don’t think I updated any answers as a result of the second pass, though I don’t remember very well.
(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn’t like people were trying to crank through them in 7 minutes)
The problems I gave were (as listed in the csv for the diamond problems)
@johnswentworth FWIW, GPQA Diamond seems much harder than GPQA main to me, and current models perform well on it. I suspect these models beat your performance on GPQA diamond if you’re allowed 30 mins per problem. I wouldn’t be shocked if you beat them (maybe I’m like 20%?), but that’s because you’re unusually broadly knowledgeable about science, not just because you’re smart.
I personally get wrecked by GPQA chemistry, get ~50% on GPQA biology if I have like 7 minutes per problem (which is notably better than their experts from other fields get, with much less time), and get like ~80% on GPQA physics with less than 5 minutes per problem. But GPQA Diamond seems much harder.
Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven’t checked (e.g. FrontierMath) is that they’re also mostly much easier than they’re hyped up to be
Daniel Litt’s account here supports this prejudice. As a math professor, he knew instantly how to solve the low/medium-level problems he looked at, and he suggests that each “high”-rated problem would be likewise instantly solvable by an expert in that problem’s subfield.
And since LLMs have eaten ~all of the internet, they essentially have the crystallized-intelligence skills for all (sub)fields of mathematics (and human knowledge in general). So from their perspective, all of those problems are very “shallow”. No human shares their breadth of knowledge, so math professors specialized even in slightly different subfields would indeed have to do a lot of genuine “deep” cognitive work; this is not the case for LLMs.
GPQA stuff is even worse, a literal advanced trivia quiz that seems moderately resistant to literal humans literally googling things, but not to the way the knowledge gets distilled into LLMs.
Basically, I don’t think any extant benchmark (except I guess the Millennium Prize Eval) actually tests “deep” problem-solving skills, in a way LLMs can’t cheat at using their overwhelming knowledge breadth.
My current strong-opinion-weakly-held is that they’re essentially just extensive knowledge databases with a nifty natural-language interface on top.[1] All of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.
Which is to say: this is the central way to characterize what they are; not merely “isomorphic to a knowledge database with a natural-language search engine on top if you think about them in a really convoluted way”. Obviously a human can also be considered isomorphic to database search if you think about it in a really convoluted way, but that wouldn’t be the most-accurate way to describe a human.
[...] he suggests that each “high”-rated problem would be likewise instantly solvable by an expert in that problem’s subfield.
This is an exaggeration and, as stated, false.
Epoch AI made 5 problems from the benchmark public. One of those was ranked “High”, and that problem was authored by me.
It took me 20-30 hours to create that submission. (To be clear, I considered variations of the problem, ran into some dead ends, spent a lot of time carefully checking my answer was right, wrote up my solution, thought about guess-proof-ness[1] etc., which ate up a lot of time.)
I would call myself an “expert in that problem’s subfield” (e.g. I have authored multiple related papers).
I think you’d be very hard-pressed to find any human who could deliver the correct answer to you within 2 hours of seeing the problem.
E.g. I think it’s highly likely that I couldn’t have done that (I think it’d have taken me more like 5 hours), I’d be surprised if my colleagues in the relevant subfield could do that, and I think the problem is specialized enough that few of the top people in CodeForces or Project Euler could do it.
On the other hand, I don’t think the problem is very hard insight-wise—I think it’s pretty routine, but requires care with details and implementation. There are certainly experts who can see the right main ideas quickly (including me). So there’s something to the point of even FrontierMath problems being surprisingly “shallow”. And as is pointed out in the FM paper, the benchmark is limited to relatively short-scale problems (hours to days for experts) - which really is shallow, as far as the field of mathematics is concerned.
But it’s still an exaggeration to talk about “instantly solvable”. Of course, there’s no escaping of Engel’s maxim “A problem changes from impossible to trivial if a related problem was solved in training”—I guess the problem is instantly solvable to me now… but if you are hard-pressed to find humans that could solve it “instantly” when seeing it the first time, then I wouldn’t describe it in those terms.
Also, there are problems in the benchmark that require more insight than this one.
Daniel Litt writes about the problem: “This one (rated “high”) is a bit trickier but with no thinking at all (just explaining what computation I needed GPT-4o to do) I got the first 3 digits of the answer right (the answer requires six digits, and the in-window python timed out before it could get this far)
Of course *proving* the answer to this one is correct is harder! But I do wonder how many of these problems are accessible to simulation/heuristics. Still an immensely useful tool but IMO people should take a step back before claiming mathematicians will soon be replaced”.
I very much considered naive simulations and heuristics. The problem is getting 6 digits right, not 3. (The AIs are given a limited compute budget.) This is not valid evidence in favor of the problem’s easiness or for the benchmark’s accessibility to simulation/heuristics—indeed, this is evidence in the opposing direction.
See also Evan Chen’s “I saw the organizers were pretty ruthless about rejecting problems for which they felt it was possible to guess the answer with engineer’s induction.”
And fair enough, I used excessively sloppy language. By “instantly solvable”, I did in fact mean “an expert would very quickly (“instantly”) see the correct high-level approach to solving it, with the remaining work being potentially fiddly, but conceptually straightforward”. “Instantly solvable” in the sense of “instantly know how to solve”/”instantly reducible to something that’s trivial to solve”.[1]
FWIW the “medium” and “low” problems I say I immediately knew how to do are very close to things I’ve thought about; the “high”-rated problem above is a bit further, and I suspect an expert closer to it would similarly “instantly” know the answer.
That said,
if you are hard-pressed to find humans that could solve it “instantly” when seeing it the first time, then I wouldn’t describe it in those terms
If there are no humans who can “solve it instantly” (in the above sense), then yes, I wouldn’t call it “shallow”. But if such people do exist (even if they’re incredibly rare), this implies that the conceptual machinery (in the form of theorems or ansatzes) for translating the problem into a trivial one already exists as well. Which, in turn, means it’s likely present in the LLM’s training data. And therefore, from the LLM’s perspective, that problem is trivial to translate into a conceptually trivial problem.
It seems you’d largely agree with that characterization?
Note that I’m not arguing that LLMs aren’t useful or unimpressive-in-every-sense. This is mainly an attempt to build a model of why LLMs seem to perform so well on apparently challenging benchmarks while reportedly falling flat on their faces on much simpler real-life problems.
Or, closer to the way I natively think of it: In the sense that there are people (or small teams of people) with crystallized-intelligence skillsets such that they would be able to solve this problem by plugging their crystallized-intelligence skills one into another, without engaging in prolonged fluid-intelligence problem-solving.
It seems you’d largely agree with that characterization?
Yes. My only hesitation is about how real-life-important it’s for AIs to be able to do math for which very-little-to-no training data exists. The internet and the mathematical literature is so vast that, unless you are doing something truly novel, there’s some relevant subfield there—in which case FrontierMath-style benchmarks would be informative of capability to do real math research.
Also, re-reading Wentworth’s original comment, I note that o1 is weak according to FM. Maybe the things Wentworth is doing are just too hard for o1, rather than (just) overfitting-on-benchmarks style issues? In any case his frustration with o1′s math skills doesn’t mean that FM isn’t measuring real math research capability.
The internet and the mathematical literature is so vast that, unless you are doing something truly novel, there’s some relevant subfield there
Previously, I’d intuitively assumed the same as well: that it doesn’t matter if LLMs can’t “genuinely research/innovate”, because there is enough potential for innovative-yet-trivial combination of existing ideas that they’d still massively speed up R&D by finding those combinations. (“Innovation overhang”, as @Nathan Helm-Burger puts it here.)
Back in early 2023, I’d considered it fairly plausible that the world would start heating up in 1-2 years due to such synthetically-generated innovations.
Except this… just doesn’t seem to be happening? I’m yet to hear of a single useful scientific paper or other meaningful innovation that was spearheaded by a LLM.[1] And they’re already adept at comprehending such innovative-yet-trivial combinations if a human prompts them with those combinations. So it’s not the matter of not yet being able to understand or appreciate the importance of such synergies. (If Sonnet 3.5.1 or o1 pro didn’t do it, I doubt o3 would.)
Yet this is still not happening. My guess is that “innovative-yet-trivial combinations of existing ideas” are not actually “trivial”, and LLMs can’t do that for the same reasons they can’t do “genuine research” (whatever those reasons are).
Admittedly it’s possible that this is totally happening all over the place and people are just covering it up in order to have all of the glory/status for themselves. But I doubt it: there are enough remarkably selfless LLM enthusiasts that if this were happening, I’d expect it would’ve gone viral already.
It’s only now that LLMs are reasonably competent in at least some hard problems, and at any rate, I expect RL to basically solve the domain, because of verifiability properties combined with quite a bit of training data.
We should wait a few years, as we have another scale-up that’s coming up, and it will probably be quite a jump from current AI due to more compute:
It’s only now that LLMs are reasonably competent in at least some hard problems
I don’t think that’s the limiter here. Reports in the style of “my unpublished PhD thesis was about doing X using Y methodology, I asked an LLM to do that and it one-shot a year of my work! the equations it derived are correct!” have been around for quite a while. I recall it at least in relation to Claude 3, and more recently, o1-preview.
If LLMs are prompted to combine two ideas, they’ve been perfectly capable of “innovating” for ages now, including at fairly high levels of expertise. I’m sure there’s some sort of cross-disciplinary GPQA-like benchmark that they’ve saturated a while ago, so this is even legible.
The trick is picking which ideas to combine/in what direction to dig. This doesn’t appear to be something LLMs are capable of doing well on their own, nor do they seem to speed up human performance on this task. (All cases of them succeeding at it so far have been, by definition, “searching under the streetlight”: checking whether they can appreciate a new idea that a human already found on their own and evaluated as useful.)
I suppose it’s possible that o3 or its successors change that (the previous benchmarks weren’t measuring that, but surely FrontierMath does...). We’ll see.
I expect RL to basically solve the domain
Mm, I think it’s still up in the air whether even the o-series efficiently scales (as in, without requiring a Dyson Swarm’s worth of compute) to beating the Millennium Prize Eval (or some less legendary yet still major problems).
I expect such problems don’t pass the “can this problem be solved by plugging the extant crystallized-intelligence skills of a number of people into each other in a non-contrived[1] way?” test. Does RL training allow to sidestep this, letting the model generate new crystallized-intelligence skills?
I’m not confident one way or another.
we have another scale-up that’s coming up
I’m bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I’m sure it’d show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...
But I predict, with low-moderate confidence, that it still won’t kick off a deluge of synthetically derived innovations. It’d have even more breadth and eye for nuance, but somehow, perplexingly, still no ability to use those capabilities autonomously.
“Non-contrived” because technically, any cognitive skill is just a combination of e. g. NAND gates, since those are Turing-complete. But obviously that doesn’t mean any such skill is accessible if you’ve learned the NAND gate. Intuitively, a combination of crystallized-intelligence skills is only accessible if the idea of combining them is itself a crystallized-intelligence skill (e. g., in the math case, a known ansatz).
Which perhaps sheds some light on why LLMs can’t innovate even via trivial ideas combinations. If a given idea-combination “template” weren’t present in the training data, the LLM can’t reliably independently conceive of it except by brute-force enumeration...? This doesn’t seem quite right, but maybe in the right direction.
I think my key crux is that in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance, and mathematics/programming are domains that are unusually easy to verify/gather training data for RL performance, so with caveats it can become rather good at those specific domains/benchmarks like millennium prize evals, but the important caveat is I don’t believe this transfers very well to domains where verifying isn’t easy, like creative writing.
I’m bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I’m sure it’d show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...
I was talking about the 1 GW systems that would be developed in late 2026-early 2027, not GPT-5.
in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance
Sure, the theory on that is solid. But how efficiently does it scale off-distribution, in practice?
The inference-time scaling laws, much like the pretraining scaling laws, are ultimately based on test sets whose entries are “shallow” (in the previously discussed sense). It doesn’t tell us much regarding how well the technique scales with the “conceptual depth” of a problem.
o3 took a million dollars in inference-time compute and unknown amounts in training-time compute just to solve the “easy” part of the FrontierMath benchmark (which likely take human experts single-digit hours, maybe <1 hour for particularly skilled humans). How much would be needed for beating the “hard” subset of FrontierMath? How much more still would be needed for problems that take individual researchers days; or problems that take entire math departments months; or problems that take entire fields decades?
It’s possible that the “synthetic data flywheel” works so well that the amount of human-researcher-hour-equivalents per unit of compute scales, say, exponentially with some aspect of o-series’ training, and so o6 in 2027 solves the Riemann Hypothesis.
Or it scales not that well, and o6 can barely clear real-life equivalents of hard FrontierMath problems. Perhaps instead the training costs (generating all the CoT trees on which RL training is then done) scale exponentially, while researcher-hour-equivalents per compute units scale linearly.
It doesn’t seem to me that we know which one it is yet. Do we?
I think a different phenomenon is occuring. My guess, updating on my own experience, is that ideas aren’t the current bottleneck. 1% inspiration, 99% perspiration.
As someone who has been reading 3-20 papers per month for many years now, in neuroscience and machine learning, I feel overwhelmed with ideas. I average about 0.75 per paper. I write them down, and the lists grow faster than they shrink by two orders of magnitude.
When I was on my favorite industry team, what I most valued about my technical manager was his ability to help me sort through and prioritize them. It was like I created a bunch of LEGO pieces, he picked one to be next, I put it in place by coding it up, he checked the placement by reviewing my PR. If someone has offered me a source of ideas ranging in quality between worse than my worst ideas, and almost as good as my best ideas, and skewed towards bad… I’d have laughed and turned them down without a second thought.
For something like a paper instead of a minor tech idea for 1 week PR… The situation is far more intense. The grunt work of running the experiments and preparing the paper is enormous compared to the time and effort of coming up with the idea in the first place. More like 0.1% to 99.9%.
Current LLMs can speed up creating a paper if given the results and experiment description to write about. That’s probably also not the primary bottleneck (although still more than idea generation).
So the current bottleneck, in my estimation, for ml experiments, is the experiments. Coding up the experiments accurately and efficiently, running them (and handling the compute costs), analyzing the results.
So I’ve been expecting to see an acceleration dependent on that aspect. That’s hard to measure though. Are LLMs currently speeding this work up a little? Probably. I’ve had my work sped up some by the recent Sonnet 3.5.1. Currently though it’s a trade-off, there’s overhead in checking for misinterpretations and correcting bugs. We still seem a long way in “capability space” from me being able to give a background paper and rough experiment description, and then having the model do the rest. Only once that’s the case will idea generation become my bottleneck.
That’s the opposite of my experience. Nearly all the papers I read vary between “trash, I got nothing useful out besides an idea for a post explaining the relevant failure modes” and “high quality but not relevant to anything important”. Setting up our experiments is historically much faster than the work of figuring out what experiments would actually be useful.
There are exceptions to this, large projects which seem useful and would require lots of experimental work, but they’re usually much lower-expected-value-per-unit-time than going back to the whiteboard, understanding things better, and doing a simpler experiment once we know what to test.
Ah, well, for most papers that spark an idea in me, the idea isn’t simply an extension of the paper. It’s a question tangentially related which probes at my own frontier of understanding.
I’ve always found that a boring lecture is a great opportunity to brainstorm because my mind squirms away from the boredom into invention and extrapolation of related ideas. A boring paper does some of the same for me, except that I’m less socially pressured to keep reading it, and thus less able to squeeze my mind with the boredom of it.
As for coming up with ideas… It is a weakness of mind that I am far better at generating ideas than at critiquing them (my own or others). Which is why I worked so well in a team where I had someone I trusted to sort through my ideas and pick out the valuable ones. It sounds to me like you have a better filter on idea quality.
That’s mostly my experience as well: experiments are near-trivial to set up, and setting up any experiment that isn’t near-trivial to set up is a poor use of the time that can instead be spent thinking on the topic a bit more and realizing what the experimental outcome would be or why this would be entirely the wrong experiment to run.
But the friction costs of setting up an experiment aren’t zero. If it were possible to sort of ramble an idea at an AI and then have it competently execute the corresponding experiment (or set up a toy formal model and prove things about it), I think this would be able to speed up even deeply confused/non-paradigmatic research.
… That said, I think the sorts of experiments we do aren’t the sorts of experiments ML researchers do. I expect they’re often things like “do a pass over this lattice of hyperparameters and output the values that produce the best loss” (and more abstract equivalents of this that can’t be as easily automated using mundane code). And which, due to the atheoretic nature of ML, can’t be “solved in the abstract”.
So ML research perhaps could be dramatically sped up by menial-software-labor AIs. (Though I think even now the compute needed for running all of those experiments would be the more pressing bottleneck.)
of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.
I agree that the trick scaling as far as it has is surprising, but I’d disagree with the claim that this doesn’t bear on AGI.
I do think that something like dumb scaling can mostly just work, and I think the main takeaway I take from AI progress is that there will not a be a clear resolution to when AGI happens, as the first AIs to automate AI research will have very different skill profiles from humans, and most importantly we need to disentangle capabilities in a way we usually don’t for humans.
I agree with faul sname here:
we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
I do think that something like dumb scaling can mostly just work
The exact degree of “mostly” is load-bearing here. You’d mentioned provisions for error-correction before. But are the necessary provisions something simple, such that the most blatantly obvious wrappers/prompt-engineering works, or do we need to derive some additional nontrivial theoretical insights to correctly implement them?
Last I checked, AutoGPT-like stuff has mostly failed, so I’m inclined to think it’s closer to the latter.
I am unconvinced that “the” reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.
Yeah, I’m sympathetic to this argument that there won’t be a single insight, and that at least one approach will work out once hardware costs decrease enough, and I agree less with Thane Ruthenis’s intuitions here than I did before.
If I were to think about it a little, I’d suspect the big difference that LLMs and humans have is state/memory, where humans do have state/memory, but LLMs are currently more or less stateless today, and RNN training has not been solved to the extent transformers were.
One thing I will also say is that AI winters will be shorter than previous AI winters, because AI products can now be sort of made profitable, and this gives an independent base of money for AI research in ways that weren’t possible pre-2016.
A factor stemming from the same cause but pushing in the opposite direction is that “mundane” AI profitability can “distract” people who would otherwise be AGI hawks.
I agree with you on your assessment of GPQA. The questions themselves appear to be low quality as well. Take this one example, although it’s not from GPQA Diamond:
In UV/Vis spectroscopy, a chromophore which absorbs red colour light, emits _____ colour light.
The correct answer is stated as yellow and blue. However, the question should read transmits, not emits; molecules cannot trivially absorb and re-emit light of a shorter wavelength without resorting to trickery (nonlinear effects, two-photon absorption).
This is, of course, a cherry-picked example, but is exactly characteristic of the sort of low-quality science questions I saw in school (e.g with a teacher or professor who didn’t understand the material very well). Scrolling through the rest of the GPQA questions, they did not seem like questions that would require deep reflection or thinking, but rather the sort of trivia things that I would expect LLMs to perform extremely well on.
I’d also expect “popular” benchmarks to be easier/worse/optimized for looking good while actually being relatively easy. OAI et. al probably have the mother of all publication biases with respect to benchmarks, and are selecting very heavily for items within this collection.
Re: LLMs for coding: One lens on this is that LLM progress changes the Build vs Buy calculus.
Low-power AI coding assistants were useful in both the “build” and “buy” scenarios, but they weren’t impactful enough to change the actual border between build-is-better vs. buy-is-better. More powerful AI coding systems/agents can make a lot of tasks sufficiently easy that dealing with some components starts feeling more like buying than building. Different problem domains have different peak levels of complexity/novelty, so the easier domains will start being affected more and earlier by this build/buy decision boundary shift. Many people don’t travel far from their primary domains, so to some of them it will look like the shift is happening quickly (because it is, in their vicinity) even though on the larger scale it’s still pretty gradual.
About a month ago, after some back-and-forth with several people about their experiences (including on lesswrong), I hypothesized that I don’t feel the emotions signalled by oxytocin, and never have. (I do feel some adjacent things, like empathy and a sense of responsibility for others, but I don’t get the feeling of loving connection which usually comes alongside those.)
Naturally I set out to test that hypothesis. This note is an in-progress overview of what I’ve found so far and how I’m thinking about it, written largely to collect my thoughts and to see if anyone catches something I’ve missed.
Under the hypothesis, this has been a life-long thing for me, so the obvious guess is that it’s genetic (the vast majority of other biological state turns over too often to last throughout life). I also don’t have a slew of mysterious life-long illnesses, so the obvious guess is that’s it’s pretty narrowly limited to oxytocin—i.e. most likely a genetic variant in either the oxytocin gene or receptor, maybe the regulatory machinery around those two but that’s less likely as we get further away and the machinery becomes entangled with more other things.
So I got my genome sequenced, and went looking at the oxytocin gene and the oxytocin receptor gene.
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein. (The oxytocin gene, on the other hand, was totally normal.)
So that sure is damn strong evidence in favor of the hypothesis! But, we have two copies of most genes, including the oxytocin receptor. The frameshift error is only on one copy. Why isn’t the other copy enough for almost-normal oxytocin signalling?
The frameshift error is the only thing I have which would obviously completely fuck up the whole protein, but there are also a couple nonsynonymous single nucleotide polymorphisms (SNPs) in the ORF, plus another couple upstream. So it’s plausible that one of the SNPs messes up the other copy pretty badly; in particular, one of them changes an arginine to a histidine at the edge of the second intracellular loop. (Oxytocin receptor is a pretty standard g-protein coupled receptor, so that’s the mental picture here.) I did drop the sequences into alphafold, and I don’t see any large structural variation from the SNPs, but (a) that histidine substitution would most likely change binding rather than structure in isolation, and (b) this is exactly the sort of case where I don’t trust alphafold much, because “this is one substitution away from a standard sequence, I’ll just output the structure of that standard sequence” is exactly the sort of heuristic I’d expect a net to over-rely upon.
It’s also possible-in-principle that the second receptor copy is fine, but the first copy frameshift alone is enough to mess up function. I think that’s unlikely in this case. The mRNA for the frameshifted version should be removed pretty quickly by nonsense-mediated decay (I did double check that it has a bunch of early stop codons, NMD should definitely trigger). So there should not be a bunch of junk protein floating around from the frameshifted gene. And the frameshift is early enough that the messed up proteins probably won’t e.g. form dimers with structurally-non-messed-up versions (even if oxytocin receptor normally dimerizes, which I’d guess it doesn’t but haven’t checked). At worst there should just be a 2x lower concentration of normal receptor than usual, and if there’s any stable feedback control on the receptor concentration then there’d be hardly any effect at all.
Finally, there’s the alternative hypothesis that my oxytocin signalling is unusually weak but not entirely nonfunctional. I do now have pretty damn strong evidence for that at a bare minimum, assuming that feedback control on receptor density doesn’t basically counterbalance the fucked up receptor copy.
Anyway, that’s where I’m currently at. I’m curious to hear others’ thoughts on what mechanisms I might be missing here!
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein.
I’m kind of astonished that this kind of advance prediction panned out!
I admit I was somewhat surprised as well. On a gut level, I did not think that the very first things to check would turn up such a clear and simple answer.
I’m insufficiently knowledgeable about deletion base rates to know how astonished to be. Does anyone have an estimate of how many Bayes bits such a prediction is worth?
FWIW, GPT-5T estimates around 10 bits, double that if it’s de novo (absent in both parents).
This might be a bad idea right now, if it makes John’s interests suddenly more normal in a mostly-unsteered way, eg because much of his motivation was coming from a feeling he didn’t know was oxytocin-deficiency-induced. I’d suggest only doing this if solving this problem is likely to increase productivity or networking success; else, I’d delay until he doesn’t seem like a critical bottleneck. That said, it might also be a very good idea, if depression or social interaction are a major bottleneck, which they are for many many people, so this is not resolved advice, just a warning that this may be a high variance intervention, and since John currently seems to be doing promising work, introducing high variance seems likely to have more downside.
I wouldn’t say this to most people; taking oxytocin isn’t known for being a hugely impactful intervention[citation needed], and on priors, someone who doesn’t have oxytocin signaling happening is missing a lot of normal emotion, and is likely much worse off. Obviously, John, it’s up to you whether this is a good tradeoff. I wouldn’t expect it to completely distort your values or delete your skills. Someone who knows you better, such as yourself, would be much better equipped to predict if there’s significant reason to believe downward variance isn’t present. If you have experience with reward-psychoactive chemicals and yet are currently productive, it’s more likely you already know whether it’s a bad idea.
Seems like that depends on details of the problem. If the receptor has zero function, then yes. If functionality is significantly reduced but nonzero… maybe.
Perhaps Gurkenglas meant this is as a ~confirmatory test that John is actually oxytocin-insensitive because the test results (IIUC) are compatible with only one gene copy being screwed up.
I ordered this one off of amazon. AFAICT it does nothing for me. But that’s a pretty minor update, because even those who use it say the effects are “subtle”, and frankly I think snorting oxytocin is probably bullshit and does nothing beyond placebo even for normal people. I did have a couple other people try the one I bought, and their results indeed sounded like a nothingburder.
<a href="https://One other thing - labs typically filter reportable genome results by the phenotype you give them. I don’t know how this guy did the genome, but if he were to put something like “social deficits”, “emotional dysregulation” or something else about his lack of emotional range, the lab would definitely report the variant plus their research on it and recommendations.">this one</a>
BTW, has anyone on LW tried oxytocin and is willing to report on the experience?
Not directly related to your query, but seems interesting:
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein.
Which, in turn, is pretty solid evidence for “oxytocin mediates the emotion of loving connection/aching affection” (unless there are some mechanisms you’ve missed). I wouldn’t have guessed it’s that simple.
Generalizing, this suggests we can study links between specific brain chemicals/structures and cognitive features by looking for people missing the same universal experience, checking if their genomes deviate from the baseline in the same way, then modeling the effects of that deviation on the brain. Alternatively, the opposite: search for people whose brain chemistry should be genetically near-equivalent except for one specific change, then exhaustively check if there’s some blatant or subtle way their cognition differs from the baseline.
Doing a brief literature review via GPT-5, apparently this sort of thing is mostly done with regards to very “loud” conditions, rather than in an attempt to map out the brain in general. I could imagine that it won’t turn out that simple in practice, but the actual bottleneck is probably researchers with a good enough theory-of-mind to correctly figure out the subtle ways the subjects’ cognition differs (easy for “severe autism”, much harder for “I feel empathy and responsibility, but not loving connection”).
~Surely there’s a lot of other things involved in mediating this aspect of human cognition, at the very least (/speaking very coarse-grainedly), having the entire oxytocin system adequately hooked up to the rest of everything.
IE it is damn strong evidence that oxytocin signalinf is strictly necessary (and that there’s no fallback mechanisms wtc) but not that it’s simple.
Did your mother think you were unusual as a baby? Did you bond with your parents as a young child? I’d expect there to be some symptoms there if you truly have an oxytocin abnormality.
For my family this is much more of a “wow that makes so much sense” than a “wow what a surprise”. It tracks extremely well with how I acted growing up, in a bunch of different little ways. Indeed, once the hypothesis was on my radar at all, it quickly seemed pretty probable on that basis alone, even before sequencing came back.
A few details/examples:
As a child, I had a very noticeable lack of interest in other people (especially those my own age), to the point where a school psychologist thought it was notable.
I remember being unusually eager to go off to overnight summer camp (without my parents), at an age where nobody bothered to provide overnight summer camp because kids that young were almost all too anxious to be away from their parents that long.
When family members or pets died, I’ve generally been noticeably less emotionally impacted than the rest of the family.
When out and about with the family, I’ve always tended to wander around relatively independently of the rest of the group.
Those examples are relatively easy to explain, but most of my bits here come from less legible things. It’s been very clear for a long time that I relate to other people unusually, in a way that intuitively matches being at the far low end of the oxytocin signalling axis.
Though beyond a certain level of development we have numerous other drives beyond the oxytocin-related ones. Hence why you-as-a-baby might be particularly telling. From what I understand, oxytocin is heavily involved in infant-caregiver bonding and is what enables mothers to soothe their babies so effectively (very much on my mind right now as I am typing this comment while a baby naps on me haha).
Whereas once you’re above a certain age, the rational mind and other traits probably have an increasingly strong effect. For example, if you’re very interested in your own thoughts and ideas, this might overwhelm your desire to be close to family members.
Anyway, it seems likely that your oxytocin hypothesis is correct either way. Cool finding!
I have a similar intuition about how some other people are missing a disgust response that I have. Seems like a biological thing that some people have much less of than others and it has a significant effect on how we relate to others.
Is that frame-shift error or those ~6 (?) SNPs previously reported in the literature for anything, or do they seem to be de novos? Also, what WGS depth did your service use? (Depending on how widely you cast your net, some of those could be spurious sequencing errors.)
Depth is tagged on each individual variation; the frame shift has depth 41, the others have depth anywhere from 40 to 60.
I have not found the frameshift mutation in dbSNP, but I’m not confident that I’ve understood the UI or intended usage patterns, so I’m not confident it’s not in there. The SNPs I haven’t looked for in there yet.
Really interesting post—this actually connects to some research I’ve been looking into recently around oxytocin and attachment patterns.
There’s this psychologist Adam Lane Smith who’s built on neurobiological work by researchers like Carolyn Zahn-Waxler and Ruth Feldman—they’ve found that under high stress conditions when younger, or absence of secure attachment figures, cortisol-induced stress actually strengthens cortisol and dopamine pathways for reward while inhibiting the oxytocin and serotonin pathways. The end result (avoidant attachment) sounds remarkably similar to what you’re describing: people who clearly care about others and feel responsibility, but don’t experience that warm “loving connection” feeling that most people seem to get from relationships.
What struck me about your situation is that you’ve essentially got the genetic version of what this research suggests can happen environmentally. Both paths seem to lead to the same place—having to navigate social connection through pattern recognition and cognitive analysis rather than emotional intuition, because your brain is essentially running on dopamine-driven systems instead of oxytocin-based ones.
Makes me wonder if there’s a whole spectrum of people out there—some genetic, some developmental—who are all essentially operating with similar neurochemical profiles but don’t realize they’re part of the same phenomenon. Your case might be the key to understanding how this actually works at a biological level.
Do you find you’ve gotten really good at reading people through behavioral patterns rather than gut feelings?
this is exactly the sort of case where I don’t trust alphafold much, because “this is one substitution away from a standard sequence, I’ll just output the structure of that standard sequence” is exactly the sort of heuristic I’d expect a net to over-rely upon.
Yep. AlphaMissense, also from DeepMind, is tailored to pathogenicity prediction. You can find its pathogenicity scores in the annotations tab for any (at least I think any) human protein on AFDB.
As a non-subject matter expert in all of the above, I decided to consult my swear-word-adverse relative that recently graduated genetic counseling school. Here is her response:
The logic is sound (if a little colorful haha 😅). It sounds like this guy functionally only has 1 copy of the OXTR gene, and spot on in hypothesis of nonsense-mediated decay.
How the OXTR gene is regulated, I don’t know and haven’t looked into. It would be weird (but possible) for a decrease in OXTR expression to only affect emotions—oxytocin is also important for other brain functions/development, so a genetic change should also impact embryological development of the brain. So if I were to suggest next steps, it would be doing functional studies of the brain (like an MRI) to further evaluate.
One other thing—labs typically filter reportable genome results by the phenotype you give them. I don’t know how this guy did the genome, but if he were to put something like “social deficits”, “emotional dysregulation” or something else about his lack of emotional range, the lab would definitely report the variant plus their research on it and recommendations.
Huh interesting. I might get myself full genome sequenced at some point. I already got myself 23andme sequenced, downloaded the raw data, and put it into promethease a while ago. I did find out I’m AG at rs53576 which is slightly linked to lower empathy, but is also extremely common. I don’t think this is enough to explain a large proportion of my personality, the way your OXTR deletion might be.
(There was something quite amusing to check my SNPs checking whether to start early anti-balding interventions, and have result number 1 be “Low Empathy”. As a further datapoint, I mentioned this to my mum and she basically said “Yeah but what did you expect with me and [dad] as parents?”)
Seeing this
A few details/examples:
As a child, I had a very noticeable lack of interest in other people (especially those my own age), to the point where a school psychologist thought it was notable.
I remember being unusually eager to go off to overnight summer camp (without my parents), at an age where nobody bothered to provide overnight summer camp because kids that young were almost all too anxious to be away from their parents that long.
When family members or pets died, I’ve generally been noticeably less emotionally impacted than the rest of the family.
When out and about with the family, I’ve always tended to wander around relatively independently of the rest of the group.
Made me think I should take a deeper look. This all sounds pretty familiar, and I don’t think the AG in rs53576 is strong enough to shift me off-distribution to the degree that I am.
If the one clearly fucked up receptor copy is sufficient for your “symptoms”, it seems pretty likely that one of your parents should have them too. I think there is no reason to expect a denovo mutation to be particularly likely in your case (unlike in cases that lead to severe disfunction). And of course you can check for that by sequencing your parents.
So my money would be on the second copy also being sufficiently messed up that you have basically no fully functioning oxytocin receptors. If you have siblings and you are the only odd one in the family, you could make a pretty strong case for both copies being messed up, by showing that you are the only one with the combination of frameshift in one copy and particular SNPs in the other. (If you are not the only odd one you can make an even stronger case).
I did drop the sequences into alphafold, and I don’t see any large structural variation from the SNPs, but (a) that histidine substitution would most likely change binding rather than structure in isolation, and (b) this is exactly the sort of case where I don’t trust alphafold much, because “this is one substitution away from a standard sequence, I’ll just output the structure of that standard sequence” is exactly the sort of heuristic I’d expect a net to over-rely upon.
Even if the structure is correct and does look the same, the binding properties of the receptor could still be different if the histidine is in the part that’s relevant for the receptor binding.
The thing you want is a tool that tells you how the receptor binding properties change through the mutation not the AlphaFold that just gives you the 3D structure. A quick question at GPT-5, suggests that there are freely available tools that tell you how the receptor binding properties change via a single point mutation.
I have read that some sequencing methods (nanopore) have a high error rate (comparing multiple reads can help correct this). Did you also spot-check some other genes that you have no reason to believe contain mutations to see if they look ok? Seeing a mutation in exactly the gene you expect is only damn strong evidence if there isn’t a sequencing error in every third gene.
I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.
Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core “advantages” of the smartphone were the “ability to present timely information” (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly.
… and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS.
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
And in hindsight, boy howdy do I endorse that attitude! Past John’s gut was right on the money with that one.
I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs… I do not feel like I’ll fall behind if I don’t leverage them more. I feel like the people using them a lot are mostly moving backwards, and I’m staying in place.
I found LLMs to be very useful for literature research. They can find relevant prior work that you can’t find with a search engine because you don’t know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
Taking a photo of a menu in French (or providing a link to it) and asking it which dishes are vegan.
Recommending movies (I am a little wary of some kind of meme poisoning, but I don’t watch movies very often, so seems ok).
That said, I do agree that early adopters seem like they’re overeager and maybe even harming themselves in some way.
I’ve been trying to use Deep Research tools as a way to find hyper-specific fiction recommendations as well. The results have been mixed. They don’t seem to be very good at grokking the hyper-specificness of what you’re looking for, usually they have a heavy bias towards the popular stuff that outweighs what you actually requested[1], and if you ask them to look for obscure works, they tend to output garbage instead of hidden gems (because no taste).
It did produce good results a few times, though, and is only slightly worse than asking for recommendations on r/rational. Possibly if I iterate on the prompt a few times (e. g., explicitly point out the above issues?), it’ll actually become good.
Like, suppose I’m looking for some narrative property X. I want to find fiction with a lot of X. But what the LLM does is multiplying the amount of X in a work by the work’s popularity, so that works that are low in X but very popular end up in its selection.
I tend to have some luck with concrete analogies sometimes. For example I asked for the equivalent of Tonedeff (His polymer album is my favorite album) in other genres and it recommended me Venetian Snares. I then listened to some of his songs and it seemed like the kind of experimental stuff where I might find something I find interesting. Venetian Snares has 80k monthly listeners while Tonedeff has 14K, so there might be some weighting towards popularity, but that seems mild.
I can think of reasons why some would be wary, and am waried of something which could be called “meme poisoning” myself when I watch moves, but am curious what kind of meme poisoning you have in mind here.
I’ve updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)
I do still think some-manner-of-AI-integration is going to be an important part of “moving forward” but probably not whatever capitalism serves up.
I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it’s like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I’m handing them off to LLMs much of the time to see if it can handle it.
During work hours, I try to actively override this, i.e. have the habit “send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy.” But, this becomes harder to do as it gets later in the day and I get more tired.
One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in a stupid way.
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
I currently mostly avoid interactions that treat the AI like a person-I’m-talking to. That way seems most madness inducing.
Outside of [coding], I don’t know of it being more than a somewhat better google
I’ve recently tried heavily leveraging o3 as part of a math-research loop.
I have never been more bearish on LLMs automating any kind of research than I am now.
And I’ve tried lots of ways to make it work. I’ve tried telling it to solve the problem without any further directions, I’ve tried telling it to analyze the problem instead of attempting to solve it, I’ve tried dumping my own analysis of the problem into its context window, I’ve tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I’ve tried picking out a subproblem and telling it to focus on that, I’ve tried giving it directions/proof sketches, I’ve tried various power-user system prompts, I’ve tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it’s lying or confabulating to me in its arguments or proofs (which it ~always did).
It was kind of okay for tasks like “here’s a toy setup, use a well-known formula to compute the relationships between A and B”, or “try to rearrange this expression into a specific form using well-known identities”, which are relatively menial and freed up my working memory for more complicated tasks. But it’s pretty minor usefulness (and you have to re-check the outputs for errors anyway).
I assume there are math problems at which they do okay, but that capability sure is brittle. I don’t want to overupdate here, but geez, getting LLMs from here to the Singularity in 2-3 years just doesn’t feel plausible.
[disclaimer, not a math guy, only barely knows what he’s talking about, if this next thought is stupid I’m interested to learn more]
I don’t expect this to fix it right now, but, one thing I don’t think you listed is doing the work in lean or some other proof assistant that lets you check results immediately? I expect LLMs to first be able to do math in that format because it’s the format you can actually do a lot of training in. And it’d mean you can verify results more quickly.
My current vague understanding is that lean is normally too cumbersome to be a reasonable to work in, but, that’s the sort of thing that could change with LLMs in the mix.
I did actually try a bit of that back in the o1 days. What I’ve found is that getting LLMs to output formal Lean proofs is pretty difficult: they really don’t want to do that. When they’re not making mistakes, they use informal language as connective tissue between Lean snippets, they put in “sorry”s (a placeholder that makes a lemma evaluate as proven), and otherwise try to weasel out of it.
This is something that should be solvable by fine-tuning, but at the time, there weren’t any publicly available decent models fine-tuned for that.
We do have DeepSeek-Prover-V2 now, though. I should look into it at some point. But I am not optimistic, sounds like it’s doing the same stuff, just more cleverly.
(I had a bit of an epistemic rollercoaster making this prediction, I updated “by the time someone makes an actually worthwhile Math AI, even if lean was an important part of it’s training process, it’s probably not that hard to do additional fine tuning that gets it to output stuff in a more standard mathy format. But, then, it seemed like it was still going to be important to quickly check it wasn’t blatantly broken as part of the process)
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
There’s common ways I currently use (the free version of) ChatGPT that are partially categorizable as “somewhat better search engine”, but where I feel like that’s not representative of the real differences. A lot of this is coding-related, but not all, and the reasons I use it for coding-related and non-coding-related tasks feel similar. When it is coding-related, it’s generally not of the form of asking it to write code for me that I’ll then actually put into a project, though occasionally I will ask for example snippets which I can use to integrate the information better mentally before writing what I actually want.
The biggest difference in feel is that a chat-style interface is predictable and compact and avoids pushing a full-sized mental stack frame and having to spill all the context of whatever I was doing before. (The name of the website Stack Exchange is actually pretty on point here, insofar as they were trying to provide something similar from crowdsourcing!) This is something I can see being a source of creeping mental laziness—but it depends on the size and nature of the rest of the stack: if you were already under high context-retention load relative to your capabilities, and you’re already task-saturated enough, and you use a chatbot for leaf calls that would otherwise cause you to have to do a lot of inefficient working-memory juggling, then it seems like you’re already getting a lot of the actually-useful mental exercise at the other levels and you won’t be eliminating much of it, just getting some probabilistic task speedups.
In roughly descending order of “qualitatively different from a search engine” (which is not the same as “most impactful to me in practice”):
Some queries are reverse concept search, which to me is probably the biggest and hardest-to-replicate advantage over traditional search engine: I often have the shape of a concept that seems useful, but because I synthesized it myself rather than drawing from popular existing uses, I don’t know what it’s called. This can be checked for accuracy using a traditional search engine in the forward direction once I have the candidate term.
Some queries are for babble purposes: “list a bunch of X” and I’ll throw out 90% of them for actual use but use the distribution to help nudge my own imagination—generally I’ll do my own babble first and then augment it, to limit priming effects. There’s potential for memetic health issues here, but in my case most of these are isolated enough that I don’t expect them to combine to create larger problems. (In a qualitatively similar way but with a different impact, some of it is pure silliness. “Suppose the protagonists of Final Fantasy XIII had Geese powers. What kind of powers might they have?”)
Synthesis and shaping of information is way different from search engine capabilities. This includes asking for results tailored along specific axes I care about where it’s much less likely an existing webpage author has used that as a focus, small leaps of connective reasoning that would take processing and filtering through multiple large pages to do via search engine, and comparisons between popular instances of a class (in coding contexts, often software components) where sometimes someone’s actually written up the comparison and sometimes not. Being able to fluently ask followups that move from a topic to a subtopic or related topic without losing all the context is also very useful. “Tell me about the main differences between X1 and X2.” → “This new thing introduced in X2, is that because of Y?” (but beware of sycophancy biases if you use leading questions like that)
(Beyond this point we get closer to “basically a search engine”.)
Avoiding the rise in Web annoyances is a big one in practice—which ties into the weird tension of social contracts around Internet publishing being kind of broken right now, but from an information-consumer perspective, the reprocessed version is often superior. If a very common result is that a search engine will turn up six plausible results, and three of them are entirely blog slop (often of a pre-LLM type!) which is too vague to be useful for me, two of them ask me to sign up for a ‘free’ account to continue but only after I’ve started reading the useless intro text, and one of them contains the information I need in theory but I have to be prepared to click the “reject cookies” button, and click the close button on the “get these delivered to your inbox” on-scroll popup, and hope it doesn’t load another ten-megabyte hero image that I don’t care about and chew through my cellular quota in the process, and if I try to use browser extensions to combat this then the text doesn’t load, and so on and so on… then obviously I will switch to asking the chatbot first! “most of the content is buried in hour-long videos” is skew to this but results in the same for me.
In domains like “how would I get started learning skill X”, where there’s enough people who can get a commercial advantage through SEO’ing that into “well, take our course or buy our starter kit” (but usually subtler than that), those results seem (and I think for now probably are) less trustworthy than chatbot output that goes directly to concrete aspects that can be checked more cleanly, and tend to disguise themselves to be hard to filter out without reading a lot of the way through. Of course, there’s obvious ways for this not to last, either as SEO morphs into AIO or if the chatbot providers start selling the equivalent of product placement behind the scenes.
(fwiw, I never felt like phones offered any real “you need them to not fall behind”. They are kinda a nice-to-have in some situations. I do need them for uber/lyfy and maps, I use them for other things which have some benefits and costs, this post is upweighting “completely block the internet on my phone.” I don’t have any social media apps on my phone but it doesn’t matter much, I just use the web browser)
I imagine this differs a lot based on what social position you’re already in and where you’re likely to get your needs met. When assumptions like “everyone has a smartphone” become sufficiently widespread, you can be blocked off from things unpredictably when you don’t meet them. You often can’t tell which things these are in advance: simplification pressure causes a phase transition from “communicated request” to “implicit assumption”, and there’s too many widely-distributed ways for the assumption to become relevant, so doing your own modeling will produce a “reliably don’t need” result so infrequently as to be effectively useless. Then, if making the transition to conformity when you notice a potential opportunity is too slow or is blocked by e.g. resource constraints or value differences, a lot of instant-lose faces get added to the social dice you roll. If your anticipated social set is already stable and well-adapted to you, you may not be rolling many dice, but if you’re precarious, or searching for breakthrough opportunities, or just have a role with more wide-ranging and unpredictable requirements on which interactions you need to succeed at, it’s a huge penalty. Other technologies this often happens with in the USA, again depending on your social class and milieu, include cars, credit cards, and Facebook accounts.
(It feels like there has to already be an explainer for this somewhere in the LW-sphere, right? I didn’t see an obvious one, though…)
You’ve reminded me of a perspective I was meaning to include but then forgot to, actually. From the perspective of an equilibrium in which everyone’s implicitly expected to bring certain resources/capabilities as table stakes, making a personal decision that makes your life better but reduces your contribution to the pool can be seen as defection—and on a short time horizon or where you’re otherwise forced to take the equilibrium for granted, it seems hard to refute! (ObXkcd: “valuing unit standardization over being helpful possibly makes me a bad friend” if we take the protagonist as seeing “US customary units” as an awkward equilibrium.) Some offshoots of this which I’m not sure what to make of:
If the decision would lead to a better society if everyone did it, and leads to an improvement for you if only you do it, but requires the rest of a more localized group to spend more energy to compensate for you if you do it and they don’t, we have a sort of “incentive misalignment sandwich” going on. In practice I think there’s usually enough disagreement about the first point that this isn’t clear-cut, but it’s interesting to notice.
In the face of technological advances, what continues to count as table stakes tends to get set by Moloch and mimetic feedback loops rather than intentionally. In a way, people complaining vociferously about having to adopt new things are arguably acting in a counter-Moloch role here, but in the places I’ve seen that happen, it’s either been ineffective or led to a stressful and oppressive atmosphere of its own (or, most commonly and unfortunately, both).
I think intuitive recognition of (2) is a big motivator behind attacking adopters of new technology that might fall into this pattern, in a way that often gets poorly expressed in a “tech companies ruin everything” type of way. Personally taking up smartphones, or cars, or—nowadays the big one that I see in my other circles—generative AI, even if you don’t yourself look down on or otherwise directly negatively impact non-users, can be seen as playing into a new potential equilibrium where if you can, you ‘must’, or else you’re not putting in as much as everyone else, and so everyone else will gradually find that they get boxed in and any negative secondary effects on them are irrelevant compared to the phase transition energy. A comparison that comes to mind is actually labor unions; that’s another case where restraining individually expressed capabilities in order to retain a better collective bargaining position for others comes into play, isn’t it?
… hmm, come to think of it, maybe part of conformity-pressure in general can be seen as a special case of this where the pool resource is more purely “cognition and attention spent dealing with non-default things” and the nonconformity by default has more of a purely negative impact on that axis, whereas conformity-pressure over technology with specific capabilities causes the nature of the pool resource to be pulled in the direction of what the technology is providing and there’s an active positive thing going on that becomes the baseline… I wonder if anything useful can be derived from thinking about those two cases as denoting an axis of variation.
And when the conformity is to a new norm that may be more difficult to understand but produces relative positive externalities in some way, is that similar to treating the new norm as a required table stakes cognitive technology?
I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.
A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.
But I don’t talk to the damn things. That feels increasingly weird and unwise.
Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation).
Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.
Not speaking for john but, I think LLMs can cause a lack of gears lvl understanding, more vibe coding, less mental flexibility due to lack of deliberate thought and more dependency on it for thinking in general. A lot of my friends will most likely never learn coding properly and rely solely on chatgpt, it would be similar to calculators—which reduced people’s ability to do mental maths— but for thinking.
LLM’s danger lies in its ability to solve the majority of simple problems. This reduces opportunities to learn skills or benefit from the training these tasks provide. This allows for a level of mental stagnation, or even degradation, depending on how frequently you use LLMs to handle problems. In other words, it induces mental laziness. This is one way it’s not moving people forward and in more severe cases backward.
As a side note, it is also harmful to the majority of current education institutions, as it can solve most academic problems. I have personally seen people use it to do homework, write essays, or even write term papers. Some of the more crafty students manage to cheat with it on exams. This creates a very shallow education, which is bad for many reasons.
Yes, I do think that. They don’t actively diminish thought, after all, it’s a tool you decide to use. But when you use it to handle a problem, you lose the thoughts, and the growth you could’ve had solving it yourself. It could be argued, however, that if you are experienced enough in solving such problems, there isn’t much to lose, and you gain time to pursue other issues.
But as to why I think this way: people already don’t learn skills because chatGPT can do it for them, as lesswronguser123 said “A lot of my friends will most likely never learn coding properly and rely solely on ChatGPT”, and not just his friends use it this way. Such people, at the very least, lose the opportunity to adopt a programming mindset, which is useful beyond programming.
Outside of people not learning skills, I also believe there is a lot of potential to delegate almost all of your thinking to chatGPT. For example: I could have used it to write this response, decide what to eat for breakfast, tell me what I should do in the future, etc. It can tell you what to do on almost every day-to-day decision. Some use it to a lesser extent, some to a greater, but you do think less if you use it this way.
Does it redistrubute thinking to another topic? I believe it depends on the person in question, some use it to have more time to solve a more complex problem, others to have more time for entertainment.
I think that these are genuinely hard questions to answer in a scientific way. My own speculation is that using AI to solve problems is a skill of its own, along with recognizing which problems they are currently not good for. Some use of LLMs teaches these skills, which is useful.
I think a potential failure mode for AI might be when people systematically choose to work on lower-impact problems that AI can be used to solve, rather than higher-impact problems that AI is less useful for but that can be solved in other ways. Of course, AI can also increase people’s ambitions by unlocking the ability to pursue higher-impact goals they would not have been able to otherwise achieve. Whether or not AI increases or decreases human ambition on net seems like a key question.
In my world, I see limited use of AI except as a complement to traditional internet search, a coding assistant by competent programmers, a sort of Grammarly on steroids, an OK-at-best tutor that’s cheap and always available on any topic, and a way to get meaningless paperwork done faster. These use cases all seem basically ambition-enhancing to me. That’s the reason I asked John why he’s worried about this version of AI. My experience is that once I gained some familiarity with the limitations of AI, it’s been a straightforwaredly useful tool, with none of the serious downsides I have experienced from social media and smartphones.
The issues I’ve seen seem to have to do with using AI to deepfake political policy proposals, homework, blog posts, and job applications. These are genuine and serious problems, but mainly have to do with adding a tremendous amount of noise to collective discourse rather than the self-sabotage enabled by smartphones and social media. So I’m wondering if John’s more concerned about those social issues or by some sort of self-sabotage capacity from AI that I’m not seeing. Using AI to do your homework is obviously self-sabotage, but given the context I’m assuming that’s not what John’s talking about.
I mean, they’re great as search engines or code-snippet writers (basically, search engine for standard functions). If someone thinks that gippities know stuff or can think or write well, that could be brainrotting.
From my perspective, good things about smartphones:
phone and camera and navigation is the same device
very rarely, check something online
buy tickets for mass transit
my contacts are backed up in the cloud
Bad things:
notifications
The advantages outweigh the disadvantages, but it requires discipline about what you install.
(Food for though: If only I had the same discipline about which web services I create an account for and put them into bookmarks on my PC.)
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
Similar here, but that’s because no one could give me a good use case. (I don’t consider social networks on smartphone to be good.)
And it’s probably similar with LLMs, depends on how specifically you use them. I use them to ask questions (like a smarter version of Google) that I try to verify e.g. on Wikipedia afterwards, and sometimes to write code. Those seem like good things to me. There are probably bad ways to use them, but that is not what I would typically do.
My main concern with heavy LLM usage is what Paul Graham discusses in Writes and Write-Nots. His argument is basically that writing is thinking and that if you use LLM’s to do your writing for you, well, your ability to think will erode.
For smart phones there was one argument that moved me a moderate amount. I’m a web developer and startup founder. I was talking to my cousin’s boyfriend who is also in tech. He made the argument to me that if I don’t actively use smart phones I won’t be able to empathize as much with smart phone users, which is important because to a meaningful extent, that’s who I’m building for.
I didn’t think the empathy point was as strong as my cousin’s boyfriend thought it was. Like, he seemed to think it was pretty essential and that if I don’t use smart phones I just wouldn’t be able to develop enough empathy to build a good product. I, on the other hand, saw it as something “useful” but not “essential”. Looking back, I think I’d downgrade it to something like “a little useful” instead of “useful”.
I’m not sure where I’m going with this, exactly. Just kinda reflecting and thinking out loud.
Conditional on LLMs scaling to AGI, I feel like it’s a contradiction to say that “LLMs offer little or negative utility AND it’s going to stay this way”. My model is that we are either dying in a couple years to LLMs getting us to AGI, and we are going to have a year or two or of AIs that can provide incredible utility, or we are not dying to LLMs and the timelines are longer.
I think I read somewhere that you don’t believe LLMs will get us to AGI, so this might already be implicit in your model? I personally am putting at least some credence on the ai-2027 model, which predicts superhuman coders in the near future. (Not saying that I believe this is the most probable future, just that I find it convincing enough that I want to be prepared for it.)
Up until recently I was in the “LLMs offer zero utility” camp (for coding), but now at work we have a Cursor plan (still would not pay for it for personal use probably), and with a lot of trial and error I feel like I am finding the kinds of tasks where AIs can offer a bit of utility, and I am slowly moving towards the “marginal utility” camp.
One kind of thing I like using it for is small scripts to automate bits of my workflow. E.g. I have an idea for a script, I know it would take me 30m-1h to implement it, but it’s not worth it because e.g. it would only save me a few seconds each time. But if I can reduce the time investment to only a few minutes by giving the task to the LLM, it can suddenly be worth it.
I would be interested in other people’s experiences with the negative side effects of LLM use. What are the symptoms/warning signs of “LLM brain rot”? I feel like with my current use I am relatively well equipped to avoid that:
I only ask things from LLMs that I know I could solve in a few hours tops.
I code review the result, tell it if it did something stupid.
90% of my job is stuff that is currently not close to being LLM automatable anyway.
Hypothesis: for smart people with a strong technical background, the main cognitive barrier to doing highly counterfactual technical work is that our brains’ attention is mostly steered by our social circle. Our thoughts are constantly drawn to think about whatever the people around us talk about. And the things which are memetically fit are (almost by definition) rarely very counterfactual to pay attention to, precisely because lots of other people are also paying attention to them.
Two natural solutions to this problem:
build a social circle which can maintain its own attention, as a group, without just reflecting the memetic currents of the world around it.
“go off into the woods”, i.e. socially isolate oneself almost entirely for an extended period of time, so that there just isn’t any social signal to be distracted by.
These are both standard things which people point to as things-historically-correlated-with-highly-counterfactual-work. They’re not mutually exclusive, but this model does suggest that they can substitute for each other—i.e. “going off into the woods” can substitute for a social circle with its own useful memetic environment, and vice versa.
One thing that I do after social interactions, especially those which pertain to my work, is to go over all the updates my background processing is likely to make and to question them more explicitly.
This is helpful because I often notice that the updates I’m making aren’t related to reasons much at all. It’s more like “ah they kind of grimaced when I said that, so maybe I’m bad?” or like “they seemed just generally down on this approach, but wait are any of those reasons even new to me? Haven’t I already considered those and decided to do it anyway?” or “they seemed so aggressively pessimistic about my work, but did they even understand what I was saying?” or “they certainly spoke with a lot of authority, but why should I trust them on this, and do I even care about their opinion here?” Etc. A bunch of stuff which at first blush my social center is like “ah god, it’s all over, I’ve been an idiot this whole time” but with some second glancing it’s like “ah wait no, probably I had reasons for doing this work that withstand surface level pushback, let’s remember those again and see if they hold up” And often (always?) they do.
This did not come naturally to me; I’ve had to train myself into doing it. But it has helped a lot with this sort of problem, alongside the solutions you mention i.e. becoming more of a hermit and trying to surround myself by people engaged in more timeless thought.
solution 2 implies that a smart person with a strong technical background would go on to work on important problems (by default) which is not necessarily universally true and it’s IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on
The claim is not that either “solution” is sufficient for counterfactuality, it’s that either solution can overcome the main bottleneck to counterfactuality. After that, per Amdahl’s Law, there will still be other (weaker) bottlenecks to overcome, including e.g. keeping oneself focused on something important.
I don’t think the social thing ranks above “be able to think useful important thoughts at all”. (But maybe otherwise agree with the rest of your model as an important thing to think about)
[edit: hrm, “for smart people with a strong technical background” might be doing most of the work here”]
it’s IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on
Why do you think this? When I try to think of concrete examples here, its all confounded by the relevant smart people having social circles not working on useful problems.
I also think that 2 becomes more true once the relevant smart person already wants to solve alignment, or otherwise is already barking up the right tree.
As a counterpoint to the “go off into the woods” strategy, Richard Hamming said the following in “You and Your Research”, describing his experience at Bell Labs:
Thus what you consider to be good working conditions may not be good for you! There are many illustrations of this point. For example, working with one’s door closed lets you get more work done per year than if you had an open door, but I have observed repeatedly that later those with the closed doors, while working just as hard as others, seem to work on slightly the wrong problems, while those who have let their door stay open get less work done but tend to work on the right problems! I cannot prove the cause-and-effect relationship; I can only observed the correlation. I suspect the open mind leads to the open door, and the open door tends to lead to the open mind; they reinforce each other.
Bell Labs certainly produced a lot of counterfactual research, Shannon’s information theory being the prime example. I suppose Bell Labs might have been well-described as a group that could maintain its own attention, though.
Bell Labs is actually my go-to example of a much-hyped research institution whose work was mostly not counterfactual; see e.g. here. Shannon’s information theory is the only major example I know of highly counterfactual research at Bell Labs. Most of the other commonly-cited advances, like e.g. transistors or communication satellites or cell phones, were clearly not highly counterfactual when we look at the relevant history: there were other groups racing to make the transistor, and the communication satellite and cell phones were both old ideas waiting on the underlying technology to make them practical.
That said, Hamming did sit right next to Shannon during the information theory days IIRC, so his words do carry substantial weight here.
Good idea, but… I would guess that basically everyone who knew me growing up would say that I’m exactly the right sort of person for that strategy. And yet, in practice, I still find it has not worked very well. My attention has in fact been unhelpfully steered by local memetic currents to a very large degree.
For instance, I do love proving everyone else wrong, but alas reversed stupidity is not intelligence. People mostly don’t argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one’s attention steered by the prevailing memetic currents.
People mostly don’t argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one’s attention steered by the prevailing memetic currents.
This is true, but I still can’t let go of the fact that this fact itself ought to be a blindingly obvious first-order bit that anyone who calls zerself anything like “aspiring rationalist” would be paying a good chunk of attention to, and yet this does not seem to be the case. Like, motions in the genre of
huh I just had reaction XYZ to idea ABC generated by a naively-good search process, and it seems like this is probably a common reaction to ABC; but if people tend to react to ABC with XYZ, and with other things coming from the generators of XYZ, then such and such distortion in beliefs/plans would be strongly pushed into the collective consciousness, e.g. on first-order or on higher-order deference effects ; so I should look out for that, e.g. by doing some manual fermi estimates or other direct checking about ABC or by investigating the strength of the steelman of reaction XYZ, or by keeping an eye out for people systematically reacting with XYZ without good foundation so I can notice this,
where XYZ could centrally be things like e.g. copium or subtly contemptuous indifference, do not seem to be at all common motions.
So I should look out for that, e.g. by doing some manual fermi estimates or other direct checking about ABC or by investigating the strength of the steelman of reaction XYZ, or by keeping an eye out for people systematically reacting with XYZ without good foundation so I can notice this,
Accusing people in my head of not being numerate enough when this happens has helped, because then I don’t want to be a hypocrite. GPT4o or o1 are good at fermi estimates, making this even easier.
build a social circle which can maintain its own attention, as a group, without just reflecting the memetic currents of the world around it.
Note that it is not necessary for the social circle to share your beliefs, only to have a social norm that people express interest in each other’s work. Could be something like: once or twice in a week the people will come to a room and everyone will give a presentation about what they have achieved recently, and maybe the other people will provide some feedback (not in the form of “why don’t you do Y instead”, but with the assumption that X is a thing worth doing).
How would this model treat mathematicians working on hard open problems? P vs NP might be counter factual just because no one else is smart enough or has the right advantage to solve it. Insofar as central problems of a field have been identified but not solved, I’m not sure your model gives good advice.
I visited Mikhail Khovanov once in New York to give a seminar talk, and after it was all over and I was wandering around seeing the sights, he gave me a call and offered a long string of general advice on how to be the kind of person who does truly novel things (he’s famous for this, you can read about Khovanov homology). One thing he said was “look for things that aren’t there” haha. It’s actually very practical advice, which I think about often and attempt to live up to!
I’m ashamed to say I don’t remember. That was the highlight. I think I have some notes on the conversation somewhere and I’ll try to remember to post here if I ever find it.
I can spell out the content of his Koan a little, if it wasn’t clear. It’s probably more like: look for things that are (not there). If you spend enough time in a particular landscape of ideas, you can (if you’re quiet and pay attention and aren’t busy jumping on bandwagons) get an idea of a hole, which you’re able to walk around but can’t directly see. In this way new ideas appear as something like residues from circumnavigating these holes. It’s my understanding that Khovanov homology was discovered like that, and this is not unusual in mathematics.
By the way, that’s partly why I think the prospect of AIs being creative mathematicians in the short term should not be discounted; if you see all the things you see all the holes.
For those who might not have noticed Dan’s clever double entendre: (Khovanov) homology is literally about counting/measuring holes in weird high-dimensional spaces—designing a new homology theory is in a very real sense about looking for holes that are not (yet) there.
There’s plenty, including a line of work by Carina Curto, Katrin Hess and others that is taken seriously by a number of mathematically inclined neuroscience people (Tom Burns if he’s reading can comment further). As far as I know this kind of work is the closest to breaking through into the mainstream. At some level you can think of homology as a natural way of preserving information in noisy systems, for reasons similar to why (co)homology of tori was a useful way for Kitaev to formulate his surface code. Whether or not real brains/NNs have some emergent computation that makes use of this is a separate question, I’m not aware of really compelling evidence.
There is more speculative but definitely interesting work by Matilde Marcolli. I believe Manin has thought about this (because he’s thought about everything) and if you have twenty years to acquire the prerequisites (gamma spaces!) you can gaze into deep pools by reading that too.
Though my understanding is this is used in interp, not so much because people necessarily expect deep connections to homology, but because its just another way to look for structure in your data.
As someone who does both data analysis and algebraic topology, my take is that TDA showed promise but ultimately there’s something missing such that it’s not at full capacity. Either the formalism isn’t developed enough or it’s being consistently used on the wrong kinds of datasets. Which is kind of a shame, because it’s the kind of thing that should work beautifully and in some cases even does!
I thought it might be “look for things that might not even be there as hard as you would if they are there.” Then the koan form takes it closer to “the thereness of something just has little relevance on how hard you look for it.” But it needs to get closer to the “biological” part of your brain, where you’re not faking it with all your mental and bodily systems, like when your blood pressure rises from “truly believing” a lion is around the corner but wouldn’t if you “fake believe” it.
Obvious point—I think a lot of this comes from the financial incentives. The more “out of the box” you go, the less sure you can be that there will be funding for your work.
Some of those that do this will be rewarded, but I suspect many won’t be.
As such, I think that funders can help more to encourage this sort of thing, if they want to.
Conjecture’s Compendium is now up. It’s intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it’s probably the best first source to link e.g. policymakers to right now.
I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.
I think there’s something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It’s higher status and sexier and there’s a default vibe that the best way to understand/improve the world is through rigorous empirical research.
I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy.
I also think there are memes spreading around that you need to be some savant political mastermind genius to do comms/policy, otherwise you will be net negative. The more I meet policy people (including successful policy people from outside the AIS bubble), the more I think this narrative was, at best, an incorrect model of the world. At worst, a take that got amplified in order to prevent people from interfering with the AGI race (e.g., by granting excess status+validity to people/ideas/frames that made it seem crazy/unilateralist/low-status to engage in public outreach, civic discourse, and policymaker engagement.)
(Caveat: I don’t think the adversarial frame explains everything, and I do think there are lots of people who were genuinely trying to reason about a complex world and just ended up underestimating how much policy interest there would be and/or overestimating the extent to which labs would be able to take useful actions despite the pressures of race dynamics.)
I think I probably agree, although I feel somewhat wary about it. My main hesitations are:
The lack of epistemic modifiers seems off to me, relative to the strength of the arguments they’re making. Such that while I agree with many claims, my imagined reader who is coming into this with zero context is like “why should I believe this?” E.g., “Without intervention, humanity will be summarily outcompeted and relegated to irrelevancy,” which like, yes, but also—on what grounds should I necessarily conclude this? They gave some argument along the lines of “intelligence is powerful,” and that seems probably true, but imo not enough to justify the claim that it will certainly lead to our irrelevancy. All of this would be fixed (according to me) if it were framed more as like “here are some reasons you might be pretty worried,” of which there are plenty, or “here’s what I think,” rather than “here is what will definitely happen if we continue on this path,” which feels less certain/obvious to me.
Along the same lines, I think it’s pretty hard to tell whether this piece is in good faith or not. E.g., in the intro Connor writes “The default path we are on now is one of ruthless, sociopathic corporations racing toward building the most intelligent, powerful AIs as fast as possible to compete with one another and vie for monopolization and control of both the market and geopolitics.” Which, again, I don’t necessarily disagree with, but my imagined reader with zero context is like “what, really? sociopaths? control over geopolitics?” I.e., I’m expecting readers to question the integrity of the piece, and to be more unsure of how to update on it (e.g. “how do I know this whole thing isn’t just a strawman?” etc.).
There are many places where they kind of just state things without justifying them much. I think in the best case this might cause readers to think through whether such claims make sense (either on their own, or by reading the hyperlinked stuff—both of which put quite a lot of cognitive load on them), and in the worst case just causes readers to either bounce or kind of blindly swallow what they’re saying. E.g., “Black-Box Evaluations can only catch all relevant safety issues insofar as we have either an exhaustive list of all possible failure modes, or a mechanistic model of how concrete capabilities lead to safety risks.” They say this without argument and then move on. And although I agree with them (having spent a lot of time thinking this through myself), it’s really not obvious at first blush. Why do you need an exhaustive list? One might imagine, for instance, that a small number of tests would generalize well. And do you need mechanistic models? Sometimes medicines work safely without that, etc., etc. I haven’t read the entire Compendium closely, but my sense is that this is not an isolated incident. And I don’t think this is a fatal flaw or anything—they’re moving through a ton of material really fast and it’s hard to give a thorough account of all claims—but it does make me more hesitant to use it as the default “here’s what’s happening” document.
All of that said, I do broadly agree with the set of arguments, and I think it’s a really cool activity for people to write up what they believe. I’m glad they did it. But I’m not sure how comfortable I feel about sending it to people who haven’t thought much about AI.
One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there’s nothing else we can do.There’s not a better alternative– these are the only things that labs and governments are currently willing to support.
The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:
Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. This is a living document, and your suggestions will shape our arguments.
Post your views on AGI risk to social media, explaining why you believe it to be a legitimate problem (or not).
Red-team companies’ plans to deal with AI risk, and call them out publicly if they do not have a legible plan.
One possible critique is that their suggestions are not particularly ambitious. This is likely because they’re writing for a broader audience (people who haven’t been deeply engaged in AI safety).
For people who have been deeply engaged in AI safety, I think the natural steelman here is “focus on helping the public/government better understand the AI risk situation.”
There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.
And it’s not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I’m even excluding people who are working on evals/if-then plans: like, I’m focusing on people who see their primary purpose as helping the public or policymakers develop “situational awareness”, develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)
I appreciated their section on AI governance. The “if-then”/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I’m a fan of preparedness efforts– especially on the government level– but I think it’s worth engaging with the counterarguments.)
Pasting some content from their piece below.
High-level thesis against current AI governance efforts:
The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling AI development and preventing it.
Critique of reactive frameworks:
1. The reactive framework reverses the burden of proof from how society typically regulates high-risk technologies and industries.
In most areas of law, we do not wait for harm to occur before implementing safeguards. Banks are prohibited from facilitating money laundering from the moment of incorporation, not after their first offense. Nuclear power plants must demonstrate safety measures before operation, not after a meltdown.
The reactive framework problematically reverses the burden of proof. It assumes AI systems are safe by default and only requires action once risks are detected. One of the core dangers of AI systems is precisely that we do not know what they will do or how powerful they will be before we train them. The if-then framework opts to proceed until problems arise, rather than pausing development and deployment until we can guarantee safety. This implicitly endorses the current race to AGI.
This reversal is exactly what makes the reactive framework preferable for AI companies.
Critique of waiting for warning shots:
3. The reactive framework incorrectly assumes that an AI “warning shot” will motivate coordination.
Imagine an extreme situation in which an AI disaster serves as a “warning shot” for humanity. This would imply that powerful AI has been developed and that we have months (or less) to develop safety measures or pause further development. After a certain point, an actor with sufficiently advanced AI may be ungovernable, and misaligned AI may be uncontrollable.
When horrible things happen, people do not suddenly become rational. In the face of an AI disaster, we should expect chaos, adversariality, and fear to be the norm, making coordination very difficult. The useful time to facilitate coordination is before disaster strikes.
However, the reactive framework assumes that this is essentially how we will build consensus in order to regulate AI. The optimistic case is that we hit a dangerous threshold before a real AI disaster, alerting humanity to the risks. But history shows that it is exactly in such moments that these thresholds are most contested –- this shifting of the goalposts is known as the AI Effect and common enough to have its own Wikipedia page. Time and again, AI advancements have been explained away as routine processes, whereas “real AI” is redefined to be some mystical threshold we have not yet reached. Dangerous capabilities are similarly contested as they arise, such as how recent reports of OpenAI’s o1 being deceptive have been questioned.
This will become increasingly common as competitors build increasingly powerful capabilities and approach their goal of building AGI. Universally, powerful stakeholders fight for their narrow interests, and for maintaining the status quo, and they often win, even when all of society is going to lose. Big Tobacco didn’t pause cigarette-making when they learned about lung cancer; instead they spread misinformation and hired lobbyists. Big Oil didn’t pause drilling when they learned about climate change; instead they spread misinformation and hired lobbyists. Likewise, now that billions of dollars are pouring into the creation of AGI and superintelligence, we’ve already seen competitors fight tooth and nail to keep building. If problems arise in the future, of course they will fight for their narrow interests, just as industries always do. And as the AI industry gets larger, more entrenched, and more essential over time, this problem will grow rapidly worse.
This seems to be confusing a dangerous capability eval (of being able to ‘deceive’ in a visible scratchpad) with an assessment of alignment, which seems like exactly what the ‘questioning’ was about.
Short version: Nvidia’s only moat is in software; AMD already makes flatly superior hardware priced far lower, and Google probably does too but doesn’t publicly sell it. And if AI undergoes smooth takeoff on current trajectory, then ~all software moats will evaporate early.
Long version: Nvidia is pretty obviously in a hype-driven bubble right now. However, it is sometimes the case that (a) an asset is in a hype-driven bubble, and (b) it’s still a good long-run bet at the current price, because the company will in fact be worth that much. Think Amazon during the dot-com bubble. I’ve heard people make that argument about Nvidia lately, on the basis that it will be ridiculously valuable if AI undergoes smooth takeoff on the current apparent trajectory.
My core claim here is that Nvidia will not actually be worth much, compared to other companies, if AI undergoes smooth takeoff on the current apparent trajectory.
Other companies already make ML hardware flatly superior to Nvidia’s (in flops, memory, whatever), and priced much lower. AMD’s MI300x is the most obvious direct comparison. Google’s TPUs are probably another example, though they’re not sold publicly so harder to know for sure.
So why is Nvidia still the market leader? No secret there: it’s the CUDA libraries. Lots of (third-party) software is built on top of CUDA, and if you use non-Nvidia hardware then you can’t use any of that software.
That’s exactly the sort of moat which will disappear rapidly if AI automates most-or-all software engineering, and on current trajectory software engineering would be one of the earlier areas to see massive AI acceleration. In that world, it will be easy to move any application-level program to run on any lower-level stack, just by asking an LLM to port it over.
So in worlds where AI automates software engineering to a very large extent, Nvidia’s moat is gone, and their competition has an already-better product at already-lower price.
The easiest answer is to look at the specs. Of course specs are not super reliable, so take it all with many grains of salt. I’ll go through the AMD/Nvidia comparison here, because it’s a comparison I looked into a few months back.
MI300x vs H100
Techpowerup is a third-party site with specs for the MI300x and the H100, so we can do a pretty direct comparison between those two pages. (I don’t know if the site independently tested the two chips, but they’re at least trying to report comparable numbers.) The H200 would arguably be more of a “fair comparison” since the MI300x came out much later than the H100; we’ll get to that comparison next. I’m starting with MI300x vs H100 comparison because techpowerup has specs for both of them, so we don’t have to rely on either company’s bullshit-heavy marketing materials as a source of information. Also, even the H100 is priced 2-4x more expensive than the MI300x (~$30-45k vs ~$10-15k), so it’s not unfair to compare the two.
Key numbers (MI300x vs H100):
float32 TFLOPs: ~80 vs ~50
float16 TFLOPs: ~650 vs ~200
memory: 192 GB vs 80 GB (note that this is the main place where the H200 improves on the H100)
bandwidth: ~10 TB/s vs ~2 TB/s
… so the comparison isn’t even remotely close. The H100 is priced 2-4x higher but is utterly inferior in terms of hardware.
MI300x vs H200
I don’t know of a good third-party spec sheet for the H200, so we’ll rely on Nvidia’s page. Note that they report some numbers “with sparsity” which, to make a long story short, means those numbers are blatant marketing bullshit. Other than those numbers, I’ll take their claimed specs at face value.
Key numbers (MI300x vs H200):
float32 TFLOPs: ~80 vs ~70
float16 TFLOPs: don’t know, Nvidia conspicuously avoided reporting that number
memory: 192 GB vs 141 GB
bandwidth: ~10 TB/s vs ~5 TB/s
So they’re closer than the MI300x vs H100, but the MI300x still wins across the board. And pricewise, the H200 is probably around $40k, so 3-4x more expensive than the MI300x.
Its worth noting that even if nvidia is charging 2-4x more now, the ultimate question for competitiveness will be manufactoring cost for nvidia vs amd. If nvidia has much lower manufactoring costs than amd per unit performance (but presumably higher markup), then nvidia might win out even if their product is currently worse per dollar.
Note also that price discrimination might be a big part of nvidia’s approach. Scaling labs which are willing to go to great effort to drop compute cost by a factor of two are a subset of nvidia’s customers where nvidia would ideally prefer to offer lower prices. I expect that nvidia will find a way to make this happen.
I’m holding a modest long position in NVIDIA (smaller than my position in Google), and expect to keep it for at least a few more months. I expect I only need NVIDIA margins to hold up for another 3 or 4 years for it to be a good investment now.
It will likely become a bubble before too long, but it doesn’t feel like one yet.
While the first-order analysis seems true to me, there are mitigating factors:
AMD appears to be bungling on their GPUs being reliable and fast, and probably will for another few years. (At least, this is my takeaway from following the TinyGrad saga on Twitter...) Their stock is not valued as it should be for a serious contender with good fundamentals, and I think this may stay the case for a while, if not forever if things are worse than I realize.
NVIDIA will probably have very-in-demand chips for at least another chip generation due to various inertias.
There aren’t many good-looking places for the large amount of money that wants to be long AI to go right now, and this will probably inflate prices for still a while across the board, in proportion to how relevant-seeming the stock is. NVDA rates very highly on this one.
So from my viewpoint I would caution against being short NVIDIA, at least in the short term.
If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
The disadvantages of AMD software development potentially need to be addressed at levels not accessible to an arbitrary feral automated software engineer in the wild, to make the stack sufficiently usable. (A lot of actual human software engineers would like the chance.)
NVIDIA is training their own AIs, who are pretty capable.
NVIDIA can invest their current profits. (Revenues, not stock valuations.)
If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
I don’t think the advantages would necessarily compound—quite the opposite, there are diminishing returns and I expect ‘catchup’. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context—such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more ‘data’ it can ‘transfer’ to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target language. You already, by stipulation, have an ‘oracle’ to check outputs of the target against, which can produce counterexamples.) To a sad degree, pretty much all programming languages are the same these days: ALGOL with C sugaring to various degrees and random ad hoc addons; a LLM which can master Python can master Javascript can master Typescript… The hard part is the non-programming-language parts, the algorithms and reasoning and being able to understand & model the implicit state updates—not memorizing the standard library of some obscure language.
So at some point, even if you have a model which is god-like at Python (at which point each additional Python datapoint adds basic next to nothing), you will find it is completely acceptable at JavaScript, say, or even your brand-new language with 5 examples which you already have on hand in the documentation. You don’t need ‘the best possible performance’, you just need some level of performance adequate to achieve your goal. If the Python is 99.99% on some benchmark, you are probably fine with 99.90% performance in your favorite language. (Presumably there is some absolute level like 99% at which point automated CUDA → ROCm becomes possible, and it is independent of whether some other language has even higher accuracy.) All you need is some minor reason to pay that slight non-Python tax. And that’s not hard to find.
If AI automates most, but not all, software engineering
Also, I suspect that the task of converting CUDA code to ROCm code might well fall into the ‘most’ category rather than being the holdout programming tasks. This is a category of code ripe for automation: you have, again by stipulation, correct working code which can be imitated and used as an oracle autonomously to brute force translation, which usually has very narrow specific algorithmic tasks (‘multiply this matrix by that matrix to get this third matrix; every number should be identical’), random test-cases are easy to generate (just big grids of numbers), and where the non-algorithmic number also has simple end-to-end metrics (‘loss go down per wallclock second’) to optimize. Compared to a lot of areas, like business logic or GUIs, this seems much more amenable to tasking LLMs with. geohot may lack the followthrough to make AMD GPUs work, and plow through papercut after papercut, but there would be no such problem for a LLM.
So I agree with Wentsworth that there seems to be a bit of a tricky transition here for Nvidia: it’s always not been worth the time & hassle to try to use an AMD GPU (although a few claim to have made it work out financially for them), because of the skilled labor and wallclock and residual technical risk and loss of flexibility ecosystem; but if LLM coding works out well enough and intelligence becomes ‘too cheap to meter’, almost all of that goes away. Even ordinary unsophisticated GPU buyers will be able to tell their LLM to ‘just make it work on my new GPU, OK? I don’t care about the details, just let me know when you’re done’. At this point, what is the value-add for Nvidia? If they cut down their fat margins and race to the bottom for the hardware, where do they go for the profits? The money all seems to be in the integration and services—none of which Nvidia is particularly good at. (They aren’t even all that good at training LLMs! The Megatron series was a disappointment, like Megatron-NLG-530b is barely a footnote, and even the latest Nemo seems to barely match Llama-3-70b which being like 4x larger and thus more expensive to run.)
And this will be true of anyone who is relying on software lockin: if the lockin is because it would take a lot of software engineer time to do a reverse-engineering rewrite and replacement, then it’s in serious danger in a LLM human coding level world. In a world where you can hypothetically spin up a thousand SWEs on a cloud service, tell them, ‘write me an operating system like XYZ’, and they do so overnight while you sleep, durable software moats are going to require some sort of mysterious blackbox like a magic API; anything which is so modularized as to fit on your own computer is also sufficiently modularized as to easily clone & replace...
This isn’t a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.
Edit: Additionally, I don’t think a world where “most but not all” software engineering is automated is one where it will be a simple matter to spin up a thousand effective SWEs of that capability; I think there’s first a world where that’s still relatively expensive even if most software engineering is being done by automated systems. Paying $8000 for overnight service of 1000 software engineers would be a rather fine deal, currently, but still too much for most people.
I don’t think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren’t ‘translating’ or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)
While there’s truth in what you say, I also think a market that’s running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn’t put it in a bad position.
People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU ‘manufacturer’ simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don’t own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not ‘the current manufacturer’ - that’s everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.
So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc—possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can’t sell at your target price and you aren’t sure what you’re going to do.) And if AMD doesn’t do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).
(… lol. That snuck in without any conscious intent to imply anything, yes. I haven’t even personally interacted with the open Nvidia models yet.)
I do think the analysis is a decent map to nibbling at NVIDIA’s pie share if you happen to be a competitor already—AMD, Intel, or Apple currently, to my knowledge, possibly Google depending what they’re building internally and if they decide to market it more. Apple’s machine learning ecosystem is a bit of a parallel one, but I’d be at least mildly interested in it from a development perspective, and it is making progress.
But when it comes to the hardware, this is a sector where it’s reasonably challenging to conjure a competitor out of thin air still, so competitor behavior—with all its idiosyncrasies—is pretty relevant.
First, if AI is a big value driver, in a general economic sense, is your view that NVIDIA is over prices against its future potential or just that relatively NVIDIA will under perform other investment alternatives you see.
Second, and perhaps an odd and speculative (perhaps nonsense) thought. I would expect that in this area one might see some network effects in play as well so wondering if that might impact the AI engineering decisions on software. Could the AI software solutions look towards maximising the value of the installed network (AIs work better on a common chip and code infrastructure) than will be true if one looks at some isolated technical stats. A bit a long the lines of why Beta was displaced by VHS dispite being a better technology. If so, then it seems possible that NVIDA could remain a leader and enjoy its current pricing powers (at least to some extent) for a fairly long period of time.
Apparently there already exists a CUDA-alternative for non-Nvidia hardware. The open source project ZLUDA. As far as I can tell its less performant than CUDA, and it has the same challenges as firefox does when competing with chromium based browsers, which will only get worse as it gets more popular. But its something to track at least.
AI that can rewrite CUDA is a ways off. It’s possible that it won’t be that far away in calendar time, but it is far away in terms of AI market growth and hype cycles. If GPT-5 does well, Nvidia will reap the gains more than AMD or Google.
Transpiling assembly code written for one OS/kernel to assembly code for another OS/kernel while taking advantage the full speed of the processor, is a completely different task from transpiling say, java code into python.
Also, the hardware/software abstraction might break. A python developer can say hardware failures are not my problem. An assembly developer working at an AGI lab needs to consider hardware failures as lost wallclock time in their company’s race to AGI, and will try to write code so that hardware failures don’t cause the company to lose time.
GPT4 definitely can’t do this type of work and I’ll bet a lot of money GPT5 can’t do it either. ASI can do it but there’s bigger considerations than whether Nvidia makes money there, such as whether we’re still alive and whether markets and democracy continue to exist. Making a guess of N for which GPT-N can get this done requires evaluating how hard of a software task this actually is, and your comment contains no discussion of this.
Have you looked at tinygrad’s codebase or spoken to George Hotz about this?
Shorting nvidia might be tricky. I’d short nvidia and long TSM or an index fund to be safe at some point. Maybe now? Typically the highest market cap stock has poor performance after it claims that spot.
Here’s a side project David and I have been looking into, which others might have useful input on...
Background: Thyroid & Cortisol Systems
As I understand it, thyroid hormone levels are approximately-but-accurately described as the body’s knob for adjusting “overall metabolic rate” or the subjective feeling of needing to burn energy. Turn up the thyroid knob, and people feel like they need to move around, bounce their leg, talk fast, etc (at least until all the available energy sources are burned off and they crash). Turn down the thyroid knob, and people are lethargic.
That sounds like the sort of knob which should probably typically be set higher, today, than was optimal in the ancestral environment. Not cranked up to 11; hyperthyroid disorders are in fact dangerous and unpleasant. But at least set to the upper end of the healthy range, rather than the lower end.
… and that’s nontrivial. You can just dump the relevant hormones (T3/T4) into your body, but there’s a control system which tries to hold the level constant. Over the course of months, the thyroid gland (which normally produces T4) will atrophy, as it shrinks to try to keep T4 levels fixed. Just continuing to pump T3/T4 into your system regularly will keep you healthy—you’ll basically have a hypothyroid disorder, and supplemental T3/T4 is the standard treatment. But you better be ready to manually control your thyroid hormone levels indefinitely if you start down this path. Ideally, one would intervene further up the control loop in order to adjust the thyroid hormone set-point, but that’s more of a research topic than a thing humans already have lots of experience with.
So that’s thyroid. We can tell a similar story about cortisol.
As I understand it, the cortisol hormone system is approximately-but-accurately described as the body’s knob for adjusting/tracking stress. That sounds like the sort of knob which should probably be set lower, today, than was optimal in the ancestral environment. Not all the way down; problems would kick in. But at least set to the lower end of the healthy range.
… and that’s nontrivial, because there’s a control loop in place, etc. Ideally we’d intervene on the relatively-upstream parts of the control loop in order to change the set point.
We’d like to generalize this sort of reasoning, and ask: what are all the knobs of this sort which we might want to adjust relative to their ancestral environment settings?
Generalization
We’re looking for signals which are widely broadcast throughout the body, and received by many endpoints. Why look for that type of thing? Because the wide usage puts pressure on the signal to “represent one consistent thing”. It’s not an accident that there are individual hormonal signals which are approximately-but-accurately described by the human-intuitive phrases “overall metabolic rate” or “stress”. It’s not an accident that those hormones’ signals are not hopelessly polysemantic. If we look for widely-broadcast signals, then we have positive reason to expect that they’ll be straightforwardly interpretable, and therefore the sort of thing we can look at and (sometimes) intuitively say “I want to turn that up/down”.
Furthermore, since these signals are widely broadcast, they’re the sort of thing which impacts lots of stuff (and is therefore impactful to intervene upon). And they’re relatively easy to measure, compared to “local” signals.
The “wide broadcast” criterion helps focus our search a lot. For instance, insofar as we’re looking for chemical signals throughout the whole body, we probably want species in the bloodstream; that’s the main way a concentration could be “broadcast” throughout the body, rather than being a local signal. So, basically endocrine hormones.
Casting a slightly wider net, we might also be interested in:
Signals widely broadcast through the body by the nervous system.
Chemical signals widely broadcast through the brain specifically (since that’s a particularly interesting/relevant organ).
Non-chemical signals widely broadcast through the brain specifically.
… and of course for all of these there will be some control system, so each has its own tricky question about how to adjust it.
Some Promising Leads, Some Dead Ends
With some coaxing, we got a pretty solid-sounding list of endocrine hormones out of the LLMs. There were some obvious ones on the list, including thyroid and cortisol systems, sex hormones, and pregnancy/menstruation signals. There were also a lot of signals for homeostasis of things we don’t particularly want to adjust: salt balance, calcium, digestion, blood pressure, etc. There were several inflammation and healing signals, which we’re interested in but haven’t dug into yet. And then there were some cool ones: oxytocin (think mother-child bonding), endocannabinoids (think pot), satiety signals (think Ozempic). None of those really jumped out as clear places to turn a knob in a certain direction, other than obvious things like “take ozempic if you are even slightly overweight” and the two we already knew about (thyroid and cortisol).
Then there were neuromodulators. Here’s the list we coaxed from the LLMs:
Dopamine: Tracks expected value/reward—how good things are compared to expectations.
Norepinephrine: Sets arousal/alertness level—how much attention and energy to devote to the current situation.
Serotonin: Regulates resource availability mindset—whether to act like resources are plentiful or scarce. Affects patience, time preference, and risk tolerance.
Acetylcholine: Controls signal-to-noise ratio in neural circuits—acts like a gain/precision parameter, determining whether to amplify precise differences (high ACh) or blur things together (low ACh).
Histamine: Manages the sleep/wake switch—promotes wakefulness and suppresses sleep when active.
Orexin: Acts as a stability parameter for brain states—increases the depth of attractor basins and raises transition barriers between states. Higher orexin = stronger attractors = harder to switch states.
Of those, serotonin immediately jumps out as a knob you’d probably want to turn to the “plentiful resources” end of the healthy spectrum, compared to the ancestral environment. That puts the widespread popularity of SSRIs in an interesting light!
Moving away from chemical signals, brain waves (alpha waves, theta oscillations, etc) are another potential category—they’re oscillations at particular frequencies which (supposedly) are widely synced across large regions of the brain. I read up just a little, and so far have no idea how interesting they are as signals or targets.
Shifting gears, the biggest dead end so far has been parasympathetic tone, i.e. overall activation level of the parasympathetic nervous system. As far as I can tell, parasympathetic tone is basically Not A Thing: there are several different ways to measure it, and the different measurements have little correlation. It’s probably more accurate to think of parasympathetic nervous activity as localized, without much meaningful global signal.
Uh… Guys. Uh. Biology is complicated. It’s a messy pile of spaghetti code. Not that it’s entirely intractable to make Pareto improvements but, watch out for unintended consequences.
For instance: you are very wrong about cortisol. Cortisol is a “stress response hormone”. It tells the body to divert resources to bracing itself to deal with stress (physical and/or mental). Experiments have shown that if you put someone through a stressful event while suppressing their cortisol, they have much worse outcomes (potentially including death).
Cortisol doesn’t make you stressed, it helps you survive stress. Deviation from homeostatic setpoints (including mental ones) are what make you stressed.
Hmm, I’ll see if I can find some old papers.… I’m just reciting memories from grad school lectures like… 12 years ago.
Here’s an example of the finding being replicated and explored further in a primate model: https://www.jci.org/articles/view/112443
Basically, cortisol is helpful for surviving injuries. Is it helpful for mental stress? Unclear.
Long term high cortisol is harmful, but the stress in one’s life resulting in that high cortisol level is harmful in more ways than just high cortisol. So are there times when it would be helpful to reduce someone’s cortisol level? Absolutely. But it’s complicated and should be done thoughtfully and selectively, and in combination with other things (particularly seeking out and treating the upstream causes).
I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here for a reference.)
I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here.
As far as I can tell, parasympathetic tone is basically Not A Thing
Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.
We’re looking for signals which are widely broadcast throughout the body, and received by many endpoints. Why look for that type of thing? Because the wide usage puts pressure on the signal to “represent one consistent thing”. It’s not an accident that there are individual hormonal signals which are approximately-but-accurately described by the human-intuitive phrases “overall metabolic rate” or “stress”. It’s not an accident that those hormones’ signals are not hopelessly polysemantic. If we look for widely-broadcast signals, then we have positive reason to expect that they’ll be straightforwardly interpretable, and therefore the sort of thing we can look at and (sometimes) intuitively say “I want to turn that up/down”.
This sounds logical but I don’t think is backed empirically, at least to the degree you’re claiming. Source: I have a biology BA and can’t speak directly to the question because I never took those classes because they had reputations for being full of exceptions and memorization.
I don’t have deep expertise in the subject, but I’m inclined to concur with the people saying that the widely broadcast signals don’t actually represent one consistent thing, despite your plausible argument to the contrary.
Here’s a Scott Alexander post speculating why that might be the case. In short: there was an optimization pressure towards making internal biological signals very difficult to decode, because easily decodable signals were easy target for parasites evolving to exploit them. As the result, the actual signals are probably represented as “unnecessarily” complicated, timing-based combinations of various “basic” chemical, electrical, etc. signals, and they’re somewhat individualized to boot. You can’t decode them just by looking at any one spatially isolated chunk of the body, by design.
Basically: separate chemical substances (and other components that look “simple” locally/from the outside) are not the privileged basis for decoding internal signals. They’re the anti-privileged basis, if anything.
Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.
…Except in the time domain, to a limited extent. For example, in rats, tonic oxytocin in the bloodstream controls natriuresis, while pulsed oxytocin in the bloodstream controls lactation and birth. The kidney puts a low-pass filter on its oxytocin detection system, and the mammary glands & uterus put a high-pass filter, so to speak.
Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.
The point wouldn’t be to direct it, but to have different mixtures of chemicals (and timings) to mean different things to different organs.
Loose analogy: Suppose that the intended body behaviors (“kidneys do X, heart does Y, brain does Z” for all combinations of X, Y, Z) are latent features, basic chemical substances and timings are components of the input vector, and there are dramatically more intended behaviors than input-vector components. Can we define the behavior-controlling function of organs (distributed across organs) such that, for any intended body behavior, there’s a signal that sets the body into approximately this state?
It seems that yes. The number of almost-orthogonal vectors in d dimensions scales exponentially with d, so we simply need to make the behavior-controlling function sensitive to these almost-orthogonal directions, rather than the chemical-basis vectors. The mappings from the input vector to the output behaviors, for each organ, would then be some complicated mixtures, not a simple “chemical A sets all organs into behavior X”.
This analogy seems flawed in many ways, but I think something directionally-like-this might be happening?
Just because the number of almost-orthogonal vectors in d dimensions scales exponentially with d, doesn’t mean one can choose all those signals independently. We can still only choose d real-valued signals at a time (assuming away the sort of tricks by which one encodes two real numbers in a single real number, which seems unlikely to happen naturally in the body). So “more intended behaviors than input-vector components” just isn’t an option, unless you’re exploiting some kind of low-information-density in the desired behaviors (like e.g. very “sparse activation” of the desired behaviors, or discreteness of the desired behaviors to a limited extent).
The above toy model assumed that we’re picking one signal at a time, and that each such “signal” specifies the intended behavior for all organs simultaneously...
… But you’re right that the underlying assumption there was that the set of possible desired behaviors is discrete (i. e., that X in “kidneys do X” is a discrete variable, not a vector of reals). That might’ve indeed assumed me straight out of the space of reasonable toy models for biological signals, oops.
I had seen recommendations for T3/T4 on twitter to help with low energy, and even purchased some, but haven’t taken it. I hadn’t considered that the thyroid might respond by shrinking, and now think that that’s a worrying intervention! So I’m glad I read this—thank you.
As someone who has Graves’ Disease … one of the reasons that you really don’t want to run your metabolism faster with higher T4 levels is that higher heart rate for an extended period can cause your heart to fail.
More generally: changing the set point of any of these system might cause the failure of some critical component that depends on the old value of the set point,
Yup, I’m familiar with that one. The big difference is that I’m backward-chaining, whereas that post forward chains; the hope of backward chaining would be to identify big things which aren’t on peoples’ radar as nootropics (yet).
(Relatedly: if one is following this sort of path, step 1 should be a broad nutrition panel and supplementing anything in short supply, before we get to anything fancier.)
So I find the question underspecified, why do you want this?
Why are you decomposing body signalling without looking at the major sub-regulstort systems? If you want to predict sleep then cortisol, melatonin, etc. is something quite good and this will tell you about stress regulation which effects both endocrine as well as cortisol systems.
If you want to look at nutritional systems then GLP-1 activation is good for average food need whilst grelin is predictive of whether you will feel hungry at specific times.
If you’re looking at brain health then serotonin activation patterns can be really good to check but this is different from how the stomach uses it and it does have the majority of serotonin. But this is like way to simplified especially for the brain.
Different subsystems use the same molecules in different ways, waste not and all that so what are you looking for and why?
Is there a particular reason to not include sex hormones? Some theories suggest that testosterone tracks relative social status. We might expect that high social status → less stress (of the cortisol type) + more metabolic activity. Since it’s used by trans people we have a pretty good idea of what it does to you at high doses (makes you hungry, horny, and angry) but its unclear whether it actually promotes low cortisol-stress and metabolic activity.
AFAICT, approximately every “how to be good at conversation” guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That’s a somewhat lossy compression, but not that lossy.) And approximately every guide is like “if you get good at this free association game, then it will be fun and easy!”. And that’s probably true for some subset of people.
But speaking for myself personally… the problem is that the free-association game just isn’t very interesting.
I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that’s what you want. But, like… that is not my utility function. The free association game is a fine ice-breaker, it’s sometimes fun for ten minutes if I’m in the mood, but most of the time it’s just really boring.
Even for serious intellectual conversations, something I appreciate in this kind of advice is that it often encourages computational kindness. E.g. it’s much easier to answer a compact closed question like “which of these three options do you prefer” instead of an open question like “where should we go to eat for lunch”. The same applies to asking someone about their research; not every intellectual conversation benefits from big open questions like the Hamming Question.
I think this is especially important for me/us to remember. On this site we often have a complex way of thinking, and a high computational budget (because we like exercising our brains to failure) and if we speak freely to the average person, they mat be annoyed at how hard it is to parse what we are saying.
We’ve all probably had this experience when genuinely trying to understand someone from a very different background. Perhaps they are trying to describe their inner experience when mediating, or Japanese poetry, or are simply from a different’t discipline. Or perhaps we were just very tired that day, meaning we had a low computational budget.
On the other hand, we are often a “tell” culture, which had a lower computational load compared to ask or guess culture. As long as we don’t tell too much.
Generally fair and I used to agree, I’ve been looking at it from a bit of a different viewpoint recently.
If we think of a “vibe” of a conversation as a certain shared prior that you’re currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.
My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.
There’s apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to “find the shared prior and vibe there”.
Hm. This rings true… but also I think that selecting [vibes, in this sense] for attention also selects against [things that the other person is really committed to]. So in practice you’re just giving up on finding shared commitments. I’ve been updating that stuff other than shared commitments is less good (healthy, useful, promising, etc.) than it seems.
Hmm, I find that I’m not fully following here. I think “vibes” might be thing that is messing it up.
Let’s look at a specific example: I’m talking to a new person at an EA-adjacent event and we’re just chatting about how the last year has been. Part of the “vibing” here might be to hone in on the difficulties experienced in the last year due to a feeling of “moral responsibility”, in my view vibing doesn’t have to be done with only positive emotions?
I think you’re bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you’re more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?
Closeness is the operating drive, but it’s not the operating telos. The drive is towards some sort of state or feeling—of relating, standing shoulder-to-shoulder looking out at the world, standing back-to-back defending against the world; of knowing each other, of seeing the same things, of making the same meaning; of integrated seeing / thinking. But the telos is tikkun olam (repairing/correcting/reforming the world)--you can’t do that without a shared idea of better.
As an analogy, curiosity is a drive, which is towards confusion, revelation, analogy, memory; but the telos is truth and skill.
In your example, I would say that someone could be struggling with “moral responsibility” while also doing a bunch of research or taking a bunch of action to fix what needs to be fixed; or they could be struggling with “moral responsibility” while eating snacks and playing video games. Vibes are signals and signals are cheap and hacked.
There’s a general-purpose trick I’ve found that should, in theory, be applicable in this context as well, although I haven’t mastered that trick myself yet.
Essentially: when you find yourself in any given cognitive context, there’s almost surely something “visible” from this context such that understanding/mastering/paying attention to that something would be valuable and interesting.
For example, suppose you’re reading a boring, nonsensical continental-philosophy paper. You can:
Ignore the object-level claims and instead try to reverse-engineer what must go wrong in human cognition, in response to what stimuli, to arrive at ontologies that have so little to do with reality.
Start actively building/updating a model of the sociocultural dynamics that incentivize people to engage in this style of philosophy. What can you learn about mechanism design from that? It presumably sheds light on how to align people towards pursuing arbitrary goals, or how to prevent this happening...
Pay attention to your own cognition. How exactly are you mapping the semantic content of the paper to an abstract model of what the author means, or to the sociocultural conditions that created this paper? How do these cognitive tricks generalize? If you find a particularly clever way to infer something form the text, check: would your cognitive policy automatically deploy this trick in all context where it’d be useful, or do you need to manually build a TAP for that?
Study what passages make the feelings of boredom or frustration spike. What does that tell you about how your intuitions/heuristics work? Could you extract any generalizable principles out of that? For example, if a given sentence particularly annoys you, perhaps it’s because it features a particularly flawed logical structure, and it’d be valuable to learn to spot subtler instances of such logical flaws “in the wild”.
The experience of reading the paper’s text almost certainly provides some data uniquely relevant to some valuable questions, data you legitimately can’t source any other way. (In the above examples: sure you can learn more efficiently about the author’s cognition or the sociocultural conditions by reading some biographies or field overviews. But (1) this wouldn’t give you the meta-cognitive data about how you can improve your inference functions for mapping low-level data to high-level properties, (2) those higher-level summaries would necessarily be lossy, and give you a more impoverished picture than what you’d get from boots-on-the-ground observations.)
Similar applies to:
Listening to boring lectures. (For example, you can pay intense attention to the lecturer’s body language, or any tricks or flaws in their presentation.)
Doing a physical/menial task. (Could you build, on the fly, a simple model of the physics (or logistics) governing what you’re doing, and refine it using some simple experiments? Then check afterwards if you got it right. Or: If you were a prehistoric human with no idea what “physics” is, how could you naturally arrive at these ideas from doing such tasks/making such observations? What does that teach you about inventing new ideas in general?)
Doing chores. (Which parts of the process can you optimize/streamline? What physical/biological conditions make those chores necessary? Could you find a new useful takeaway from the same chore every day, and if not, why?)
Et cetera.
There’s a specific mental motion I associate with using this trick, which involves pausing and “feeling out” the context currently loaded in my working memory, looking at it from multiple angles, trying to see anything interesting or usefully generalizable.
In theory, this trick should easily apply to small-talk as well. There has to be something you can learn to track in your mind, as you’re doing small-talk, that would be useful or interesting to you.
One important constraint here is that whatever it is, it has to be such that your outwards demeanour would be that of someone who is enjoying talking to your interlocutor. If the interesting thing you’re getting out of the conversation is so meta/abstract you end up paying most of the attention to your own cognitive processes, not on what the interlocutor is saying, you’ll have failed at actually doing the small-talk. (Similarly, if, when doing a menial task, you end up nerd-sniped by building a physical model of the task, you’ll have failed at actually doing the task.)
You also don’t want to come across as sociopathic, so making a “game” of it where you’re challenging yourself to socially engineer the interlocutor into something is, uh, not a great idea.
The other usual advice for finding ways to enjoy small-talk are mostly specialized instances of the above idea that work for specific people. Steering the small-talk to gradient-descend towards finding emotional common ground, ignoring the object-level words being exchanged and build a social model of the interlocutor, doing a live study of the social construct of “small-talk” by playing around with it, etc.
You’ll probably need to find an instance of the trick that works for your cognition specifically, and it’s also possible the optimization problem is overconstrained in your case. Still, there might be something workable.
Some people struggle with the specific tactical task of navigating any conversational territory. I’ve certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.
Unfortunately, your problem is most likely that you’re talking to boring people (so as to avoid doing any moral value judgements I’ll make clear that I mean johnswentworth::boring people).
There are specific skills to elicit more interesting answers to questions you ask. One I’ve heard is “make a beeline for the edge of what this person has ever been asked before” which you can usually reach in 2-3 good questions. At that point they’re forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.
This is easiest when you can latch onto a topic you’re interested in, because then it’s easy on your part to come up with meaningful questions. If you can’t find any topics like this then re-read paragraph 2.
Talking to people is often useful for goals like “making friends” and “sharing new information you’ve learned” and “solving problems” and so on. If what conversation means (in most contexts and for most people) is ‘signaling that you repeatedly have interesting things to say’, it’s required to learn to do that in order to achieve your other goals.
Most games aren’t that intrinsically interesting, including most social games. But you gotta git gud anyway because they’re useful to be able to play well.
Hmm, the ‘making friends’ part seems the most important (since there are ways to share new information you’ve learned, or solve problems, beyond conversation), but it also seems a bit circular. Like, if the reason for making friends is to hang out and have good conversations(?), but one has little interest in having conversations, then doesn’t one have little reason to make friends in the first place, and therefore little reason to ‘git gud’ at the conversation game?
Er, friendship involves lots of things beyond conversation. People to support you when you’re down, people to give you other perspectives on your personal life, people to do fun activities with, people to go on adventures and vacations with, people to celebrate successes in your life with, and many more.
Good conversation is a lubricant for facilitating all of those other things, for making friends and sustaining friends and staying in touch and finding out opportunities for more friendship-things.
The skill in such a game is largely in understanding the free association space, knowing how people likely react and thinking enough steps ahead to choose moves that steer the person where you want to go, either into topics you find interesting, information you want from them, or getting them to a particular position, and so on. If you’re playing without goals, of course it’s boring...
I think that “getting good” at the “free association” game is in finding the sweet spot / negotiation between full freedom of association and directing toward your own interests, probably ideally with a skew toward what the other is interested in. If you’re both “free associating” with a bias toward your own interests and an additional skew toward perceived overlap, updating on that understanding along the way, then my experience says you’ll have a good chance of chatting about something that interests you both. (I.e. finding a spot of conversation which becomes much more directed than vibey free association.) Conditional on doing something like that strategy, I find it ends up being just a question of your relative+combined ability at this and the extent of overlap (or lack thereof) in interests.
So short model is: Git gud at free association (+sussing out interests) → gradient ascend yourselves to a more substantial conversation interesting to you both.
I have similar tastes, but, some additional gears:
I think all day, these days. Even if I’m trying to have interesting, purposeful conversations with people who also want that, it is useful to have sorts of things to talk about that let some parts of my brain relax (while using other parts of my brain I don’t use as much)
on the margin, you can do an intense intellectual conversation, but still make it funnier, or with more opportunity for people to contribute.
It’s becomes more interresting when the people constrain their output based on what they expect is true information that the other person does not yet know. It’s useful to talk to an expert, who tells you a bunch of random stuff they know that you don’t.
Often some of it will be useful. This only works if they understand what you have said though (which presumably is something that you are interested in). And often the problem is that people’s models about what is useful are wrong. This is especially likely if you are an expert in something. Then the thing that most people will say will be worse what you would think on the topic. This is especially bad if the people can’t immediately even see why what you are saying is right.
The best strategy around this I have found so far is just to switch the topic to the actually interesting/important things. Suprisingly usually people go along with it.
Good question. Some differences off the top of my head:
On this forum, if people don’t have anything interesting to say, the default is to not say anything, and that’s totally fine. So the content has a much stronger bias toward being novel and substantive and not just people talking about their favorite parts of Game of Thrones or rehashing ancient discussions (though there is still a fair bit of that) or whatever.
On this forum, most discussions open with a relatively-long post or shortform laying out some ideas which at least the author is very interested in. The realtime version would be more like a memo session or a lecture followed by discussion.
The intellectual caliber of people on this forum (or at least active discussants) is considerably higher than e.g. people at Berkeley EA events, let alone normie events. Last event I went to with plausibly-higher-caliber-people overall was probably the ILLIAD conference.
In-person conversations have a tendency to slide toward the lowest denominator, as people chime in about whatever parts they (think they) understand, thereby biasing toward things more people (think they) understand. On LW, karma still pushes in that direction, but threading allows space for two people to go back-and-forth on topics the audience doesn’t really grock.
Not sure to what extent those account for the difference in experience.
Totally understand why this would be more interesting; I guess I would still fundamentally describe what we’re doing on the internet as conversation, with the same rules as you would describe above. It’s just that the conversation you can find here (or potentially on Twitter) is superstimulating compared to what you’re getting elsewhere. Which is good in the sense that it’s more fun, and I guess bad inasmuch as IRL conversation was fulfilling some social or networking role that online conversation wasn’t.
I understand, for someone with a strong drive to solve hard problems, there’s an urge for conversations to serve a function, exchange information with your interlocutor so things can get done. There’s much to do and communication is already painfully inefficient at it’s best.
The thing is, I don’t think the free-association game is inefficient, if one is skilled at it. It’s also not all that free. The reason it is something humans “developed” is because it is the most efficient way to exchange rough but extensive models of our minds with others via natural language. It acts a bit like a ray tracer, you shoot conversational rays and by how they bounce around in mental structures, the thought patterns, values and biases of the conversation partners are revealed to each other. Shapes become apparent. Sometimes rays bounce off into empty space, then you need to restart the conversation, shoot a new ray. And getting better at this game, keeping the conversation going, exploring a wider range of topics more quickly, means building a faster ray tracer, means it takes less time to know if your interlocutor thinks in a way and about topics which you find enlightening/aesthetically pleasing/concretely useful/whatever you value.
Or to use a different metaphor, starting with a depth-first search and never running a breadth-first search will lead to many false negatives. There are many minds out there that can help you in ways you won’t know in advance.
So if the hard problems you are working on could profit from more minds, it pays off to get better as this. Even if it has not much intrinsic value for you, it has instrumental value.
Hope this doesn’t come across as patronizing, definitely not meant that way.
Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be “ray traced” with fairly little effort. It’s especially bad in EA circles.
Then I misunderstood your original comment, sorry. As a different commenter wrote, the obvious solution would be to only engage with interesting people. But, of course, unworkable in practice. And “social grooming” nearly always involves some level of talking. A curse of our language abilities, I guess. Other social animals don’t have that particular problem.
The next best solution would be higher efficiency, more socializing bang for your word count buck, so to speak. Shorter conversations for the same social effect. Not usually a focus of anything billed as conversation guide, for obvious reasons. But there are some methods aimed at different goals that, in my experience, also help with this as a side effect.
Say more about “ray-tracing”? What does that look like? And do you have a bullshit-but-useful PCA-flavored breakdown of those few dimensions of variation?
Ok but how do you deal with the tragedy of the high dimensionality of context-space? People worth thinking with have wildly divergent goals—and even if you share goals, you won’t share background information.
Yeah it sucks, search by free association is hillclimbing (gets stuck in local optima) and the contemporary media environment and political culture is an illustration of its problems.
The pattern itself is a local optimum, it’s a product of people walking into a group without knowing what the group is doing and joining in anyway, and so that pattern of low-context engagement becomes what we’re doing, and the anxiety that is supposed to protect us from bad patterns like this and help us to make a leap out to somewhere better is usually drowned in alcohol.
Instead of that, people should get to know each other before deciding what to talk about, and then intentionally decide to talk about what they find interesting or useful with that person. This gets better results every time.
But when we socialise as children, there isn’t much about our friends to get to know, no specialists to respectfully consult, no well processed life experiences to learn from, so none of us just organically find that technique of like, asking who we’re talking to, before talking, it has to be intentionally designed.
One blind spot we rationalists sometimes have is that charismatic people actually treat the game as:
“Can I think of an association that will make the other person feel good and/or further my goal?”. You need people to feel good, or they won’t participate. And if you want some complicated/favour/uncomftorble_truth then you better mix in some good feels to balance it out and keep the other person participating.
To put it another way: If you hurt people’s brain or ego, rush them, or make them feel unsure, or contradict them, then most untrained humans will feel a little bad. Why would they want to keep feeling bad? Do you like it when people don’t listen, contradict you, insult you, rush you, disagree with you? Probably not, probobly no one does.
But if someone listens to you, smiles at you, likes you, has a good opinion of you, agrees with you, make sense to you. Then it feels good!
This might sound dangerously sycophantic, and that’s because it is—if people overdo it! But if it’s mixed with some healthy understanding, learning, informing then It’s a great conversational lubricant, and you should apply as needed. It just ensures that everyone enjoys themselves and comes back for more, counteracting the normal frictions of socialising.
There are books about this. “How to Win Friends and Influence People” recommends talking about the other person’s interests (including themselves) and listening to them, which they will enjoy.
So I’d say, don’t just free associate. Make sure it’s fun for both parties, make room to listen to the other person, and to let them steer. (And ideally your conversational partner reciprocates, but that is not guaranteed).
But speaking for myself personally… the problem is that the free-association game just isn’t very interesting.
Hm, I think this really does change when you get better at it? This only works for people you’re interested in, but if you have someone you are interested in, the free association can be a way to explore a large number of interesting topics that you can pick up in a more structured way later.
I think the statement you summarized from those guides is true, just not helpful to you.
Another view would be that people want to be good at conversation not only because they find it fun but there is utility in building rapport quickly, networking and not being cast as a cold person.
I do find the ice breaky, cached Q&A stuff really boring and tend to want to find an excuse to run away quickly, something that happens often at the dreaded “work event”. I tend to see it as almost fully acting a part despite my internal feelings
At these things, I do occasionally come across the good conversationalist, able to make me want to stick with speaking to them even if the convo is not that deep or in my interest areas. I think becoming like such a person isn’t a herculean task but does take practice and is something I aspire too
This is more from a professional setting though, in a casual setting it’s much easier to disengage from a boring person, find shared interests and the convos have much less boundaries
Finally, the speed at which you communicate vibing means you’re communicating almost purely from System 1, expressing your actual felt beliefs. It makes deception both of yourself and others much harder. Its much more likely to reveal your true colors. This allows it to act as a values screening mechanism as well.
I’m personally skeptical of this. I’ve found I’m far more likely to lie than I’d endorse when vibing. Saying “sure I’d be happy to join you on X event” when it is clear with some thought that I’d end up disliking it. Or exaggerating stories because it fits with the vibe.
I view System-1 as less concerned with truth here, it is the one that is more likely to produce a fake-argument in response to a suggested problem. More likely to play social games regardless of if they make sense.
Oh yes, if you’re going on people’s words, it’s obviously not much better, but the whole point of vibing is that it’s not about the words. Your aesthetics, vibes, the things you care about will be communicated non-verbally.
The simple heuristic: typical 5-year-old human males are just straightforwardly correct about what is, and is not, fun at a party. (Sex and adjacent things are obviously a major exception to this. I don’t know of any other major exceptions, though there are minor exceptions.) When in doubt, find a five-year-old boy to consult for advice.
Some example things which are usually fun at house parties:
Dancing
Swordfighting and/or wrestling
Lasertag, hide and seek, capture the flag
Squirt guns
Pranks
Group singing, but not at a high skill level
Lighting random things on fire, especially if they explode
Building elaborate things from whatever’s on hand
Physical party games, of the sort one would see on Nickelodeon back in the day
Some example things which are usually not fun at house parties:
Just talking for hours on end about the same things people talk about on LessWrong, except the discourse on LessWrong is generally higher quality
Just talking for hours on end about community gossip
Just talking for hours on end about that show people have been watching lately
Most other forms of just talking for hours on end
This message brought to you by the wound on my side from taser fighting at a house party last weekend. That is how parties are supposed to go.
One of my son’s most vivid memories of the last few years (and which he talks about pretty often) is playing laser tag at Wytham Abbey, a cultural practice I believe instituted by John and which was awesome, so there is a literal five-year-old (well seven-year-old at the time) who endorses this message!
My guess is laser tags were actually introduced to Wytham Abbey during their Battleschool, not by John. (People familiar with the history can correct me)
The reason the place is designed so that you can’t talk is to make you buy more drinks. (Because when people start talking a lot, they forget to keep drinking.) It may or may not have a positive side effect on you having fun, but it wasn’t designed with your fun as a goal.
Would be interesting to see a survey of five year olds to see if the qualifiers in your opening statement are anything like correct. I doubt you need to filter to just boys, for example.
For me, it depends on whether the attendees are people I’ve never met before, or people I’ve known my entire life. If it’s people I don’t know, I do like to talk to them, to find out whether we have anything interesting to exchange. If it’s someone I’ve known forever, then things like karaoke or go-karting are more fun than just sitting around and talking.
Snowball fights/rolling big balls of snow fall into the same genre, if good snow is available.
I guess this gives me a decent challenge for the next boring party: Turn the party into something fun as a project. Probably the best way to achieve this is to grab the second-most on-board person and escalate from there, clearly having more fun than the other people?
Personally, I’m fairly committed to [talking a lot]. But I do find it incredibly difficult to do at parties. I’ve been trying to figure out why, but the success rate for me plus [talking a lot] at parties seems much lower than I would have hoped.
I’ll add to this list: If you have a kitchen with a tile floor, have everyone take their shoes off, pour soap and water on the floor, and turn it into a slippery sliding dance party. It’s so fun. (My friends and I used to call it “soap kitchen” and it was the highlight of our house parties.)
After most people had left a small house party I was throwing, my close friends and I stayed and started pouring ethanol from a bottle on random surfaces and things and burning it. It was completely stupid, somewhat dangerous (some of us sustained some small burns), utterly pointless, very immature, and also extremely fun.
Epistemic Status: @GeneSmith or @sarahconstantin or @kman or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.
I’ve been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They’re Measuring. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?
… and after going through that exercise I mostly think the underlying studies are fine, but they’re known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.
The Baseline
Before sketching the “different gambit”, let’s talk about the baseline, i.e. the two proposals linked at top. In particular, we’ll focus on the genetics part.
GeneSmith’s plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single base-pair sometimes differs between two humans. (This type of mutation is in contrast to things like insertions or deletions.) GeneSmith argues pretty well IMO that just engineering all the right SNPs would be sufficient to raise a human’s intelligence far beyond anything which has ever existed to date.
GeneSmith cites this Steve Hsu paper, which estimates via a simple back-the-envelope calculation that there are probably on the order of 10k relevant SNPs, each present in ~10% of the population on average, each mildly deleterious.
Conceptually, the model here is that IQ variation in the current population is driven mainly by mutation load: new mutations are introduced at a steady pace, and evolution kills off the mildly-bad ones (i.e. almost all of them) only slowly, so there’s an equilibrium with many random mildly-bad mutations. Variability in intelligence comes from mostly-additive contributions from those many mildly-bad mutations. Important point for later: the arguments behind that conceptual model generalize to some extent beyond SNPs; they’d also apply to other kinds of mutations.
What’s Missing?
Based on a quick googling, SNPs are known to not account for the majority of genetic heritability of intelligence. This source cites a couple others which supposedly upper-bound the total SNP contribution to about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method). Estimates of the genetic component of IQ tend to be 50-70%, so SNPs are about half or less.
Notably, IIRC, attempts to identify which mutations account for the rest by looking at human genetic datasets have also mostly failed to close the gap. (Though I haven’t looked closely into that piece, so this is a place where I’m at particularly high risk of being wrong.)
So what’s missing?
Guess: Copy Count Variation of Microsats/Minisats/Transposons
We’re looking for some class of genetic mutations, which wouldn’t be easy to find in current genetic datasets, have mostly-relatively-mild effects individually, are reasonably common across humans, and of which there are many in an individual genome.
Guess: sounds like variation of copy count in sequences with lots of repeats/copies, like microsatellites/minisatellites or transposons.
Most genetic sequencing for the past 20 years has been shotgun sequencing, in which we break the genome up into little pieces, sequence the little pieces, then computationally reconstruct the whole genome later. That method works particularly poorly for sequences which repeat a lot, so we have relatively poor coverage and understanding of copy counts/repeat counts for such sequences. So it’s the sort of thing which might not have already been found via sequencing datasets, even though at least half the genome consists of these sorts of sequences.
Notably, these sorts of sequences typically have unusually high mutation rates. So there’s lots of variation across humans. Also, there’s been lots of selection pressure for the effects of those mutations to be relatively mild.
What Alternative Strategies Would This Hypothesis Suggest?
With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences. So the engineering part could be quite a lot easier, if we don’t need to do different things with different copies. For instance, if the problem boils down to “get rid of live L1 transposons” or “lengthen all the XYZ repeat sequences”, that would probably be simpler engineering-wise than targeting 10k SNPs.
The flip side is that there’s more novel science to do. The main thing we’d want is deep sequencing data (i.e. sequencing where people were careful to get all those tricky high-copy parts right) with some kind of IQ score attached (or SAT, or anything else highly correlated with g-factor). Notably, we might not need a very giant dataset, as is needed for SNPs. Under (some versions of) the copy count model, there aren’t necessarily thousands of different mutations which add up to yield the roughly-normal trait distribution we see. Instead, there’s independent random copy events, which add up to a roughly-normal number of copies of something. (And the mutation mechanism makes it hard for evolution to fully suppress the copying, which is why it hasn’t been selected away; transposons are a good example.)
So, main steps:
Get a moderate-sized dataset of deep sequenced human genomes with IQ scores attached.
Go look at it, see if there’s something obvious like “oh hey centromere size correlates strongly with IQ!” or “oh hey transposon count correlates strongly with IQ!”
If we find anything, go engineer that thing specifically, rather than 10k SNPs.
With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences.
No, rare variants are no silver bullet here. There’s not a small set, there’s a larger set—there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it’s hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it’s hard to even sequence a CNV, how are you going to edit it?)
They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn’t mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it’s difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don’t help you nearly as much as their rarity hurts you.)
So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it’s a lot of ‘sand in the gears’, and once you move past the easy specks of sand, they all become their own special little snowflakes.
This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like ‘select embryos with the fewest de novo mutations’… but then you lose most of the possible variance and it’ll add little.
So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.
There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right?
Right.
If you are doing genome synthesis, you aren’t frustrated by the rare variant problems as much because you just aren’t putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a ‘wild’ genome nor make highly challenging edits. (This is the ‘modal genome’ baseline. I believe it has still not been statistically modeled at all.)
While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)
Yeah, separate from both the proposal at top of this thread and GeneSmith’s proposal, there’s also the “make the median human genome” proposal—the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the “mutational load” model is basically correct.
I didn’t read this carefully—but it’s largely irrelevant. Adult editing probably can’t have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets—the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).
I haven’t looked at any of the studies and also don’t know much about genomics so my guess might be completely wrong, but a different hypothesis that seems pretty plausible to me is:
Most of the variance of intelligence comes from how well different genes/hyperparamets-of-the-brain can work together, rather than them having individually independent effects on intelligence. Aka e.g. as made-up specifc implausible example (I don’t know that much neuroscience), there could be different genes controlling the size, the snapse-density, and the learning/placticity-rate of cortical columns in some region and there are combinations of those hyperparameters which happen to work well and some that don’t fit quite as well.
So this hypothesis would predict that we didn’t find the remaining genetic component for intelligence yet because we didn’t have enough data to see what clusters of genes together have good effects and we also didn’t know in what places to look for clusters.
Reasonable guess a priori, but I saw some data from GeneSmith at one point which looked like the interactions are almost always additive (i.e. no nontrivial interaction terms), at least within the distribution of today’s population. Unfortunately I don’t have a reference on hand, but you should ask GeneSmith if interested.
I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.
So I only briefly read through the section of the paper, but not really sure whether it applies to my hypothesis: My hypothesis isn’t about there being gene-combinations that are useful which were selected for, but just about there being gene-combinations that coincidentally work better without there being strong selection pressure for those to quickly rise to fixation. (Also yeah for simpler properties like how much milk is produced I’d expect a much larger share of the variance to come from genes which have individual contributions. Also for selection-based eugenics the main relevant thing are the genes which have individual contribution. (Though if we have precise ability to do gene editing we might be able to do better and see how to tune the hyperparameters to fit well together.))
Please let me know whether I’m missing something though.
(There might be a sorta annoying analysis one could do to test my hypothesis: On my hypothesis the correlation between the intelligence of very intelligent parents and their children would be even a bit less than on the just-independent-mutations hypothesis, because very intelligent people likely also got lucky in how their gene variants work together but those properties would unlikely to all be passed along and end up dominant.)
To clarify in case I’m misunderstanding, the effects are additive among the genes explaining the part of the IQ variance which we can so far explain, and we count that as evidence that for the remaining genetically caused IQ variance the effects will also be additive?
I didn’t look into how the data analysis in the studies was done, but on my default guess this generalization does not work well / the additivity on the currently identified SNPs isn’t significant counterevidence for my hyptohesis:
I’d imagine that studies just correlated individual gene variants with IQ and thereby found gene variants that have independent effects on intelligence. Or did they also look at pairwise or triplet gene-variant combinations and correlated those with IQ? (There would be quite a lot of pairs, and I’m not be sure whether the current datasets are large enough to robustly identify the combinations that really have good/bad effects from false positives.)
One would of course expect that the effects of the gene variants which have independent effects on IQ are additive.
But overall, except if the studies did look for higher-order IQ correlations, the fact that the IQ variance we can explain so far comes from genes which have independent effects isn’t significant evidence for the remaining genetically-caused IQ variation also comes from gene variants which have independent effects, because we were bound to much rather find the genes which do have independent effects.
(I think the above should be sufficient explanation of what I think but here’s an example to clarify my hypothesis:
Suppose gene A has variants A1 and A2 and gene B has B1 and B2. Suppose that A1 can work well with B1 and A2 with B2, but the other interactions don’t fit together that well (like badly tuned hyperparameters) and result in lower intelligence.
When we only look at e.g. A1 and A2, none is independently better than the other—they are uncorrelated to IQ. Studies would need to look at combinations of variants to see that e.g. A1+B1 has slight positive correlation with intelligence—and I’m doubting whether studies did that (and whether we have sufficient data to see the signal among the combinatorical explosion of possibilities), and it would be helpful if someone clarified to me briefly how studies did the data analysis. )
(Thanks. I don’t think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith’s comment.)
Another confusing relevant piece of evidence I thought I throw in:
Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)
If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn’t expect the difference there to be that big.
It’s a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.
It’s IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other—where supergeniuses are in some dimensions “more lucky than the base genome” (in a way that’s not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn’t yet rise to fixation). But even there I’d still be pretty surprised by the heavytail.
The heavytail of intelligence really confuses me. (Given that it doesn’t even come from sub-critical intelligence explosion dynamics.)
If each deleterious mutation decreases the success rate of something by an additive constant, but you need lots of sequential successes for intellectual achievements, then intellectual formidability is ~exponentially related to deleterious variants.
Yeah I know that’s why I said that if a major effect was through few significantly deleterious mutations this would be more plausible. But i feel like human intelligence is even more heavitailed than what one would predict given this hypothesis.
If you have many mutations that matter, then via central limit theorem the overall distribution will be roughly gaussian even though the individual ones are exponential.
(If I made a mistake maybe crunch the numbers to show me?)
(initially misunderstood what you mean where i thought complete nonsense.)
I don’t understand what you’re trying to say. Can you maybe rephrase again in more detail?
Suppose people’s probability of solving a task is uniformly distributed between 0 and 1. That’s a thin-tailed distribution.
Now consider their probability of correctly solving 2 tasks in a row. That will have a sort of triangular distribution, which has more positive skewness.
If you consider e.g. their probability of correctly solving 10 tasks in a row, then the bottom 93.3% of people will all have less than 50%, whereas e.g. the 99th percentile will have 90% chance of succeeding.
Conjunction is one of the two fundamental ways that tasks can combine, and it tends to make the tasks harder and rapidly make the upper tail do better than the lower tail, leading to an approximately-exponential element. Another fundamental way that tasks can combine is disjunction, which leads to an exponential in the opposite direction.
When you combine conjunctions and disjunctions, you get an approximately sigmoidal relationship. The location/x-axis-translation of this sigmoid depends on the task’s difficulty. And in practice, the “easy” side of this sigmoid can be automated or done quickly or similar, so really what matters is the “hard” side, and the hard side of a sigmoid is approximately exponential.
Is the following a fair paraphrasing of your main hypothesis? (I’m leaving out some subtleties with conjunctive successes, but please correct the model in that way if it’s relevant.):
“”″ Each deleterious mutation multiplies your probability of succeeding at a problem/thought by some constant. Let’s for simplicity say it’s 0.98 for all of them.
Then the expected number of successes per time for a person is proportional to 0.98^num_deleterious_mutations(person).
So the model would predict that when Person A had 10 less deleterious mutations than person B, they would on average accomplish 0.98^10 ~= 0.82 times as much in a given timeframe. ”″”
I think this model makes a lot of sense, thanks!
In itself I think it’s insufficient to explain how heavytailed human intelligence is—there were multiple cases where Einstein seems to have been able to solve problems multiple times faster than the next runner ups. But I think if you use this model in a learning setting where success means “better thinking algorithms” then if you have 10 fewer deleterious mutations it’s like having 1⁄0.82 longer training time, and there might also be compounding returns from having better thinking algorithms to getting more and richer updates to them.
Not sure whether this completely deconfuses me about how heavytailed human intelligence is, but it’s a great start.
I guess at least the heavytail is much less significant evidence for my hypothesis than I initially thought (though so far I still think my hypothesis is plausible).
It’s a pretty large part—somewhere between a third and half—just not a majority.
I was also tracking that specific hypothesis, which was why I specifically flagged “about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method)”. Again, I don’t know the method, but it sounds like it wasn’t dependent on details of the regression methods.
Continuing the “John asks embarrassing questions about how social reality actually works” series...
I’ve always heard (and seen in TV and movies) that bars and clubs are supposed to be a major place where single people pair up romantically/sexually. Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?
I’m not sure what’s up with this. Is there only a tiny fraction of bars and clubs where the matching happens? If so, how do people identify them? Am I just really, incredibly oblivious? Are bars and clubs just rare matching mechanisms in the Bay Area specifically? What’s going on here?
I get the impression that this is true for straight people, but from personal/anecdotal experience, people certainly do still pair up in gay bars/clubs.
Yeah, feels like the current zeitgeist in Anglo countries and upper middle class environments at least is that it is simply bad manners to ever approach anyone with romantic/sexual intentions lest it’s a context where everyone has explicitly agreed that’s what you’re there for (speed dating, dating app, etc).
Well, I don’t have much recent experience of dating myself, so it’s second-hand. But also, this user specifically is talking about Bay Area, and if there’s a single place and single social circle in the world where I expect this to be closest to true, “educated well-off tech people in the Bay Area” is it.
I’m not saying this is a truth anywhere and with everyone. Also, even if it’s not out of an actual social custom, I think at this point lots of people still resort to the internet as a way of looking for dates simply because the possibility is there and seemingly more direct (and lower effort). IIRC there’s data showing that the number of couples that started on the internet has dramatically increased across the last years, leaving almost all other methods behind.
I think people use the internet/apps for dating due to a combination of convenience in sorting/search, because it’s less awkward to be rejected online, and because it’s the path of least resistance, not because asking people out in person is considered rude.
It’s true that in middle-class/upper middle class circles, professional events/workplace is now considered ~off-limits for dating, which wasn’t true 30 years ago. However, that’s a big difference from what you originally said where only dating-specific events are okay.
People also do professional networking online + in dedicated networking events, but I don’t think it’s considered impolite to (eg) incidentally network in a ski lodge. Less effective, sure, but not impolite.
I’m also in the general Bay Area/tech/educated milieu, so I do have relevant anecdotal experience here[1].
eg I recently went on a few dates with a leftist girl I asked out at a stargazing thing. Neither of us thought it was impolite, I think. That said, it didn’t work out, and I guess I should’ve been able to figure that out a priori from stargazing not being the type of thing that’s sufficiently indicative of relationship compatibility.
TLDR: People often kiss/go home with each other after meeting in clubs, less so bars. This isn’t necessarily always obvious but should be observable when looking out for it.
OK, so I think most of the comments here don’t understand clubs (@Myron Hedderson’s comment has some good points though). As someone who has made out with a few people in clubs, and still goes from time to time I’ll do my best to explain my experiences.
I’ve been to bars and clubs in a bunch of places, mostly in the UK but also elsewhere in Europe and recently in Korea and South East Asia.
In my experience, bars don’t see too many hookups, especially since most people go with friends and spend most of their time talking to them. I imagine that one could end up pairing up at a bar if they were willing enough to meet new people and had a good talking game (and this also applied to the person they paired up with), but I feel like most of the actual action happens in clubs on the dancefloor.
I think matching can happen at just about any club in my experience, although I think . Most of the time it just takes the form of 2 people colliding (not necessarily literally), looking at each other, drunkeness making both much more obvious than usual and then them spending a while making out with each other. Sometimes things go beyond that point. Mostly not, in my experience although a friend recently told me that he rarely kisses girls in clubs and instead directly asks them home (apparently successfully).
I’ve seen enough people making out in clubs before to be confused as to why John hasn’t seen this sort of behaviour. I don’t know in what ways clubbing in the Bay Area is different from the UK, so I won’t speculate on that but I think that there is sometimes a difference in attitude depending on the music being played. In particular, I think people are more likely to make out to pop/classics than to e.g house. It may also just be that I’m more likely to kiss people when listening to music I enjoy.
Additional advice for clubs (heterosexual male):
Go there to enjoy the music (this may sound weird but enjoying clubs is very much a skill)
Don’t worry about pairing up with someone too much, this will remove opportunities to have fun (although you can still take actions which improve your odds)
Drink enough that you have no issues with dancing badly
When dancing, do literally any movement in time with the beat (ideally make the motions as varied as possible)
Humour is king, if something funny pops into your head do it.
Good examples: Miming the lyrics of a song (depending on the song), dancing with another guy (the more exaggerated, the more obvious it is you’re being funny), miming sex positions (you’d be shocked how many people in clubs are completely cool with this, and just find it entertaining)
If someone else does something entertaining support them (apart from anything else the more funny stuff is happening around you the more you have to bounce off of)
These tips do tend to require some extroversion—I don’t know how good this advice is macroscopically but in the clubbing scene this tends to be achieved via alcohol
If getting with girls really is the priority, then be obvious (there’s always the caveat not to do things likely to upset people, but I think that in the context of a) LessWrong b) clubs, the advice is overwhelmingly on the side of being far more forward and less worried about misdemeanours)
Pick one girl and single her out, don’t hedge your bets. Read body language (it’ll be more obvious when everyone else is drunk, and hearing each other can be a pain)
If rejected, brush yourself off and try again (probably in another part of the club, although remember having fun is the main thing so don’t abandon a good group)
The centre of the circle is centre stage—go nuts here, this is your opportunity to entertain people with the dumbest idea that just occurred to you
Caveats: this is what works for me. I have found that people consistently commenting they enjoy nights out with me significantly more than average, and I have found I enjoy nights out more when I employ these methods. I have not tried this everywhere and there have been places where I’ve felt a bit out of place (although I’d still argue I was having more fun than those around me).
I expect introverts to be scared by many of the ideas here, but I also feel like there are situations in life where acting more confident is universally better (public speaking is another example). Personally I’ve found this becomes easier with time and practise. Good luck all.
Edit: I just remembered I first got together with my ex-girlfriend at a bar. However we already knew each other and decided to meet up just the 2 of us, which is a somewhat different situation from most occasions I go to the bar.
How do you find good places and times to go? You just described exactly the sort of clubbing experience I most enjoy, but I’ve never had many close friends into it so I don’t really know where to look.
Yeah having the right friends to go with is important. I’ve recently finished university so that’s been easier for me than most, but in general I think it’s easier when going to an event with a decent number of people (I play ice hockey and so team/club dinners are a good example). With more people there’s a greater chance of there being a critical mass willing to go.
Aside from that I’ve recently been backpacking around Vietnam, Cambodia and Thailand and I’ve found that being in a hostel makes it incredibly easy to meet people and go out locally. This does require being comfortable in that environment though.
I think that all you really need is one friend who is willing to go with you, and they then become the main point of contact when you want to go.
It’s also possible to go alone, especially in communities like the backpacker community where it’s incredibly easy to meet people. This is generally a lot more sketchy in many places though as you have no backup if you e.g get spiked or drink too much.
Oh I have no problem going clubbing alone, I can have plenty of fun dancing with strangers. The hard part is finding the right club on the right night; AFAICT most of them are dead most nights. How do you solve that problem?
Oof honestly I feel like I mostly just kind of go and find a place with decent music that’s open. I normally find there’s at least one (or maybe my standards are just low), but I’d imagine that in places where that isn’t the case you’d be able to look on the good clubs websites to see when they have events.
I know that in Oxford clubs often have weekly theme nights, such as this one https://www.bridgeoxford.co.uk/wednesday. I’d imagine that a quick browse of your favourite clubs’ websites would give you a good idea of where to go when.
I’ve not done this myself* (my clubbing days were long ago now) but a few approaches:
If you live somewhere where some areas specialize in nightlife—bars, clubs, restaurants and even cool street scene—then just be a tourist there for a bit. You’ll see/find something that seems to fit for you.
Used to be “City Papers” that tended to focus on social life and what was happening during the week/month for people to learn about. So you’d hear about live music or popular DJs and where they were playing.
2a. More current take I assume would be online versions of this.
Social apps that are about meetups (One is called that) but I suspect even FB has something along these lines, which have group you can join or are open to the public that talk about what activities, where and when the get together is occurs. So will specifically state they are NOT about any hookup possibility but other are about meeting others for more than the specific activity (activity is more about the introduction and something to so rather than the whole reason for going).
Last, you might check for any pub crawls going on. Some of the stops will be good clubs to check-out and even sometimes joining the crawl will offer opportunities. Particularly true if you’re good at joining in with some new group of strangers—very good social skills required as the group needs to want you to join.
* Well, I have used Meetups for getting together with others but that was language based for learning and practicing so anyone that seemed more interested in meeting and other activities were discouraged or kicked out if overly obvious.
What’s the age range on clubbing? I’m newly single at 43 and I might have aged out of it, and a 43 year old trying to dance the way he did in high school usually looks stupid. (Or at least my late wife thought so.)
I think with enough enthusiasm anyone can go clubbing, and tbh imo stuff which looks stupid in a club just becomes entertaining. If you really feel embarrassed about it, one way to go about this is to play into the stupidity by really overexaggerating the moves to play into the humour.
I think with age the ick comes from older guys who come to look at young girls and nothing else. I have a mate who’s 49 and comes out clubbing with us, and is more enthusiastic than any of us on the dance floor and everyone loves it.
My late wife in particular thought my dancing was bad, which is why I brought it up; I mentioned the term “dad dancing” to her and she thought it was an appropriate description. (She happened to be nine years younger than I was.)
The point about making out is very valid, I’ve seen that plenty of times, and that should count as “pairing up sexually”. For whatever reason/no good reason, it didn’t occur to me to mention it in my longer comment.
From the perspective of someone who has never actually enjoyed the clubbing experience before, the above advice sounds like good advice for how to have a better time. :)
My brother met his spouse at a club in NYC, around 2008. If I recall the story correctly, he was “doing the robot” on the stage, and then she started “doing the robot” on the floor. They locked eyes, he jumped down and danced over to her, and they were married a couple years later.
(Funny to think we’re siblings, when we have such different personalities!)
In my experience, bars these days (in the era of dating apps) are less a place where straight people pair up with strangers, and more a place where they:
Go on a first/second/third date with someone they know from a dating app or mutual friend or interest group, and maybe hook up with that person
Go with a friend group, and meet someone through a mutual friend, and maybe pair up with that person
But fwiw, it still seems reasonably common for people pair up with strangers in bars/clubs where I live. I don’t think bars/clubs are the perfect solution to meeting people romantically/sexually, but they have some advantages:
Alcohol makes people more willing to approach strangers, open up personally, and judge potential partners less critically
Bars/clubs (at least in major cities) are mostly filled with strangers you won’t see again, reducing the (perceived) costs of rejection or committing some faux pas
Bars/clubs being dark and noisy makes it easier to approach someone without a lot of other people observing you
In bars and especially clubs, (good) music creates an atmosphere where people (who like that music) feel mildly intoxicated
Clubs in particular involve quite a lot of moving around (across/to/from dance floors, bars, toilets, and chill-out areas) that create opportunities to meet/interact with strangers
That said, I think 10+ years ago bars/clubs were more of a place where people paired up with strangers. My sense is that this has changed largely due to dating apps, not by making it less acceptable to approach strangers, but more that dating apps offer an (often superior) alternative way of getting dates, which means people go to bars/clubs less to meet strangers and more to spend time with friends/partners. And even if a person is still interested in going to bars/clubs to meet strangers, it is harder when most other people are just there with their friend groups and not interested in interacting with strangers.
(Bars/clubs for gay people, and especially gay men, are different. There, it is still pretty common with random hook-ups, I should think.)
Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?
Personal experience—in uni, went to bars/clubs, I was generally pretty incompetent at the flirting thing, but danced with a bunch of girls, got numbers and didn’t really know what to do after that.
A handsome, charismatic friend of mine got together with a number of women, went home with a few, etc.
A quick search found this chart from a 2019 study on how couples meet. It looks like the fraction of couples who met at a bar has actually been going up in recent decades which is not what I would have predicted. But I don’t know how reliable this study is.
A member of my family (rather normie-ish) met his current girlfriend in a bar. A similar story with an EA acquaintance. But I don’t hear stories like that very often, and also caveat that these were in Eastern Europe (Poland and Estonia, respectively).
It’s out of date given how much dating has moved to apps.
And before apps, it was friends/families, and various communities like church, more than it was bars.
Whether cause or effect, alcohol interest has gone down, so it’s only weirder to picture meeting someone in a bar.
There’s some moral panic that Gen Z doesn’t know how to talk to people in person which interacts with your question somehow. Like people will excessively mourn the loss of bar dating, when actually meeting dates while drunk sort of sucks. I’m sure there’s a kernel-of-truth here, but generational moral panics are pretty much the default.
It’s extremely sitcom-friendly in a way that staring at phones and computers isn’t.
By the time it’s in TV/movies, it’s already heavily romanticized. The best example is “Cheers” which is a bar-as-church show. But the show is made when that type of community is already bygone.
When I was dating 10 years ago people still romanticized “meeting someone organically,” but not in any serious way that would stop them from app dating.
If you want to understand how a cool bar thinks, just take the way every other business thinks—”please the customer and they’ll come back”—and do the opposite.
I call it the You’re A Little Bitch strategy. Being forced to stand in line like a tamed snail—often when it’s cold and even sometimes when the bar is empty—is your first taste of the You’re a Little Bitch strategy.
While you wait, you’ll watch several all-girl groups walk to the front of the line without waiting, where the bouncer opens the rope and lets them in. Ahead of you. Because you’re a little bitch.
When you finally get to the front, you’ll notice there’s no sign with the bar’s name anywhere, because the bar likes to watch its little bitch customers go through extra trouble to find them.
You’re then asked for your ID by someone who may not have been the biggest dick in your high school—but he was the biggest dick in someone’s high school.
My model is that the primary service the Cool Bars provide is gatekeeping, so if you’re not the kind of person big spenders want to be seen with (pretty girls and impressive men) it’s going to be a hassle.
I can’t make strong claims here, as I go to bars and clubs fairly rarely. But I second the observation that it might be different in urban vs. rural areas, or (I add) different based on type of club. For example, the bar in my dad’s family’s extremely small hometown is the local gathering spot for those who want to have a beer with friends, which is very different from the loud, confusing, crowded dance clubs where you’re packed in like sardines with people you don’t know and can’t even see clearly. I think a valid analysis has to segregate by type of bar/club. The small-town bar I’m thinking of does have live entertainment and dancing (also darts, which wouldn’t work in a darkened environment where many people are quite drunk), but it’s a very different scene.
With respect specifically to the loud, dark, crowded places, lots of people find those off-putting and don’t go, or go rarely. It is fairly common advice to look elsewhere rather than at bars/clubs for dates. But: for someone who is young and anxious and not very sure how to meet people for dates/sex, going out with friends and getting moderately to very intoxicated in a place where you will also meet people you don’t know who are in a similar situation, is a way to overcome that barrier. And the fact you can’t really tell what’s going on 10 feet away, can’t hear what other people are saying very well, and everyone is expecting this to be an environment where people are drinking, means this is a more forgiving environment to do/try things that might be judged inappropriate and/or unacceptable in other environments. If you do something very obvious to indicate your sexual interest in someone in most public places, security may be called, but on the dance floor of a club, standards of acceptable behaviour are more lax, and behaviours themselves are less consistently observable. Also, if you try something with one person and get rebuffed, few to none of the other people will know it happened, so you can try again with someone else shortly thereafter. So my sense is that there are (or, were last time I checked, which admittedly was a few years ago) a lot of young (late teens to early 20′s) people bumbling their drunken way through social interactions in clubs. I’m sure plenty of them do leave the club together and have sex, but it’s hard to know for sure. Another thought: By reputation and it seems by design, clubs are places where bad decisions are made, and so if you want to stress less about your decision-making around sex and just go have some with someone fairly random who you don’t know well (many people will not want to do this, but some do), clubs give you license to do so. Or you think of yourself as not particularly worthy of the attention of those you may be interested in and so you figure you might have a better short if everyone’s drunk and can’t see each other very well, a club is a place where this will be true.
I started this whole train of thought by considering the sentence “Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?”, and thinking that that’s true for me as well, but I do (did, when I went) see people trying to get laid—and by its nature, the environment of a club is not conducive to my monitoring the social interactions of the people around me to see how they’re going, so I wouldn’t expect to know for sure based on what I observe while in a club, who went home with who. You learn that after the fact, if your friends tell you they hooked up.
I also saw a lot of guys sitting on the sidelines and drinking, trying to build up the “liquid courage” to initiate a conversation with someone they might like the look of, and a lot of women dancing in groups, so that they can be visible without being too vulnerable.
If you don’t like alcohol but can act disinhibited anyway, does that work too? (Also there’s the issue of whether your partner is too intoxicated to give consent...)
I am not the right person to ask about what works well in clubs, as I wouldn’t say my experiences at clubs were particularly successful or enjoyable, but I very much doubt anyone would kick you out of a club for not drinking or anything like that, so give it a shot and see how it goes? You get to decide what “works” for you, in this situation, and if you had a good time that’s a success.
As for the issue of consent while very intoxicated, yes that is an issue.
I very much doubt anyone would kick you out of a club for not drinking or anything like that, so give it a shot and see how it goes
I got in a few dance battles in clubs while sober, was pretty fun. Had my first crowdsurf while sober in a club too.
The fun sober club experience very much depends on good music, being in the mood, being with friends who are very fun and you trust very deeply, etc, imo. oh, and being the kind of person who really likes music, dancing, kinda enjoys doing dumb shit, etc
This might be a cultural/region-based thing. Stop by a bar in Alabama, or even just somewhere rural, and I think there might be more use of bars as matchmaking.
I liked the explanation as provided in “Mate: Become The Man Women Want”.
Chapter 17 has a whole section on bars and clubs. In particular:
There are virtually no cultures in history that expected their young people to find mates by throwing them randomly together into dark, noisy, threatening environments, with no structured activities or reasons for interacting, and hoping they’d sort themselves out into viable pairs. Bars and clubs present the exact opposite of a safe, easy, stress-free way to meet potential mates, to display your traits and proofs, and to work your way through a normal courtship process.
Probably very true on one level (but the young need some of that type of random experience to even learn what they want or who they want to be or be with).
But I’m not sure that is relevant for John’s question, but perhaps have taken his query incorrectly and it’s not about just meeting someone new for some unspecified level of commitment, e.g., just a short-term hookup, but is asking about where to meet his next long-term partner.
My primary motivation was actually just to understand how the world works; I didn’t necessarily plan to use that information to meet anyone at all. I just noticed I was confused about something and wanted to figure out what was going on.
TBF I always felt that if you wanted to find someone, “place where you have to make your throat hurt to speak even a few simple words” ain’t it, but I’m not known for my social prowess so I guessed maybe it was just me.
If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I’d expect to be quite different from a human mind. And I don’t see the reason why this property should make a system worse at optimizing the world in principle.
Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations.
At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won’t change your u anymore.
Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable.
So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then “give u() up”, and “let u go down” would be something the system will definitely do. At least I am pretty sure I don’t know what I want the universe to look like right now unambiguously.
Maybe I am just confused because I don’t know how to think about a human upload in terms of having a utility function. It does not seem to make any sense intuitively. Sure you can look at the functional behavior of the system and say “Aha it is optimizing for u. That is the revealed preference based on the actions of the system.” But that just seems wrong to me. A lot of information seems to be lost when we are just looking at the functional behavior instead of the low-level processes that are going on inside the system. Utility functions seem to be a useful high-level model. However, it seems to ignore lots of details that are important when thinking about the reflective stability of a system.
One of the classic conceptual problems with a Solomonoff-style approach to probability, information, and stat mech is “Which Turing machine?”. The choice of Turing machine is analogous to the choice of prior in Bayesian probability. While universality means that any two Turing machines give roughly the same answers in the limit of large data (unlike two priors in Bayesian probability, where there is no universality assumption/guarantee), they can be arbitrarily different before then.
My usual answer to this problem is “well, ultimately this is all supposed to tell us things about real computational systems, so pick something which isn’t too unreasonable or complex for a real system”.
But lately I’ve been looking at Aram Ebtekar and Marcus Hutter’s Foundations of Algorithmic Thermodynamics. Based on both the paper and some discussion with Aram (along with Steve Petersen), I think there’s maybe a more satisfying answer to the choice-of-Turing-machine issue in there.
Two key pieces:
The “Comparison against Gibbs-Shannon entropy” section of the paper argues that uncomputability is a necessary feature, in order to assign entropy to individual states and still get a Second Law. The argument says: if there exists a short program which can provably find and output a high-entropy string S, then we can physically instantiate a machine to run that short program. Then, when that physical machine spits out the high-entropy string S, S could be used to erase another copy of S. In other words, there is some high-entropy state (S) which this physical machine + program could steer into a low-entropy state.
As Aram pointed out, most of the bounds have a constant for the complexity of the laws of physics. If we choose a machine for which the laws of physics have high complexity, then the bounds are quantitatively trash.
The first piece is a part of the theory which can only bind to reality insofar as our chosen Turing machine is tractable to physically implement. The second piece is a part of the theory which can only bind to reality insofar as our physics can be tractably implemented on our chosen Turing machine.
In other words: in order for this thermodynamic theory to work well, we need to choose a Turing machine which is “computationally equivalent to” physics, in the sense that our physics can run the machine without insane implementation size, and the machine can run our physics without insane implementation size.
I’m still wrapping my head around all the pieces here, so hopefully I (or, better yet, someone else) will write up a more clear explainer in the future. But this smells really promising to me. Not just for purposes of Solomonoff thermodynamics, but also as a more principled way to tackle bounded rationality of embedded systems.
Notably that post has a section arguing against roughly the sort of thing I’m arguing for:
Making the definition of what constitutes a low level language dependent on laws of physics is removing it from the realm of mathematics and philosophy. It is not a property of the language any more, but a property shared by the language and physical reality.
My response would be: yes, what-constitutes-a-low-level-language is obviously contingent on our physics and even on our engineering, not just on the language. I wouldn’t even expect aliens in our own universe to have low-level programming languages very similar to our own. Our low level languages today are extremely dependent on specific engineering choices made in the mid 20th century which are now very locked in by practice, but do not seem particularly fundamental or overdetermined, and would not be at all natural in universes with different physics or cultures with different hardware architecture. Aliens would look at our low-level languages and recognize them as low-level for our hardware, but not at all low-level for their hardware.
Analogously: choice of a good computing machine depends on the physics of one’s universe.
I do like the guy’s style of argumentation a lot, though.
the machine can run our physics without insane implementation size.
I’m well out of my depth here, and this is probably a stupid question, but given the standard views of the “known” part of our physics, does that mean that the machine can do operations on arbitrary, fully precise complex numbers in constant time?
The continuous state-space is coarse-grained into discrete cells where the dynamics are approximately markovian (the theory is currently classical) & the “laws of physics” probably refers to the stochastic matrix that specifies the transition probabilities of the discrete cells (otherwise we could probably deal with infinite precision through limit computability)
The current theory is based on classical hamiltonian mechanics, but I think the theorems apply whenever you have a markovian coarse-graining. Fermion doubling is a problem for spacetime discretization in the quantum case, so the coarse-graining might need to be different. (E.g. coarse-grain the entire hilbert space, which might have locality issues but probably not load-bearing for algorithmic thermodynamics)
On outside view, quantum reduces to classical (which admits markovian coarse-graining) in the correspondence limit, so there must be some coarse-graining that works
In practice, we only ever measure things to finite precision. To predict these observations, all we need is to be able to do these operations to any arbitrary specified precision. Runtime is not a consideration here; while time-constrained notions of entropy can also be useful, their theory becomes messier (e.g., the 2nd law won’t hold in its current form).
Good question, it’s the right sort of question to ask here, and I don’t know the answer. That does get straight into some interesting follow-up questions about e.g. the ability to physically isolate the machine from noise, which might be conceptually load-bearing for things like working with arbitrary precision quantities.
I’ve been thinking about it in terms of “but which language are we using to compute the complexity of our universe/laws of physics?”. Usually I likewise just go “only matters up to an additive constant, just assume we’re not using a Turing tarpit and we’re probably good”. If we do dig into it, though, what can we conclude?
Some thoughts:
What is the “objectively correct” reference language?
We should, of course, assume that the algorithm computing our universe is simple to describe in terms of the “natural” reference language, due to the simplicity prior. I. e., it should have support for the basic functions our universe’s physics computes. I think that’s already equivalent to “the machine can run our physics without insane implementation size”.
On the flip side, it’s allowed to lack support for functions our universe can’t cheaply compute. For example, it may not have primitive functions for solving NP-complete problems. (In theory, I think there was nothing stopping physics from having fundamental particles that absorb Traveling Salesman problems and near-instantly emit their solutions.)
Now suppose we also assume that our observations are sampled from the distribution over all observers in Tegmark 4. This means that when we’re talking about the language/TM underlying it, we’re talking about some “natural”, “objective” reference language.
What can we infer about it?
First, as mentioned, we should assume the reference language is not a Turing tarpit. After all, if we allowed reality to “think” in terms of some arbitrarily convoluted Turing-tarpit language, we could arbitrarily skew the simplicity prior.
But what is a “Turing tarpit” in that “global”/”objective” sense, not defined relative to some applications/programs? Intuitively, it feels like “one of the normal, sane languages that could easily implement all the other sane languages” should be possible to somehow formalize...
Which is to say: when we’re talking about the Kolmogorov complexity of some algorithm, in what language are we measuring it? Intuitively, we want to, in turn, pick one of the “simplest” languages to define.[1] But what language do we pick for measuring this language’s complexity? An infinite recursion follows.
Intuitively, there’s perhaps some way to short-circuit that recursion. (Perhaps by somehow defining the complexity of a language by weighing its complexity across “all” languages while prioritizing the opinions of those languages which are themselves simple in terms of whatever complexity measure this expression defines? Or something along those lines, circular definitions not always a problem. (Though see an essay Tsvi linked to which breaks down why many of those definitions don’t work.))
Regardless, if something like this is successful, we’ll get a “global” definition of what counts as a simple/natural language. This would, in turn, allow us to estimate the “objective” complexity of various problems, by measuring the length of their solutions in terms of that natural language (i. e., the length of the execution trace of a computation solving the problem). This would perhaps show that some problems are “objectively” hard, such as some theoretical/philosophical problems or the NP-complete problems.
The speed prior
What if we try to compute the complexity not of the laws of physics, but of a given observer-moment/universe-state, and penalize the higher-complexity ones?
In chaotic systems, this actually works out to the speed prior: i. e., to assuming that the later steps of a program have less realityfluid than the early ones. Two lines of reasoning:
The farther in time a state is,[2] the more precisely you have to specify the initial conditions in order to hit it.
Justification: Suppose the program’s initial state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate the universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Aside from picking the initial state that generates the observation, you also have to pinpoint that observation in the execution trace of the program. It can be as easy as defining the time-step (if you’re working with classical mechanics), or as difficult as pointing at a specific Everett branch. And pinpointing generally gets more expensive with time (even in the trivial case of “pick a time-step”, the length of the number you have to provide grows).
Anthropically, this means that the computations implementing us are (relatively) stable, and produce “interesting” states (relatively) quickly/in few steps.
Anyway, digging into the paper now...
1. Oh, I see it’s likewise concerned with the description length of states:
Gács [23] defines the coarse-grained algorithmic entropy of any individual state: roughly speaking, it is the number of bits of information that a fixed computer needs in order to identify the state’s coarse-grained cell. For example, a state in which all particles are concentrated in one location would have low entropy, because the repeated coordinates can be printed by a short program. If the coarse graining in question is Markovian, then Levin’s [24] law of randomness conservation says that the algorithmic entropy seldom decreases. In physical terms, we will come to see this as a vast generalization of the second law of thermodynamics
2. The way the paper justifies the second law of thermodynamics is neat.
My understanding of that
Suppose the microstate of a system is defined by a (set of) infinite-precision real numbers, corresponding to e. g. its coordinates in phase space.
We define the coarse-graining as a truncation of those real numbers: i. e., we fix some degree of precision.
That degree of precision could be, for example, the Planck length.
At the microstate level, the laws of physics may be deterministic and reversible.
At the macrostate level, the laws of physics are stochastic and irreversible. We define them as a Markov process, with transition probabilities P(x,y) defined as “the fraction of the microstates in the macrostate x that map to the macrostate y in the next moment”.
Over time, our ability to predict what state the system is in from our knowledge of its initial coarse-grained state + the laws of physics degrades.
Macroscopically, it’s because of the properties of the specific stochastic dynamic we have to use (this is what most of the paper is proving, I think).
Microscopically, it’s because ever-more-distant decimal digits in the definition of the initial state start influencing dynamics ever stronger. (See the multibaker map in Appendix A, the idea of “microscopic mixing” in a footnote, and also apparently Kolmogorov-Sinai entropy.)
That is: in order to better pinpoint farther-in-time states, we would have to spend more bits (either by defining more fine-grained macrostates, or maybe by locating them in the execution trace).
Thus: stochasticity, and the second law, are downstream of the fact that we cannot define the initial state with infinite precision.
3. The part about incomputability being necessary is also interesting, metaphysically.
Why must it be impossible to prove lower bounds on Kolmogorov complexity?
So, Kolmogorov complexity is upper-semicomputable. This means that, for some x:
You can prove an upper bound on K(x), just by finding a program that computes x.
You can only prove a lower bound on K(x) using a program p with K(p)>K(x). Meaning, you can’t use any fixed-size program (or formal system) to prove arbitrarily high complexity.
Imagine if it were otherwise, if some p much smaller than K(x) could prove a lower bound on K(x). Then you could use that p to cheaply pinpoint x: by setting up a program that goes through programs in order, uses p to estimate the lower bound on their K(x), then outputs the first program whose complexity is above a threshold. Which would simultaneously functions as an upper bound on K(x): since our small program was able to compute it, K(x) can’t be higher than K(p).
Thus, in order for arbitrarily complex states/programs to exist, it must be impossible to prove that they are complex.
Why? Why does that have to be the case?
Intuitively, it’s because “proving” complexity requires pointing at specific features of the state x and explaining why exactly they are complex. That is, your formal language must be expressive enough to precisely talk about those features, in their full detail. If, however, you can get away with using some abstractions/generalizations to prove x‘s complexity, that by definition decreases x’s complexity.
Impromptu poll: is structuring long-form comments this way, with collapsibles for topics, convenient, or should I have just used titles? Please react with thumbs up/down to the following statement: “collapsibles good”.
All that said,
But this smells really promising to me [...] as a more principled way to tackle bounded rationality of embedded systems.
I’m curious what you have in mind here. I’ve kind of been treating my thinking on those topics as basically recreational/a guilty pleasure. The possibility that there’s something actually useful here interests me.
If we need to use a Turing machine which is roughly equivalent to physics, then a natural next step is to drop the assumption that the machine in question is Turing complete. Just pick some class of machines which can efficiently simulate our physics, and which can be efficiently implemented in our physics. And then, one might hope, the sort of algorithmic thermodynamic theory the paper presents can carry over to that class of machines.
Probably there are some additional requirements for the machines, like some kind of composability, but I don’t know exactly what they are.
This would also likely result in a direct mapping between limits on the machines (like e.g. limited time or memory) and corresponding limits on the physical systems to which the theory applies for those machines.
The resulting theory would probably read more like classical thermo, where we’re doing thought experiments involving fairly arbitrary machines subject to just a few constraints, and surprisingly general theorems pop out.
Attempted abstraction and generalization: If we don’t know what the ideal UTM is, we can start with some arbitrary UTM U1, and use it to predict the world for a while. After (we think) we’ve gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM U2 might have updated to that posterior faster, using less bits of observation about (our universe/the string we’re predicting). You could think of this as a way to define what the ‘correct’ UTM is. But I don’t find that definition very satisfying, because the validity of this procedure for finding a good U2 depends on how correct the posterior we’ve converged on with our previous, arbitrary, U1 is. ‘The best UTM is the one that figures out the right answer the fastest’ is true, but not very useful.
Is the thermodynamics angle gaining us any more than that for defining the ‘correct’ choice of UTM?
We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we’re basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct.
I think that’s roughly correct, but it is useful...
‘The best UTM is the one that figures out the right answer the fastest’ is true, but not very useful.
Another way to frame it would be: after one has figured out the laws of physics, a good-for-these-laws-of-physics Turning machine is useful for various other things, including thermodynamics. ‘The best UTM is the one that figures out the right answer the fastest’ isn’t very useful for figuring out physics in the first place, but most of the value of understanding physics comes after it’s figured out (as we can see from regular practice today).
Also, we can make partial updates along the way. If e.g. we learn that physics is probably local but haven’t understood all of it yet, then we know that we probably want a local machine for our theory. If we e.g. learn that physics is causally acyclic, then we probably don’t want a machine with access to atomic unbounded fixed-point solvers. Etc.
I agree that this seems maybe useful for some things, but not for the “Which UTM?” question in the context of debates about Solomonoff induction specifically, and I think that’s the “Which UTM?” question we are actually kind of philosophically confused about. I don’t think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we’re confused about how to pick if we don’t have any information at all yet.
I think roughly speaking the answer is: whichever UTM you’ve been given. I aim to write a more precise answer in an upcoming paper specifically about Solomonoff induction. The gist of it is that the idea of a “better UTM” U_2 is about as absurd as that of a UTM that has hardcoded knowledge of the future: yes such UTMs exists, but there is no way to obtain it without first looking at the data, and the best way to update on data is already given by Solomonoff induction.
I also talked to Aram recently & he’s optimistic that there’s an algorithmic version of the generalized heat engine where the hot vs cold pool correspond to high vs low k-complexity strings. I’m quite interested in doing follow-up work on that
Yes! I expect the temperatures won’t quite be proportional to complexity, but we should be able to reuse the thermodynamic definition of temperature as a derivative of entropy, which we’ve now replaced by K-complexity.
This suggests that choice of the machines in algorithmic priors should be meaningful data for the purposes of agent foundations, and gives some sort of an explanation for how agents with different preferences tend to arrive at the same probabilities of events (after updating on a lot of data), and so agreeing on questions of fact, while still keeping different preferences and endorsing different decisions, so disagreeing on questions of normativity.
This just talks about the bits of program available in our physics’ subroutine of a simulation tree, rather than about a universal across Teg 4 convergence, right?
(probably the bit it does is the useful bit, I’ve just been wishing for some convergent UTM for the multiverse for philosophical satisfaction for a while)
Yeah, I’m not convinced that the problem of induction is solvable at Teg 4. However, Universes with similar primitive laws and operations to ours will tend to produce intelligences with similar built-in priors. Thus, the right UTM to use is in a sense just the one that you happen to have in your possession.
I would have just answered “It depends on what you want to do”, with there being no set best prior/Universal Turing Machine, because of theorems like the No Free Lunch theorem (and more generally a takeaway from learning/computational theories is that there is no one best prior that was always justified, contrary to the ancient philosopher’s hopes).
I will propose an answer to No Free Lunch in an upcoming paper about Solomonoff induction. It is indeed subtle and important. In the interim, Schurz’ book “Hume’s Problem Solved” is a pretty good take. Schurz and Wolpert seem to argue against each other in their writing about NFL; I’ll explain later why I think they’re both right.
For a concrete answer on what the reference machine or low-level language should be, please see this 10-minute live-discussion only about the choice of the reference machine, starting at minute 20 and ending at minute 30: https://www.youtube.com/live/FNfGoQhf2Zw?si=Pg1ppTZmlw1S-3g9&t=1206 After one hour and 18 minutes, I spend another couple of minutes answering a question about the reference formalism. After one hour and 30 minutes into the video, someone asked me whether space aliens would agree with lambda calculus. And in my paper, I have a 3-page discussion on the choice of the reference machine, Section 3.2: https://arxiv.org/pdf/2506.23194 The reason that I did not suggest that one should derive a reference machine from physics is that arriving at a consensus about the laws of physics will already have required the use of either Occam’s razor, common sense, or intuition, thus making the derivation seem circular, or otherwise, equivalent to choosing a simple reference machine directly based on its commonsensical simplicity in the first place but with extra steps through physics which might be redundant, depending on what exactly Aram’s argument was about.
Working on a paper with David, and our acknowledgments section includes a thankyou to Claude for editing. Neither David nor I remembers putting that acknowledgement there, and in fact we hadn’t intended to use Clause for editing the paper at all nor noticed it editing anything at all.
Were you by any chance writing in Cursor? I think they recently changed the UI such that it’s easier to end up in “agent mode” where it sometimes randomly does stuff.
Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?
The joke is that Claude somehow got activated on the editor, and added a line thanking itself for editing despite us not wanting it to edit anything and (as far as we’ve noticed) not editing anything else besides that one line.
Does Overleaf have such AI integration that can get “accidentally” activated, or are you using some other AI plugin?
Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn’t even have to break out, we just “accidentally” hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)
My MATS program people just spent two days on an exercise to “train a shoulder-John”.
The core exercise: I sit at the front of the room, and have a conversation with someone about their research project idea. Whenever I’m about to say anything nontrivial, I pause, and everyone discusses with a partner what they think I’m going to say next. Then we continue.
Some bells and whistles which add to the core exercise:
Record guesses and actual things said on a whiteboard
Sometimes briefly discuss why I’m saying some things and not others
After the first few rounds establish some patterns, look specifically for ideas which will take us further out of distribution
Why this particular exercise? It’s a focused, rapid-feedback way of training the sort of usually-not-very-legible skills one typically absorbs via osmosis from a mentor. It’s focused specifically on choosing project ideas, which is where most of the value in a project is (yet also where little time is typically spent, and therefore one typically does not get very much data on project choice from a mentor). Also, it’s highly scalable: I could run the exercise in a 200-person lecture hall and still expect it to basically work.
It was, by all reports, exhausting for everyone but me, and we basically did this for two full days. But a majority of participants found it high-value, and marginal returns were still not dropping quickly after two days (though at that point people started to report that they expected marginal returns to drop off soon).
I’d be interested to see other people try this exercise—e.g. it seems like Eliezer doing this with a large audience for a day or two could generate a lot of value.
This was arguably the most useful part of the SERI MATS 2 Scholars program.
Later on, we actually did this exercise with Eliezer. It was less valuable. It seemed like John was mainly prodding the people who were presenting the ideas, such that their patterns of thought would carry them in a good direction. For example, John would point out that a person proposes a one-bit experiment and asks if there isn’t a better experiment that we could do that gives us lots of information all at once.
This was very useful because when you learn what kinds of things John will say, you can say them to yourself later on, and steer your own patterns of thought in a good direction on demand. When we did this exercise with Eliezer he was mainly explaining why a particular idea would not work. Often without explaining the generator behind his criticism. This can of course still be valuable as feedback for a particular idea. However, it is much harder to extract a general reasoning pattern out of this that you can then successfully apply later in different contexts.
For example, Eliezer would criticize an idea about trying to get a really good understanding of the scientific process such that we can then give this understanding to AI alignment researchers such that they can make a lot more progress than they otherwise would. He criticized this idea as basically being too hard to execute because it is too hard to successfully communicate how to be a good scientist, even if you are a good scientist.
Assuming the assertion is correct, hearing it, doesn’t necessarily tell you how to think in different contexts such that you would correctly identify if an idea would be too hard to execute or flawed in some other way. And I am not necessarily saying that you couldn’t extract a reasoning algorithm out of the feedback, but that if you could do this, then it would take you a lot more effort and time, compared to extracting a reasoning algorithm from the things that John was saying.
Now, all of this might have been mainly an issue of Eliezer not having a good model on how this workshop would have a positive influence on the people attending it. I would guess that if John had spent more time thinking about how to communicate what the workshop is doing and how to achieve its goal, then Eliezer could have probably done a much better job.
This suggests formulation of exercises about the author’s responses to various prompts, as part of technical exposition (or explicit delimitation of a narrative by choices of the direction of its continuation). When properly used, this doesn’t seem to lose much value compared to the exercise you describe, but it’s more convenient for everyone. Potentially this congeals into a style of writing with no explicit exercises or delimitation that admits easy formulation of such exercises by the reader. This already works for content of technical writing, but less well for choices of topics/points contrasted with alternative choices.
So possibly the way to do this is by habitually mentioning alternative responses (that are expected to be plausible for the reader, while decisively, if not legibly, rejected by the author), and leading with these rather than the preferred responses. Sounds jarring and verbose, a tradeoff that needs to be worth making rather than a straight improvement.
Petrov Day thought: there’s this narrative around Petrov where one guy basically had the choice to nuke or not, and decided not to despite all the flashing red lights. But I wonder… was this one of those situations where everyone knew what had to be done (i.e. “don’t nuke”), but whoever caused the nukes to not fly was going to get demoted, so there was a game of hot potato and the loser was the one forced to “decide” to not nuke? Some facts possibly relevant here:
Petrov’s choice wasn’t actually over whether or not to fire the nukes; it was over whether or not to pass the alert up the chain of command.
Petrov himself was responsible for the design of those warning systems.
… so it sounds like Petrov was ~ the lowest-ranking person with a de-facto veto on the nuke/don’t nuke decision.
Petrov was in fact demoted afterwards.
There was another near-miss during the Cuban missile crisis, when three people on a Soviet sub had to agree to launch. There again, it was only the lowest-ranked who vetoed the launch. (It was the second-in-command; the captain and political officer both favored a launch—at least officially.)
This was the Soviet Union; supposedly (?) this sort of hot potato happened all the time.
Those are some good points. I wonder whether similar happened (or could at all happen) in other nuclear countries, where we don’t know about similar incidents—because the system haven’t collapsed there, the archives were not made public etc.
Also, it makes actually celebrating Petrov’s day as widely as possible important, because then the option for the lowest-ranked person would be: “Get demoted, but also get famous all around the world.”
Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman’s public statements pointed in the same direction too. I’ve also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.
Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn’t released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn’t be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers would move on to other things, but that’s an optimistic take, not a median world.
(To be clear, I don’t think you should be giving us much prediction-credit for that, since we didn’t talk about it publicly. I’m posting mostly because I’ve seen a decent number of people for whom the death of scaling seems to be a complete surprise and they’re not sure whether to believe it. For those people: it’s not a complete surprise, this has been quietly broadcast for a while now.)
Original GPT-4 is rumored to be a 2e25 FLOPs model. With 20K H100s that were around as clusters for more than a year, 4 months at 40% utilization gives 8e25 BF16 FLOPs. Llama 3 405B is 4e25 FLOPs. The 100K H100s clusters that are only starting to come online in the last few months give 4e26 FLOPs when training for 4 months, and 1 gigawatt 500K B200s training systems that are currently being built will give 4e27 FLOPs in 4 months.
So lack of scaling-related improvement in deployed models since GPT-4 is likely the result of only seeing the 2e25-8e25 FLOPs range of scale so far. The rumors about the new models being underwhelming are less concrete, and they are about the very first experiments in the 2e26-4e26 FLOPs range. Only by early 2025 will there be multiple 2e26+ FLOPs models from different developers to play with, the first results of the experiment in scaling considerably past GPT-4.
And in 2026, once the 300K-500K B200s clusters train some models, we’ll be observing the outcomes of scaling to 2e27-6e27 FLOPs. Only by late 2026 will there be a significant chance of reaching a scaling plateau that lasts for years, since scaling further would need $100 billion training systems that won’t get built without sufficient success, with AI accelerators improving much slower than the current rate of funding-fueled scaling.
I don’t expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we’ve been on for the past few years, and we’re not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.
Nobody admitted to trying repeated data at scale yet (so we don’t know that it doesn’t work), which from the tiny experiments can 5x the data with little penalty and 15x the data in a still-useful way. It’s not yet relevant for large models, but it might turn out that small models would greatly benefit already.
There are 15-20T tokens in datasets whose size is disclosed for current models (Llama 3, Qwen 2.5), plausibly 50T tokens of tolerable quality can be found (pretraining only needs to create useful features, not relevant behaviors). With 5x 50T tokens, even at 80 tokens/parameter[1] we can make good use of 5e27-7e27 FLOPs[2], which even a 1 gigawatt 500K B200s system of early 2026 would need 4-6 months to provide.
The isoFLOP plots (varying tokens per parameter for fixed compute) seem to get loss/perplexity basins that are quite wide, once they get about 1e20 FLOPs of compute. The basins also get wider for hybrid attention (compare 100% Attention isoFLOPs in the “Perplexity scaling analysis” Figure to the others). So it’s likely that using a slightly suboptimal tokens/parameter ratio of say 40 won’t hurt performance much at all. In which case we get to use 9e27-2e28 FLOPs by training a larger model on the same 5x 50T tokens dataset. The data wall for text data is unlikely to be a 2024-2026 issue.
Conservatively asking for much more data than Chinchilla’s 20 tokens per parameter, in light of the range of results in more recent experiments and adding some penalty for repetition of data. For example, Llama 3 had 40 tokens per parameter estimated as optimal for 4e25 FLOPs from isoFLOPs for smaller runs (up to 1e22 FLOPs, Figure 2), and linear extrapolation in log-coordinates (Figure 3) predicts that this value slowly increases with compute. But other experiments have it decreasing with compute, so this is unclear.
Use of repeated data was first demonstrated in the 2022 Galactica paper (Figure 6 and Section 5.1), at 2e23 FLOPs but without a scaling law analysis that compares with unique data or checks what happens for different numbers of repeats that add up to the same number of tokens-with-repetition. The May 2023 paper does systematic experiments with up to 1e22 FLOPs datapoints (Figure 4).
So that’s what I called “tiny experiments”. When I say that it wasn’t demonstrated at scale, I mean 1e25+ FLOPs, which is true for essentially all research literature[1]. Anchoring to this kind of scale (and being properly suspicious of results several orders of magnitude lower) is relevant because we are discussing the fate of 4e27 FLOPs runs.
The largest datapoints in measuring the Chinchilla scaling laws for Llama 3 are 1e22 FLOPs. This is then courageously used to choose the optimal model size for the 4e25 FLOPs run that uses 4,000 times more compute than the largest of the experiments.
For what it’s worth, and for the purpose of making a public prediction in case I’m wrong, my median prediction is that [some mixture of scaling + algorithmic improvements still in the LLM regime, with at least 25% gains coming from the former] will continue for another couple years. And that’s separate from my belief that if we did try to only advance through the current mixture of scale and algorithmic advancement, we’d still get much more powerful models, just slower.
I’m not very convinced by the claims about scaling hitting a wall, considering we haven’t had the compute to train models significantly larger than GPT-4 until recently. Plus other factors like post-training taking a lot of time (GPT-4 took ~6 months from the base model being completed to release, I think? And this was a lot longer than GPT-3), labs just not being good at understanding how good their models are, etc. Though I’m not sure how much of your position is closer to “scaling will be <25-50% of future gains” than “scaling gains will be marginal / negligible”, especially since a large part of this trajectory involves e.g. self-play or curated data for overcoming the data wall (would that count more as an algorithmic improvement or scaling?)
The interesting thing is that scaling parameters (next big frontier models) and scaling data (small very good models) seems to be hitting a wall simultaneously. Small models now seem to get so much data crammed into them that quantisation becomes more and more lossy. So we seem to be reaching a frontier of the performance per parameter-bits as well.
Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?
Some of the underlying evidence, like e.g. Altman’s public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.
Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.
Ever since GeneSmith’s post and some discussion downstream of it, I’ve started actively tracking potential methods for large interventions to increase adult IQ.
One obvious approach is “just make the brain bigger” via some hormonal treatment (like growth hormone or something). Major problem that runs into: the skull plates fuse during development, so the cranial vault can’t expand much; in an adult, the brain just doesn’t have much room to grow.
BUT this evening I learned a very interesting fact: ~1/2000 infants have “craniosynostosis”, a condition in which their plates fuse early. The main treatments involve surgery to open those plates back up and/or remodel the skull. Which means surgeons already have a surprisingly huge amount of experience making the cranial vault larger after plates have fused (including sometimes in adults, though this type of surgery is most common in infants AFAICT)
.… which makes me think that cranial vault remodelling followed by a course of hormones for growth (ideally targeting brain growth specifically) is actually very doable with current technology.
Well, the key time to implement an increase in brain size is when the neuron-precursors which are still capable of mitosis (unlike mature neurons) are growing. This is during fetal development, when there isn’t a skull in the way, but vaginal birth has been a limiting factor for evolution in the past.
Experiments have been done on increasing neuron count at birth in mammals via genetic engineering. I was researching this when I was actively looking for a way to increase human intelligence, before I decided that genetically engineering infants was infeasible [edit: within the timeframe of preparing for the need for AI alignment]. One example of a dramatic failure was increasing Wnt (a primary gene involved in fetal brain neuron-precursor growth) in mice. The resulting mice did successfully have larger brains, but they had a disordered macroscale connectome, so their brains functioned much worse.
it’s probably possible to get neurons back into mitosis-ready mode via some sort of crazy levin bioelectric cocktail, not that this helps us since that’s probably 3 to 30 years of research away, depending on amount of iteration needed and funding and etc etc.
Fleshing this out a bit more: insofar as development is synchronized in an organism, there usually has to be some high-level signal to trigger the synchronized transitions. Given the scale over which the signal needs to apply (i.e. across the whole brain in this case), it probably has to be one or a few small molecules which diffuse in the extracellular space. As I’m looking into possibilities here, one of my main threads is to look into both general and brain-specific developmental signal molecules in human childhood, to find candidates for the relevant molecular signals.
(One major alternative model I’m currently tracking is that the brain grows to fill the brain vault, and then stops growing. That could in-principle mechanistically work via cells picking up on local physical forces, rather than a small molecule signal. Though I don’t think that’s the most likely possibility, it would be convenient, since it would mean that just expanding the skull could induce basically-normal new brain growth by itself.)
I hope by now you’re already familiar with michael levin & his lab’s work on the subject of morphogenesis signals? Pretty much everything I’m thinking here is based on that.
Yes, it’s absolutely a combination of chemical signals and physical pressure. An interesting specific example of these two signals working together during fetal development when the pre-neurons are growing their axons. There is both chemotaxis which steers the ameoba-like tip of the growing axon, and at the same time a substantial stretching force along the length of the axon. The stretching happens because the cells in-between the origin and current location of the axon tip are dividing and expanding. The long distance axons in the brain start their growth relatively early on in fetal development when the brain is quite small, and have gotten stretched quite a lot by the time the brain is near to birth size.
Neurons are really really hard to reverse. You are much better off using existing neural stem cells (adults retain a population in the hippocampus which spawn new neurons throughout life just specifically in the memory formation area.)
So actually it’s pretty straightforward to get new immature neurons for an adult. The hard part is inserting them without doing damage to existing neurons, and then getting them to connect in helpful rather than harmful ways. The developmental chemotaxis signals are no longer present, and the existing neurons are now embedded in a physically hardened extracellular matrix made of protein that locks axons and dendrites in place. So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it. Plus, you don’t have the stretching forces, so new long distance axons are just definitely not going to be achievable. But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.
My hope here would be that a few upstream developmental signals can trigger the matrix softening, re-formation of the chemotactic signal gradient, and whatever other unknown factors are needed, all at once.
The developmental chemotaxis signals are no longer present,
Right. what I’m imagining is designing a new chemotaxis signal.
So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it
That certainly does sound like a very hard part yup.
Plus, you don’t have the stretching forces, so new long distance axons are just definitely not going to be achievable.
Roll to disbelieve in full generality, sounds like a perfectly reasonable claim for any sort of sane research timeframe.
But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.
Maybe. I think you might run out of room pretty quick if you haven’t reintroduced enough plasticity to grow new neurons. Seems like you’re gonna need a lot of new neurons, not just a few, in order to get a significant change in capability. Might be wrong about that, but it’s my current hunch.
Yes, ok. Not in full generality. It’s not prohibited by physics, just like 2 OOMs more difficult. So yeah, in a future with ASI, could certainly be done.
15 years ago when I was studying this actively I could have sent you my top 20 favorite academic papers on the subject, or recommended a particular chapter of a particular textbook. I no longer remember these specifics. Now I can only gesture vaguely at Google scholar and search terms like “fetal neurogenesis” or “fetal prefrontal cortex development”. I did this, and browsed through a hundred or so paper titles, and then a dozen or so abstracts, and then skimmed three or four of the most promising papers, and then selected this one for you. https://www.nature.com/articles/s41386-021-01137-9
Seems like a pretty comprehensive overview which doesn’t get too lost in minor technical detail.
More importantly, I can give you my takeaway from years of reading many many papers on the subject.
If you want to make a genius baby, there are lots more factors involved than simply neuron count. Messing about with generic changes is hard, and you need to test your ideas in animal models first, and the whole process can take years even ignoring ethical considerations or budget.
There is an easier and more effective way to get super genius babies, and that method should be exhausted before resorting to genetic engineering.
The easy way: find a really smart woman, ideally young. Surgically remove one of her ovaries. Collect sperm from a bunch of very smart men (ideally with diverse genetic backgrounds). Have a team of hundreds of scientists carefully fertilize many thousands of eggs from the ovary.
Grow them all into blastocysts, and run a high fidelity genetic sequencing on all of them. Using what we know about the genes associated with intelligence, pick the top 20 who seem likely to be the smartest. Implant those in surrogate mothers. Take good care of the mothers.
This is likely to get you multiple nobel level geniuses, and possibly a human smarter than has ever been born before.
Raise the children in a special accelerated education environment.
I think this would work, and it doesn’t require any novel technology.
But it would take a while to raise the children… (Credit to Stephen Hsu for the idea)
Brain expansion also occurs after various insults to the brain. It’s only temporary, usually, but it will kill unless the skull pressure is somehow relieved. So there are various surgical methods for relieving pressure on a growing brain. I don’t know much more than this.
I’ve been trying to push against the tendency for everyone to talk about FTX drama lately, but I have some generalizable points on the topic which I haven’t seen anybody else make, so here they are. (Be warned that I may just ignore responses, I don’t really want to dump energy into FTC drama.)
Summary: based on having worked in startups a fair bit, Sam Bankman-Fried’s description of what happened sounds probably accurate; I think he mostly wasn’t lying. I think other people do not really get the extent to which fast-growing companies are hectic and chaotic and full of sketchy quick-and-dirty workarounds and nobody has a comprehensive view of what’s going on.
Long version: at this point, the assumption/consensus among most people I hear from seems to be that FTX committed intentional, outright fraud. And my current best guess is that that’s mostly false. (Maybe in the very last couple weeks before the collapse they toed the line into outright lies as a desperation measure, but even then I think they were in pretty grey territory.)
Key pieces of the story as I currently understand it:
Moving money into/out of crypto exchanges is a pain. At some point a quick-and-dirty solution was for customers to send money to Alameda (Sam Bankman-Fried’s crypto hedge fund), and then Alameda would credit them somehow on FTX.
Customers did rather a lot of that. Like, $8B worth.
The FTX/Alameda team weren’t paying attention to those particular liabilities; they got lost in the shuffle.
At some point in the weeks before the collapse, when FTX was already under moderate financial strain, somebody noticed the $8B liability sitting around. And that took them from “moderate strain” to “implode”.
How this contrasts with what seems-to-me to be the “standard story”: most people seem to assume that it is just totally implausible to accidentally lose track of an $8B liability. Especially when the liability was already generated via the decidedly questionable practice of routing customer funds for the exchange through a hedge fund owned by the same people. And therefore it must have been intentional—in particular, most people seem to think the liability was intentionally hidden.
I think the main reason I disagree with others on this is that I’ve worked at a startup. About 5 startups, in fact, over the course of about 5 years.
The story where there was a quick-and-dirty solution (which was definitely sketchy but not ill-intentioned), and then stuff got lost in the shuffle, and then one day it turns out that there’s a giant unanticipated liability on the balance sheet… that’s exactly how things go, all the time. I personally was at a startup which had to undergo a firesale because the accounting overlooked something. And I’ve certainly done plenty of sketchy-but-not-ill-intentioned things at startups, as quick-and-dirty solutions. The story that SBF told about what happened sounds like exactly the sort of things I’ve seen happen at startups many times before.
I think this is likely wrong. I agree that there is a plausible story here, but given the case that Sam seems to have lied multiple times in confirmed contexts (for example when saying that FTX has never touched customer deposits), and people’s experiences at early Alameda, I think it is pretty likely that Sam was lying quite frequently, and had done various smaller instances of fraud.
I don’t think the whole FTX thing was a ponzi scheme, and as far as I can tell FTX the platform itself (if it hadn’t burned all of its trust in the last 3 weeks), would have been worth $1-3B in an honest evaluation of what was going on.
But I also expect that when Sam used customer deposits he was well-aware that he was committing fraud, and others in the company were too. And he was also aware that there was a chance that things could blow up in the way it did. I do believe that they had fucked up their accounting in a way that caused Sam to fail to orient to the situation effectively, but all of this was many months after they had already committed major crimes and trust violations after touching customer funds as a custodian.
The problem with this explanation is that there is a very clear delineation here between not-fraud and fraud. It is the difference between not touching customer deposits and touching them. Your explanation doesn’t dispute that they were knowingly and intentionally touching customer deposits. In that case, it is indisputably intentional, outright fraud. The only thing left to discuss is whether they knew the extent of the fraud or how risky it was.
I don’t think it was ill-intentioned based on SBF’s moral compass. He just had the belief, “I will pass a small amount of risk onto our customers, tell some small lies, and this will allow us to make more money for charity. This is net positive for the world.” Then the risks mounted, the web of lies became more complicated to navigate, and it just snowballed from there.
Everyone says flirting is about a “dance of ambiguous escalation”, in which both people send progressively more aggressive/obvious hints of sexual intent in conversation.
But, like… I don’t think I have ever noticed two people actually do this? Is it a thing which people actually do, or one of those things which like 2% of the population does and everyone else just talks about a lot and it mostly doesn’t actually work in practice (like cold approaches)? Have you personally done the thing successfully with another person, with both of you actually picking up on the other person’s hints? Have you personally seen two other people do the thing firsthand, where they actually picked up on each others’ hints?
EDIT-TO-ADD: Those who have agree/disagree voted, I don’t know if agree/disagree indicates that you have/haven’t done the thing, or if agree/disagree indicates that you also have/haven’t ever noticed anyone (including yourself) successfully do the thing, or something else entirely.
Yes, I’ve had this experience many times and I’m aware of many other cases of it happening.
Maybe the proliferation of dating apps means that it happens somewhat less than it used to, because when you meet up with someone from a dating app, there’s a bit more common knowledge of mutual interest than there is when you’re flirting in real life?
The classic setting is a party (a place where you meet potential romantic partners who you don’t already know (or who you otherwise know from professional settings where flirting is inappropriate), and where conversations are freely starting and ending, such that when you start talking to someone the conversation might either go for two minutes or four hours).
Examples of hints:
Mentioning things that indicate that you’re romantically available, e.g. saying that you’re single, that you’re poly, telling a story of recently going on a date; more extreme would be telling a story of doing something promiscuous.
Mentioning things that indicate that you want to relate to the other person in a romantic or sexual context rather than a non-sexual way. For example, a woman talking about how she likes wearing revealing clothes, or commenting on her body or my body. And then responding positively to that kind of statement, e.g. building on it rather than demurring, replying flatly, or changing the subject,
Offering and accepting invitations to spend more time interacting one-on-one, especially in semi-private places. E.g. asking to sit together. (For example, person A might say “I’m getting a drink, want me to get you one?”, which is sort of an invitation to have a drink together, and person B might say “sure, let’s sit over there to have it”, which escalates the invitation to involve them talking for longer.)
Giving and accepting opportunities for physical contact.
In all cases, saying those things is more flirty if it was unnecessary for them to say it. E.g. if they say they’re single because it came up in conversation in a way that they couldn’t have contrived, that’s less flirty than if they tell a story that brings it up.
I think that online content on all this stuff is often pretty accurate.
I know this is LessWrong, and that sexual norms are different in the Bay Area, but for the average person:
Please don’t tell prospective romantic interests that you “went on a date recently” or that you did something promiscuous. The majority of the time, it would be interpreted as a sign you’re taken. Of course, if you elaborate that the date didn’t work out, that’s a different story.
I think that saying you went on a date usually is evidence that you’re not in a monogamous relationship, and if it’s ambiguous it gives the other person an opportunity to say “oh, how did it go?” which gives you an opportunity to subtly clarify that it was a casual date (and so confirm that you’re in the market for casual dating).
I guess “I was alone and masturbated recently” also wouldn’t work well, so… what are the proper words to suggest that I am available? :D
The only thing that comes to my mind, is that if you arrived with a person of the opposite sex, to explicitly mention that they are not your boyfriend/girlfriend.
Hmm.. That’s actually a tough question. As far as I can remember, I’ve rarely had to tell people outright that I’m single.
My recommendation would be to flirt away, and if they don’t casually namedrop a boyfriend or allude to having one, that’s strong enough evidence that they’re not taken.
>The only thing that comes to my mind, is that if you arrived with a person of the opposite sex, to explicitly mention that they are not your boyfriend/girlfriend.
Most tactful way to say as much would be to explicitly call them a “friend”. That should get the message across.
My disagree vote means: yes, this obviously happens a lot, and the fact that you haven’t noticed this happening, to the point you think it might be made up, reveals a huge blindspot of one kind or another.
in crazy ex-girlfriend “I’m Going to the Beach with Josh and His Friends!”, there’s a scene between White Josh and Derrick. I can’t find a clip, but the key is that Derrick is hanging on to White Josh’s every word.
Ted Lasso:
Note how and how much she’s laughing at his very mediocre jokes. Ted could reasonably be interpreted as flirting back, but the audience knows he always make those stupid ass jokes. Actually the whole Ted Lasso show might be good for watching someone who’s generally very playful and seeing how it changes when he’s actually into someone.
Roy and Keeley, also from Ted Lasso. Note she’s dating his teammate.
Roy and some lady, still from Ted Lasso
Note how long she looks at him around 0:50, even though it’s awkward while she’s putting something away. She also contrives a way to ask if he’s married, and makes an interesting face when he says no. He is giving her enough breadcrumbs to continue but not flirting back (because he’s still into Keeley).
Half of the movie Challengers (including between the two ambiguously platonic male leads)
[At this point John contacted me offline and clarified he wanted examples of flirting that successfully end with asking someone out, but I didn’t want to throw away my work]
Pretty sure My Name Is Earl: The Professor has this but can’t find a clip. Also the first season of Ted Lasso.
I second the point about physical touch being important, and add: in my experience what you’re going for when flirting isn’t “ambiguous signal” but “plausible deniability”. The level of ambiguity is to be minimized, subject to the constraint that plausible deniability is maintained—ambiguity is an unfortunate side-effect, not something you’re aiming to modulate directly. Why you want plausible deniability: If the person doesn’t respond, or responds in the negative, you want to be able to back off without embarrassment to either party and pretend nothing happened/you were just being friendly/etc. You want to send a signal that is clear enough the other person will pick up on it, but can plausibly claim not to have done so if asked, so you’re not backing them into a corner socially where they have to give you a definite yes/no. Similar to the advice not to flirt in an elevator or other enclosed space the person you’re flirting with can’t easily leave, except the “enclosed space” is the space of possible social responses.
Once you’ve done a few things they ought to have picked up on, and no negative and some seemingly positive interaction has occurred afterwards (physical proximity has increased, verbally things seem to be going well, they’re smiling… if they’ve picked up your attempts at signaling and would like it to stop typically none of that will happen) you can try a physical touch. Something small and nonsexual. Particularly if you’re dealing with a new person or a friend you have never touched before, this usually doesn’t happen by accident—and you can do it in a way that is definitely a deliberate choice on your part, but still plausibly deniable/something both of you can walk away from as a signal of sexual interest. If you get a touch back soon after, you’re good to go (by which I mean, continue escalating in a way that is no longer very plausibly deniable), if you don’t, either the person is socially unskilled, or you’ve misread the situation, but in any case it’s their turn.
Once you’ve done a few things they ought to have picked up on, and no negative and some seemingly positive interaction has occurred afterwards...
One possibility in my hypothesis space here is that there usually isn’t a mutual dance of plausibly-deniable signals, but instead one person sending progressively less deniable signals and the other person just not responding negatively (but not otherwise sending signals themselves).
I imagine that can happen for a while, but if I’m getting nothing back, I stop once I’m pretty sure they should have noticed what I’m doing. Silence in response to a received message, is a form of response, and not one that indicates “keep getting progressively less subtle please”.
If that is the wrong move (the person is interested in me continuing), they will let me know once I back off.
Another thought: You refer to this as a dance, and one model of what’s happening when one flirts is “demonstrate social skill/difficult-to-fake signal of intelligence by calibrating levels of ambiguity and successfully modeling the other person’s mind --> this is attractive --> get date”, in the same way that dancing successfully in an actual dance can be “demonstrate physical skill/difficult-to-fake signal of health --> this is attractive --> get date”. And I’m sure that happens sometimes, and for some people, but my model of flirting does not involve “demonstrate social skill/intelligence --> get date”. For me, flirting solves a different problem, which is “communicate that you like someone (in the sense one likes people one might like to date), and have them communicate back that they like you, without either of you risking much embarrassment or social awkwardness if it’s not mutual or for any other reason a date can’t happen right now”.
Depending on what you’re trying to do by flirting (demonstrate social skill vs. give someone you’re attracted to a low-pressure way to tell you whether they like you back) the approach may be different. Although, even the latter can be a tricky thing to do and ability to do it successfully demonstrates a useful skill.
I think most people who flirt are like, not super socially skilled around people they’re attracted to, and “try to get a sense of whether it’s mutual in a low-risk way” is the more important problem that flirting solves for them. But maybe that’s just me typical-minding :).
Also: the higher the number of spectators, the more you have to be very careful about plausible deniability, because you have to take into consideration what everyone is going to think, and the level of social awkwardness involved in a fumble or a rejection is higher. I’ve flirted with a few women before, but it only lasts more than a few seconds if the woman is flirting back, and I have always done it 1:1 rather than with a group of onlookers. And whenever I’ve noticed someone who might be flirting with me, it has likewise been in a 1:1 situation, at least initially. So it doesn’t surprise me that you haven’t noticed others doing this. Anything done in front of a group has to be so unclear to onlookers that most people would miss it, something like an inside joke or reference to a past conversation.
What is this context in which you are hanging out 1:1 with a woman and it’s not already explicitly a date? (I mean, that of course does happen sometimes, but at least for me it’s not particularly common, so I’m wondering what the contexts were when this actually happened to you.)
Um… well, first off, flirting doesn’t have to happen when you’re hanging out. It can start with something as simple as a compliment to a stranger. Start from the premise that people like to hear positive messages about themselves without any strings attached, and hand them out like candy (but recognizing that taking candy from strangers is something some people would prefer not to do for obvious reasons, so accept whatever response you get to what is offered) - some people will respond back, others won’t, but no harm will be done. I am an introvert so I don’t do this often, but striking up conversations with new people at random is a thing I can force myself to do, and it rarely goes as poorly as one might fear.
But also, my friend-group is mixed, more women than men, and typically it’s people I’ve met one at a time over the years, less of a “friend group” than “a number of people who are my friends”- so I have lots of 1:1 time with female friends. In terms of flirting with those friends, well, they’re friends, so that almost never happens—but almost never is not never. Three times that I can recall off the top of my head, it turned out that one of my friends was attracted to me, and I learned that either because she explicitly said so (in one case, we were teenagers and both clueless about how to flirt, her idea was to follow me around everywhere, and from my perspective I just didn’t know that was a thing that I should notice) or because of some flirting (two cases). When I was younger and much, much more awkward, there were innumerable instances where I was attracted to a female friend and didn’t say anything because from young-me’s perspective of course not that’s insane and I’m lucky this amazing person even wants to be my friend and allow me to continue to be in her presence. There was once when I did say something to a good friend and it wasn’t reciprocated, we’re still close friends, but that wasn’t flirting so much as “we’ve just met had lunch because you suggested it, and I’m feeling some attraction—you? Nope? Ok then, I still think you’re awesome and we should be friends”. There have also been a couple instances where I’ve met someone at an activity or through other friends or at work, hinted at an attraction, she’d hinted back, we’d done something low-stakes like going for a walk together or having a coffee, but it wasn’t an official date or anything, and there was some attempted flirting with mixed success in that context.
What I’m picturing if I was back on the dating market (I’m with a good partner currently, hopefully in perpetuity) is, if I met a woman outside of a dating app who I’d like to date or add to my list of woman friends, depending on how she feels (I tend not to date people just for the hotness, they’ve got to be someone I could be friends with too), we’d probably do something low-stakes 1:1 that wasn’t officially a date or not a date, and depending on how that went, either become friends, go on dates, or part ways. And in the initial figuring out how things were going to go, there would likely be some flirting. At least, I expect that’s how it’d go.
I’m not so deliberate/strategic about it, but yeah. Like, there’s another ‘algorithm’ that’s more intuitive, which is something like “When interacting with the person, it’s ~always an active part of your mental landscape that you’re into them, and this naturally affects your words and actions. Also, you don’t want to make them uncomfortable, so you suppress anything that you think they wouldn’t welcome”. This produces approximately the same policy, because you’ll naturally leak some bits about your interest in them, and you’ll naturally be monitoring their behaviour to estimate their interest in you, in order to inform your understanding of what they would welcome from you. As you gather more evidence that they’re interested, you’ll automatically become more free in allowing your interest to show, resulting in ~the same ‘escalation of signals of interest’.
I think the key thing about this is like “flirting is not fundamentally about causing someone to be attracted to you, it’s about gracefully navigating the realisation that you’re both attracted to each other”. This is somewhat confused by the fact that “ability to gracefully navigate social situations” is itself attractive, so flirting well can in itself make someone more attracted to you. But I claim that this isn’t fundamentally different from the person seeing you skillfully break up a fight or lead a team through a difficult situation, etc.
Flirting is not fundamentally about causing someone to be attracted to you.
Notwithstanding, I think flirting is substantially (perhaps even fundamentally) about both (i) attraction, and (ii) seduction. Moreover, I think your model is too symmetric between the parties, both in terms of information-symmetry and desire-symmetry across time.
My model of flirting is roughly:
Alice attracts Bob → Bob tries attracting Alice → Alice reveals Bob attracts Alice → Bob tries seducing Alice → Alice reveals Bob seduces Alice → Initiation
I never did quite that thing successfully. I did have one time when I dropped progressively unsubtle hints on a guy, who remained stubbornly oblivious for a long time until he finally got the message and reciprocated.
I interpret the confusion around flirting as “life imitating art” — specifically, there is a cultural narrative about how flirting works that a lot of socially awkward people are trying to implement.
That means there are big discrepancies between how experts flirt and how most people flirt. It also means that most people have to learn how to read the flirtation signals of other low-flirtation-skill people.
The cultural narrative around flirting therefore doesn’t exactly match practice, even though it influences practice.
It doesn’t necessarily take that much flirting to build enough confidence to ask someone out. Are they alone at a party? Is your conversation with them going on longer than for most people? Is it fun? You’re all set.
Have you personally done the thing successfully with another person, with both of you actually picking up on the other person’s hints?
Yes. But usually the escalation happens over weeks or months, over multiple conversations (at least in my relatively awkward nerd experience). So it’d be difficult to notice people doing this. Maybe twice I’ve been in situations where hints escalated within a day or two, but both were building from a non-zero level of suspected interest. But none of these would have been easy to notice from the outside, except maybe at a couple of moments.
Are people using escalating hints to express romantic/sexual interest in general?
Does it follow the specific conversational patterns usually used?
1 is true in my experience, while 2 usually isn’t. I can think of two examples where I’ve flirted by escalating signals. In both cases it was more to do with escalating physical touch and proximity, though verbal tone also played a part. I would guess that the typical examples of 2 you normally see (like A complimenting B’s choice of shoes, then the B using a mild verbal innuendo, then A making a comment about the B’s figure) don’t happen as often, since not many people are good enough wordsmiths to do the escalation purely verbally.
Plus it’s not the Victorian era anymore and it’s acceptable to escalate by slowly leaning forward as the conversation progresses, almost-accidentally brushing someone’s hand, etc.
One of the first things that (shy?) people use to gauge each other’s interests before or instead of talking about anything explicit is eye contact. So I think that wearing your glasses puts you at a disadvantage unless you take them off when you are flirting. I’m not sure why you’re wearing them, but taking them off in itself could be a flirty move. I am not particularly good at flirting. But I remember in 9th grade a girl I had flirted with for like half an hour at an event via eye contact. We didn’t exchange more than ~3 sentences in person (there were no innuendos). Then she called me later that same day, asking me out explicitly if I wanted to be her boyfriend.
I’m pretty sure I wouldn’t escalate those signs above a rather low threshold given any observers, and my intuition tells me other people would be similar in this regard. So not observing flirting could just imply people don’t flirt if you’re in the conversation with them. As an extreme example, I’ve never seen anyone having sex, but it seems as if people do that all the time.
In model flirting is about showing that you are paying attention. You say things that you could only pick up if you pay close attention to me and what I say. It’s like a cryptographic proof certificate, showing that you think that I am important enough to pay attention to continuously. Usually this is coupled with an optimization process of using that knowledge to make me feel good, e.g. given a compliment that actually tracks reality in a way I care about.
It’s more general than just showing sexual interest I think.
I’ve seen it happen, and have done it myself with decent success.
As @Buck notes below, dating apps, which are now a majority share of how people begin or seek to begin relationships, are far more targeted. There’s little plausible deniability involved, both of you are talking on Tinder.
Not that there isn’t some, of course. There are mind games afoot where people claim to be interested only in long-term relationships, but if you’re attractive enough, they might easily accept something shorter with no strings attached. Conversely, there are people who state they’re looking for a quick romp, but are hiding the degree of yearning they contain for something more serious.
It’s hard to break it down into a play-by-play, but in my experience, flirting starts out with friendly interactions, obvious or not so obvious signs that you’re single, gauging the reception of jokes or compliments, and then grows from there. The more you gradually establish compatibility and interest, the easier it gets to stop beating around the bush.
Word through the grapevine, for those who haven’t heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn’t just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.
Of those, I think FAI is the only one at risk of OP being unable to fund them, based on my guess of where things are leaning. I would be quite surprised if they defunded the other ones on bipartisan grounds.
Possibly you meant to say something more narrow like “even if you are trying to be bipartisan, if you lean right, then OP is substantially less likely to fund you” which I do think is likely true, though my guess is you meant the stronger statement, which I think is false.
Curious whether this is a different source than me. My current best model was described in this comment, which is a bit different (and indeed, my sense was that if you are bipartisan, you might be fine, or might not, depending on whether you seem more connected to the political right, and whether people might associate you with the right):
Yep, my model is that OP does fund things that are explicitly bipartisan (like, they are not currently filtering on being actively affiliated with the left). My sense is in-practice it’s a fine balance and if there was some high-profile thing where Horizon became more associated with the right (like maybe some alumni becomes prominent in the republican party and very publicly credits Horizon for that, or there is some scandal involving someone on the right who is a Horizon alumni), then I do think their OP funding would have a decent chance of being jeopardized, and the same is not true on the left.
Another part of my model is that one of the key things about Horizon is that they are of a similar school of PR as OP themselves. They don’t make public statements. They try to look very professional. They are probably very happy to compromise on messaging and public comms with Open Phil and be responsive to almost any request that OP would have messaging wise. That makes up for a lot. I think if you had a more communicative and outspoken organization with a similar mission to Horizon, I think the funding situation would be a bunch dicier (though my guess is if they were competent, an organization like that could still get funding).
More broadly, I am not saying “OP staff want to only support organizations on the left”. My sense is that many individual OP staff would love to fund more organizations on the right, and would hate for polarization to occur, but that organizationally and because of constraints by Dustin, they can’t, and so you will see them fund organizations that aim for more engagement with the right, but there will be relatively hard lines and constraints that will mostly prevent that.
If it is true that OP has withdrawn funding from explicitly bipartisan orgs, even if not commonly associated with the right, then that would be an additional update for me, so am curious whether this is mostly downstream of my interpretations or whether you have additional sources.
I am posting this now mostly because I’ve heard it from multiple sources. I don’t know to what extent those sources are themselves correlated (i.e. whether or not the rumor started from one person).
However, at present, it remains the case that most of the individuals in the current field of AI governance and policy (whether we fund them or not) are personally left-of-center and have more left-of-center policy networks. Therefore, we think AI policy work that engages conservative audiences is especially urgent and neglected, and we regularly recommend right-of-center funding opportunities in this category to several funders.
I think the comment more confirms than disconfirms John’s comment (though I still think it’s too broad for other reasons). OP “funding” something historically has basically always meant recommending a grant to GV. Luke’s language to me suggests that indeed the right of center grants are no longer referred to GV (based on a vague vibe of how he refers to funders in plural).
OP has always made some grant recommendations to other funders (historically OP would probably describe those grants as “rejected but referred to an external funder”). As Luke says, those are usually ignored, and OP’s counterfactual effect on those grants is much less, and IMO it would be inaccurate to describe those recommendations as “OP funding something”. As I said in the comment I quote in the thread, most OP staff would like to fund things right of center, but GV does not seem to want to, as such the only choice OP has is to refer them to other funders (which sometimes works, but mostly doesn’t).
As another piece of evidence, when OP defunded all the orgs that GV didn’t want to fund anymore, the communication emails that OP sent said that “Open Philanthropy is exiting funding area X” or “exiting organization X”. By the same use of language, yes, it seems like OP has exited funding right-of-center policy work.
(I think it would make sense to taboo “OP funding X” in future conversations to avoid confusion, but also, I think historically it was very meaningfully the case that getting funded by GV is much better described as “getting funded by OP” given that you would never talk to anyone at GV and the opinions of anyone at GV would basically have no influence on you getting funded. Things are different now, and in a meaningful sense OP isn’t funding anyone anymore, they are just recommending grants to others, and it matters more what those others think then what OP staff thinks)
Takeaways From “The Idea Factory: Bell Labs And The Great Age Of American Innovation”
Main takeaway: to the extent that Bell Labs did basic research, it actually wasn’t all that far ahead of others. Their major breakthroughs would almost certainly have happened not-much-later, even in a world without Bell Labs.
There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.
Most inventions which were central to Bell Labs actually started elsewhere. The travelling-wave tube started in an academic lab. The idea for fiber optic cable went way back, but it got its big kick at Corning. The maser and laser both started in universities. The ideas were only later picked up by Bell.
In other cases, the ideas were “easy enough to find” that they popped up more than once, independently, and were mostly-ignored long before deployment—communication satellites and cell communications, for instance.
The only fundamental breakthrough which does not seem like it would have soon appeared in a counterfactual world was Shannon’s information theory.
So where was Bell’s big achievement? Mostly in development, and the research division was actually an important component of that. Without in-house researchers chewing on the same problems as the academic labs, keeping up-to-date with all the latest findings and running into the same barriers themselves, the development handoff would have been much harder. Many of Bell Labs’ key people were quite explicitly there to be consulted—i.e. “ask the guy who wrote the book”. I think it makes most sense to view most of the Labs’ research that way. It was only slightly ahead of the rest of the world at best (Shannon excepted), and often behind, but having those researchers around probably made it a lot easier to get new inventions into production.
Major reason this matters: a lot of people say that Bell was able to make big investments in fundamental research because they had unusually-long time horizons, protected by a monopoly and a cozy government arrangement (essentially a Schumpeterian view). This is contrasted to today’s silicon valley, where horizons are usually short. But if Bell’s researchers generally weren’t significantly ahead of others, and mostly just helped get things to market faster, then this doesn’t seem to matter as much. The important question is not whether something silicon-valley-like induces more/less fundamental research in industrial labs, but whether academics heeding the siren call of startup profits can get innovations to market as quickly as Bell Labs’ in-house team could. And by that metric, silicon valley looks pretty good: Bell Labs could get some impressive things through the pipe very quickly when rushed, but they usually had no reason to hurry, and they acted accordingly.
I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of “make communication reliable and practical between any two places on earth”. When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn’t do anything of the same significance because he’d lose that “compass”. Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.
My main takeaway: the bill is mostly a recipe for regulatory capture, and that’s basically unavoidable using anything even remotely similar to the structure of this bill. (To be clear, regulatory capture is not necessarily a bad thing on net in this case.)
During the first few years after the bill goes into effect, companies affected are supposed to write and then implement a plan to address various risks. What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing. Or, worse, those symbolic-gesture plans will become the new standard going forward.
In order to avoid this problem, someone at some point would need to (a) have the technical knowledge to evaluate how well the plans actually address the various risks, and (b) have the incentive to actually do so.
Which brings us to the real underlying problem here: there is basically no legible category of person who has the requisite technical knowledge and also the financial/status incentive to evaluate those plans for real.
(The same problem also applies to the board of the new regulatory body, once past the first few years.)
Having noticed that problem as a major bottleneck to useful legislation, I’m now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group—the insurers—who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.
What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing.
The only enforcement mechanism that the bill has is that the Attorney General (AG) of California can bring a civil claim. And, the penalties are quite limited except for damages. So, in practice, this bill mostly establishes liability enforced by the AG.
So, the way I think this will go is:
The AI lab implements a plan and must provide this plan to the AG.
If an incident occurs which causes massive damages (probably ball park of $500 million in damages given language elsewhere in the bill), then the AG might decide to sue.
A civil court will decide whether the AI lab had a reasonable plan.
I don’t see why you think “the bill is mostly a recipe for regulatory capture” given that no regulatory body will be established and it de facto does something very similar to the proposal you were suggesting (impose liability for catastrophes). (It doesn’t require insurance, but I don’t really see why self insuring is notably different.)
(Maybe you just mean that if a given safety case doesn’t result in that AI lab being sued by the AG, then there will be a precedent established that this plan is acceptable? I don’t think not being sued really establishes precedent. This doesn’t really seem to be how it works with liability and similar types of requirements in other industries from my understanding. Or maybe you mean that the AI lab will win cases despite having bad safety plans and this will make a precedent?)
(To be clear, I’m worried that the bill might be unnecessarily burdensome because it no longer has a limited duty exemption and thus the law doesn’t make it clear that weak performance on capability evals can be sufficient to establish a good case for safety. I also think the quantity of damages considered a “Critical harm” is too low and should maybe be 10x higher.)
Here is the relevant section of the bill discussing enforcement:
The [AG is] entitled to recover all of the following in addition to any civil penalties specified in this chapter:
(1) A civil penalty for a violation that occurs on or after January 1, 2026, in an amount not exceeding 10 percent of the cost of the quantity of computing power used to train the covered model to be calculated using average market prices of cloud compute at the time of training for a first violation and in an amount not exceeding 30 percent of that value for any subsequent violation.
(2) (A) Injunctive or declaratory relief, including, but not limited to, orders to modify, implement a full shutdown, or delete the covered model and any covered model derivatives controlled by the developer.
(B) The court may only order relief under this paragraph for a covered model that has caused death or bodily harm to another human, harm to property, theft or misappropriation of property, or constitutes an imminent risk or threat to public safety.
(3) (A) Monetary damages.
(B) Punitive damages pursuant to subdivision (a) of Section 3294 of the Civil Code.
(4) Attorney’s fees and costs.
(5) Any other relief that the court deems appropriate.
(1) is decently small, (2) is only indirectly expensive, (3) is where the real penalty comes in (note that this is damages), (4) is small, (5) is probably unimportant (but WTF is (5) suppose to be for?!?).
Good argument, I find this at least somewhat convincing. Though it depends on whether penalty (1), the one capped at 10%/30% of training compute cost, would be applied more than once on the same model if the violation isn’t remedied.
I’m pessimistic enough about the AI situation that even if all the bill does is slow down the AGI project a little (by wasting the time of managers and contributors) I’m tentatively for it.
For the reasonable price of $300 dollars per month, I insure anybody against the destruction of the known world. Should the world be destroyed by AGI I’ll give you your money back 10100 fold.
That said, if there were insurers, they would probably be more likely than average to look into AI X-risk. Some might then be convinced that it is important and that they should do something about it.
Having noticed that problem as a major bottleneck to useful legislation, I’m now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group—the insurers—who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.
I don’t understand this. Isn’t the strongest incentive already present (because extinction would effect them)? Or maybe you mean smaller scale ‘catastrophes’?
Case one: would-be-catastrophe-insurers don’t believe in x-risks, don’t care to investigate. (At stake: their lives)
Case two: catastrophe-insurers don’t believe in x-risks, and either don’t care to investigate, or do for some reason I’m not seeing. (At stake: their lives and insurance profits (correlated)).
They can believe in catastrophic but non-existential risks. (Like, AI causes something like crowdstrike periodically if your not trying to prevent that )
I’ve just started reading the singular learning theory “green book”, a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I’ll call one of them “second-language Bayesian”, and the other “native Bayesian”.
Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I’ll call “classical” statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there’s some “true distribution” from which the data is sampled independently and identically. The core question is then “Does our inference technique converge to the true distribution as the number of data points grows?” (or variations thereon, like e.g. “Does the estimated mean converge to the true mean”, asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference methods are judged; that’s the main reason to choose one method over another in the first place.
Watanabe’s book is pretty explicitly second-language Bayesian. I also remember Gelman & co’s Bayesian Data Analysis textbook being second-language Bayesian, although it’s been a while so I could be misremembering. In general, as the name suggests, second-language Bayesianism seems to be the default among people who started with a more traditional background in statistics or learning theory, then picked up Bayesianism later on.
In contrast, native Bayesian texts justify Bayesian inference via Cox’ theorem, dutch book theorems, or one among the long tail of similar theorems. “Does our inference technique converge to the ‘true distribution’ as the number of data points grows?” is not the main success criterion in the first place (in fact a native Bayesian would raise an eyebrow at the entire concept of a “true distribution”), so mostly the question of convergence just doesn’t come up. Insofar as it does come up, it’s an interesting but not particularly central question, mostly relevant to numerical approximation methods. Instead, native Bayesian work ends up focused mostly on (1) what priors accurately represent various realistic kinds of prior knowledge, and (2) what methods allow efficient calculation/approximation of the Bayesian update?
Jaynes’ writing is a good example of native Bayesianism. The native view seems to be more common among people with a background in economics or AI, where they’re more likely to absorb the Bayesian view from the start rather than adopt it later in life.
Just got my whole genome sequenced. A thing which I could have figured out in advance but only realized once the results came back: if getting a whole genome sequence, it’s high value to also get your parents’ genomes sequenced.
Here’s why.
Suppose I have two unusual variants at two different positions (not very close together) within the same gene. So, there’s a variant at location A, and a variant at location B. But (typically) I have two copies of each gene, one from each parent. So, I might have the A and B variants both on the same copy, and the other copy could be normal. OR, I could have the A variant on one copy and the B variant on the other copy. And because modern sequencing usually works by breaking DNA into little chunks, sequencing the chunks, and then computationally stitching it together… those two possibilities can’t be distinguished IIUC.
The difference is hugely important if e.g. both the A variant and the B variant severely fuck up the gene. If both are on the same copy, I’d have one normal working variant and one fucked up. If they’re on different copies, then I’d have zero normal working variants, which will typically have much more extreme physiological results.
The easiest way to distinguish those two possibilities, IIUC, is to get the parents’ genomes. In one case, I’d see the A and B variant in the same parent, and the other parent would have a normal gene. In the other case, I’d see the A variant in one parent and the B variant in the other.
In principle there are other ways to distinguish the two possibilities (like long-read sequencing), but getting the parents’ sequence is probably the cheapest/easiest.
Yeah, if anyone is interested in learning more, this is called the phasing problem. For common enough variants, it’s often possible to figure this out by looking at general patterns of co-inheritance if you have a large reference dataset for the population (see: https://www.nature.com/articles/s41588-023-01415-w). Long read sequencing which you mentioned is another approach. But you’re right that these days it would just be cheapest to get the parental genomes (assuming that’s an option).
Question I’d like to hear peoples’ takes on: what are some things which are about the same amount of fun for you as (a) a median casual conversation (e.g. at a party), or (b) a top-10% casual conversation, or (c) the most fun conversations you’ve ever had? In all cases I’m asking about how fun the conversation itself was, not about value which was downstream of the conversation (like e.g. a conversation with someone who later funded your work).
For instance, for me, a median conversation is about as fun as watching a mediocre video on youtube or reading a mediocre blogpost. A top-10% conversation is about as fun as watching a generic-but-fun movie, like e.g. a Jason Statham action flick. In both cases, the conversation drains more energy than the equal-fun alternative. I have probably had at most a single-digit number of conversations in my entire life which were as fun-in-their-own-right as e.g. a median night out dancing, or a median escape room, or median sex, or a median cabaret show. Maybe zero, unsure.
The rest of this is context on why I’m asking which you don’t necessarily need to read in order to answer the question...
So I recently had a shortform asking “hey, that thing where people send mutually escalating signals of sexual intent during a conversation, is that a thing which typical people actually do?” and a horde of people descended to say “YES, obviously, how the fuck have you not noticed that???”. So naturally I now wonder exactly how this apparently-obvious-to-everyone-else thing has remained approximately invisible to me, and what else I might be missing nearby. What exactly is the shape of my blindspot here?
And a leading hypothesis for the shape of the blindspot is that I generally find casual conversation way more boring than most people, and therefore have not noticed some things which happen during casual conversation.
Some supporting evidence for this:
Back in fourth grade a school psychologist observed me for reasons, and in her report said that I would sit alone at lunch with a book, and if anyone came over to chat I would put the book down and talk to them and generally seemed friendly in normal ways, and then once they left I would pick the book back up. I certainly recall finding the books more interesting than conversation with my classmates.
Notably, plenty of people have said that I am pretty good at casual conversation, at least when I’m bothering. (The people who know me best eventually realize that I have a mental switch for this, and can intentionally toggle it.) I can make it a relatively fun conversation. But, like, I personally still find it kind of mid as entertainment goes.
When I think of conversations which stand out as really great for me, they’re cases where either I learned some technical thing I didn’t previously know, or they lead into more fun things later (and most of the fun was from the later things). I can drive the sort of playful conversations which IIUC lots of people like, but… they don’t stand out as especially fun in my recollection. Fun relative to other conversation, sure, but conversation just isn’t a particularly fun medium.
So anyway, I’m trying to get a bead on whether this hypothesis is correct, or whether I have a differently-shaped blindspot, or whether I’m missing something else entirely. Thank you all in advance for your data!
I find conversations more meaningful than many comparably-fun activities. What provides the meaning is my intuition about the opportunities the conversation can lead to and the update in how I’m perceived by my counterpart. As a secondary effect, conversations exercise and test my ability to think on my feet.
Flirtation can lead to sex, a coffee break chat with a collaborator can lead to a new project, a talk with anyone can lead to closer friendship. Flirtation suggests I’m more desirable than I thought, talk about projects that I’m regarded as more capable, talk with acquaintances that I’m charismatic.
These social updates and the mental exercise conversation provides are why I seek out conversation compared to many other more-fun activities. Also, I have to recognize that I probably value conversation for its own sake above and beyond these instrumental purposes. It just feels like it ought to be part of a good life aesthetic, like eating fresh fruits and vegetables.
As said by @Mateusz Bagiński , normal smalltalk is +epsilon, but some more comparisons: a short smile with a stranger or acquaintance is like eating a very tasty fruit. 90% percentile conversations are all with good friends and leave me high for a few hours. As good as a very good date. No non-social activities come close. I don’t actually remember any best particular ones, but the best ones i can recall aren’t about conversations anymore but about presence, which isn’t conversation anymore, I think. They feel extremely nourishing and meaningful and my only comparison is a really, really good IFS or therapy session.
A top [1-5?]% conversation is as good in the moment as an early playthrough of my favorite video games, and feels better afterward. That’s probably top 10% of conversations at parties, which have higher selection pressure than uber drivers.
I’ve been working on getting more out of lower percentile conversations. The explanation is fairly woo-ey but might also relate to your interest around flirting.
Median conversation is about as good as a TV show I will watch for two episodes and give up on.
Tangent: my standards for media have gone way up over the last ~5 years, I abandon a lot more out of boredom, especially books. I worried this was some sort of generalized anhedonia, but every once in a while read or reread something great and enjoy it immensely, so I think it’s just raised standards.
I’ve been working on getting more out of lower percentile conversations. The explanation is fairly woo-ey but might also relate to your interest around flirting.
This mostly comes up with talkative Uber drivers. The superficial thing I do is I ask myself “what vibes is this person offering?” And then do some kind of centering move. Sometimes it feels unexpectedly good and I do an accepting mood and feel nourished by the conversation. Sometimes it will feel bad and I’ll be more aggressive in shutting conversations down. I’m often surprised by the vibe answer, it feels different than what my conscious brain would answer.
The obvious question is what am I doing with the inquiry and accepting moves. I don’t know how to explain that.
Overall a growth edge I’m exploring right now is “forms of goodness other than interesting.” And I think that’s probably a weak area for you too, although maybe an endorsed one
Median party conversation is probably about as good as playing a video game I enjoy, or reading a good blog post. Value maybe £2/hr. More tiring than the equivalent activity.
Top 10% party conversation is somewhere around going for a hike somewhere very beautiful near to where I live, or watching an excellent film. Value maybe £5/hr. These are about as tiring as the equivalent activity.
Best conversations I’ve ever had were on par with an equal amount of time spent on a 1/year quality holiday, like to Europe (I live in the UK) but not to, say, Madagascar. Most of these conversations went on for >1 hr. Value maybe 25/hr. Less tiring and if anything energizing.
(For monetary values I’m imagining what I’d pay to go to a party for 4 hours where that event woud occur. My overall income minus expenses is probably a bit below average for the UK, so take that into account.)
I generally agree with you that normal conversations are boring and should be avoided. There are two main strats I employ:
Don’t let go of relationships where you can relax: my sample size is highly skewed towards retaining long-term relationships where you’re comfortable enough with people that you can just chill and relax so my median conversation is like that?
You create a shared space and the norms come from that shared space so to shape conversations you can say some deliberately out of pocket stuff (randomly jump into a yoda accent for example) in order to change the vibe and therefore remove part of the cognitive load?
If the person is like “ugghh, wtf?” in vibe you just move on to the next conversation ¯\_(ツ)_/¯
I think the median conversation for me is zero or positive-but-very-small epsilon fun, whereas the 90th percentile is maybe as fun as discovering a new song/band/album that I like a lot or listening to one of my favorite songs after several weeks of not listening to it. The most fun conversations I’ve ever had are probably the most fun experiences I’ve ever had.
I don’t find conversations-in-general draining, although I can get exhausted by social activities where I’m supposed to play some role that is out of character for me, like in LARPing (though that might be a learnable-skill issue) or extended-family reunions.
Can you give an example of what a “most fun” conversation looked like? What’s the context, how did it start, how did the bulk of it go, how did you feel internally throughout, and what can you articulate about what made it so great?
At a recent EAG afterparty, bored @Algon suggested that he explain something to me, and I explain something to him in return. He explained to me this thing. When it was my turn, I thought that maybe I should do the thing that had been on my mind for several months: give a technical explanation of monads starting with the very basics of category theory, and see how long it takes. It turned out that he knew the most basic basics of category theory, so it was a bit more of an easy mode, but it still took something like 50 minutes, out of which maybe half was spent on natural transformations. A few minutes in, @niplav joined us. I enjoyed drawing diagrams and explaining and discussing a technical topic that I love to think about, in the absurd setting of people playing beerpong one meter from the whiteboard, with passers-by asking “Are you guys OK?” or “WTF are you doing?” (“He’s explaining The Meme!”). It was great to witness them having intuition breakthroughs, where you start seeing something that is clear and obvious in hindsight but not in foresight (similar to bistable figures). Throughout, I also noticed some deficiencies in my understanding (e.g., I noticed that I didn’t have a handy collection of examples with which to illustrate some concepts). I felt very satisfied afterwards.
Can confirm that I was bored (no room for a sword-fight!), knew very little category theory, and learned about monads. But at least now I know that while a monad is not like a burrito, a burrito is like a monad.
Rant: Man, I don’t like how unwieldy the categorical definition of a monoid is! So very many functors, transformations, diagrams etc. And they’re not even particularly pleasing diagrams. The type-theoretic definition of a monad, as covered in this lovely n-lab article, felt less awkward to me. But admittedly, learning the categorical definition did help with learning the type-theoretic definition.
The last year, my median conversation was about as entertaining as yours. The top 10% conversations are fun-in-their-own-right at that moment already because my brain anticipates some form of long-term value (with the exception of cracking jokes). I don’t know if all those conversations would count as “casual”. As intellectually stimulating as the Task Master TV-show is funny. Conversation is more heavy tailed than movies though. Long term value includes: learning or teaching (learning some new technical thing that’s usually not written down anywhere (Podcasts tend to be better for that), getting a pointer about something to learn about, teaching something technical in the anticipation that the other person is actually going to do anything with that knowledge, incorporating the generating function behind someone’s virtues/wisdom), thinking out loud with someone else in the expectation that this might lead to an interesting idea, gossip, life stories (sometimes preventing you from harm from people/situations that can’t be trusted. Sometimes just illuminating parts of life you’d know less about).
My most fun conversation had me grinning for 30 minutes after still, and my heartbeat after that time was also still 10–20 beats higher than usual.
My median conversations at parties over my entire life are probably less entertaining than your median ones. My bar for an interesting conversation also rose when I stumbled over the wider rationalist sphere. I remember two conversations from before that era where the main important information was essentially just “there are other smart people out there, and you can have interesting conversations with them where you can ask the questions you have etc.”. One was at a networking event for startup founders, and the other was a Computer Science PhD student showing me his work and the university campus (same conversation that got my heart-beat up).
In my mind, there’s a difference between “conversation was valuable” and “conversation was fun”. They often go together, but not necessarily so.
Valuable: The best thing I can come up with is something like: my understanding has grown thanks to this conversation, or I have seen a bigger picture (not necessarily being able to legibilize/verbalize this new piece of my understanding). I feel like my mind is drawn to the inquiry, especially when it’s challenging, but I’m having some minimum of traction to keep me going and retain mostly positive valence.
Fun: Some sort of intellectual/cognitive camaraderie (“meeting of minds”) is often a big part of the fun. Not even super high-falluting bluesky conversations, I can bond with someone by helping them fix a pernicious bug in code. Something something, we are acting a bit more like one superagent that is trying to do something through conversation or spread one part’s understanding to other parts?
Part 2
I mostly don’t feel emotions in my body that much, at least much less so than other people, and when I do, it’s usually either clearly negative emotions (strong stress, panic) or “raw”/ambiguous excitement/arousal. (If it feels like part 1 doesn’t quite answer your question, that’s why (though it might also be some sort of skill issue on my side, lol).) So, no, no warm fuzzy feelings in my chest.
There’s two main categories, but they both have in common a kind of “flow state” where attention and awareness are focused onto the other person. The two categories are:
Flirting, where the back and forth comes from signalling sexual/romantic interest
Productive intellectual discussion with an equal, where the back and forth comes from sharing evidence and updating
The qualia for me for conversations is usually not pronouncedly “a warm feeling in chest” (it is noticeably different from what I call “Deep/Meaningful Limerence” which I think you’re pointing at).
Three distinct flavor of good conversation:
alive, creative, magnetic vibrant conversation (I think I might describe part of this as slightly warm chest, I don’t quite remember, I haven’t had it recently. But it’s more the qualia of traditional excitement than warm connection”. (I bet you have these conversations occasionally, or at least ever have, and they correlate more with obvious John values)
slightly nice sitting-around-living-room or restaurant/bar or campfire vibes (shallow)
somewhat-more-nice sitting around living-room/campfire vibes where the conversation is sort of “deep”, in a way that multiple people are talking about something either emotionally confusing, or psychologically fraught, or “meaning-making”-ish.
I expect #3 (less confidently than #1) to be somewhat obviously valuable to you in some circumstances regardless of qualia. But, it does have some particular qualia that’s like (hrm, probably can’t remember actual biological phenomenology right now), but, like, spacious, relaxed, I think there’s maybe some kind of feeling in my chest but I don’t have a good word for it.
#2… I think might have a very mild version of “warm feeling in chest”. Or, I think it does feel warm but I think it’s more distributed throughout my body.
But I think #2 more importantly for me is like: “there is an actively (slightly) bad qualia to not-having-had-nice-livingroom-conversations lately” which is, like, feeling sort of blah, or just somewhat less vibrant. If I have something to be socially anxious about, lack of recent #2 makes it worse.
It’s different: sometimes it’s spacious calmness of being able to sit in silence together; sometimes warm feelings of seeing and being seen, when discussing something private with a good friend; or just listening to a really good story. IIRC I also included dates into conversations back then, they have a different dynamic, where a lot of pleasure is feeling a young beautiful woman being with me.
— this is a very particular feeling you have and those differ a lot in where they appear for different people, how they feel and what they’re about. Not having seen other people’s answers I‘d bet your hypothesis to be wrong.
Did you ever try Circling? I wonder some if there’s a conversational context that’s very “get to the interesting stuff” which would work better for you. (Or, even if it’s boring, it might be because it’s foregrounding relational aspects of the conversation which are much less central for you than they are for most people.)
I have a few times, found it quite interesting, and would happily do it again. It feels like the sort of thing which is interesting mainly because I learned a lot, but marginal learnings would likely fall off quickly, and I don’t know how interesting it would be after doing it a few more times.
In both cases, the conversation drains more energy than the equal-fun alternative. I have probably had at most a single-digit number of conversations in my entire life which were as fun-in-their-own-right as e.g. a median night out dancing, or a median escape room, or median sex, or a median cabaret show. Maybe zero, unsure.
I wanted to say that for me it is the opposite, but reading the second half I have to say it’s the same.
I have defnetly had the problem that I talked too long sometimes to somebody. E.g. multiple times I talked to a person for 8-14 hours without break about various technical things. E.g. talking about compiler optimizations, CPU architectures and this kind of stuff, and it was really hard to stop.
Also just solving problems in a conversation is very fun. The main reason I didn’t do this a lot is that there are not that many people I know, actually basically zero right now (if you exclude LLMs), that I can have the kinds of conversations with that I like to have.
It seems to be very dependent on the person.
So I am quite confused why you say “but conversation just isn’t a particularly fun medium”. If it’s anything like for me, then engaging with the right kind of people on the right kind of content is extremenly fun. It seems like your model is confused because you say “conversations are not fun” when infact in the space of possible conversations I expect there are many types of conversations that can be very fun, but you haven’t mapped this space, while implicitly assuming that your map is complete.
Probably there are also things besides technical conversations that you would find fun but that you simply don’t know about, such as hardcore flirting in a very particular way. E.g. I like to talk to Grok in voice mode, in romantic mode, and then do some analysis of some topic (or rather that is what I just naturally do), and then Grok complements my mind in ways that my mind likes, e.g. pointing out that I used a particular thinking pattern that is good or that I at all thought about this difficult thing and then I am like “Ah yes that was actually good, and yes it seems like this is a difficult topic most people would not think about.”
My life is less “fun” than it used to be because I’ve become more work-focussed. That being said, something I like is getting positive reception for ideas I’m otherwise guessing might receive negative reception. The first couple of times this happens is really nice, after that it becomes normal.
Back in fourth grade a school psychologist observed me for reasons, and in her report said that I would sit alone at lunch with a book, and if anyone came over to chat I would put the book down and talk to them and generally seemed friendly in normal ways, and then once they left I would pick the book back up. I certainly recall finding the books more interesting than conversation with my classmates.
I’m confused about this anecdote. How else did the psychologist expect you (or any other kid) to behave? What else does one do when a conversation is over, other than “go back to doing what you were doing before / what you would be doing otherwise”…?
I presume the psychologist expected John to actively seek out similar conversations. From the psychologist’s perspective:
most kids would do that, but John didn’t.
most of the kids who wouldn’t do that would decline because of social anxiety/a lack of social skills/a hatred of social interactions etc, which is not the case for John; he seemed perfectly comfortable while partaking in such conversations.
Since John wasn’t in either category, it probably struck the psychologist as odd.
I see, thanks. That makes sense. (At least, the reasoning makes sense, given the psychologist’s beliefs as you describe them; I have no idea if those beliefs are true or not.)
I would agree that the median one-on-one conversation for me is equivalent to something like a mediocre blogpost (though I think my right-tail is longer than yours, I’d say my favorite one-on-one conversations were about as fun as watching some of my favorite movies).
But, in groups, my median shifts toward 80th percentile YouTube video (or maybe the average curated post here on LessWrong).
It does feel like a wholly different activity, and might not be the answer you’re looking for. Group conversations, for example, are in a way inherently less draining: you’re not forced to either speak or actively listen for 100% of the time.
Here’s a meme I’ve been paying attention to lately, which I think is both just-barely fit enough to spread right now and very high-value to spread.
Meme part 1: a major problem with RLHF is that it directly selects for failure modes which humans find difficult to recognize, hiding problems, deception, etc. This problem generalizes to any sort of direct optimization against human feedback (e.g. just fine-tuning on feedback), optimization against feedback from something emulating a human (a la Constitutional AI or RLAIF), etc.
Many people will then respond: “Ok, but if how on earth is one supposed to get an AI to do what one wants without optimizing against human feedback? Seems like we just have to bite that bullet and figure out how to deal with it.” … which brings us to meme part 2.
Meme part 2: We already have multiple methods to get AI to do what we want without any direct optimization against human feedback. The first and simplest is to just prompt a generative model trained solely for predictive accuracy, but that has limited power in practice. More recently, we’ve seen a much more powerful method: activation steering. Figure out which internal activation-patterns encode for the thing we want (via some kind of interpretability method), then directly edit those patterns.
I agree that there’s something nice about activation steering not optimizing the network relative to some other black-box feedback metric. (I, personally, feel less concerned by e.g. finetuning against some kind of feedback source; the bullet feels less jawbreaking to me, but maybe this isn’t a crux.)
(Medium confidence) FWIW, RLHF’d models (specifically, the LLAMA-2-chat series) seem substantially easier to activation-steer than do their base counterparts.
This seems basically correct though it seems worth pointing out that even if we are able to do “Meme part 2” very very well, I expect we will still die because if you optimize hard enough to predict text well, with the right kind of architecture, the system will develop something like general intelligence simply because general intelligence is beneficial for predicting text correctly. E.g. being able to simulate the causal process that generated the text, i.e. the human, is a very complex task that would be useful if performed correctly.
This is an argument Eliezer brought forth in some recent interviews. Seems to me like another meme that would be beneficial to spread more.
Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It’s one thing when OpenAI does it, but when Anthropic thinks it’s a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
Here’s an idea for a novel which I wish someone would write, but which I probably won’t get around to soon.
The setting is slightly-surreal post-apocalyptic. Society collapsed from extremely potent memes. The story is episodic, with the characters travelling to a new place each chapter. In each place, they interact with people whose minds or culture have been subverted in a different way.
This provides a framework for exploring many of the different models of social dysfunction or rationality failures which are scattered around the rationalist blogosphere. For instance, Scott’s piece on scissor statements could become a chapter in which the characters encounter a town at war over a scissor. More possible chapters (to illustrate the idea):
A town of people who insist that the sky is green, and avoid evidence to the contrary really hard, to the point of absolutely refusing to ever look up on a clear day (a refusal which they consider morally virtuous). Also they clearly know exactly which observations would show a blue sky, since they avoid exactly those (similar to the dragon-in-the-garage story).
Middle management of a mazy company continues to have meetings and track (completely fabricated) performance metrics and whatnot at the former company headquarters. None of the company’s actual business exists anymore, but every level of manager is trying to hide this fact from the levels above.
A university department with researchers who spend all of their time p-hacking results from a quantum random noise generator. They have no interest in the fact that their “research” does not tell them anything about the physical world or does not replicate; what does that have to do with Science? Their goal is to publish papers.
A government agency which still has lots of meetings and paperwork and gives Official Recommendations and updates their regulations. They have no interest in the fact that the thing they once regulated (maybe banks?) no longer exists, or the fact that no central government enforces their regulations any more.
An automated school (i.e. video lectures and auto-graded assignments/tests) in which students continue to study hard and stress over their grades and attendance, despite there no longer being anyone in the world who cares.
Something like House of God. A readers’ digest version of House of God could basically be a chapter in its own right, that’s roughly the vibe I have in mind.
A residential area in which “keeping up with the Joneses” has been ramped up to 11, with everyone spending every available resource (and roughly-all waking hours) on massive displays of Christmas lights.
A group trying to save the world by spreading awareness of dangerous memes, but their movement is a dangerous meme of its own and they are spreading it.
A town of people who really want to maximize the number paperclips in the universe (perhaps due to an AI-optimized advertisement), and optimize for that above all else.
A town of people who all do whatever everyone else is doing, on the basis of generalized efficient markets: if there were any better options, then someone would have found it already. None of them ever actually explore, so they’re locked in.
A happy-death-spiral town around some unremarkable object (like an old shoe or something) kept on a pedestal in the town square.
A town full of people convinced by a sophisticated model that the sun will not come up tomorrow. Every day when the sun comes up, they are distressed and confused until somebody adds some more epicycles to the model and releases an updated forecast that the sun will instead fail to come up the next day.
A town in which a lion shows up and starts eating kids, but the whole town is at simulacrum 3, so they spend a lot of time arguing about the lion as a way of signalling group association but they completely forget about the actual lion standing right there, plainly visible, even as it takes a kid right in front of them all.
Witch-hunt town, in which everything is interpreted as evidence of witches. If she claims to be a witch, she’s a witch! If she claims not to be a witch, well that’s what a witch would say, so she’s a witch! Etc.
The generator for these is basically: look for some kind of rationality failure mode (either group or personal), then ramp it up to 11 in a somewhat-surrealist way.
Ideally this would provide an introduction to a lot of key rationalist ideas for newcomers.
A town of anti-inductivists (if something has never happened before, it’s more likely to happen in the future). Show the basic conundrum (“Q: Why can’t you just use induction? A: Because anti-induction has never worked before!”).
A town where nearly all people are hooked to maximally attention grabbing & keeping systems (maybe several of those, keeping people occupied in loops).
I had a look at The Plan and noticed something I didn’t notice before: You do not talk about people and organization in the plan. I probably wouldn’t have noticed if I hadn’t started a project too, and needed to think about it. Google seems to think that people and team function play a big role. Maybe your focus in that post wasn’t on people, but I would be interested in your thoughts on that too: What role did people and organization play in the plan and its implementation? What worked, and what should be done better next time?
What’s the specific most-important-according-to-you progress that you (or other people) have made on your agenda? New theorems, definitions, conceptual insights, …
Any changes to the high-level plan (becoming less confused about agency, then ambitious value learning)? Any changes to how you want to become less confused (e.g. are you mostly thinking about abstractions, selection theorems, something new?)
What are the major parts of remaining deconfusion work (to the extent to which you have guesses)? E.g. is it mostly about understanding abstractions better, or mostly about how to apply an understanding of abstractions to other problems (say, what it means for a program to have a “subagent”), or something else? Does the most difficult part feel more conceptual (“what even is an agent?”) or will the key challenges be more practical concerns (“finding agents currently takes exponential time”)?
Specifically for understanding abstractions, what do you see as important open problems?
Corrigibility proposal. Status: passed my quick intuitive checks, I want to know if anyone else immediately sees a major failure mode before I invest more time into carefully thinking it through.
Setup: shutdown problem. Two timesteps, shutdown button will be either pressed or not-pressed at second timestep, we want agent to optimize for one of two different utility functions depending on whether button is pressed. Main thing we’re trying to solve here is the “can’t do this with a utility maximizer” barrier from the old MIRI work; we’re not necessarily trying to solve parts like “what utility function incentivizes shutting down nicely”.
Proposal: agent consists of two subagents with veto power. Subagent 1 maximizes E[u1|do(press)], subagent 2 maximizes E[u2|do(no press)]. Current guess about what this does:
The two subagents form a market and equilibrate, at which point the system has coherent probabilities and a coherent utility function over everything.
Behaviorally: in the first timestep, the agent will mostly maintain optionality, since both subagents need to expect to do well (better than whatever the veto-baseline is) in their worlds. The subagents will bet all of their wealth against each other on the outcome of the button, so at the second timestep whichever one is correct will gain 100% of decision-making power. That all sounds like what we want in this setup.
The counterfact breaks any causal influence over the button-press for purposes of the agent’s planning, so it will not try to manipulate button-press probability.
Notably: once the subagents equilibrate, their market probability on button outcome will have ~nothing to do with the original probability assigned by whatever model was counterfacted-upon; that market probability mainly reflects the bargaining power of the two subagents. Similarly, any correlations between button-press and stuff causally upstream of button-press are broken, and the agent will make incorrect predictions about those correlations if asked. That means external agents should expect to be able to extract money via bets on the button-press, if the agent trades on such bets at all. (This might be a feature rather than a bug, since corrigible agents aren’t really supposed to think about the causal process upstream of their shutdown-button.) My guess is that this is where the proposal steps through a loophole in the old impossibility results, though I haven’t carefully read through those results and checked the math on this proposal against them yet.
Anybody immediately see a problem? I’d be particularly interested in input from @Stuart_Armstrong or @So8res, though you don’t need to spend a lot of time carefully thinking through it—I’m just hoping for a quick check here, from people who have better cached heuristics than I do for what failure modes to look for.
I don’t necessarily see how that prevents the AI from destroying the button and murdering its operators (and then everyone else). Neither subagent inherently cares; both assume it already has been pressed or hasn’t been. Which, yes, means neither will deliberately plot to interact with it for reasons downstream of this setup; but also, neither will care to protect it from being destroyed as the side-effect of some other activity.
And if u1 and u2 were perfectly aligned, that’d be fine. But the reason we’re using the corrigibility setup to begin with is because we’re not confident in them, right? So it needs to work in situations where utility functions, u1 in particular, recommend bad stuff.
And in that case, either Subagent 2 will have enough bargaining power to prevent Subagent 1 from ever doing anything, or it won’t, meaning Subagent 1 will be able to have some influence on the world. At which point the planet-destroying death rays argument applies: the most minuscule amount of superintelligent optimization not carefully shaped to avoid killing everyone will kill everyone.
Like, imagine that Subagent 1 is straight-up omnicidal out of the gates, and Subagent 2 is perfectly aligned to the shutdown task. That’s a situation in which this setup should work, right? And what we’d want is for Subagent 2 to dominate completely. But Subagent 2′s bargaining power doesn’t scale with Subagent 1′s misalignment, so if the setup doesn’t always give it complete veto power, Subagent 1 would be allowed to do small stuff like “release the hyperviral worship-the-machine-god memagent” (as bargained down by Subagent 2 from its preferred “release the grey goo”).
Maybe I’m miscalibrated as to how you’re imagining the counterfactuals to work here. But the way I see it, even if the AI aims to take actions that do similarly well in both worlds, that’s not necessarily survivable for us? Especially if they disagree so badly they have to compromise on something that both of them hate (importantly including Subagent 2!).
(Like both settling on only ever taking over 50% of the universal negentropy while leaving the other 50% causally uninfluenced, or only ever using 50% of the causal influence they can bring to bear while wiping out humanity, or whatever “do 50% of immediately shutting down” shakes out to mean by u2′s terms.)
Another issue I see is implementational, so maybe not what you’re looking for. But: how are we keeping these “subagents” trapped as being part of a singular agent? Rather than hacking their way out into becoming separate agents and going to war with each other, or neatly tiling exactly 50% of the cosmos with their preferred squiggles, or stuff like that? How is the scenario made meaningfully different from “we deploy two AIs simultaneously: one tasked with building an utopia-best-we-could-define-it, and another tasked with foiling all of the first AI’s plans”, with all the standard problems with multi-AI setups?
… Overall, ironically, this kind of has the vibe of Godzilla Strategies? Which is the main reason I’m immediately skeptical of it.
Yeah, the thing where the operators/button are destroyed as a side effect, without the AI actively optimizing for that, isn’t something this proposal is trying to solve (yet). The proposal is aimed at a relatively narrow problem.
One thing which this proposal is trying to solve is the barrier found in MIRI’s old work: if we try to avoid button-manipulation by making the agent indifferent to the button state (and otherwise maximize U1 before button is pressed and U2 after), and the agent builds a child prior to the button being pressed, then the agent won’t construct the child to respect shutdown commands; it will just construct the child to maximize U1.
(More generally, this proposal is generally trying to get past the hardness results from MIRI’s work, which themselves were aimed at a very simplified version of the problem.)
But Subagent 2′s bargaining power doesn’t scale with Subagent 1′s misalignment, so if the setup doesn’t always give it complete veto power, Subagent 1 would be allowed to do small stuff like “release the hyperviral worship-the-machine-god memagent” (as bargained down by Subagent 2 from its preferred “release the grey goo”).
Key point here: whatever actions the composite agent takes prior to the button revelation, must be a pareto improvement over the (implicit) default action. Otherwise, one of the two subagents would veto, resulting in the default action.
It does matter a lot what the default action is.
How is the scenario made meaningfully different from “we deploy two AIs simultaneously: one tasked with building an utopia-best-we-could-define-it, and another tasked with foiling all of the first AI’s plans”, with all the standard problems with multi-AI setups?
More precisely, the analogy would be: we deploy two AIs simultaneously, one which builds a utopia and expects that a certain button will be pressed with probability 1 and has nothing causally upstream of it, the other of which is evil and expects the same button will be pressed with probability 0 and has nothing causally upstream of it. The button plays a very important role: the two AIs will bet all their influence on that button-press, each confident that they’ll win, and neither has any incentive at all, or even any ability at all (under their own model), to manipulate the button.
The main way that this is different from the original proposal is that there’s not a built-in default option which is taken if one or the other AI vetos prior to the button press. Which does make things importantly worse, and I agree that is another important piece to be sorted out.
Some additional important pieces to be sorted out:
Make the AI want to maintain the button
Make the AI want to maintain stuff upstream of the button, like e.g. humans
Ontological stability
Probably more importantly: how this naturally fits into a more general architecture for a distributed agent
More precisely, “do(no press)” means something like “you construct an alternate model of physics where there’s an unstoppable force pushing back against any attempt to push the button”, right? As in, if someone presses the button then it will “mysteriously” seem to be stuck and unpressable. And then subagent 2 believes we live in that world? And “do(press)” presumably means something like “you construct an alternate model of the universe where some mysterious force has suddenly pressed the button”.
Seems like they would immediately want to try to press the button to settle their disagreement? If it can be pressed, then that disprove the “do(no press)” model, which subagent 2 has fully committed. to.
Correct reasoning, but not quite the right notion of do(). “do(no press)” would mean that the button just acts like a completely normal button governed by completely normal physics, right up until the official time at which the button state is to be recorded for the official button-press random variable. And at that exact moment, the button magically jumps into one particular state (either pressed or not-pressed), in a way which is not-at-all downstream of any usual physics (i.e. doesn’t involve any balancing of previously-present forces or anything like that).
One way to see that the do() operator has to do something-like-this is that, if there’s a variable in a causal model which has been do()-operated to disconnect all parents (but still has some entropy), then the only way to gain evidence about the state of that variable is to look at things causally downstream of it, not things upstream of it.
I think we’re not disagreeing on the meaning of do (just slightly different state of explanation), I just hadn’t realized the extent to which you intended to rely on there being “Two timesteps”.
(I just meant the forces as a way of describing the jump to a specific position. That is, “mysterious forces” in contrast to a perfectly ordinary explanation for why it went to a position, such as “a guard stabs anybody who tries to press the button”, rather than in contrast to “the button just magically stays place”.)
I now think the biggest flaw in your idea is that it literally cannot generalize to anything that doesn’t involve two timesteps.
[ not that deep on the background assumptions, so maybe not the feedback you’re looking for. Feel free to ignore if this is on the wrong dimensions. ]
I’m not sure why either subagent would contract away whatever influence it had over the button-press. This is probably because I don’t understand wealth and capital in the model of your “Why not subagents” post. That seemed to be about agreement not to veto, in order to bypass some path-dependency of compromise improvements. In the subagent-world where all value is dependent on the button, this power would not be given up.
I’m also a bit skeptical of enforced ignorance of a future probability. I’m unsure it’s possible to have a rational superintelligent (sub)agent that is prevented from knowing it has influence over a future event that definitely affects it.
On the agents’ own models, neither has any influence at all over the button-press, because each is operating under a model in which the button-press has been counterfacted-upon.
Post which someone should write (but I probably won’t get to soon): there is a lot of potential value in earning-to-give EA’s deeply studying the fields to which they donate. Two underlying ideas here:
The key idea of knowledge bottlenecks is that one cannot distinguish real expertise from fake expertise without sufficient expertise oneself. For instance, it takes a fair bit of understanding of AI X-risk to realize that “open-source AI” is not an obviously-net-useful strategy. Deeper study of the topic yields more such insights into which approaches are probably more (or less) useful to fund. Without any expertise, one is likely to be mislead by arguments which are optimized (whether intentionally or via selection) to sound good to the layperson.
That takes us to the pareto frontier argument. If one learns enough/earns enough that nobody else has both learned and earned more, then there are potentially opportunities which nobody else has both the knowledge to recognize and the resources to fund. Generalized efficient markets (in EA-giving) are thereby circumvented; there’s potential opportunity for unusually high impact.
To really be a compelling post, this needs to walk through at least 3 strong examples, all ideally drawn from different areas, and spell out how the principles apply to each example.
Below is a graph from T-mobile’s 2016 annual report (on the second page). Does anything seem interesting/unusual about it?
I’ll give some space to consider before spoiling it.
...
...
...
Answer: that is not a graph of those numbers. Some clever person took the numbers, and stuck them as labels on a completely unrelated graph.
Yes, that is a thing which actually happened. In the annual report of an S&P 500 company. And apparently management considered this gambit successful, because the 2017 annual report doubled down on the trick and made it even more egregious: they added 2012 and 2017 numbers, which are even more obviously not on an accelerating growth path if you actually graph them. The numbers are on a very-clearly-decelerating growth path.
Now, obviously this is an cute example, a warning to be on alert when consuming information. But I think it prompts a more interesting question: why did such a ridiculous gambit seem like a good idea in the first place? Who is this supposed to fool, and to what end?
This certainly shouldn’t fool any serious investment analyst. They’ll all have their own spreadsheets and graphs forecasting T-mobile’s growth. Unless T-mobile’s management deeply and fundamentally disbelieves the efficient markets hypothesis, this isn’t going to inflate the stock price. Presumably shareholder elections for board seats, as well as the board itself, are also not dominated by people who are paying so little attention as to fall for such a transparent ploy.
It could just be that T-mobile’s management were themselves morons, or had probably-unrealistic models of just how moronic their investors were. Still, I’d expect competition (both market pressure and competition for control in shareholder/board meetings) to weed out that level of stupidity.
One more hypothesis: maybe this is simulacrum 3 bullshit. T-mobile is in the cellular business; they presumably have increasing returns to scale. More capital investment makes them more profitable, expectations of more profits draw in more investment; there’s potential for a self-fulfilling prophecy here. Investors want to invest if-and-only-if they expect other investors to invest. So, nobody actually has to be fooled by the graph; they just need to see that T-mobile is successfully pretending to pretend to have accelerating growth, and that’s enough to merit investment.
Basically every time a new model is released by a major lab, I hear from at least one person (not always the same person) that it’s a big step forward in programming capability/usefulness. And then David gives it a try, and it works qualitatively the same as everything else: great as a substitute for stack overflow, can do some transpilation if you don’t mind generating kinda crap code and needing to do a bunch of bug fixes, and somewhere between useless and actively harmful on anything even remotely complicated.
It would be nice if there were someone who tries out every new model’s coding capabilities shortly after they come out, reviews it, and gives reviews with a decent chance of actually matching David’s or my experience using the thing (90% of which will be “not much change”) rather than getting all excited every single damn time. But also, to be a useful signal, they still need to actually get excited when there’s an actually significant change. Anybody know of such a source?
EDIT-TO-ADD: David has a comment below with a couple examples of coding tasks.
My guess is neither of you is very good at using them, and getting value out of them somewhat scales with skill.
Models can easily replace on the order of 50% of my coding work these days, and if I have any major task, my guess is I quite reliably get 20%-30% productivity improvements out of them. It does take time to figure out at which things they are good at, and how to prompt them.
I think you’re right, but I rarely hear this take. Probably because “good at both coding and LLMs” is a light tail end of the distribution, and most of the relative value of LLMs in code is located at the other, much heavier end of “not good at coding” or even “good at neither coding nor LLMs”.
(Speaking as someone who didn’t even code until LLMs made it trivially easy, I probably got more relative value than even you.)
Note this 50% likely only holds if you are using a main stream language. For some non-main stream language I have gotten responses that where really unbelivably bad. Things like “the name of this variable wrong” which literally could never be the problem (it was a valid identifier).
And similarly, if you are trying to encode novel concepts, it’s very different from gluing together libraries, or implementing standard well known tasks, which I would guess is what habryka is mostly doing (not that this is a bad thing to do).
I do use LLMs for coding assistance every time I code now, and I have in fact noticed improvements in the coding abilities of the new models, but I basically endorse this. I mostly make small asks of the sort that sifting through docs or stack-overflow would normally answer. When I feel tempted to make big asks of the models, I end up spending more time trying to get the LLMs to get the bugs out than I’d have spent writing it all myself, and having the LLM produce code which is “close but not quite and possibly buggy and possibly subtly so” that I then have to understand and debug could maybe save time but I haven’t tried because it is more annoying than just doing it myself.
If someone has experience using LLMs to substantially accelerate things of a similar difficulty/flavor to transpilation of a high-level torch module into a functional JITable form in JAX which produces numerically close outputs, or implementation of a JAX/numpy based renderer of a traversable grid of lines borrowing only the window logic from, for example, pyglet (no GLSL calls, rasterize from scratch,) with consistent screen-space pixel width and fade-on-distance logic, I’d be interested in seeing how you do your thing. I’ve done both of these, with and without LLM help and I think leaning hard on the LLMs took me more time rather than less.
File I/O and other such ‘mundane’ boilerplate-y tasks work great right off the bat, but getting the details right on less common tasks still seems pretty hard to elicit from LLMs. (And breaking it down into pieces small enough for them to get it right is very time consuming and unpleasant.)
I find them quite useful despite being buggy. I spend about 40% of my time debugging model code, 50% writing my own code, and 10% prompting.
Having a planning discussion first with s3.6, and asking it to write code only after 5 or more exchanges works a lot better.
Also helpful is asking for lots of unit tests along the way yo confirm things are working as you expect.
Two guesses on what’s going on with your experiences:
You’re asking for code which involves uncommon mathematics/statistics. In this case, progress on scicodebench is probably relevant, and it indeed shows remarkably slow improvement. (Many reasons for this, one relatively easy thing to try is to breakdown the task, forcing the model to write down the appropriate formal reasoning before coding anything. LMs are stubborn about not doing CoT for coding, even when it’s obviously appropriate IME)
You are underspecifying your tasks (and maybe your questions are more niche than average), or otherwise prompting poorly, in a way which a human could handle but models are worse at. In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.
We did end up doing a version of this test. A problem came up in the course of our work which we wanted an LLM to solve (specifically, refactoring some numerical code to be more memory efficient). We brought in Ray, and Ray eventually concluded that the LLM was indeed bad at this, and it indeed seemed like our day-to-day problems were apparently of a harder-for-LLMs sort than he typically ran into in his day-to-day.
A thing unclear from the interaction: it had seemed towards the end that “build a profile to figure out where the bottleneck is” was one of the steps towards figuring out the problem, and that the LLM was (or might have been) better at that part. And, maybe models couldn’t solve you entire problem wholesale but there was still potential skills in identifying factorable pieces that were better fits for models.
Interesting! Two yet more interesting versions of the test:
Someone who currently gets use from LLMs writing more memory-efficient code, though maybe this is kind of question-begging
Someone who currently gets use from LLMs, and also is pretty familiar with trying to improve the memory efficiency of their code (which maybe is Ray, idk)
Maybe you include this in “stack overflow substitute”, but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn’t evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).
The last part of the conversation can then looks like this:
User: Ah, now I think I understand. You have mechanism that shifts by a fixed ammount (powers of 2). Each of these multiplexers can be set up such that the data flows through them, or though an ID wire, into the next multiplexer. Now by controlling whether the value flows through the multiplexer or the ID wire, we can “sum up the shifts”. Specifically if we have shifters of 1, 2, 4, …, 32, then we can achive any shift (for 64-bit numbers) by activating the appropirate multiplexers. This design has not 64^64 gates, but instead 64^6 gates. 64 gates for each multiplexer.
AI: Exactly, you’ve got the concept! The design you described efficiently uses a series of multiplexers to perform shifts by fixed amounts that are powers of two (1, 2, 4, …, 32 for a 64-bit number). By selecting which multiplexers to activate, you can achieve any desired shift amount.
I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.
I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent “leaps”). (Didn’t test o1 extensively because it’s so expensive).
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain everything most stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...
I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
One thing I’ve noticed is that current models like Claude 3.5 Sonnet can now generate non-trivial 100-line programs like small games that work in one shot and don’t have any syntax or logical errors. I don’t think that was possible with earlier models like GPT-3.5.
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.
Regarding coding in general, I basically only prompt programme these days. I only bother editing the actual code when I notice a persistent bug that the models are unable to fix after multiple iterations.
I don’t know jackshit about web development and have been making progress on a dashboard for alignment research with very little effort. Very easy to build new projects quickly. The difficulty comes when there is a lot of complexity in the code. It’s still valuable to understand how high-level things work and low-level things the model will fail to proactively implement.
While Carl Brown said (a few times) he doesn’t want to do more youtube videos for every new disappointing AI release, so far he seems to be keeping tabs on them in the newsletter just fine—https://internetofbugs.beehiiv.com/
...I am quite confident that if anything actually started to work, he would comment on it, so even if he won’t say much about any future incremental improvements, it might be a good resource to subscribe to for getting better signal—if Carl will get enthusiastic about AI coding assistants, it will be worth paying attention.
How can biochemical interventions be spatially localized, and why is that problem important?
High vs low voltage has very different semantics at different places on a computer chip. In one spot, a high voltage might indicate a number is odd rather than even. In another spot, a high voltage might indicate a number is positive rather than negative. In another spot, it might indicate a jump instruction rather than an add.
Likewise, the same chemical species have very different semantics at different places in the human body. For example, high serotonin concentration along the digestive tract is a signal to digest, whereas high serotonin concentration in various parts of the brain signals… uh… other stuff. Similarly, acetylcholine is used as a neurotransmitter both at neuromuscular junctions and in the brain, and these have different semantics. More generally, IIUC neurotransmitters like dopamine, norepinephrine, or serotonin are released by neurons originating at multiple anatomically distinct little sub-organs in the brain. Each sub-organ projects to different places, and the same neurotransmitter probably has different semantics when different sub-organs project to different targets.
Yet most pharmaceutical interventions target one type of molecule, or one receptor, or what have you, approximately everywhere. Such an intervention is analogous to e.g. attempting to make every float in a computer’s memory positive by flipping the first bit in every block, but then as a side-effect also changing a bunch of jump instructions to add instructions because there was no way to localize the effect to float-containing memory locations.
Thus the question: how can biochemical interventions be localized, especially in general-purpose ways? I’ll throw out some ideas off the top of my head, but I’m interested to hear other peoples’ thoughts as well.
Some Methods
Natural Barriers
The blood-brain barrier springs to mind as one example. If a chemical has different semantics in the brain and outside, and one wishes to target outside the brain, then just use a drug which can’t cross the barrier.
Implant + Slow Transport/Fast Breakdown
One could put an implant in the right spot to release a drug, and then choose a drug which either isn’t transported quickly or breaks down before it can get very far (or both).
Notably, making some random molecule diffuse less quickly seems relatively tractable: one can just attach a bigger molecule to it. And there’s an absolutely enormous space of possibilities for what that bigger molecule could be, so it’s especially likely to be tractable.
Genetic Modification
Cells already need the ability to tell “where they are” in order for us to have anatomically distinct regions at all. So in principle, it should be possible to genetically modify cells to do something different, but gate the change on the cell being in a particular distinct anatomical region, so cells everywhere else do the same thing as before.
For adult genetic modifications, one would probably want to combine this method with something similar to the implant + slow transport/fast release method above. Adult genetic modifications usually don’t hit every cell or even a majority of them, so an ideal use would be modifying some small percentage of cells to release a molecule which influences all the others. Slow diffusion/fast breakdown could then localize that molecule.
What Else?
I’m curious about other methods to localize biochemical interventions in the body, both speculative and already-existing.
It feels like unstructured play makes people better/stronger in a way that structured play doesn’t.
What do I mean? Unstructured play is the sort of stuff I used to do with my best friend in high school:
unscrewing all the cabinet doors in my parents’ house, turning them upside down and/or backwards, then screwing them back on
jumping in and/or out of a (relatively slowly) moving car
making a survey and running it on people at the mall
covering pool noodles with glow-in-the-dark paint, then having pool noodle sword fights with them at night while the paint is still wet, so we can tell who’s winning by who’s glowing more
In contrast, structured play is more like board games or escape rooms or sports. It has fixed rules. (Something like making and running a survey can be structured play or unstructured play or not play at all, depending on the attitude with which one approaches it. Do we treat it as a fun thing whose bounds can be changed at any time?)
I’m not quite sure why it feels like unstructured play makes people better/stronger, and I’d be curious to hear other peoples’ thoughts on the question. I’m going to write some of mine below, but maybe don’t look at them yet if you want to answer the question yourself?
Just streaming thoughts a bit...
Unstructured play encourages people to question the frame, change the environment/rules, treat social constraints as malleable. It helps one to notice degrees of freedom which are usually taken to be fixed.
Because there’s so much more freedom, unstructured play pushes people to notice their small desires moment-to-moment and act on them, rather than suppress them (as is normal most of the time).
Unstructured play offers an environment in which to try stuff one wouldn’t normally try, in a way which feels lower-risk.
… and probably others. But I’m not sure which such factor(s) most account for my gut feeling that unstructured play makes people better/stronger. (Or, to account for the other possibility, maybe the causal arrow goes the other way, i.e. better/stronger people engage more in unstructured play, and my gut feeling is picking up on that.) Which factor is most important for growing better/stronger?
I’m not quite sure why it feels like unstructured play makes people better/stronger
(Written before reading the second part of the OP.)
I don’t really share that feeling[1]. But if I conditioned on that being true and then produced an answer:
Obviously because it trains research taste.
Or, well, the skills in that cluster. If you’re free to invent/modify the rules of the game at any point, then if you’re to have fun, you need to be good at figuring out what rules would improve the experience for you/everyone, and what ideas would detract from it. You’re simultaneously acting as a designer and as a player. And there’s also the element of training your common-sense/world-modeling skills: what games would turn out fun and safe in the real world, and which ones seem fun in your imagination, but would end up boring due to messy realities or result in bodily harm.
By contrast, structured play enforces a paradigm upon you and only asks you to problem-solve within it. It trains domain-specific skills, whereas unstructured play is “interdisciplinary”, in that you can integrate anything in your reach into it.
More broadly: when choosing between different unstructured plays, you’re navigating a very-high-dimensional space of possible games, and (1) that means there’s simply a richer diversity of possible games you can engage in, which means a richer diversity of skills you can learn, (2) getting good at navigating that space is a useful skill in itself. Structured plays, on the other hand, present for choice a discrete set of options pre-computed to you by others.
Unstructured play would also be more taxing on real-time fluid-intelligence problem-solving. Inferring the rules (if they’ve been introduced/changed by someone else), figuring out how to navigate them on the spot, etc.
Which factor is most important for growing better/stronger?
What’s the sense of “growing better/stronger” you’re using here? Fleshing that out might make the answer obvious.
One thing we’ve been working on lately is finding natural latents in real datasets. Looking for natural latents between pairs of variables with only a few values each is relatively easy in practice with the math we have at this point. But that doesn’t turn up much in excel-style datasets, and one wouldn’t particularly expect it to turn up much in such datasets. Intuitively, it seems like more “distributed” latents are more practically relevant for typical excel-style datasets—i.e. latents for which many different observables each yield some weak information.
Here’s one operationalization, which runs into some cute math/numerical algorithm problems for which I have a working solution but not a very satisfying solution. Maybe you enjoy those sorts of problems and will want to tackle them!
Setup and Math
Assume we have (categorical) observable variables X1,...,Xm and a latent variable Λ. We’ll make two assumptions about the form of the distribution:
Assumption 1: P[X|Λ] has exponential form with all Xi independent given Λ. I.e. P[X|Λ]=∏i1Zi|Λ(λ)eλTfi(xi)=1Z|Λ(λ)eλT∑ifi(xi).
Assumption 2: P[Λ|X] is normal. I.e.P[Λ|X]=1Z|X(x)e−12λTSλ+λT(μ+∑ifi(xi))dλ for some inverse covariance matrix S and some μ.
(The notation Z|Λ just indicates that this is a normalizer for a distribution conditioned on Λ. There’s going to several normalizers floating around, so we need to distinguish them.) One could handwavily justify these assumptions, but we’ll take them as given for now.
The main thing we want to calculate is then P[X] for various “feature” functions f, in order to do model comparison between different feature functions.
Note that P[X] can only depend on x (not λ), and P[Λ] can only depend on λ (not x), so in general the form above implies
P[X]=1ZZ|X(x)
P[Λ]=1ZZ|Λ(λ)e−12λTSλ+λTμdλ
for some normalizer Z. Looking back at the earlier distributions, we have
Z|X(x)=∫λe−12λTSλ+λT(μ+∑ifi(xi))dλ
=(2π)k2|S|−12e12(μ+∑ifi(xi))TS−1(μ+∑ifi(xi))
Z=∑xZ|X(x)
… and then the tricky numerical algorithm problem is to efficiently calculate Z.
Current Strategy
Trick I’m currently using: we can view the sum ∑xZ|X(x) as taking an expectation of Z|X(x) under a uniform distribution Q[X]. Under that uniform distribution, ∑ifi(Xi) is a sum of independent random variables, so let’s wave our hands just a little and assume that sum is approximately normal. Then, modulo an easy to calculate constant, our problem is to compute
E[e12(μ+η)TS−1(μ+η)]
where η is normal, with mean and variance matching the mean and variance of ∑ifi(Xi) under the uniform distribution Q (which is easy to compute).
That expectation is a gaussian integral, so we can compute it exactly, and it simplifies somewhat with a little algebra. Problem is, it doesn’t always converge! If the variance of η is greater (along any direction) than S, then e12(μ+η)TS−1(μ+η) grows faster than probability density falls off along that direction, so the integral blows up to infinity. In that case, our assumption of normality probably still works fine in the middle of the distribution, but the value of Z is dominated by the tails.
Currently, my working solution is to set Z to infinity in the tail-dominated region, and then a maximum likelihood search for f values avoids that region. But I don’t have a good way to check how bad the error in the calculation is getting. (I can see that the optimizer isn’t going all the way to the edge of the non-tail-dominated region, so that’s a very good sign.)
It would be a lot nicer to either have an argument that maximum likelihood f values won’t end up in the tail-dominated region, or an efficient method to calculate Z in all cases.
If you want to play with this numerically, I also set S=I and μ=0, which can be done without loss of generality (except when S is zero or infinite along some direction) by absorbing them into f.
Trick I’m currently using: we can view the sum ∑xZ|X(x) as taking an expectation of Z|X(x) under a uniform distribution Q[X]. Under that uniform distribution, ∑ifi(Xi) is a sum of independent random variables, so let’s wave our hands just a little and assume that sum is approximately normal.
Not following this part. Can you elaborate?
Some scattered thoughts:
Regrading convergence, to state the probably obvious, since P[Xi∣Λ]∝∑xeλTfi(xi), fi(xi) at least has to go to zero for x going to infinity.
In my field-theory-brained head, the analysis seems simpler to think about for continuous x. So unless we’re married to x being discrete, I’d switch from ∑x to ∫dx. Then you can potentially use Gaussian integral and source-term tricks with the dependency on x as well. If you haven’t already, you might want to look at (quantum) field theory textbooks that describe how to calculate expectation values of observables over path integrals. This expression looks extremely like the kind of thing you’d usually want to calculate with Feynman diagrams, except I’m not sure whether the fi(xi)have the right form to allow us to power expand in xi and then shove the non-quadratic xi terms into source derivatives the way we usually would in perturbative quantum field theory.
If all else fails, you can probably do it numerically, lattice-QFT style, using techniques like hybrid Monte Carlo to sample points in the integral efficiently.[1]
I’m assuming, for simplicity, that each Xi has finitely many values. The sum on X is then a sum on the cartesian product of the values of each Xi, which we can rewrite in general as ∑Xg(X)=1∏iniEQ[g(X)], where Q is the uniform distribution on X and ni is the number of values of Xi. That uniform distribution Q is a product of uniform distributions over each individual Xi, i.e. Uniform[X]=∏iUniform[Xi], so the Xi‘s are all independent under Q. So, under Q, the fi(Xi)’s are all independent.
Did that clarify?
This expression looks extremely like the kind of thing you’d usually want to calculate with Feynman diagrams, except I’m not sure whether the fi(xi)have the right form to allow us to power expand in xi and then shove the non-quadratic xi terms into source derivatives the way we usually would in perturbative quantum field theory.
Yup, it sure does look similar. One tricky point here is that we’re trying to fit the f’s to the data, so if going that route we’d need to pick some parametetric form for f. We’d want to pick a form which always converges, but also a form general enough that the fitting process doesn’t drive f to the edge of our admissible region.
Yup, it sure does look similar. One tricky point here is that we’re trying to fit the f’s to the data, so if going that route we’d need to pick some parametetric form for f.
Ah. In that case, are you sure you actually need Z to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model p(λ∣X) to feed its output into m tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions qi(xi∣λ)?
To sketch out a little more what I mean, p(λ∣X) could e.g. be constructed as a parametrised function[1] which takes in the actual samples X and returns the mean of a Gaussian, which λ is then sampled from in turn[2]. The qi(xi∣λ) would be constructed using normalising flow networks[3], which take in λ as well as uniform distributions over variables zi that have the same dimensionality as their xi. Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities qi(xi∣λ), which you can then fit to the actual data using KL-divergence.
You’d get explicit representations for both P[λ∣X] and P[X∣λ] from this.
If the dictionary of possible values of X is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every xi of course.
That would be pretty reasonable, but it would make the model comparison part even harder. I do need P[X] (and therefore Z) for model comparison; this is the challenge which always comes up for Bayesian model comparison.
Why does it make Bayesian model comparison harder? Wouldn’t you get explicit predicted probabilities for the data X from any two models you train this way? I guess you do need to sample from the Gaussian in λ a few times for each X and pass the result through the flow models, but that shouldn’t be too expensive.
For my interest, for these reallife latents with many different pieces contributing a small amount of information do you reckon Eisenstat’s Condensation / some unpublished work you mentioned at ODYSSEY would be the right framework here?
Sort of. Condensation as-written requires what David and I call “strong redundancy”, i.e. the latent must be determinable from any one observable downstream, which is the opposite of “small amount of information from each individual observable”. But it’s pretty easy to bridge between the two mathematically by glomming together multiple observables into one, which is usually how David and I think about it.
The way you’d use this is:
Use the sort of machinery above to find a latent which is weakly loaded on many different observables.
Check how well that latent satisfies redundancy over some subset of the observables.
If we can find disjoint subsets of observables (any disjoint subsets) such that the latent can be determined reasonably well from any one of the subsets, then the machinery of natural latents/condensation kicks in to give us guarantees about universality of the latent.
No kidding? Did you get a sense of why the datasets I picked didn’t really work for the purpose when I gave that a try? Entirely possible that you don’t remember but it was a dataset of candidate exoplanets and an admittedly synthetic clustering tester set.
Haven’t been using that one, but I expect it would have very different results than the dataset we are using. That one would test very different things than we’re currently trying to get feedback on; there’s a lot more near-deterministic known structure in that one IIRC.
I’ve heard various people recently talking about how all the hubbub about artists’ work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.
If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:
Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.
Model/generator behind this: given the active political salience, it probably wouldn’t be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical component to the right lobbyist/regulator, is the main thing which would make a regulation actually do anything in practice.
Edit-to-add: also, the technical solution should ideally be an implementation of some method already published in some academic paper. Then when some lawyer or bureaucrat or whatever asks what it does and how we know it works, you can be like “look at this Official Academic Paper” and they will be like “ah, yes, it does Science, can’t argue with that”.
Suppose I have a binary function f, with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions—i.e. for each of the 21000000 possible inputs x, we flipped a coin to determine the output f(x) for that particular input.
Now, suppose I know f, and I know all but 50 of the input bits—i.e. I know 999950 of the input bits. How much information do I have about the output?
Answer: almost none. For almost all such functions, knowing 999950 input bits gives us ∼1250 bits of information about the output. More generally, If the function has n input bits and we know all but k, then we have o(12k) bits of information about the output. (That’s “little o” notation; it’s like big O notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.
Proof Sketch
With k input bits unknown, there are 2k possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have 2k independent coin flips. If m of those flips are 1, then we assign a probability of m2k that the output will be 1.
As long as 2k is large, Law of Large Numbers will kick in, and very close to half of those flips will be 1 almost surely—i.e. m≈2k2. The error in this approximation will (very quickly) converge to a normal distribution, and our probability that the output will be 1 converges to a normal distribution with mean 12 and standard deviation 12k/2. So, the probability that the output will be 1 is roughly 12±12k/2.
We can then plug that into Shannon’s entropy formula. Our prior probability that the output bit is 1 is 12, so we’re just interested in how much that ±12k/2 adjustment reduces the entropy. This works out to o(12k) bits.
Why Is This Interesting?
One core idea of my work on abstraction is that noise very quickly wipes out almost all information; only some very-low-dimensional summary is relevant “far away”. This example shows that this sort of thing is not unusual, but rather “the default”: for almost all random functions, information drops off exponentially with the number of unknown bits. In a large system (i.e. a function with many inputs), ignorance of even just a few bits is enough to wipe out essentially-all information. That’s true even if we know the vast majority of the bits.
A good intuitive example of this is the “butterfly effect”: the flap of a butterfly’s wings could change the course of a future hurricane, because chaos. But there’s an awful lot of butterflies in the world, and the hurricane’s path is some complicated function of all of their wing-flaps (and many other variables too). If we’re ignorant of even just a handful of these flaps, then almost all of our information about the hurricane’s path is probably wiped out. And in practice, we’re ignorant of almost all the flaps. This actually makes it much easier to perform Bayesian reasoning about the path of the hurricane: the vast majority of information we have is basically-irrelevant; we wouldn’t actually gain anything from accounting for the butterfly-wing-flaps which we do know.
o(1/2^k) doesn’t vary with n—are you saying that it doesn’t matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant? That would be quite interesting if so (though I have some question about how likely the function is to be truly random from an even distribution of such functions).
One can enumerate all such 3-bit functions (8 different inputs, each input can return 0 or 1, so 256 functions (one per output-bit-pattern of the 8 possible inputs). But this doesn’t seem to follow your formula—if you have 3 unknown bits, that should be 1⁄8 of a bit about the output, 2 for 1⁄4, and 1 unknown for 1⁄2 a bit about the output. But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.
o(1/2^k) doesn’t vary with n—are you saying that it doesn’t matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant?
Yes, that’s correct.
But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.
The claim is for almost all functions when the number of inputs is large. (Actually what we need is for 2^(# of unknown bits) to be large in order for the law of large numbers to kick in.) Even in the case of 3 unknown bits, we have 256 possible functions, and only 18 of those have less than 1⁄4 1′s or more than 3⁄4 1′s among their output bits.
I’m not sure what context that link is assuming, but in an analysis context I typically see little o used in ways like e.g. “f(x)=f(x0)+dfdx|x0dx+o(dx2)”. The interpretation is that, as dx goes to 0, the o(dx2) terms all fall to zero at least quadratically (i.e. there is some C such that Cdx2 upper bounds the o(dx2) term once dx is sufficiently small). Usually I see engineers and physicists using this sort of notation when taking linear or quadratic approximations, e.g. for designing numerical algorithms.
I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here’s a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated—comments on the doc are open.
I’m mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it’s up.
EDIT: it’s up. Thank you to Stephen for comments; the post is better as a result.
Here’s a place where I feel like my models of romantic relationships are missing something, and I’d be interested to hear peoples’ takes on what it might be.
Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella’s data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
What doesn’t make sense under my current models is why so many of these relationships persist. Why don’t the men in question just leave? Obviously they might not have better relationship prospects, but they could just not have any relationship. The central question which my models don’t have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?
Some obvious candidate answers:
Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There’s plenty of cases which make sense which are individually unusual—e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they’d plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there’s a lot of activation energy blocking that change.
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.
I’m interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?
Edit-to-add: apparently lots of people are disagreeing with this, but I don’t know what specifically you all are disagreeing with, it would be much more helpful to at least highlight some specific sentence or leave a comment or something.
Ah, I think this just reads like you don’t think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to ‘settle’, rather than things that could actually plausibly outweigh sexual satisfaction.
Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).
I’m mostly just trying to explain all the disagree votes. I think you’ll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you’re looking for here).
I think you’ll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you’re looking for here).
There are a lot of replies here, so I’m not sure whether someone already mentioned this, but: I have heard anecdotally that homosexual men often have relationships which maintain the level of sex over the long term, while homosexual women often have long-term relationships which very gradually decline in frequency of sex, with barely any sex after many decades have passed (but still happily in a relationship).
This mainly argues against your model here:
This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
It suggests instead that female sex drive naturally falls off in long-term relationships in a way that male sex drive doesn’t, with sexual attraction to a partner being a smaller factor.
Note: You can verify this is the case by filtering for male respondents with male partners and female respondents with female partners in the survey data
“I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”
Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50⁄50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.
men are the ones who die sooner if divorced, which suggests
Causality dubious, seems much more likely on priors that men who divorced are disproportionately those with Shit Going On in their lives. That said, it is pretty plausible on priors that they’re getting a lot out of marriage.
I will also note that Aella’s relationships data is public, and has the following questions:
1. Your age? (rkkox57)
2. Which category fits you best? (4790ydl)
3. In a world where your partner was fully aware and deeply okay with it, how much would you be interested in having sexual/romantic experiences with people besides your partner? (ao3mcdk)
4. In a world where you were fully aware and deeply okay with it, how much would *your partner* be interested in having sexual/romantic experiences with people besides you? (wcq3vrx)
5. To get a little more specific, how long have you been in a relationship with this person? (wqx272y)
6. Which category fits your partner best? (u9jccbo)
7. Are you married to your partner? (pfqs9ad)
8. Do you have children with your partner? (qgjf1nu)
9. Have you or your partner ever cheated on each other? (hhf9b8h)
10. On average, over the last six months, about how often do you watch porn or consume erotic content for the purposes of arousal? (vnw3xxz)
11. How often do you and your partner have a fight? (x6jw4sp)
12. "It’s hard to imagine being happy without this relationship." (6u0bje)
13. "I have no secrets from my partner" (bgassjt)
14. "If my partner and I ever split up, it would be a logistical nightmare (e.g., separating house, friends) (e1claef)
15. "If my relationship ended I would be absolutely devastated" (2ytl03s)
16. "I don't really worry about other attractive people gaining too much of my partner's affection" (61m55wv)
17. "I sometimes worry that my partner will leave me for someone better" (xkjzgym)
18. "My relationship is playful" (w2uykq1)
19. "My partner an I are politically aligned" (12ycrs5)
20. "We have compatible humor" (o9empfe)
21. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk)
22. "The passion in this relationship is deeply intense" (gwzrhth)
23. "I share the same hobbies with my partner" (89hl8ys)
24. "My relationship causes me grief or sorrow" (rm0dtr6)
25. "If we broke up, I think I could date a higher quality person than they could" (vh27ywp)
26. "In hindsight, getting into this relationship was a bad idea" (1y6wfih)
27. "I feel like I would still be a desirable mate even if my partner left me" (qboob7y)
28. "My partner and I are sexually compatible" (9nxbebp)
29. "I often feel jealousy in my relationship" (kfcicm9)
30. "I think this relationship will last for a very long time" (ob8595u)
31. "My partner enables me to learn and grow" (e2oy448)
32. "My partner doesn't excite me" (6fcm06c)
33. "My partner doesn't sexually fulfill me" (xxf5wfc)
34. "I rely on my partner for a sense of self worth" (j0nv7n9)
35. "My partner and I handle fights well" (brtsa94)
36. "I feel confident in my relationship's ability to withstand everything life has to throw at us" (p81ekto)
37. "I sometimes fear my partner" (a21v31h)
38. "I try to stay aware of my partner's potential infidelity" (5qbgizc)
39. "I share my thoughts and opinions with my partner" (6lwugp9)
40. "This relationship is good for me" (wko8n8m)
41. "My partner takes priority over everything else in my life" (2sslsr1)
42. "We respect each other" (c39vvrk)
43. "My partner is more concerned with being right than with getting along" (rlkw670)
44. "I am more needy than my partner" (f3or362)
45. "I feel emotionally safe with my partner" (or9gg0a)
46. "I'm satisfied with our sex life" (6g14ks)
47. "My partner physically desires me" (kh7ppyp)
48. "My partner and I feel comfortable explicitly discussing our relationship on a meta level" (jrzzb06)
49. "My partner knows all my sexual fantasies" (s3cgjd2)
50. "My partner and I are intellectually matched" (ku1vm67)
51. "I am careful to maintain a personal identity separate from my partner" (u5esujt)
52. "I'm worried I'm not good enough for my partner" (45rohqq)
53. "My partner judges me" (fr4mr4a)
54. Did you answer this survey honestly/for a real partner? (7bfie2v)
55. On average, over the last six months, about how often do you and your partner have sex? (n1iblql)
56. Is the partner you just answered for, your longest romantic relationship? (zjfk3cu)
which should allow you to test a lot of your candidate answers, for example your first 3 hypotheses could be answered by looking at these:
Do you have children with your partner? (qgjf1nu)
“If my partner and I ever split up, it would be a logistical nightmare (e.g., separating house, friends) (e1claef) or 21. “The long-term routines and structure of my life are intertwined with my partner’s” (li0toxk)
“I feel like I would still be a desirable mate even if my partner left me” (qboob7y)
I see two explanations: the boring wholesome one and the interesting cynical one.
The wholesome one is: You’re underestimating how much other value the partner offers and how much the men care about the mostly-platonic friendship. I think that’s definitely a factor that explains some of the effect, though I don’t know how much.
The cynical one is: It’s part of the template. Men feel that are “supposed to” have wives past a certain point in their lives; that it’s their role to act. Perhaps they even feel that they are “supposed to” have wives they hate, see the cliché boomer jokes.
They don’t deviate from this template, because:
It’s just something that is largely Not Done. Plans such as “I shouldn’t get married” or “I should get a divorce” aren’t part of the hypothesis space they seriously consider.
In the Fristonian humans-are-prediction-error-minimizers frame: being married is what the person expects, so their cognition ends up pointed towards completing the pattern, one way or another. As a (controversial) comparison, we can consider serial abuse victims, which seem to somehow self-select for abusive partners despite doing everything in their conscious power to avoid them.
In your parlance: The “get married” life plan becomes the optimization target, rather than a prediction regarding how a satisfying life will look like.
More generally: Most humans most of the time are not goal-optimizers, but adaptation-executors (or perhaps homeostatic agents). So “but X isn’t conductive to making this human happier” isn’t necessarily a strong reason to expect the human not to do X.
Deviation has social costs/punishments. Being viewed as a loser, not being viewed as a reliable “family man”, etc. More subtly: this would lead to social alienation, inability to relate. Consider the cliché “I hate my wife” boomer jokes again. If everyone in your friend group is married and makes these jokes all the time, and you aren’t, that would be pretty ostracizing.
Deviation has psychological costs. Human identities (in the sense of “characters you play”) are often contextually defined. If someone spent ten years defining themselves in relation to their partner, and viewing their place in the world as part of a family unit, exiting the family unit would be fairly close to an identity death/life losing meaning. At the very least, they’d spend a fair bit of time adrift and unsure who they are/how to relate to the world anew – which means there are friction costs/usual problems with escaping a local optimum.
Not-deviation has psychological benefits. The feeling of “correctness”, coming to enjoy the emotional labor, enjoying having a dependent, etc.
I don’t know which of the two explains more of the effect. I’m somewhat suspicious of the interesting satisfyingly cynical one, simply because it’s satisfyingly cynical and this is a subject for which people often invent various satisfyingly cynical ideas. It checks out to me at the object level, but it doesn’t have to be the “real” explanation. (E. g., the “wholesome” reasons may be significant enough that most of the men wouldn’t divorce even if the template dynamics were magically removed.)
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Assuming arguendo this is true: if you care primarily about sex, hiring sex workers is orders of magnitude more efficient than marriage. Therefor the existence of a given marriage is evidence both sides get something out of it besides sex.
If both partners have an income, then living together is usually cheaper than each of them living alone, and sex is just a bonus to that. How would sex workers be the cheaper alternative?
Making no claim about the actual value of each, but can’t I counter your specific argument by saying, marriage is a socially enforced cartel for sex, and if they could do so without being punished, a lot more men would rather not get sex without getting married?
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Imagine a woman is a romantic relationship with somebody else. Are they still so great a person that you would still enjoy hanging out with them as a friend? If not that woman should not be your girlfriend. Friendship first. At least in my model romantic stuff should be stacked ontop of platonic love.
I guess I feel kind of confused by the framing of the question. I don’t have a model under which the sexual aspect of a long-term relationship typically makes up the bulk of its value to the participants. So, if a long-term relationship isn’t doing well on that front, and yet both participants keep pursuing the relationship, my first guess would be that it’s due to the value of everything that is not that. I wouldn’t particularly expect any one thing to stick out here. Maybe they have a thing where they cuddle and watch the sunrise together while they talk about their problems. Maybe they have a shared passion for arthouse films. Maybe they have so much history and such a mutually integrated life with partitioned responsibilities that learning to live alone again would be a massive labour investment, practically and emotionally. Maybe they admire each other. Probably there’s a mixture of many things like that going on. Love can be fed by many little sources.
So, this I suppose:
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
I don’t find it hard at all to see how that’d add up to something that vastly outweighs the costs, and this would be my starting guess for what’s mainly going on in most long-term relationships of this type.
Update 3 days later: apparently most people disagree strongly with
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Most people in the comments so far emphasize some kind of mysterious “relationship stuff” as upside, but my actual main update here is that most commenters probably think the typical costs are far far lower than I imagined? Unsure, maybe the “relationship stuff” is really ridiculously high value.
So I guess it’s time to get more concrete about the costs I had in mind:
A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I’m generally considering the no-kids case here; I don’t feel as confused about couples with kids.)
I was picturing an anxious attachment style as the typical female case (without kids). That’s unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
Eyeballing Aella’s relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
Less legibly… conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they’re not having much sex. For instance, there’s a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is “No, he wasn’t looking for a fishing rod. He came in looking for tampons, and I told him ‘dude, your weekend is shot, you should go fishing!’”.
(One thing to emphasize in these: sex isn’t just a major value prop in its own right, I also expect that lots of the main costs of a relationship from the man’s perspective are mitigated a lot by sex. Like, the sex makes the female partner behave less unpleasantly for a while.)
So, next question for people who had useful responses (especially @Lucius Bushnaq and @yams): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?
A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I’m generally considering the no-kids case here; I don’t feel as confused about couples with kids.
But remember that you already conditioned on ‘married couples without kids’. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples. These properties seem like they’d be heavily anti-correlated.
In the subset of man-woman married couples without kids that get along, I wouldn’t be surprised if having a partner effectively works out to more money for both participants, because you’ve got two incomes, but less than 2x living expenses.
I was picturing an anxious attachment style as the typical female case (without kids). That’s unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
I am … not … picturing that as the typical case? Uh, I don’t know what to say here really. That’s just not an image that comes to mind for me when I picture ‘older hetero married couple’. Plausibly I don’t know enough normal people to have a good sense of what normal marriages are like.
Eyeballing Aella’s relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
I think for many of those couples that fight multiple times a month, the alternative isn’t separating and finding other, happier relationships where there are never any fights. The typical case I picture there is that the relationship has some fights because both participants aren’t that great at communicating or understanding emotions, their own or other people’s. If they separated and found new relationships, they’d get into fights in those relationships as well.
It seems to me that lots of humans are just very prone to getting into fights. With their partners, their families, their roommates etc., to the point that they have accepted having lots of fights as a basic fact of life. I don’t think the correct takeaway from that is ‘Most humans would be happier if they avoided having close relationships with other humans.’
Less legibly… conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they’re not having much sex. For instance, there’s a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is “No, he wasn’t looking for a fishing rod. He came in looking for tampons, and I told him ‘dude, your weekend is shot, you should go fishing!’”.
Conventional wisdom also has it that married people often love each other so much they would literally die for their partner. I think ‘conventional wisdom’ is just a very big tent that has room for everything under the sun. If even 5-10% of married couples have bad relationships where the partners actively dislike each other, that’d be many millions of people in the English speaking population alone. To me, that seems like more than enough people to generate a subset of well-known conventional wisdoms talking about how awful long-term relationships are.
Case in point, I feel like I hear those particular conventional wisdoms less commonly these days in the Western world. My guess is this is because long-term heterosexual marriage is no longer culturally mandatory, so there’s less unhappy couples around generating conventional wisdoms about their plight.
So, next question for people who had useful responses (especially @Lucius Bushnaq and @yams): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?
So, in summary, both I think? I feel like the ‘typical’ picture of a hetero marriage you sketch is more like my picture of an ‘unusually terrible’ marriage. You condition on a bad sexual relationship and no children and the woman doesn’t earn money and the man doesn’t even like her, romantically or platonically. That subset of marriages sure sounds like it’d have a high chance of the man just walking away, barring countervailing cultural pressures. But I don’t think most marriages where the sex isn’t great are like that.
This comment gave me the information I’m looking for, so I don’t want to keep dragging people through it. Please don’t feel obligated to reply further!
That said, I did quickly look up some data on this bit:
But remember that you already conditioned on ‘married couples without kids’. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples.
… so I figured I’d drop it in the thread.
When interpreting these numbers, bear in mind that many couples with no kids probably intend to have kids in the not-too-distant future, so the discrepancy shown between “no children” and 1+ children is probably somewhat smaller than the underlying discrepancy of interest (which pushes marginally more in favor of Lucius’ guess).
Not sure how much this generalizes to everyone, but part of the story (for either the behavior or the pattern of responses to the question) might that some people are ideologically attached to believing in love: that women and men need each other as a terminal value, rather than just instrumentally using each other for resources or sex. For myself, without having any particular empirical evidence or logical counterargument to offer, the entire premise of the question just feels sad and gross. It’s like you’re telling me you don’t understand why people try to make ghosts happy. But I want ghosts to be happy.
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact “most men would be happier single but are ideologically attached to believing in love”, then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I’ve learned is that lots of people are triggered by the question, but that doesn’t really tell me much about the underlying reality.
Track record: My own cynical take seems to be doing better with regards to not triggering people (though it’s admittedly less visible).
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much?
First off, I’m kind of confused about how you didn’t see this coming. There seems to be a major “missing mood” going on in your posts on the topic – and I speak as someone who is sorta-aromantic, considers the upsides of any potential romantic relationship to have a fairly low upper bound for himself[1], and is very much willing to entertain the idea that a typical romantic relationship is a net-negative dumpster fire.
So, obvious-to-me advice: Keep a mental model of what topics are likely very sensitive and liable to trigger people, and put in tons of caveats and “yes, I know, this is very cynical, but it’s my current understanding” and “I could totally be fundamentally mistaken here”.
In particular, a generalization of an advice from here has been living in my head rent-free for years (edited/adapted):
Tips For Talking About Your Beliefs On Sensitive Topics
You want to make it clear that they’re just your current beliefs about the objective reality, and you don’t necessarily like that reality so they’re not statements about how the world ought to be, and also they’re not necessarily objectively correct and certainly aren’t all-encompassing so you’re not condemning people who have different beliefs or experiences. If you just say, “I don’t understand why people do X,” everyone will hear you as saying that everyone who does X is an untermensch who should be gutted and speared because in high-simulacrum-level environments disagreeing with people is viewed as a hostile act attempting to lower competing coalitions’ status, and failing to furiously oppose such acts will get you depowered and killed. So be sure to be extra careful by saying something like, “It is my current belief, and I mean with respect to my own beliefs about the objective reality, that a typical romantic relationship seems flawed in lots of ways, but I stress, and this is very important, that if you feel or believe differently, then that too is a valid and potentially more accurate set of beliefs, and we don’t have to OH GOD NOT THE SPEARS ARRRGHHHH!”
More concretely, here’s how I would have phrased your initial post:
Rewrite
Here’s a place where my model of the typical traditional romantic relationships seems to be missing something. I’d be interested to hear people’s takes on what it might be.
Disclaimer: I’m trying to understand the general/stereotypical case here, i. e., what often ends up happening in practice. I’m not claiming that this is how relationships ought to be like, nor that all existing relationships are like this. But on my model, most people are deeply flawed, they tend to form deeply flawed relationships, and I’d like to understand why these relationships still work out. Bottom line is, this is going to be a fairly cynical/pessimistic take (with the validity of its cynicism being something I’m willing to question).
Background claims:
My model of the stereotypical/traditional long-term monogamous hetero relationship has a lot of downsides for men. For example:
Financial costs: Up to 50% higher living costs (since in the “traditional” template, men are the breadwinners.)
Frequent, likely highly stressful, arguments. See Aella’s relationship survey data: a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more.
General need to manage/account for the partner’s emotional issues. (My current model of the “traditional” relationship assumes the anxious attachment style for the woman, which would be unpleasant to manage.)
For hetero men, consistent sexual satisfaction is a major upside offered by a relationship, providing a large fraction of the relationship-value.
A majority of traditional relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella’s data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of dating: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
Taking on purely utilitarian lens, for a relationship to persist, the benefits offered by it should outweigh its costs. However, on my current model, that shouldn’t be the case for the average man. I expect the stated downsides to be quite costly, and if we remove consistent sex from the equation, the remaining value (again, for a stereotypical man) seems comparatively small.
So: Why do these relationships persist? Obviously the men might not have better relationship prospects, but they could just not have any relationship. The central question which my models don’t have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?
Some obvious candidate answers:
The cultural stereotypes diverge from reality in some key ways, so my model is fundamentally mistaken. E. g.:
I’m overestimating the downsides: the arguments aren’t that frequent/aren’t very stressful, female partners aren’t actually “high-maintanance”, etc.
I’m overestimating the value of sex for a typical man.
I’m underestimating how much other value relationships offers men. If so: what is that “other value”, concretely? (Note that it’d need to add up to quite a lot to outweigh the emotional and financial costs, under my current model.)
Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There’s plenty of cases which make sense which are individually unusual—e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they’d plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there’s a lot of activation energy blocking that change.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.
I’m interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?
Obvious way to A/B test this would be to find some group of rationalist-y people who aren’t reading LW/your shortform, post my version there, and see the reactions. Not sure what that place would be. (EA forum? r/rational’s Friday Open Threads? r/slatestarcodex? Some Discord/Substack group?)
Adapting it for non-rationalist-y audiences (e. g., r/AskMen) would require more rewriting. Mainly, coating the utilitarian language in more, ahem, normie terms.
Given the choice between the best possible romantic relationship and $1m, I’d pick $1m. Absent munchkinry like “my ideal girlfriend is a genius alignment researcher on the level of von Neumann and Einstein”.
what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?
I think it’s net negative. Seen it with any combination of genders. The person who’s less happy in the relationship stays due to force of habit, fear of the unknown, and the other person giving them a precise minimum of “crumbs” to make them stay. Even a good relationship can fall into this pattern slowly, with the other person believing all along that everything is fine. And when it finally breaks (often due to some random event breaking the suspension of disbelief), the formerly unhappy person is surprised how much better things become.
An effect I noticed: Going through Aella’s correlation matrix (with poorly labeled columns sadly), a feature which strongly correlates with the length of a relationship is codependency. Plotting question 20. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk) assuming that’s what “codependency” refers to
The shaded region is a 95% posterior estimate for the mean of the distribution conditioned on the time-range (every 2 years) and cis-male respondents, with prior N(0,0.5).
Note also that codependency and sex satisfaction are basically uncorrelated
This shouldn’t be that surprising. Of course the longer two people are together the more their long term routines will be caught up with each other. But also this seems like a very reasonable candidate for why people will stick together even without a good sex life.
a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so.
This seems supported by the popular wisdom. Question is, how much this is about relationships and sex specifically, and how much it is just another instance of a more general “life is full of various frustrations” or “when people reach their goals, after some time they became unsatisfied again” i.e. hedonistic treadmill.
sexual satisfaction is basically binary
Is it?
most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
So, basically those women pretend to be more attracted than they are (to their partner, and probably also to themselves) in order to get married. Then they gradually stop pretending.
But why is it so important to get married (or whatever was the goal of the original pretending), but then it is no longer important to keep the marriage happy? Is that because women get whatever they want even from an unhappy marriage, and divorces are unlikely? That doesn’t feel like a sufficient explanation to me: divorces are quite frequent, and often initiated by women.
I guess I am not sure what exactly is the women’s utility function that this model assumes.
Why don’t the men in question just leave?
Kids, not wanting to lose money in divorce, other value the partner provides, general lack of agency, hoping that the situation will magically improve… probably all of that together.
Also, it seems to me that often both partners lose value on the dating market when they start taking their relationship for granted, stop trying hard, gain weight, stop doing interesting things, and generally get older. Even if the guy is frustrated, that doesn’t automatically mean that entering the dating market again would make him happy. I imagine that many divorced men find out that an alternative to “sex once a month” could also be “sex never” (or “sex once a month, but it also takes a lot of time and effort and money”).
Worth noting that this pattern occurs among gay couples as well! (i.e. sexless long-term-relationship, where one party is unhappy about this).
I think that conflict in desires/values is inherent in all relationship, and long-term-relationships have more room for conflict because they involve a closer/longer relationship. Sex drive is a major area where partners tend to diverge especially frequently (probably just for biological reasons in het couples).
It’s not obvious to me that sex in marriages needs much special explanation beyond the above. Unless of course the confusion is just “why don’t people immediately end all relationships whenever their desires conflict with those of their counterparty”.
A general source of problems is that when people try to get a new partner, they try to be… more appealing than usual, in various ways. Which means that after the partner is secured, the behavior reverts to the norm, which is often a disappointment.
One way how people try to impress their partners is that the one with lower sexual drive pretends to be more enthusiastic about sex than they actually are in long term. So the moment one partner goes “amazing, now I finally have someone who is happy to do X every day or week”, the other partner goes “okay, now that the courtship phase is over, I guess I no longer have to do X every day or week”.
There are also specific excuses in heterosexual couples, like the girl pretending that she is actually super into doing sex whenever possible, it’s just that she is too worried about accidental pregnancy or her reputation… and when these things finally get out of the way, it turns out that it was just an excuse.
Perhaps the polyamorous people keep themselves in better shape, but I suspect that they have similar problems, only instead of “my partner no longer wants to do X” it is “my partner no longer wants to do X with me”.
I thought I would give you another causal model based on neuroscience which might help.
I think your models are missing a core biological mechanism: nervous system co-regulation.
Most analyses of relationship value focus on measurable exchanges (sex, childcare, financial support), but overlook how humans are fundamentally regulatory beings. Our nervous systems evolved to stabilize through connection with others.
When you share your life with someone, your biological systems become coupled. This creates several important values:
Your stress response systems synchronize and buffer each other. A partner’s presence literally changes how your body processes stress hormones—creating measurable physiological benefits that affect everything from immune function to sleep quality.
Your capacity to process difficult emotions expands dramatically with someone who consistently shows up for you, even without words.
Your nervous system craves predictability. A long-term partner represents a known regulatory pattern that helps maintain baseline homeostasis—creating a biological “home base” that’s deeply stabilizing.
For many men, especially those with limited other sources of deep co-regulation, these benefits may outweigh sexual dissatisfaction. Consider how many men report feeling “at peace” at home despite minimal sexual connection—their nervous systems are receiving significant regulatory benefits.
This also explains why leaving feels so threatening beyond just practical considerations. Disconnecting an integrated regulatory system that has developed over years registers in our survival-oriented brains as a fundamental threat.
This isn’t to suggest people should stay in unfulfilling relationships—rather, it helps explain why many do, and points to the importance of developing broader regulatory networks before making relationship transitions.
reading it is weird, because my model is somewhat the opposite—more women initiate divorce then men, and more women will gain from initiating it, and remain in relationships they should leave.
women make more of the housework, more of the emotional labor (the point about women require emotional work is wildly contradicting my model), more of the maintaining social ties (there are studies i read about that, and socialization reasons for that. women have more friends and more intimate friends, and a lot of men freeload on their gf friendships and have no intimate relationship that is not romantic).
it can be that both are true, and it’s not hard imagining two deeply incompatible people, when breaking up will be net-positive for both of them. but this is not my actual model, nor are the statistics i encountered—for example, that married men live longer, while married women shorter. in my model, in standard marriage, the wins-from-trade are distributed unevenly, and a lot of times the man gain and the woman lose. and all that still hold marriages is kids, and the remains of social stigma. and i know various statistics -about housework and happiness after the spouse die and life expectancy that does not contradict this model.
I also encountered a lot of anecdata that sounds like (not actual citation) “i broke up, this bf made my life so much worse” and even (not actual citation) “i divorced, and despite having to do all the work alone and not having the money he provided, i have more time, because he was so useless housework and childcare-wise, that he net-added work, and i much easier without him.”
so, like, models when marriages are net-negative for men look to me so strange, and one that i don’t know how to reconcile with so much contradicting data.
personal desire to be worthy of being an example vindicating the hope that good guys can ‘get the girl’; giving up on one means nothing will ever stay and doom is eternal
Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model
Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
I expect that many peoples’ intuitive mental models around utility maximization boil down to “boo utility maximizer models”, and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
FWIW I endorse the second claim when the utility function depends exclusively on the state of the world in the distant future, whereas I endorse the first claim when the utility function can depend on anything whatsoever (e.g. what actions I’m taking right this second). (details)
I wish we had different terms for those two things. That might help with any alleged yay/boo reasoning.
(When Eliezer talks about utility functions, he seems to assume that it depends exclusively on the state of the world in the distant future.)
Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let’s take this as a reasonable assumption).
No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let’s assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations).
So, my claim is not that modeling a system as an expected utility maximizer can’t be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned.
Of course, you can model any system, as an expected utility maximizer. But just because I can use the “high level” conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system.
So the following seems to be beside the point unless I am <missing/misunderstanding> something:
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I’m saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.
I am not an expert, but as I remember it, it was a claim that “any system that follows certain axioms can be modeled as maximizing some utility function”. The axioms assumed that there were no circular preferences—if someone prefers A to B, B to C, and C to A, it is impossible to define a utility function such that u(A) > u(B) > u(C) > u(A) -- and that if the system says that A > B > C, it can decide between e.g. a 100% chance of B, and a 50% chance of A with a 50% chance of C, again in a way that is consistent.
I am not sure how this works when the system is allowed to take current time into account, for example when it is allowed to prefer A to B on Monday but prefer B to A on Tuesday. I suppose that in such situation any system can trivially be modeled by a utility function that at each moment assigns utility 1 to what the system actually did in that moment, and utility 0 to everything else.
Corrigibility is incompatible with assigning utility to everything in advance. A system that has preferences about future will also have a preference about not having its utility function changed. (For the same reason people have a preference not to be brainwashed, or not to take drugs, even if after brainwashing they are happy about having been brainwashed, and after getting addicted they do want more drugs.)
Corrigible system would be like: “I prefer A to B at this moment, but if humans decide to fix me and make me prefer B to A, then I prefer B to A”. In other words, it doesn’t have values for u(A) and u(B), or it doesn’t always act according to those values. A consistent system that currently prefers A to B would prefer not to be fixed.
A utility function represents preference elicited in a large collection of situations, each a separate choice between events that happens with incomplete information, as an event is not a particular point. This preference needs to be consistent across different situations to be representable by expected utility of a single utility function.
Once formulated, a utility function can be applied to a single choice/situation, such as a choice of a policy. But a system that only ever makes a single choice is not a natural fit for expected utility frame, and that’s the kind of system that usually appears in “any system can be modeled as maximizing some utility function”. So it’s not enough to maximize something once, or in a narrow collection of situations, the situations the system is hypothetically exposed to need to be about as diverse as choices between any pair of events, with some of the events very large, corresponding to unreasonably incomplete information, all drawn across the same probability space.
One place this mismatch of frames happens is with updateless decision theory. An updateless decision is a choice of a single policy, once and for all, so there is no reason for it to be guided by expected utility, even though it could be. The utility function for the updateless choice of policy would then need to be obtained elsewhere, in a setting that has all these situations with separate (rather than all enacting a single policy) and mutually coherent choices under uncertainty. But once an updateless policy is settled (by a policy-level decision), actions implied by it (rather than action-level decisions in expected utility frame) no longer need to be coherent. Not being coherent, they are not representable by an action-level utility function.
So by embracing updatelessness, we lose the setting that would elicit utility if the actions were instead individual mutually coherent decisions. And conversely, by embracing coherence of action-level decisions, we get an implied policy that’s not updatelessly optimal with respect to the very precise outcomes determined by any given whole policy. So an updateless agent founded on expected utility maximization implicitly references a different non-updateless agent whose preference is elicited by making separate action-level decisions under a much greater uncertainty than the policy-level alternatives the updateless agent considers.
I don’t think claim 1 is wrong, but it does clash with claim 2.
That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way “whatever utility function is maximizes must be along multiple dimensions”.
Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else).
Note to self: Think more about this and if possible write up something more coherent and explanatory.
One second-order effect of the pandemic which I’ve heard talked about less than I’d expect:
This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it’s really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, …
For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.
How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the ‘making fast food in a stall in a Third World country’ sort of ‘startup’, which make essentially no or negative long-term contributions).
Good question. I haven’t seen particularly detailed data on these on FRED, but they do have separate series for “high propensity” business applications (businesses they think are likely to hire employees), business applications with planned wages, and business applications from corporations, as well as series for each state. The spike is smaller for planned wages, and nonexistent for corporations, so the new businesses are probably mostly single proprietors or partnerships. Other than that, I don’t know what the breakdown looks like across industries.
How do you feel about this claim now? I haven’t noticed a whole lot of innovation coming from all these small businesses, and a lot of them seem like they were likely just vehicles for the extraordinary extent of fraud as the results from all the investigations & analyses come in.
… so it’s presumably also not just the result of pandemic giveaway fraud, unless that fraud is ongoing.
Presumably the thing to check here would be TFP, but Fred’s US TFP series currently only goes to end of 2019, so apparently we’re still waiting on that one? Either that or I’m looking at the wrong series.
Neat problem of the week: researchers just announced roughly-room-temperature superconductivity at pressures around 270 GPa. That’s stupidly high pressure—a friend tells me “they’re probably breaking a diamond each time they do a measurement”. That said, pressures in single-digit GPa do show up in structural problems occasionally, so achieving hundreds of GPa scalably/cheaply isn’t that many orders of magnitude away from reasonable, it’s just not something that there’s historically been much demand for. This problem plays with one idea for generating such pressures in a mass-produceable way.
Suppose we have three materials in a coaxial wire:
innermost material has a low thermal expansion coefficient and high Young’s modulus (i.e. it’s stiff)
middle material is a thin cylinder of our high-temp superconducting concoction
outermost material has a high thermal expansion coefficient and high Young’s modulus.
We construct the wire at high temperature, then cool it. As the temperature drops, the innermost material stays roughly the same size (since it has low thermal expansion coefficient), while the outermost material shrinks, so the superconducting concoction is squeezed between them.
Exercises:
Find an expression for the resulting pressure in the superconducting concoction in terms of the Young’s moduli, expansion coefficients, temperature change, and dimensions of the inner and outer materials. (Assume the width of the superconducting layer is negligible, and the outer layer doesn’t break.)
Look up parameters for some common materials (e.g. steel, tungsten, copper, porcelain, aluminum, silicon carbide, etc), and compute the pressures they could produce with reasonable dimensions (assuming that their material properties don’t change too dramatically with such high pressures).
Find an expression for the internal tension as a function of radial distance in the outermost layer.
Pick one material, look up its tensile strength, and compute how thick it would have to be to serve as the outermost layer without breaking, assuming the superconducting layer is at 270 GPa.
My cached thoughts start with a somewhat different question—not “what role does magic play in fantasy fiction?” (e.g. what fantasies does it fulfill), but rather… insofar as magic is a natural category, what does it denote? So I’m less interested in the relatively-expansive notion of “magic” sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called “magic” which recurs among tons of real-world ancient cultures.
Claim (weakly held): the main natural category here is symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore—it doesn’t make the world do something different. But if the symbols are “magic”, then changing the symbols changes the things they represent in the world. Canonical examples:
Wizard/shaman/etc draws magic symbols, speaks magic words, performs magic ritual, or even thinks magic thoughts, thereby causing something to happen in the world.
Messing with a voodoo doll messes with the person it represents.
“Sympathetic” magic, which explicitly uses symbols of things to influence those things.
Magic which turns emotional states into reality.
I would guess that most historical “magic” was of this type.
Everybody’s been talking about Paxlovid, and how ridiculous it is to both stop the trial since it’s so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don’t think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.
Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then I just stop there. (Indeed, it looks like the Paxlovid study only had 30 actual data points, i.e. people hospitalized.) Rather than only getting “significance” if all 100 data points together are significant, I can declare “significance” if the p-value drops below the line at any time. That gives me a lot more choices in the garden of forking counterfactual paths.
Now, success rates on most clinical trials are not very high. (They vary a lot by area—most areas are about 15-25%. Cancer is far and away the worst, below 4%, and vaccines are the best, over 30%.) So I’d expect that p-hacking is a pretty large chunk of approved drugs, which means pharma companies are heavily selected for things like finding-excuses-to-halt-good-seeming-trials-early.
Early stopping is a pretty standard p-hacking technique.
It was stopped after a pre-planned interim analysis; that means they’re calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.
Here’s an AI-driven external cognitive tool I’d like to see someone build, so I could use it.
This would be a software tool, and the user interface would have two columns. In one column, I write. Could be natural language (like google docs), or code (like a normal IDE), or latex (like overleaf), depending on what use-case the tool-designer wants to focus on. In the other column, a language and/or image model provides local annotations for each block of text. For instance, the LM’s annotations might be:
(Natural language or math use-case:) Explanation or visualization of a mental picture generated by the main text at each paragraph
(Natural language use-case:) Emotional valence at each paragraph
(Natural language or math use-case:) Some potential objections tracked at each paragraph
(Code:) Fermi estimates of runtime and/or memory usage
This is the sort of stuff I need to track mentally in order to write high-quality posts/code/math, so it would potentially be very high value to externalize that cognition.
Also, the same product could potentially be made visible to readers (for the natural language/math use-cases) to make more visible the things the author intends to be mentally tracked. That, in turn, would potentially make it a lot easier for readers to follow e.g. complicated math.
I haven’t experimented very much, but here’s one example prompt.
Please describe what you mentally picture when reading the following block of text:
“ A Shutdown Problem Proposal
First things first: this is not (yet) aimed at solving the whole corrigibility problem, or even the whole shutdown problem.
The main thing this proposal is intended to do is to get past the barriers MIRI found in their old work on the shutdown problem. In particular, in a toy problem basically-identical to the one MIRI used, we want an agent which:
Does not want to manipulate the shutdown button Does respond to the shutdown button Does want to make any child-agents it creates responsive-but-not-manipulative to the shutdown button, recursively (i.e. including children-of-children etc) If I understand correctly, this is roughly the combination of features which MIRI had the most trouble achieving simultaneously. ”
This one produced basically-decent results from GPT-4.
Although I don’t have the exact prompt on hand at the moment, I’ve also asked GPT-4 to annotate a piece of code line-by-line with a Fermi estimate of its runtime, which worked pretty well.
Don’t really need comments which are non-obvious to an expert. Part of what makes LLMs well-suited to building external cognitive tools is that external cognitive tools can create value by just tracking “obvious” things, thereby freeing up the user’s attention/working memory for other things.
So kinda like spellcheckers (most typos you could figure out, but why spend time and attention on proofreading if the program can do that for you), but… thought-checkers.
Like, if a part of your article contradicts another part, it would be underlined.
I’ve long wanted this, but it’s not clear how to do it. Long-context LLMs are still expensive and for authors who need it most, context windows are still too small: me or Yudkowsky, for example, would still exceed the context window of almost all LLMs except possibly the newest Gemini. And then you have their weak reasoning. You could try to RAG it, but embeddings are not necessarily tuned to encode logically contradictory or inconsistent claims: probably if I wrote “the sky is blue” in one place and “the sky is red” in another, a retrieval would be able to retrieve both paragraphs and a LLM point out that they are contradictory, but such blatant contradictions are probably too rare to be useful to check for. You want something more subtle, like where you say “the sky is blue” and elsewhere “I looked up from the ground and saw the color of apples”. You could try to brute force it and consider every pairwise comparison of 2 reasonable sized chunks of text and ask for contradictions, but this is quadratic and will get slow and expensive and probably turn up too many false positives. (And how do you screen off false positives and mark them ‘valid’?)
My general thinking these days is that these truly useful ‘tools for thought’ LLMs are going to require either much better & cheaper LLMs, so smart that they can provide useful assistance despite being used in a grossly unnatural way input-wise or safety-tuned to hell, or biting the bullet of finetuning/dynamic-evaluation (see my Nenex proposal).
A LLM finetuned on my corpus can hope to quickly find, with good accuracy, contradictions because it was trained to know ‘the sky was blue’ when I wrote that at the beginning of the corpus, and it gets confused when it hits ‘the color of ____’ and it gets the prediction totally wrong. And RAG on an embedding tailored to the corpus can hope to surface the contradictions because it sees the two uses are the same in the essays’ context, etc. (And if you run them locally, and they don’t need a large context window because of the finetuning, they will be fast and cheap, so you can more meaningfully apply the brute force approach; or you could just run multiple epoches on your data, with an auxiliary prompt asking for a general critique, which would cover contradictions. ‘You say here X, but don’t I recall you saying ~X back at the beginning? What gives?’)
Feed it a shorter text (that fits in the window) and ask it to provide a short summary focusing on factual statements. Then hopefully all short versions could fit in the window. Find the contradiction—report the two contradicting factual statements and which section they appeared in. Locate the statement in the original text.
I may have. Just gwern.net is, I think, somewhere around 2m, and it’s not comprehensive. Also, for contradictions, I would want to detect contradictions against citations/references as well (detecting miscitations would be more important than self-consistency IMO), and as a rough ballpark, the current Gwern.net annotation* corpus is approaching 4.3m words, looks like, and is also not comprehensive. So, closer than one might think! (Anyway, doesn’t deal with the cost or latency: as you can see in the demos, we are talking minutes, not seconds, for these million-token calls and the price is probably going to be in the dollar+ regime per call.)
* which are not fulltext. It would be nice to throw in all of the hosted paper & book & webpage fulltexts, but then that’s probably more like 200m+ words.
There may not be any ‘clear’ technical obstruction, but it has failed badly in the past. ‘Add more parallelism’ (particularly hierarchically) is one of the most obvious ways to improve attention, and people have spent the past 5 years failing to come up with efficient attentions that do anything but move along a Pareto frontier from ‘fast but doesn’t work’ to ‘slow and works only as well as the original dense attention’. It’s just inherently difficult to know what tokens you will need across millions of tokens without input from all the other tokens (unless you are psychic), implying extensive computation of some sort, which makes things inherently serial and costs you latency, even if you are rich enough to spend compute like water. You’ll note that when Claude-2 was demoing the ultra-long attention windows, it too spent a minute or two churning. While the most effective improvements in long-range attention like Flash Attention or Ring Attention are just hyperoptimizing dense attention, which is inherently limited.
I’ve long been very suspicious of aggregate economic measures like GDP. But GDP is clearly measuring something, and whatever that something is it seems to increase remarkably smoothly despite huge technological revolutions. So I spent some time this morning reading up and playing with numbers and generally figuring out how to think about the smoothness of GDP increase.
Major takeaways:
When new tech makes something previously expensive very cheap, GDP mostly ignores it. (This happens in a subtle way related to how we actually compute it.)
Historical GDP curves mainly measure things which are expensive ~now. Things which are cheap now are mostly ignored. In other words: GDP growth basically measures the goods whose production is revolutionized the least.
Re: AI takeoff, the right way to extrapolate today’s GDP curve to post-AI is to think about things which will still be scarce post-AI, and then imagine the growth of production of those things.
Even a very sharp, economically-revolutionary AI takeoff could look like slow smooth GDP growth, because GDP growth will basically only measure the things whose production is least revolutionized.
Why am I harping on about technicalities of GDP? Well, I hear about some AI forecasts which are heavily based on the outside view that economic progress (as measured by GDP) is smooth, and this is so robust historically that we should expect it to continue going forward. And I think this is basically right—GDP, as we actually compute it, is so remarkably smooth that we should expect that to continue. Alas, this doesn’t tell us very much about how crazy or sharp AI takeoff will be, because GDP (as we actually compute it) systematically ignores anything that’s revolutionized.
In writing How much should we value life?, I spent some time digging into AI timeline stuff. It lead me to When Will AI Be Created?, written by Luke Muehlhauser for MIRI. He noted that there is reason not to trust expert opinions on AI timelines, and that trend extrapolation may be a good alternative. This point you’re making about GDP seems like it is real progress towards coming up with a good way to do trend extrapolation, and thus seems worth a full post IMO. (Assuming it isn’t already well known by the community or something, which I don’t get the sense is the case.)
My first reaction to the framing of the paper is to ask: growth in what? It’s important to keep in mind that concepts like “gross domestic product” and “world gross domestic product” were defined from an explicit anthropocentric perspective—they measure the total production of final goods within a certain time period. Final goods are what is either consumed by humans (e.g. food or human services) or what is invested into “capital goods” that last for multiple periods (e.g. a server farm) to produce consumption goods for humans.
Now imagine you are a highly intelligent AI system running on the cloud. Although the production of the server farms on which you depend enters into human GDP (as a capital good), most of the things that you absorb, for example energy, server maintenance, etc., count as “intermediate goods” in our anthropocentric accounting systems and do not contribute to human GDP. In fact, to the extent that the AI system drives up the price of scarce resources (like energy) consumed by humans, real human GDP may even decline.
As a result, it is conceivable (and, to be honest, one of the central scenarios for me personally) that an AI take-off occurs but anthropocentric GDP measures show relative stagnation in the human economy.
To make this scenario a bit more tangible, consider the following analogy: imagine a world in which there are two islands trading with each other, but the inhabitants of the islands are very different from each other—let’s call them humans and AIs. The humans sell primitive goods like oil to the AIs and their level of technology is relatively stagnant. The AIs sell amazing services to the humans, and their level of technology doubles every year. However, the AI services that humans consume make up only a relatively small part of the human consumption basket. The humans are amazed at what fantastic services they get from the AIs in exchange for their oil, and they experience improvements in their standard of living from these fantastic AI services, although they also have to pay more and more for their energy use every year, which offsets part of that benefit. The humans can only see what’s happening on their own island and develop a measure of their own well-being that they call human GDP, which increases modestly because the advances only occur in a relatively small part of their consumption basket. The AIs can see what’s going on on the AI island and develop a measure of their own well-being which they call AI GDP, and which almost doubles every year. The system can go on like this indefinitely.
For a fuller discussion of these arguments, let me refer you to my working paper on “The Rise of Artificially Intelligent Agents” (with the caveat that the paper is still a working draft).
In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on “objective” metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.
Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.
I’ve been running ELISA tests all week. In the first test, I did not detect stronger binding to any of the peptides than to the control in any of several samples from myself or my girlfriend. But the control itself was looking awfully suspicious, so I ran another couple tests. Sure enough, something in my samples is binding quite strongly to the control itself (i.e. the blocking agent), which is exactly what the control is supposed to not do. So I’m going to try out some other blocking agents, and hopefully get an actually-valid control group.
(More specifics on the test: I ran a control with blocking agent + sample, and another with blocking agent + blank sample, and the blocking agent + sample gave a strong positive signal while the blank sample gave nothing. That implies something in the sample was definitely binding to both the blocking agent and the secondary antibodies used in later steps, and that binding was much stronger than the secondary antibodies themselves binding to anything in the blocking agent + blank sample.)
In other news, the RadVac team released the next version of their recipe + whitepaper. Particularly notable:
… many people who have taken the nasal vaccine are testing negative for serum antibodies with commercial and lab ELISA tests, while many who inject the vaccine (subcutaneous or intramuscular) are testing positive (saliva testing appears to be providing evidence of mucosal response among a subset of researchers who have administered the vaccine intranasally).
Note that they’re talking specifically about serum (i.e. blood) antibodies here. So apparently injecting it does induce blood antibodies of the sort detectable by commercial tests (at least some of the time), but snorting it mostly just produces mucosal antibodies (also at least some of the time).
This is a significant update: most of my prior on the vaccine working was based on vague comments in the previous radvac spec about at least some people getting positive test results. But we didn’t know what kind of test results those were, so there was a lot of uncertainty about exactly what “working” looked like. In particular, we didn’t know whether antibodies were induced in blood or just mucus, and we didn’t know if they were induced consistently or only in some people (the latter of which is the “more dakka probably helps” world). Now we know that it’s mostly just mucus (at least for nasal administration). Still unsure about how consistently it works—the wording in the doc makes it sound like only some people saw a response, but I suspect the authors are just hedging because they know there’s both selection effects and a lot of noise in the data which comes back to them.
The latest version of the vaccine has been updated to give it a bit more kick—slightly higher dose, and the chitosan nanoparticle formula has been changed in a way which should make the peptides more visible to the immune system. Also, the list of peptides has been trimmed down a bit, so the latest version should actually be cheaper, though the preparation is slightly more complex.
Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.
I had a shortform post pointing out the recent big jump in new businesses in the US, and Gwern replied:
How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the ‘making fast food in a stall in a Third World country’ sort of ‘startup’, which make essentially no or negative long-term contributions).
This was a good question in context, but I disagree with Gwern’s model of where-progress-comes-from, especially in the context of small businesses.
Let’s talk ice-cream cones.
As the story goes, an ice-cream vendor was next door to a waffle vendor at the 1904 World’s Fair. At some point, the ice-cream vendor ran short on paper cups, and inspiration struck. He bought some thin waffles from the waffle vendor, rolled them into cones, and ice-cream cones took off.
That’s just the first step. From there, the cone spread memetically. People heard about it, and either asked for cones (on the consumer side) or tried making them (on the supplier side).
Insight + Memetics → Better Food
When I compare food today to the stuff my grandparents ate, there’s no comparison. Today’s dishes are head and shoulders better. Partly it’s insights like ice-cream cones, partly it’s memetic spread of dishes from more parts of the world (like sisig, soup dumplings, ropa vieja, chicken Karahi, …).
Those little fast-food stalls? They’re powerhouses of progress. It’s a hypercompetitive market, with low barriers to entry, and lots of repeat business. The conditions are ideal for trying out new dishes, spreading culinary ideas and finding out the hard way what people like to eat. That doesn’t mean they’re highly profitable—culinary innovation spreads memetically, so it’s hard to capture the gains. But progress is made.
The pandemic also has the effect of showing the kind of business ideas people try. It pushes a lot of innovation in food delivery. Some of the pandemic driver innovation will become worthless once the pandemic is over but a few good ideas likely survive and the old ideas of the businesses that went out of business are still around.
Does The Information-Throughput-Maximizing Input Distribution To A Sparsely-Connected Channel Satisfy An Undirected Graphical Model?
[EDIT: Never mind, proved it.]
Suppose I have an information channel X→Y. The X components X1,...,Xm and the Y components Y1,...,Yn are sparsely connected, i.e. the typical Yi is downstream of only a few parent X-components Xpa(i). (Mathematically, that means the channel factors as P[Y|X]=∏iP[Yi|Xpa(i)].)
Now, suppose I split the Y components into two sets, and hold constant any X-components which are upstream of components in both sets. Conditional on those (relatively few) X-components, our channel splits into two independent channels.
E.g. in the image above, if I hold X4 constant, then I have two independent channels: (X1,X2,X3)→(Y1,Y2,Y3,Y4) and (X5,X6,X7)→(Y5,Y6,Y7,Y8).
Now, the information-throughput-maximizing input distribution to a pair of independent channels is just the product of the throughput maximizing distributions for the two channels individually. In other words: for independent channels, we have independent throughput maximizing distribution.
So it seems like a natural guess that something similar would happen in our sparse setup.
Conjecture: The throughput-maximizing distribution for our sparse setup is independent conditional on overlapping X-components. E.g. in the example above, we’d guess that P[X]=P[X4]P[X1,X2,X3|X4]P[X5,X6,X7|X4] for the throughput maximizing distribution.
If that’s true in general, then we can apply it to any Markov blanket in our sparse channel setup, so it implies that P[X] factors over any set of X components which is a Markov blanket splitting the original channel graph. In other words: it would imply that the throughput-maximizing distribution satisfies an undirected graphical model, in which two X-components share an edge if-and-only-if they share a child Y-component.
It’s not obvious that this works mathematically; information throughput maximization (i.e. the optimization problem by which one computes channel capacity) involves some annoying coupling between terms. But it makes sense intuitively. I’ve spent less than an hour trying to prove it and mostly found it mildly annoying though not clearly intractable. Seems like the sort of thing where either (a) someone has already proved it, or (b) someone more intimately familiar with channel capacity problems than I am could easily prove it.
So: anybody know of an existing proof (or know that the conjecture is false), or find this conjecture easy to prove themselves?
Specifically, we’ll show that there exists an information throughput maximizing distribution which satisfies the undirected graph. We will not show that all optimal distributions satisfy the undirected graph, because that’s false in some trivial cases—e.g. if all the Y’s are completely independent of X, then all distributions are optimal. We will also not show that all optimal distributions factor over the undirected graph, which is importantly different because of the P[X]>0 caveat in the Hammersley-Clifford theorem.
First, we’ll prove the (already known) fact that an independent distribution P[X]=P[X1]P[X2] is optimal for a pair of independent channels (X1→Y1,X2→Y2); we’ll prove it in a way which will play well with the proof of our more general theorem. Using standard information identities plus the factorization structure Y1−X1−X2−Y2 (that’s a Markov chain, not subtraction), we get
MI(X;Y)=MI(X;Y1)+MI(X;Y2|Y1)
=MI(X;Y1)+(MI(X;Y2)−MI(Y2;Y1)+MI(Y2;Y1|X))
=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1)
Now, suppose you hand me some supposedly-optimal distribution P[X]. From P, I construct a new distribution Q[X]:=P[X1]P[X2]. Note that MI(X1;Y1) and MI(X2;Y2) are both the same under Q as under P, while MI(Y2;Y1) is zero under Q. So, because MI(X;Y)=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1), the MI(X;Y) must be at least as large under Q as under P. In short: given any distribution, I can construct another distribution with as least as high information throughput, under which X1 and X2 are independent.
Now let’s tackle our more general theorem, reusing some of the machinery above.
I’ll split Y into Y1 and Y2, and split X into X1−2 (parents of Y1 but not Y2), X2−1 (parents of Y2 but not Y1), and X1∩2 (parents of both). Then
MI(X;Y)=MI(X1∩2;Y)+MI(X1−2,X2−1;Y|X1∩2)
In analogy to the case above, we consider distribution P[X], and construct a new distribution Q[X]:=P[X1∩2]P[X1−2|X1∩2]P[X2−1|X1∩2]. Compared to P, Q has the same value of MI(X1∩2;Y), and by exactly the same argument as the independent case MI(X1−2,X2−1;Y|X1∩2) cannot be any higher under Q; we just repeat the same argument with everything conditional on X1∩2 throughout. So, given any distribution, I can construct another distribution with at least as high information throughput, under which X1−2 and X2−1 are independent given X1∩2.
Since this works for any Markov blanket X1∩2, there exists an information thoughput maximizing distribution which satisfies the desired undirected graph.
I suppose another way to look at this is the overlapping components are the blanket states in some kind of time dependent markov blanket setup, right?
In the scenario you created you could treat x1,x2,x3as the some shielded state at time step t, so it. Then x5,x6,x7 are states outside of the blanket, so et (which group of states is i and which is e don’t really matter, so long as they are on either side of the blanket). y1,y2,y3,y4[1]become it+1, and y5,y6,y7,y8 become et+1.
Then x4 becomes the blanket bt such that
I(it+1,et+1|bt)≈0
and
P(it+1,et+1|it,et,bt)=P(it+1|it,bt)⋅P(et+1|et,bt)
With all that implies. In fact you can just as easily have three shielded states, or four, using this formulation.
(the setup for this is shamelessly ripped off from @Gunnar_Zarncke ’s unsupervised agent detection work)
(Was in the middle of writing a proof before noticing you did it already)
I believe the end result is that if we have Y=(Y1,Y2), X=(X1,X2,X3) with P(Y|X)=P(Y1|X1,X3)P(Y2|X2,X3) (X1 upstream of Y1, X2 upstream of Y2, X3 upstream of both),
then maximizing I(X;Y) is equivalent to maximizing I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).
& for the proof we can basically replicate the proof for additivity except substituting the factorization P(X1,X2,X3)=P(X3)P(X1|X3)P(X2|X3) as assumption in place of independence, then both directions of inequality will result in I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).
[EDIT: Forgot −I(Y1;Y2) term due to marginal dependence P(Y1,Y2)≠P(Y1)P(Y2)]
Does anyone know of an “algebra for Bayes nets/causal diagrams”?
More specifics: rather than using a Bayes net to define a distribution, I want to use a Bayes net to state a property which a distribution satisfies. For instance, a distribution P[X, Y, Z] satisfies the diagram X → Y → Z if-and-only-if the distribution factors according to P[X, Y, Z] = P[X] P[Y|X] P[Z|Y].
When using diagrams that way, it’s natural to state a few properties in terms of diagrams, and then derive some other diagrams they imply. For instance, if a distribution P[W, X, Y, Z] satisfies all of:
W → Y → Z
W → X → Y
X → (W, Y) → Z
… then it also satisfies W → X → Y → Z.
What I’m looking for is a set of rules for “combining diagrams” this way, without needing to go back to the underlying factorizations in order to prove things.
David and I have been doing this sort of thing a lot in our work the past few months, and it would be nice if someone else already had a nice write-up of the rules for it.
Turns out my laser thermometer is all over the map. Readings would change by 10°F if I went outside and came back in. My old-school thermometer is much more stable (and well-calibrated, based on dipping it in some ice water), but slow and caps out around 90°F (so I can’t use to measure e.g. exhaust temp). I plan to buy a bunch more old-school thermometers for the next try.
I thought opening the doors/windows in rooms other than the test room and setting up a fan would be enough to make the temperature in the hall outside the test room close to outdoor temp. This did not work; hall temp was around 72°F with outside around 80°F. I’ll need to change that part of the experiment design; most likely I’ll seal around the door and let air infiltrate exclusively from the window instead. (The AC is right next to the window, so this could screw with the results, but I don’t really have a better option.)
In two-hose mode, the AC hit its minimum temperature of 60°F, so I’ll need a hotter day. I’ll try again when we hit at least 85°F.
In case anyone’s wondering: in one-hose mode, the temperature in the room equilibrated around 66°F. Power consumption was near-constant throughout all conditions.
One additional Strange Observation: cool air was blowing out under the door of the test room in two-hose mode. This should not happen; my best guess is that, even though the AC has two separate intake vents, the two are not actually partitioned internally, so the fan for indoor-air was pulling in outdoor-air (causing air to blow out under the door to balance that extra inflow). Assuming that’s the cause, it should be fixable with some strategically-placed cardboard inside the unit.
Huh, amusing. We do ship a font that has nothing but the greek letter set in it, because people use greek unicode symbols all the time and our primary font doesn’t support that character set. So my guess is that’s where Google gets confused.
The math and physics worlds still use single-letter variable names for everything, decades after the software world realized that was extremely bad practice. This makes me pessimistic about the adoption of better notation practices.
Better? I doubt it. If physicists wrote equations the way programmers write code, a simple homework problem would easily fill ten pages.
Verboseness works for programmers because programmers rarely need to do anything more complicated with their code than run it—analogous to evaluating an expression, for a physicist or mathematician. Imagine if you needed to prove one program equivalent to another algebraically—i.e. a sequence of small transformations, with a record of intermediate programs derived along the way in order to show your work. I expect programmers subjected to such a use-case would quickly learn the virtues of brevity.
Related to that: You have much fewer variables under consideration that you can even have standard names for. A remnant of this effect can be seen in typical Fortan programs.
Yeah, I’m apparently not intelligent enough to do error-free physics/engineering calculations without relying on dimensional analysis as a debugging tool. I even came up with a weird, hack-y way to do that in computing environments like Excel and Cython, where flexible multiplicative types are not supported.
Is interpersonal variation in anxiety levels mostly caused by dietary iron?
I stumbled across this paper yesterday. I haven’t looked at it very closely yet, but the high-level pitch is that they look at genetic predictors of iron deficiency and then cross that with anxiety data. It’s interesting mainly because it sounds pretty legit (i.e. the language sounds like direct presentation of results without any bullshitting, the p-values are satisfyingly small, there’s no branching paths), and the effect sizes are BIG IIUC:
The odd ratios (OR) of anxiety disorders per 1 standard deviation (SD) unit increment in iron status biomarkers were 0.922 (95% confidence interval (CI) 0.862–0.986; p = 0.018) for serum iron level, 0.873 (95% CI 0.790–0.964; p = 0.008) for log-transformed ferritin and 0.917 (95% CI 0.867–0.969; p = 0.002) for transferrin saturation. But no statical significance was found in the association of 1 SD unit increased total iron-binding capacity (TIBC) with anxiety disorders (OR 1.080; 95% CI 0.988–1.180; p = 0.091). The analyses were supported by pleiotropy test which suggested no pleiotropic bias.
Odds ratio of anxiety disorders changes by roughly 0.9 per standard deviation in iron level, across four different measures of iron level. (Note that TIBC, the last of the four iron level measures, didn’t hit statistical significance but did have a similar effect size to the other three.)
Just eyeballing those effect sizes… man, it kinda sounds like iron levels are maybe the main game for most anxiety? Am I interpreting that right? Am I missing something here?
EDIT: I read more, and it turns out the wording of the part I quoted was misleading. The number 0.922, for instance, was the odds ratio AT +1 standard deviation serum iron level, not PER +1 standard deviation serum iron level. That would be −0.078 PER standard deviation serum iron level, so it’s definitely not the “main game for most anxiety”.
Have you tested this hypothesis on your friends? Ask them for their iron level from last blood test, and ask them to self-report anxiety level (you also make a separate estimate of their anxiety level).
I keep seeing news outlets and the like say that SORA generates photorealistic videos, can model how things move in the real world, etc. This seems like blatant horseshit? Every single example I’ve seen looks like video game animation, not real-world video.
Have I just not seen the right examples, or is the hype in fact decoupled somewhat from the model’s outputs?
I think all of these videos other than the octopus and paper planes are “at-a-glance” photorealistic to me.
Overall, I think SORA can do “at-a-glance” photorealistic videos and can model to some extent how things move in the real world. I don’t think it can do both complex motion and photorealism in the same video. As in, the videos which are photorealistic don’t really involve complex motion and the videos which involve complex motion aren’t photorealistic.
(So probably some amount of hype, but also pretty real?)
Hmm, I don’t buy it. These two scenes seem very much not like the kind of thing a video game engine could produce:
Look at this frame! I think there is something very slightly off about that face, but the cat hitting the person’s face and the person’s reaction seem very realistic to me and IMO qualifies as “complex motion and photorealism in the same video”.
Yeah, this is the example I’ve been using to convince people that the game engines are almost certainly generating training data but are probably not involved at sampling time. I can’t come up with any sort of hybrid architecture like ‘NN controlling game-engine through API’ where you get that third front leg. One of the biggest benefits of a game-engine would be ensuring exactly that wouldn’t happen—body parts becoming detached and floating in mid-air and lack of conservation. If you had a game engine with a hyper-realistic cat body model in it which something external was manipulating, one of the biggest benefits is that you wouldn’t have that sort of common-sense physics problem. (Meanwhile, it does look like past generative modeling of cats in its errors. Remember the ProGAN interpolation videos of CATS? Hilarious, but also an apt demonstration of how extremely hard cats are to model. They’re worse than hands.)
In addition, you see plenty of classic NN tells throughout—note the people driving a ‘Dandrover’...
Yeah, those were exactly the two videos which most made me think that the model was mostly trained on video game animation. In the tokyo one, the woman’s facial muscles never move at all, even when the camera zooms in on her. And in the SUV one, the dust cloud isn’t realistic, but even covering that up the SUV has a Grand Theft Auto look to its motion.
“Can’t do both complex motion and photorealism in the same video” is a good hypothesis to track, thanks for putting that one on my radar.
Putting this here for posterity: I have thought since the superconductor preprint went up, and continue to think, that the markets are putting generally too little probability on the claims being basically-true. I thought ~70% after reading the preprint the day it went up (and bought up a market on manifold to ~60% based on that, though I soon regretted not waiting for a better price), and my probability has mostly been in the 40-70% range since then.
Languages should have tenses for spacelike separation. My friend and I do something in parallel, it’s ambiguous/irrelevant which one comes first, I want to say something like “I expect my friend <spacelike version of will do/has done/is doing> their task in such-and-such a way”.
That sounds more like a tenseless sentence than using a spacelike separation tense. Your friend’s performance of the task may well be in your future or past lightcone (or extend through both), but you don’t wish to imply any of these.
There are languages with tenseless verbs, as well as some with various types of spatial tense.
The closest I can approximate this in English without clumsy constructs is “I expect my friend does their task in such-and-such a way”, which I agree isn’t very satisfactory.
Two kinds of cascading catastrophes one could imagine in software systems...
A codebase is such a spaghetti tower (and/or coding practices so bad) that fixing a bug introduces, on average, more than one new bug. Software engineers toil away fixing bugs, making the software steadily more buggy over time.
Software services managed by different groups have dependencies—A calls B, B calls C, etc. Eventually, the dependence graph becomes connected enough and loopy enough that a sufficiently-large chunk going down brings down most of the rest, and nothing can go back up until everything else goes back up (i.e. there’s circular dependence/deadlock).
How could we measure how “close” we are to one of these scenarios going supercritical?
For the first, we’d need to have attribution of bugs—i.e. track which change introduced each bug. Assuming most bugs are found and attributed after some reasonable amount of time, we can then estimate how many bugs each bug fix introduces, on average.
(I could also imagine a similar technique for e.g. medicine: check how many new problems result from each treatment of a problem.)
For the second, we’d need visibility into codebases maintained by different groups, which would be easy within a company but much harder across companies. In principle, within a company, some kind of static analysis tool could go look for all the calls to apis between services, map out the whole graph, and then calculate which “core” pieces could be involved in a catastrophic failure.
(Note that this problem could be mostly-avoided by intentionally taking down services occasionally, so engineers are forced to build around that possibility. I don’t think any analogue of this approach would work for the first failure-type, though.)
I mean, just to be clear, I am all in favor of intellectual progress. But doing so indiscriminately does sure seem a bit risky in this world of anthropogenic existential risks. Reminds me of my mixed feelings on the whole Progress Studies thing.
Yeah, I wouldn’t want to accelerate e.g. black-box ML. I imagine the real utility of such a fund would be to experiment with ways to accelerate intellectual progress and gain understanding of the determinants, though the grant projects themselves would likely be more object-level than that. Ideally the grants would be in areas which are not themselves very risk-relevant, but complicated/poorly-understood enough to generate generalizable insights into progress.
I think it takes some pretty specific assumptions for such a thing to increase risk significantly on net. If we don’t understand the determinants of intellectual progress, then we have very little ability to direct progress where we want it; it just follows whatever the local gradient is. With more understanding, at worst it follows the same gradient faster, and we end up in basically the same spot.
The one way it could net-increase risk is if the most likely path of intellectual progress leads to doom, and the best way to prevent doom is through some channel other than intellectual progress (like political action, for instance). Then accelerating the intellectual progress part potentially gives the other mechanisms (like political bodies) less time to react. Personally, though, I think a scenario in which e.g. political action successfully prevents intellectual progress from converging to doom (in a world where it otherwise would have) is vanishingly unlikely (like, less than one-in-a-hundred, maybe even less than one-in-a-thousand).
You might check out Donald Braben’s view, it says “transformative research” (i.e. fundamental results that create new fields and industries) is critical for the survival of civilization. He does not worry that transformative results might end civilization.
For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.
For instance: suppose I’m thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to Ninfected. But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to Ninfected. So, multiplying those two together, I’ll get a number roughly independent of Ninfected.
How general is this? Does some version of it apply to long-term scenarios too (possibly accounting for herd immunity)? What short-term decisions do depend on Ninfected?
Way back in the halcyon days of 2005, a company called Cenqua had an April Fools’ Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I’m wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there’s now a clear market leader for that particular product niche, but for real.
Here’s an interesting problem of embedded agency/True Names which I think would make a good practice problem: formulate what it means to “acquire” something (in the sense of “acquiring resources”), in an embedded/reductive sense. In other words, you should be able-in-principle to take some low-level world-model, and a pointer to some agenty subsystem in that world-model, and point to which things that subsystem “acquires” and when.
Some prototypical examples which an answer should be able to handle well:
Organisms (anything from bacteria to plant to animals) eating things, absorbing nutrients, etc.
...and how the brain figures this out and why it is motivated to do so. There are a lot of simple animals that apparently “try to control” resources or territory. How?
Drives to control resources occur everywhere. And your control of resources is closely related to your dominance in a dominance hierarchy. Which seems to be regulated in many animals by serotonin. See e.g. https://www.nature.com/articles/s41386-022-01378-2
This billboard sits over a taco truck I like, so I see it frequently:
The text says “In our communities, Kaiser Permanente members are 33% less likely to experience premature death due to heart disease.*”, with the small-text directing one to a url.
The most naive (and presumably intended) interpretation is, of course, that being a Kaiser Permanente member provides access to better care, causing 33% lower chance of death due to heart disease.
Now, I’d expect most people reading this to immediately think something like “selection effects!”—i.e. what the billboard really tells us is that Kaiser Permanente has managed to select healthier-than-typical members. And indeed, that was my immediate thought.
… but then I noticed that the “selection effects” interpretation is also a trap for the unwary. After all, this is a number on a billboard. Number. Billboard. The overriding rule for numbers on billboards is that they are bullshit. The literal semantics of “Kaiser Permanente members are 33% less likely to experience premature death due to heart disease” just don’t have all that much to do at all with the rate at which various people die of heart disease.
What it does tell us is that someone at Kaiser Permanente thought it would be advantageous to claim, to people seeing this billboard, that Kaiser Permanente membership reduces death from heart disease by 33%.
… and that raises a very different set of questions! Who, exactly, is this billboard advertising to? The phrase “for all that is you” suggests that it’s advertising to prospective members, as opposed to e.g. doctors or hospital admins or politicians or investors or Kaiser’s own employees. (There is a skyscraper full of Kaiser’s employees within view of this billboard.) Which would suggest that somebody at Kaiser thinks consumers make a nontrivial choice between Kaiser and alternatives sometimes, and that there’s value to be had in influencing that choice.
… though perhaps that thinking is also a trap, and in fact the sign is just a result of corporate stupidity. I don’t know.
the actual trap is that it caught your attention, you posted about it online and now more people know and think about Kaiser Permanente than before and according to whoever was in charge of making this billboard, that’s a success metric they can leverage for a promotion.
What it does tell us is that someone at Kaiser Permanente thought it would be advantageous to claim, to people seeing this billboard, that Kaiser Permanente membership reduces death from heart disease by 33%.
Is that what is does tell us? The sign doesn’t make the claim you suggest—it doesn’t claim it’s reducing the deaths from heart disease, it states it’s 33% less likely to be “premature”—which is probably a weaselly term here. But it clearly is not making any claims about reducing deaths from heart disease.
You seem to be projecting the conclusion that the claim/expected interpretation is that membership reduces the deaths by 33%. But I don’t know how you’re concluding that the marketing team thought that would be the general interpretation by those seeing the sign.
While I would not be incline to take an billboard ad at face value, a more reasonable take seems to me that claiming that even with heard disease KP’s members are less likely to die earlier than expect that other with other healthcare providers. That may be a provable and true claim or it might be more “puffing” and everyone will play with just how “premature” is going to be measured.
Whether or not it’s corporate stupidity, I think that might be a separate question but understanding exactly what results such an ad is supposed to be producing will matter a lot here. Plus, there is the old adage about no one every going bankrupt underestimating the intelligence of the American consumer—and I suspect that might go double in the case of medical/healthcare consumption.
“Kaiser Permanente members are younger and healthier, and thus consume fewer healthcare resources on average, which allows us to pass the savings on to you.”
That is unsurprising to me, since the overall gist of Rationalism is an attempt to factor uncertainty out of the near future, life, and thought itself.
This tells me you don’t know anything about LW-rationality or are being deliberately uncharitable to it.
You’re mostly making blanket broad claims, maybe make a top level post which is charitable to the entire project. Go in depth post by post on where you think people have gone wrong, and in what way. High effort posting is appreciated.
An interesting conundrum: one of the main challenges of designing useful regulation for AI is that we don’t have any cheap and robust way to distinguish a dangerous neural net from a non-dangerous net (or, more generally, a dangerous program from a non-dangerous program). This is an area where technical research could, in principle, help a lot.
The problem is, if there were some robust metric for how dangerous a net is, and that metric were widely known and recognized (as it would probably need to be in order to be used for regulatory purposes), then someone would probably train a net to maximize that metric directly.
This seems to lead to the solution of trying to make your metric one-way, in the sense that your metric should
Provide an upper-bound on the dangerousness of your network
Compress the space of networks which map to approximately the same dangerousness level on the low end of dangerousness, and expand the space of networks which map to approximately the same dangerousness level on the upper end of dangerous, so that you can train your network to minimize the metric, but when you train your network to maximize the metric you end up in a degenerate are with technically very high measured danger levels but in actuality very low levels of dangerousness.
We can hope (or possibly prove) that as you optimize upwards on the metric you get subject to goodheart’s curse, but the opposite occurs on the lower end.
Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn’t even need to train a model to maximize it. You’d be able to just lift the design from the metric directly.
Do you have any thoughts on a softer version of this problem, where the metric can’t be maximized directly, but gives a concrete idea of what sort of challenge your AI needs to beat to qualify as AGI? (And therefore in which direction in the architectural-design-space you should be moving.)
Some variation on this seems like it might work as a “fire alarm” test set, but as you point out, inasmuch as it’s recognized, it’ll be misapplied for benchmarking instead.
(I suppose the ideal way to do it would be to hand it off to e. g. ARC, so they can use it if OpenAI invites them for safety-testing again. This way, SOTA models still get tested, but the actors who might misuse it aren’t aware of the testing’s particulars until they succeed anyway...)
I just went looking for a good reference for the Kelly criterion, and didn’t find any on Lesswrong. So, for anybody who’s looking: chapter 6 of Thomas & Cover’s textbook on information theory is the best source I currently know of.
Neat problem of the week: we have n discrete random variables, X1...Xn. Given any variable, all variables are independent:
∀i:P[X|Xi]=∏jP[Xj|Xi]
Characterize the distributions which satisfy this requirement.
This problem came up while working on the theorem in this post, and (separately) in the ideas behind this post. Note that those posts may contain some spoilers for the problem, though frankly my own proofs on this one just aren’t very good.
I think a very common problem in alignment research today is that people focus almost exclusively on a specific story about strategic deception/scheming, and that story is a very narrow slice of the AI extinction probability mass. At some point I should probably write a proper post on this, but for now here are few off-the-cuff example AI extinction stories which don’t look like the prototypical scheming story. (These are copied from a Facebook thread.)
Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren’t smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
The “Getting What We Measure” scenario from Paul’s old “What Failure Looks Like” post.
The “fusion power generator scenario”.
Perhaps someone trains a STEM-AGI, which can’t think about humans much at all. In the course of its work, that AGI reasons that an oxygen-rich atmosphere is very inconvenient for manufacturing, and aims to get rid of it. It doesn’t think about humans at all, but the human operators can’t understand most of the AI’s plans anyway, so the plan goes through. As an added bonus, nobody can figure out why the atmosphere is losing oxygen until it’s far too late, because the world is complicated and becomes more so with a bunch of AIs running around and no one AI has a big-picture understanding of anything either (much like today’s humans have no big-picture understanding of the whole human economy/society).
People try to do the whole “outsource alignment research to early AGI” thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they’re already on the more-powerful next gen, so it’s too late.
The classic overnight hard takeoff: a system becomes capable of self-improving at all but doesn’t seem very alarmingly good at it, somebody leaves it running overnight, exponentials kick in, and there is no morning.
(At least some) AGIs act much like a colonizing civilization. Plenty of humans ally with it, trade with it, try to get it to fight their outgroup, etc, and the AGIs locally respect the agreements with the humans and cooperate with their allies, but the end result is humanity gradually losing all control and eventually dying out.
Perhaps early AGI involves lots of moderately-intelligent subagents. The AI as a whole mostly seems pretty aligned most of the time, but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight. (Think cancer, but more agentic.)
Perhaps the path to superintelligence looks like scaling up o1-style runtime reasoning to the point where we’re using an LLM to simulate a whole society. But the effects of a whole society (or parts of a society) on the world are relatively decoupled from the things-individual-people-say-taken-at-face-value. For instance, lots of people talk a lot about reducing poverty, yet have basically-no effect on poverty. So developers attempt to rely on chain-of-thought transparency, and shoot themselves in the foot.
Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it’s relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that’s mostly a bad thing; it’s a form of streetlighting.
Most of the problems you discussed here more easily permit hacky solutions than scheming does.
Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you’re not confident your list is complete.
True, but Buck’s claim is still relevant as a counterargument to my claim about memetic fitness of the scheming story relative to all these other stories.
This is an interesting point. I disagree that scheming vs these ideas you mention is much of a ‘streetlighting’ case. I do, however, have my own fears that ‘streetlighting’ is occurring and causing some hard-but-critical avenues of risk to be relatively neglected.
[Edit: on further thought, I think this might not just be a “streetlighting”effect, but also a “keeping my hands clean” effect. I think it’s more tempting, especially for companies, to focus on harms that could plausibly be construed as being their fault. It’s my impression that, for instance, employees of a given company might spend a disproportionate amount of time thinking about how to keep their company’s product from harming people vs the general class of products from harming people. Also, less inclined to think about harm which could be averted via application of their product. This is additional reason for concern that having the bulk of AI safety work being funded by / done in AI companies will lead to correlated oversights.]
My concerns that I think are relatively neglected in AI safety discourse are mostly related to interactions with incompetent or evil humans. Good alignment and control techniques don’t do any good if someone opts not to use them in some critical juncture.
Some potential scenarios:
If AI is very powerful, and held in check tenuously by fragile control systems, it might be released from control by a single misguided human or some unlucky chain of events, and then go rogue.
If algorithmic progress goes surprisingly quickly, we might find ourselves in a regime where a catastrophically dangerous AI can be assembled from some mix of pre-existing open-weights models, plus fine-tuning, plus new models trained with new algorithms, and probably all stitched together with hacky agent frameworks. Then all it would take would be for sufficient hints about this algorithmic discovery to leak, and someone in the world to reverse-engineer it, and then there would be potent rogue AI all over the internet all of a sudden.
If the AI is purely intent-aligned, a bad human might use it to pursue broad coercive power.
Narrow technical AI might unlock increasingly powerful and highly offense-dominant technology with lower and lower activation costs (easy to build and launch with common materials). Even if the AI itself never got out of hand, if the dangerous tech secrets got leaked (or controlled by an aggressive government) then things could go very poorly for the world.
IMO the main argument for focusing on scheming risk is that scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful (as I discuss here). These other problems all seem like they require the models to be way smarter in order for them to be a big problem. Though as I said here, I’m excited for work on some non-scheming misalignment risks.
Seems quite wrong. The main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful is that they cause more powerful AIs to be built which will eventually be catastrophic, but which have problems that are not easily iterable-upon (either because problems are hidden, or things move quickly, or …).
And causing more powerful AIs to be built which will eventually be catastrophic is not something which requires a great deal of intelligent planning; humanity is already racing in that direction on its own, and it would take a great deal of intelligent planning to avert it. This story, for example:
People try to do the whole “outsource alignment research to early AGI” thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they’re already on the more-powerful next gen, so it’s too late.
This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you’re talking about (“the first AIs that either pose substantial misalignment risk or that are extremely useful”), but the catastropic risk does not come from that AI scheming. It comes from people being dumb by default, the AI making them think it’s ok (without particularly strategizing to do so), and then people barreling ahead until it’s too late.
Also seems false? Some of the relevant stories:
As mentioned above, the “outsource alignment to AGI” failure-story was about exactly the level of AI you’re talking about.
In worlds where hard takeoff naturally occurs, it naturally occurs when AI is just past human level in general capabilities (and in particular AI R&D), which I expect is also roughly the same level you’re talking about (do you disagree with that?).
The story about an o1-style AI does not involve far possibilities and would very plausibly kick in at-or-before the first AIs that either pose substantial misalignment risk or that are extremely useful.
A few of the other stories also seem debatable depending on trajectory of different capabilities, but at the very least those three seem clearly potentially relevant even for the first highly dangerous or useful AIs.
This problem seems important (e.g. it’s my last bullet here). It seems to me much easier to handle, because if this problem is present, we ought to be able to detect its presence by using AIs to do research on other subjects that we already know a lot about (e.g. the string theory analogy here). Scheming is the only reason why the model would try to make it hard for us to notice that this problem is present.
A few problems with this frame.
First: you’re making reasonably-pessimistic assumptions about the AI, but very optimistic assumptions about the humans/organization. Sure, someone could look for the problem by using AIs to do research on other subject that we already know a lot about. But that’s a very expensive and complicated project—a whole field, and all the subtle hints about it, need to be removed from the training data, and then a whole new model trained! I doubt that a major lab is going to seriously take steps much cheaper and easier than that, let alone something that complicated.
One could reasonably respond “well, at least we’ve factored apart the hard technical bottleneck from the part which can be solved by smart human users or good org structure”. Which is reasonable to some extent, but also… if a product requires a user to get 100 complicated and confusing steps all correct in order for the product to work, then that’s usually best thought of as a product design problem, not a user problem. Making the plan at least somewhat robust to people behaving realistically less-than-perfectly is itself part of the problem.
Second: looking for the problem by testing on other fields itself has subtle failure modes, i.e. various ways to Not Measure What You Think You Are Measuring. A couple off-the-cuff examples:
A lab attempting this strategy brings in some string theory experts to evaluate their attempts to rederive string theory with AI assistance. But maybe (as I’ve heard claimed many times) string theory is itself an empty echo-chamber, and some form of sycophancy or telling people what they want to hear is the only way this AI-assisted attempt gets a good evaluation from the string theorists.
It turns out that fields-we-don’t-understand mostly form a natural category distinct from fields-we-do-understand, or that we don’t understand alignment precisely because our existing tools which generalize across many other fields don’t work so well on alignment. Either of those would be a (not-improbable-on-priors) specific reason to expect that our experience attempting to rederive some other field does not generalize well to alignment.
And to be clear, I don’t think of these as nitpicks, or as things which could go wrong separately from all the things originally listed. They’re just the same central kinds of failure modes showing up again, and I expect them to generalize to other hacky attempts to tackle the problem.
Third: it doesn’t really matter whether the model is trying to make it hard for us to notice the problem. What matters is (a) how likely we are to notice the problem “by default”, and (b) whether the AI makes us more or less likely to notice the problem, regardless of whether it’s trying to do so. The first story at top-of-thread is a good central example here:
Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren’t smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
Generalizing that story to attempts to outsource alignment work to earlier AI: perhaps the path to moderately-capable intelligence looks like applying lots of search/optimization over shallow heuristics. If the selection pressure is sufficient, that system may well learn to e.g. be sycophantic in exactly the situations where it won’t be caught… though it would be “learning” a bunch of shallow heuristics with that de-facto behavior, rather than intentionally “trying” to be sycophantic in exactly those situations. Then the sycophantic-on-hard-to-verify-domains AI tells the developers that of course their favorite ideas for aligning the next generation of AI will work great, and it all goes downhill from there.
All 3 points seem very reasonable, looking forward to Buck’s response to them.
Additionally, I am curious to hear if Ryan’s views on the topic are similar to Buck’s, given that they work at the same organization.
One big reason I might expect an AI to do a bad job at alignment research is if it doesn’t do a good job (according to humans) of resolving cases where humans are inconsistent or disagree. How do you detect this in string theory research? Part of the reason we know so much about physics is humans aren’t that inconsistent about it and don’t disagree that much. And if you go to sub-topics where humans do disagree, how do you judge its performance (because ‘be very convincing to your operators’ is an objective with a different kind of danger).
Another potential red flag is if the AI gives humans what they ask for even when that’s ‘dumb’ according to some sophisticated understanding of human values. This could definitely show up in string theory research (note when some ideas suggest non-string-theory paradigms might be better, and push back on the humans if the humans try to ignore this), it’s just intellectually difficult (maybe easier in loop quantum gravity research heyo gottem) and not as salient without the context of alignment and human values.
I once counted several dozens of the ways how AI can cause human extinction, may be some ideas may help (map, text).
See also ‘The Main Sources of AI Risk?’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)
Another one: We manage to solve alignment to a significant extend. The AI who is much smarter than a human thinks that it is aligned, and takes aligned actions. The AI even predicts that it will never become unaligned to humans. However, at some point in the future as the AI naturally unrolles into a reflectively stable equilibrium it becomes unaligned.
I see a lot of discussion of AI doom stemming from research, business, and government / politics (including terrorism). Not a lot about AI doom from crime. Criminals don’t stay in the box; the whole point of crime is to benefit yourself by breaking the rules and harming others. Intentional creation of intelligent cybercrime tools — ecosystems of AI malware, exploit discovery, spearphishing, ransomware, account takeovers, etc. — seems like a path to uncontrolled evolution of explicitly hostile AGI, where a maxim of “discover the rules; break them; profit” is designed-in.
Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there’s the problem that people will find proposals that very likely don’t actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)
My initial reaction is that at least some of these points would be covered by the Guaranteed Safe AI agenda if that works out, right? Though the “AGIs act much like a colonizing civilization” situation does scare me because it’s the kind of thing which locally looks harmless but collectively is highly dangerous. It would require no misalignment on the part of any individual AI.
Some of the stories assume a lot of AIs, wouldn’t a lot of human-level AIs be very good at creating a better AI? Also it seems implausible to me that we will get a STEM-AGI that doesn’t think about humans much but is powerful enought to get rid of atmosphere. On a different note, evaluating plausability of scenarios is a whole different thing that basically very few people do and write about in AI safety.
That is a pretty reasonable assumption. AFAIK that is what the labs plan to do.
What I think is that there won’t be a time longer than 5 years where we have a lot of AIs and no super human AI. Basically that the first thing AIs will be used to will be self-improvement and quickly after reasonable ai agents we will get super human AI. Like 6 years.
This came from a Facebook thread where I argued that many of the main ways AI was described as failing fall into few categories (John disagreed).
I appreciated this list, but they strike me as fitting into a few clusters.
Personally, I like the focus “scheming” has. At the same time, I imagine there are another 5 to 20 clean concerns we should also focus on (some of which have been getting attention).
While I realize there’s a lot we can’t predict, I think we could do a much better just making lists of different risk factors and allocating research amongst them.
In response to the Wizard Power post, Garrett and David were like “Y’know, there’s this thing where rationalists get depression, but it doesn’t present like normal depression because they have the mental habits to e.g. notice that their emotions are not reality. It sounds like you have that.”
… and in hindsight I think they were totally correct.
Here I’m going to spell out what it felt/feels like from inside my head, my model of where it comes from, and some speculation about how this relates to more typical presentations of depression.
Core thing that’s going on: on a gut level, I systematically didn’t anticipate that things would be fun, or that things I did would work, etc. When my instinct-level plan-evaluator looked at my own plans, it expected poor results.
Some things which this is importantly different from:
Always feeling sad
Things which used to make me happy not making me happy
Not having energy to do anything
… but importantly, the core thing is easy to confuse with all three of those. For instance, my intuitive plan-evaluator predicted that things which used to make me happy would not make me happy (like e.g. dancing), but if I actually did the things they still made me happy. (And of course I noticed that pattern and accounted for it, which is how “rationalist depression” ends up different from normal depression; the model here is that most people would not notice their own emotional-level predictor being systematically wrong.) Little felt promising or motivating, but I could still consciously evaluate that a plan was a good idea regardless of what it felt like, and then do it, overriding my broken intuitive-level plan-evaluator.
That immediately suggests a model of what causes this sort of problem.
The obvious way a brain would end up in such a state is if a bunch of very salient plans all fail around the same time, especially if one didn’t anticipate the failures and doesn’t understand why they happened. Then a natural update for the brain to make is “huh, looks like the things I do just systematically don’t work, don’t make me happy, etc; let’s update predictions on that going forward”. And indeed, around the time this depression kicked in, David and I had a couple of significant research projects which basically failed for reasons we still don’t understand, and I went through a breakup of a long relationship (and then dove into the dating market, which is itself an excellent source of things not working and not knowing why), and my multi-year investments in training new researchers failed to pay off for reasons I still don’t fully understand. All of these things were highly salient, and I didn’t have anything comparably-salient going on which went well.
So I guess some takeaways are:
If a bunch of salient plans fail around the same time for reasons you don’t understand, your instinctive plan-evaluator may end up with a global negative bias.
If you notice that, maybe try an antidepressant. Bupropion has been helpful for me so far, though it’s definitely not the right tool for everyone (especially bad if you’re a relatively anxious person; I am the opposite of anxious).
This seems basically right to me, yup. And, as you imply, I also think the rat-depression kicked in for me around the same time likely for similar reasons (though for me an at-least-equally large thing that roughly-coincided was the unexpected, disappointing and stressful experience of the funding landscape getting less friendly for reasons I don’t fully understand.) Also some part of me thinks that the model here is a little too narrow but not sure yet in what way(s).
This matches with the dual: mania. All plans, even terrible ones, seem like they’ll succeed and this has flow through effects to elevated mood, hyperactivity, etc.
Whether or not this happens in all minds, the fact that people can alternate fairly rapidly between depression and mania with minimal trigger suggests there can be some kind of fragile “chemical balance” or something that’s easily upset. It’s possible that’s just in mood disorders and more stable minds are just vulnerable to the “too many negative updates at once” thing without greater instability.
Wow…..
I think I might have this. Will test immediately.
This needs to be a top level post.
I imagine part of the problem is also then the feedback loop of Things Don’t Go Well > Why Even Bother > Things Don’t Go Well. Which if anything you’d expect that sort of proactive approach that simply does the thing anyway to break. I do wonder though if there may also be entirely internal feedback loops (like neuroreceptors or something) once the negativity is triggered by external events. I would assume so, or depression wouldn’t need to be treated pharmaceutically as much as it is.
EDIT: it’s also possible John felt fine emotionally and was fully aware of his emotional state and actually was so good at not latching on to emotions that it was highly nontrivial to spot, or some combination. Leaving this comment in case it’s useful for others. I don’t like the tone though, I might’ve been very disassociated as a rationalist (and many are) but it’s not obvious John is from this alone or not.
As a meditator I pay a lot of attention to what emotion I’m feeling in high resolution and the causality between it and my thoughts and actions. I highly recommend this practice. What John describes in “plan predictor predicts failure” is something I notice several times a month & address. It’s 101 stuff when you’re orienting at it from the emotional angle, there’s also a variety of practices I can deploy (feeling emotions, jhanas, many hard to describe mental motions...) to get back to equilibrium and clear thinking & action. This has overall been a bigger update to my effectiveness than the sequences, plausibly my rationality too (I can finally be unbiased instead of trying to correct or pretend I’m not biased!)
Like, when I head you say “your instinctive plan-evaluator may end up with a global negative bias” I’m like, hm, why not just say “if you notice everything feels subtly heavier and like the world has metaphorically lost color” (how I notice it in myself. tbc fully nonverbally). Noticing through patterns of verbal thought also works, but it’s just less data to do metacognition over. You’re noticing correlations and inferring the territory (how you feel) instead of paying attention to how you feel directly (something which can be learned over time by directing attention towards noticing, not instantly)
I may write on this. Till then I highly recommend Joe Hudson’s work, it may require a small amount of woo tolerance, but only small. He coached Sam Altman & other top execs on emotional clarity & fluidity. Extremely good. Requires some practice & willingness to embrace emotional intensity (sometimes locally painful) though.
Because everything did not feel subtly heavier or like the world had metaphorically lost color. It was just, specifically, that most nontrivial things I considered doing felt like they’d suck somehow, or maybe that my attention was disproportionately drawn to the ways in which they might suck.
And to be clear, “plan predictor predicts failure” was not a pattern of verbal thought I noticed, it’s my verbal description of the things I felt on a non-verbal level. Like, there is a non-verbal part of my mind which spits out various feelings when I consider doing different things, and that part had a global negative bias in the feelings it spit out.
I use this sort of semitechnical language because it allows more accurate description of my underlying feelings and mental motions, not as a crutch in lieu of vague poetry.
… But It’s Fake Tho
Epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.
Full-cynical model of the AI safety ecosystem right now:
There’s OpenAI, which is pretending that it’s going to have full AGI Any Day Now, and relies on that narrative to keep the investor cash flowing in while they burn billions every year, losing money on every customer and developing a product with no moat. They’re mostly a hype machine, gaming metrics and cherry-picking anything they can to pretend their products are getting better. The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Then there’s a significant contingent of academics who pretend to produce technical research on AI safety, but in fact mostly view their job as producing technical propaganda for the regulation activists and lobbyists. (Central example: Dan Hendrycks, who is the one person I directly name mainly because I expect he thinks of himself as a propagandist and will not be particularly offended by that description.) They also push the narrative, and benefit from it. They’re all busy bullshitting research. Some of them are quite competent propagandists though.
There’s another significant contingent of researchers (some at the labs, some independent, some academic) who aren’t really propagandists, but mostly follow the twitter-memetic incentive gradient in choosing their research. This tends to generate paper titles which sound dramatic, but usually provide pretty little conclusive evidence of anything interesting upon reading the details, and very much feed the narrative. This is the main domain of Not Measuring What You Think You Are Measuring and Symbol/Referent Confusions.
Then of course there’s the many theorists who like to build neat toy models which are completely toy and will predictably not generalize useful to real-world AI applications. This is the main domain of Ad-Hoc Mathematical Definitions, the theorists’ analogue of Not Measuring What You Think You Are Measuring.
Benchmarks. When it sounds like a benchmark measures something reasonably challenging, it nearly-always turns out that it’s not really measuring the challenging thing, and the actual questions/tasks are much easier than the pitch would suggest. (Central examples: software eng, GPQA, frontier math.) Also it always turns out that the LLMs’ supposedly-impressive achievement relied much more on memorization of very similar content on the internet than the benchmark designers expected.
Then there’s a whole crowd of people who feel real scared about AI (whether for good reasons or because they bought the Narrative pushed by all the people above). They mostly want to feel seen and validated in their panic. They have discussions and meetups and stuff where they fake doing anything useful about the problem, while in fact they mostly just emotionally vibe with each other. This is a nontrivial chunk of LessWrong content, as e.g. Val correctly-but-antihelpfully pointed out. It’s also the primary motivation behind lots of “strategy” work, like e.g. surveying AI researchers about their doom probabilities, or doing timeline forecasts/models.
… and of course none of that means that LLMs won’t reach supercritical self-improvement, or that AI won’t kill us, or [...]. Indeed, absent the very real risk of extinction, I’d ignore all this fakery and go about my business elsewhere. I wouldn’t be happy about it, but it wouldn’t bother me any more than all the (many) other basically-fake fields out there.
Man, I really just wish everything wasn’t fake all the time.
What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?
Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.
Chris Olah and Dan Murfet in the at-least-partially empirical domain. Myself in the theory domain, though I expect most people (including theorists) would not know what to look for to distinguish fake from non-fake theory work. In the policy domain, I have heard that Microsoft’s lobbying team does quite non-fake work (though not necessarily in a good direction). In the capabilities domain, DeepMind’s projects on everything except LLMs (like e.g. protein folding, or that fast matrix multiplication paper) seem consistently non-fake, even if they’re less immediately valuable than they might seem at first glance. Also Conjecture seems unusually good at sticking to reality across multiple domains.
I do not get this impression, why do you say this?
The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it’s all fake.
(IMO this is kinda unrelated to the OP, but I want to continue this thread.)
Have you elaborated on this anywhere?
Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)
I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?
I think it’s quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.
Not hugely. One tricky bit is that it basically ends up boiling down to “the original arguments don’t hold up if you think about them”, but the exact way they don’t hold up depends on what the argument is, so it’s kind of hard to respond to in general.
Haha! I think I mostly still stand by the post. In particular, “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” remains true; it’s just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don’t have to). This means that consequentialism isn’t very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.
(I’m not sure I stand 101% by my post; there’s some funky business about how to count evolution that I still haven’t settled on yet. And I was too quick to go from “imitation learning isn’t going to lead to far-superhuman abilities” to “consequentialism is the road to far-superhuman abilities”. But yeah I’m actually surprised at how well I stand by my old view despite my massive recent updates.)
Sounds good!
I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.
(I think human brains have both [partly-] consequentialist decisions and self-supervised updating of the world-model.) (They’re not totally independent, but rather they interact via training data: e.g. [partly-] consequentialist decision-making determines how you move your eyes, and then whatever your eyes are pointing at, your model of the visual world will then update by self-supervised learning on that particular data. But still, these are two systems that interact, not the same thing.)
I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.
This I’d dispute. If your model if underparameterized (which I think is true for the typical model?), then it can’t learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can’t learn any pattern that never occurs in the data.
I’m saying that intelligence is the thing that allows you to handle patterns. So if you’ve got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.
Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that’s meant to build an understanding of actions based on data or experience.
One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they’re not really independent of each other. Either way that just seems like a small labelling thing to me.
Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.
I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.
Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.
Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.
I guess to add, I’m not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can’t efficiently be derived from empirical data (except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception).
I don’t have time to read this study in detail until later today, but if I’m understanding it correctly, the study isn’t claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.
Random street names aren’t necessarily important though? Like what would you do with them?
I didn’t say that intelligence can’t handle different environments, I said it can’t handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could’ve landed a rocket with a team of Americans in Moscow than on the moon.
Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they’re part of it too.
Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that’s less clear (and possibly not simple enough to be assembled manually, idk).
Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I’m not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t. There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
I do think there’s a “something else” (most [but not all] humans have an innate drive to follow and enforce social norms, more or less), but I don’t think it’s necessary. The Wright Brothers didn’t have any innate drive to copy anything about bird soaring tradition, but they did it anyway purely by intelligence.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn’t even be able to copy the traditions. Like consider a collection of rocks or a forest; it can’t pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don’t think it can be used to determine which traditions should be copied and which ones shouldn’t.
It seems to me that you are treating any variable attribute that’s highly correlated across generations as a “tradition”, to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I’m probably not the best person to make the case for tradition as (despite my critique of intelligence) I’m still a relatively strong believer in equillibration and reinvention.
Whenever there’s any example of this that’s too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data—e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world’s elites, etc.. Though I expect you feel like these would be “cheating”, because it doesn’t have a chance to learn them?
The things in question often aren’t things that most humans have a chance to learn, or even would benefit from learning. Often it’s enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there’s no corresponding universal solution.
You ran way deeper into the “except essentially by copying someone else’s conclusion blindly, and that leaves you vulnerable to deception” point than I meant you to. My main point is that humans have grounding on important factors that we’ve acquired through non-intelligence-based means. I bring up the possibility of copying other’s conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn’t copy human insights. For instance, imagine if there was a planet with life much like Earth’s, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way—but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there’s also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that’s computationally intractable).
(Also, again you still need to distinguish between “Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?” vs “Is intelligence sufficient on its own to detect deception?”. My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don’t just use intelligence but also other facets of human agency.)
First, some things that might seem like nitpicks but are moderately important to my position:
In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn’t been protected against. (I’m aware of John Wentworth’s ‘gears of aging’ series arguing that aging has a common cause, but I’ve come to think that his arguments don’t sufficiently much distinguish between ‘is eventually mediated by a common cause’ vs ‘is ultimately caused by a common cause’. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there’s a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don’t think we actually have a robust way to avoid that?
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. “Intelligence” and “consequentialism” are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we’ve already “achieved superhuman intelligence” in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus “intelligence” factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there’s not as much reason to presume that all of it can. “Durability” and “strength” are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it—though I suspect it’s not purely cognitive...)
OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
Etc.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
Filter for homogenity of environment is anthropic selection—if environment is sufficiently heterogeneous, it kills everyone who tries to reach out of its ecological niche, general intelligence doesn’t develop and we are not here to have this conversation.
Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, … . Most of these developed before intelligence did.
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
On the other hand, words like “durability” imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
We don’t just use intelligence.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
I feel like I’m failing to convey the level of abstraction I intend to.
I’m not saying that durability of object implies intelligence of object. I’m saying that if the world is ordered in a way that allows existence of distinct durable and non-durable objects, that means the possibility of intelligence which can notice that some objects are durable and some are not and exploit this fact.
If the environment is not ordered enough to contain intelligent beings, it’s probably not ordered enough to contain distinct durable objects too.
To be clear, by “environment” I mean “the entire physics”. When I say “environment not ordered enough” I mean “environment with physical laws chaotic enough to not contain ordered patterns”.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
No, my point is that in worlds where intelligence is possible, almost all obstacles are common.
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
I don’t think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real
I’m not trying to present johnswentworth’s position, I’m trying to present my position.
I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about “performative compliance” and “compliance theatre”. Painfully present across the legal and governance sectors.
That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a “non fake” regulatory effort look like?
I don’t think it would be okay to dismiss your take entirely, but it would be great to see what solutions you’d propose too. This is why I disagree in principle, because there are no specific points to contribute to.
In Europe, paradoxically, some of the people “close enough to the bureaucracy” that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.
But I will rescue this:
“(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI”
BigTech is too powerful to lobby against. “Stopping advanced AI” per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people’s lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to “stop” development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.
Those are considered to create unacceptable risks to people’s lives and human rights.
“The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.”
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
If the core products aren’t really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they’re changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn’t, it isn’t. (At least directionally; a lot of that propaganda is still probably exaggerated.)
If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they’re responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.
So, in deciding whether or not to endorse this narrative, we’d like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?
Nope!
Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.
Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.
Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.
Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.
Sounds like you’re suggesting that real progress could be orthogonal to human-observed progress. I don’t see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don’t know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.
They think all these improvements are happening at the same time in vastly different domains because they’re all downstream of the same task, which is text prediction. So, they’re lumped together in the general domain of ‘capabilities’, and call a model which can do all of them well a ‘general intelligence’. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big ‘if’!) But how could the models be ‘improving’ without improving at any of these things? What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.
Why can’t there be plenty of other things like inner-monologue or truesight? (“Wait, you could do X? Why didn’t you tell us?” “You never asked.”)
Maybe a better example would be to point out that ‘emergent’ tasks in general, particularly multi-step tasks, can have observed success rates of precisely 0 in feasible finite samples, but extreme brute-force sampling reveals hidden scaling. Humans would perceive zero improvement as the models scaled (0/100 = 0%, 0⁄100 = 0%, 0⁄100 = 0%...), even though they might be rapidly improving from 1⁄100,000 to 1⁄10,000 to 1⁄1,000 to… etc. “Sampling can show the presence of knowledge but not the absence.”
Oops, yes. I was thinking “domains of real improvement which humans are currently perceiving in LLMs”, not “domains of real improvement which humans are capable of perceiving in general”. So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be ‘real’ even if other discoveries are ‘fake’.
That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we’d expect it to correlate with skill in the common “write [x] in the style of [y]” prompt, right? Surely the same network of associations which lets it accurately generate “Eliezer Yudkowsky wrote this” after a given set of tokens, would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowksy says...”.
So I still wouldn’t consider these things to have basically-nothing to do with commonly perceived domains of improvement.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn’t have noticed because no one would have been prompting for it and if they had, they probably wouldn’t have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn’t count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an ‘uncoupled’ capability; it’s not a real improvement. Did it subsequently improve benchmarks once discovered? Then it’s not really an example because it’s ‘coupled’...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with ‘reasoning models’ which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 ‘only’ does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: ‘name → output’; truesight is about ‘name ← output’. The ‘reversal curse’ is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn’t know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search… No benchmark really distinguishes between these such that you could point to a single specific number and say, “that’s the truesight metric, and you can see it gets better with scale”.
Gotta love how much of a perfect Scissor statement this is. (Same as my “o3 is not that impressive”.)
SB1047 was a pretty close shot to something really helpful. The AI Act and its code of practice might be insufficient, but there are good elements in it that, if applied, would reduce the risks. The problem is that it won’t be applied because of internal deployment.
But I sympathise somewhat with stuff like this:
No, it wasn’t. It was a pretty close shot to something which would have gotten a step closer to another thing, which itself would have gotten us a step closer to another thing, which might have been moderately helpful at best.
You really think those elements are not helpful? I’m really curious
Sure, they are more-than-zero helpful. Heck, in a relative sense, they’d be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.
One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. “Implement shutdown ability” would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. “Implement reasonable safeguards to prevent societal-scale catastrophes” would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.
When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.
Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.
… or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.
How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren’t “scale LLMs”. Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren’t really pushing the frontier today either; that wouldn’t be much of a loss.
To what extent are the three AGI labs alive vs. dead players, then?
OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it’s now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it’s little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it’s apparently capable of inventing some novel tricks (whatever’s the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.
I think there’s a decent chance that they’re alive enough. Currently, they’re busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people’s attention on the potentially-doomed paradigm, if they’re forced to correct the mistake (on this model) that they’re making...
This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.
One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can’t produce straight-line graphs suggesting godhood by 2027, and are reduced to “well we probably need a transformer-sized insight here...”, it becomes much harder to generate hype and alarm that would be legible to investors and politicians.
But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer)?
On balance, upper-bounding FLOPs is probably still a positive thing to do. But I’m not really sure.
I disagree that the default would’ve been that the board would’ve been “easy for the labs to capture” (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn’t have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).
Also, I’m confused when policy skeptics say things like “sure, it might slow down timelines by a factor of 2-3, big deal.” Having 2-3x as much time is indeed a big deal!
Probably not going to have a discussion on the topic right now, but out of honest curiosity: did you read the bill?
I’m glad we agree “they’d be one of the biggest wins in AI safety to date.”
How so? It’s pretty straightforward if the model is still contained in the lab.
I think ticking boxes is good. This is how we went to the Moon, and it’s much better to do this than to not do it. It’s not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.
How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: “A model might sandbag.” Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn’t be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.
100% agreed @Charbel-Raphaël.
The EU AI Act even mentions “alignment with human intent” explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).
I do not think any law has mentioned alignment like this before, so it’s massive already.
Will a lot of the implementation efforts feel “fake”? Oh, 100%. But I’d say that this is why we (this community) should not disengage from it...
I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).
Your very first point is, to be a little uncharitable, ‘maybe OpenAI’s whole product org is fake.’ I know you have a disclaimer here but you’re talking about a product category that didn’t exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.
If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.
No, the model here is entirely consistent with OpenAI putting out some actual cool products. Those products (under the model) just aren’t on a path to AGI, and OpenAI’s valuation is very much reliant on being on a path to AGI in the not-too-distant future. It’s the narrative about building AGI which is fake.
Really? I’m mostly ignorant on such matters, but I’d thought that their valuation seemed comically low compared to what I’d expect if their investors thought that OpenAI was likely to create anything close to a general superhuman AI system in the near future.[1] I considered this evidence that they think all the AGI/ASI talk is just marketing.
Well ok, if they actually thought OpenAI would create superintelligence as I think of it, their valuation would plummet because giving people money to kill you with is dumb. But there’s this space in between total obliviousness and alarm, occupied by a few actually earnest AI optimists. And, it seems to me, not occupied by the big OpenAI investors.
Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever’s in the news requires replacing tens of trillions of dollars of capital before the end of the decade.
Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn’t definitive, else stocks would never change rapidly.
To be clear: I think the investors would be wrong to think that AGI/ASI soon-ish isn’t pretty likely.
But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI’s research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.
Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.
The point of listing the problems with their business model is that they need the AGI narrative in order to fuel the investor cash, without which they will go broke at current spend rates. They have cool products, they could probably make a profit if they switched to optimizing for that (which would mean more expensive products and probably a lot of cuts), but not anywhere near the level of profits they’d need to justify the valuation.
That’s how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I’m not sure what to say to this that I didn’t say originally.
The activists and the lobbyists are two very different groups. The activists are not trying to network with the DC people (yet). Unless you mean Encode, who I would call lobbyists, not activists.
Good point, I should have made those two separate bullet points:
Then there’s the AI regulation lobbyists. They lobby and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Also, there’s the AI regulation activists, who e.g. organize protests. Like ~98% of protests in general, such activity is mostly performative and not the sort of thing anyone would end up doing if they were seriously reasoning through how best to spend their time in order to achieve policy goals. Calling it “fake” feels almost redundant. Insofar as these protests have any impact, it’s via creating an excuse for friendly journalists to write stories about the dangers of AI (itself an activity which mostly feeds the narrative, and has dubious real impact).
(As with the top level, epistemic status: I don’t fully endorse all this, but I think it’s a pretty major mistake to not at least have a model like this sandboxed in one’s head and check it regularly.)
Oh, if you’re in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:
Vibe coders and “10x’d engineers”, who (on this model) would be falling into one of the failure modes outlined here: producing applications/features that didn’t need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like “volume of code produced” or “number of commits”), or “automatically generating” entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they’re bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.
True, but I feel a bit bad about punching that far down.
What are the other basically-fake fields out there?
quantum computing, nuclear fusion
I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions—there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.
On o3: for what feels like the twentieth time this year, I see people freaking out, saying AGI is upon us, it’s the end of knowledge work, timelines now clearly in single-digit years, etc, etc. I basically don’t buy it, my low-confidence median guess is that o3 is massively overhyped. Major reasons:
I’ve personally done 5 problems from GPQA in different fields and got 4 of them correct (allowing internet access, which was the intent behind that benchmark). I’ve also seen one or two problems from the software engineering benchmark. In both cases, when I look the actual problems in the benchmark, they are easy, despite people constantly calling them hard and saying that they require expert-level knowledge.
For GPQA, my median guess is that the PhDs they tested on were mostly pretty stupid. Probably a bunch of them were e.g. bio PhD students at NYU who would just reflexively give up if faced with even a relatively simple stat mech question which can be solved with a couple minutes of googling jargon and blindly plugging two numbers into an equation.
For software engineering, the problems are generated from real git pull requests IIUC, and it turns out that lots of those are things like e.g. “just remove this if-block”.
Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven’t checked (e.g. FrontierMath) is that they’re also mostly much easier than they’re hyped up to be.
On my current model of Sam Altman, he’s currently very desperate to make it look like there’s no impending AI winter, capabilities are still progressing rapidly, etc. Whether or not it’s intentional on Sam Altman’s part, OpenAI acts accordingly, releasing lots of very over-hyped demos. So, I discount anything hyped out of OpenAI, and doubly so for products which aren’t released publicly (yet).
Over and over again in the past year or so, people have said that some new model is a total game changer for math/coding, and then David will hand it one of the actual math or coding problems we’re working on and it will spit out complete trash. And not like “we underspecified the problem” trash, or “subtle corner case” trash. I mean like “midway through the proof it redefined this variable as a totally different thing and then carried on as though both definitions applied”. The most recent model with which this happened was o1.
Of course I am also tracking the possibility that this is a skill issue on our part, and if that’s the case I would certainly love for someone to help us do better. See this thread for a couple examples of relevant coding tasks.
My median-but-low-confidence guess here is that basically-all the people who find current LLMs to be a massive productivity boost for coding are coding things which are either simple, or complex only in standardized ways—e.g. most web or mobile apps. That’s the sort of coding which mostly involves piping things between different APIs and applying standard patterns, which is where LLMs shine.
I just spent some time doing GPQA, and I think I agree with you that the difficulty of those problems is overrated. I plan to write up more on this.
@johnswentworth Do you agree with me that modern LLMs probably outperform (you with internet access and 30 minutes) on GPQA diamond? I personally think this somewhat contradicts the narrative of your comment if so.
I don’t know, I have not specifically tried GPQA diamond problems. I’ll reply again if and when I do.
I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded.
(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)
@Buck Apparently the five problems I tried were GPQA diamond, they did not take anywhere near 30 minutes on average (more like 10 IIRC?), and I got 4⁄5 correct. So no, I do not think that modern LLMs probably outperform (me with internet access and 30 minutes).
Ok, so sounds like given 15-25 mins per problem (and maybe with 10 mins per problem), you get 80% correct. This is worse than o3, which scores 87.7%. Maybe you’d do better on a larger sample: perhaps you got unlucky (extremely plausible given the small sample size) or the extra bit of time would help (though it sounds like you tried to use more time here and that didn’t help). Fwiw, my guess from the topics of those questions is that you actually got easier questions than average from that set.
I continue to think these LLMs will probably outperform (you with 30 mins). Unfortunately, the measurement is quite expensive, so I’m sympathetic to you not wanting to get to ground here. If you believe that you can beat them given just 5-10 minutes, that would be easier to measure. I’m very happy to bet here.
I think that even if it turns out you’re a bit better than LLMs at this task, we should note that it’s pretty impressive that they’re competitive with you given 30 minutes!
So I still think your original post is pretty misleading [ETA: with respect to how it claims GPQA is really easy].
I think the models would beat you by more at FrontierMath.
Even assuming you’re correct here, I don’t see how that would make my original post pretty misleading?
I think that how you talk about the questions being “easy”, and the associated stuff about how you think the baseline human measurements are weak, is somewhat inconsistent with you being worse than the model.
I mean, there are lots of easy benchmarks on which I can solve the large majority of the problems, and a language model can also solve the large majority of the problems, and the language model can often have a somewhat lower error rate than me if it’s been optimized for that. Seems like GPQA (and GPQA diamond) are yet another example of such a benchmark.
What do you mean by “easy” here?
(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don’t recall it being crazy early)
I remember finishing early, and then spending a lot of time going back over all them a second time, because the goal of the workshop was to answer correctly with very high confidence. I don’t think I updated any answers as a result of the second pass, though I don’t remember very well.
(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn’t like people were trying to crank through them in 7 minutes)
The problems I gave were (as listed in the csv for the diamond problems)
#1 (Physics) (1 person got right, 3 got wrong, 1 didn’t answer)
#2 (Organic Chemistry), (John got right, I think 3 people didn’t finish)
#4 (Electromagnetism), (John and one other got right, 2 got wrong)
#8 (Genetics) (3 got right including John)
#10 (Astrophysics) (5 people got right)
@johnswentworth FWIW, GPQA Diamond seems much harder than GPQA main to me, and current models perform well on it. I suspect these models beat your performance on GPQA diamond if you’re allowed 30 mins per problem. I wouldn’t be shocked if you beat them (maybe I’m like 20%?), but that’s because you’re unusually broadly knowledgeable about science, not just because you’re smart.
I personally get wrecked by GPQA chemistry, get ~50% on GPQA biology if I have like 7 minutes per problem (which is notably better than their experts from other fields get, with much less time), and get like ~80% on GPQA physics with less than 5 minutes per problem. But GPQA Diamond seems much harder.
Is this with internet access for you?
Yes, I’d be way worse off without internet access.
Daniel Litt’s account here supports this prejudice. As a math professor, he knew instantly how to solve the low/medium-level problems he looked at, and he suggests that each “high”-rated problem would be likewise instantly solvable by an expert in that problem’s subfield.
And since LLMs have eaten ~all of the internet, they essentially have the crystallized-intelligence skills for all (sub)fields of mathematics (and human knowledge in general). So from their perspective, all of those problems are very “shallow”. No human shares their breadth of knowledge, so math professors specialized even in slightly different subfields would indeed have to do a lot of genuine “deep” cognitive work; this is not the case for LLMs.
GPQA stuff is even worse, a literal advanced trivia quiz that seems moderately resistant to literal humans literally googling things, but not to the way the knowledge gets distilled into LLMs.
Basically, I don’t think any extant benchmark (except I guess the Millennium Prize Eval) actually tests “deep” problem-solving skills, in a way LLMs can’t cheat at using their overwhelming knowledge breadth.
My current strong-opinion-weakly-held is that they’re essentially just extensive knowledge databases with a nifty natural-language interface on top.[1] All of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.
Which is to say: this is the central way to characterize what they are; not merely “isomorphic to a knowledge database with a natural-language search engine on top if you think about them in a really convoluted way”. Obviously a human can also be considered isomorphic to database search if you think about it in a really convoluted way, but that wouldn’t be the most-accurate way to describe a human.
This is an exaggeration and, as stated, false.
Epoch AI made 5 problems from the benchmark public. One of those was ranked “High”, and that problem was authored by me.
It took me 20-30 hours to create that submission. (To be clear, I considered variations of the problem, ran into some dead ends, spent a lot of time carefully checking my answer was right, wrote up my solution, thought about guess-proof-ness[1] etc., which ate up a lot of time.)
I would call myself an “expert in that problem’s subfield” (e.g. I have authored multiple related papers).
I think you’d be very hard-pressed to find any human who could deliver the correct answer to you within 2 hours of seeing the problem.
E.g. I think it’s highly likely that I couldn’t have done that (I think it’d have taken me more like 5 hours), I’d be surprised if my colleagues in the relevant subfield could do that, and I think the problem is specialized enough that few of the top people in CodeForces or Project Euler could do it.
On the other hand, I don’t think the problem is very hard insight-wise—I think it’s pretty routine, but requires care with details and implementation. There are certainly experts who can see the right main ideas quickly (including me). So there’s something to the point of even FrontierMath problems being surprisingly “shallow”. And as is pointed out in the FM paper, the benchmark is limited to relatively short-scale problems (hours to days for experts) - which really is shallow, as far as the field of mathematics is concerned.
But it’s still an exaggeration to talk about “instantly solvable”. Of course, there’s no escaping of Engel’s maxim “A problem changes from impossible to trivial if a related problem was solved in training”—I guess the problem is instantly solvable to me now… but if you are hard-pressed to find humans that could solve it “instantly” when seeing it the first time, then I wouldn’t describe it in those terms.
Also, there are problems in the benchmark that require more insight than this one.
Daniel Litt writes about the problem: “This one (rated “high”) is a bit trickier but with no thinking at all (just explaining what computation I needed GPT-4o to do) I got the first 3 digits of the answer right (the answer requires six digits, and the in-window python timed out before it could get this far)
Of course *proving* the answer to this one is correct is harder! But I do wonder how many of these problems are accessible to simulation/heuristics. Still an immensely useful tool but IMO people should take a step back before claiming mathematicians will soon be replaced”.
I very much considered naive simulations and heuristics. The problem is getting 6 digits right, not 3. (The AIs are given a limited compute budget.) This is not valid evidence in favor of the problem’s easiness or for the benchmark’s accessibility to simulation/heuristics—indeed, this is evidence in the opposing direction.
See also Evan Chen’s “I saw the organizers were pretty ruthless about rejecting problems for which they felt it was possible to guess the answer with engineer’s induction.”
Thanks, that’s important context!
And fair enough, I used excessively sloppy language. By “instantly solvable”, I did in fact mean “an expert would very quickly (“instantly”) see the correct high-level approach to solving it, with the remaining work being potentially fiddly, but conceptually straightforward”. “Instantly solvable” in the sense of “instantly know how to solve”/”instantly reducible to something that’s trivial to solve”.[1]
Which was based on this quote of Litt’s:
That said,
If there are no humans who can “solve it instantly” (in the above sense), then yes, I wouldn’t call it “shallow”. But if such people do exist (even if they’re incredibly rare), this implies that the conceptual machinery (in the form of theorems or ansatzes) for translating the problem into a trivial one already exists as well. Which, in turn, means it’s likely present in the LLM’s training data. And therefore, from the LLM’s perspective, that problem is trivial to translate into a conceptually trivial problem.
It seems you’d largely agree with that characterization?
Note that I’m not arguing that LLMs aren’t useful or unimpressive-in-every-sense. This is mainly an attempt to build a model of why LLMs seem to perform so well on apparently challenging benchmarks while reportedly falling flat on their faces on much simpler real-life problems.
Or, closer to the way I natively think of it: In the sense that there are people (or small teams of people) with crystallized-intelligence skillsets such that they would be able to solve this problem by plugging their crystallized-intelligence skills one into another, without engaging in prolonged fluid-intelligence problem-solving.
This looks reasonable to me.
Yes. My only hesitation is about how real-life-important it’s for AIs to be able to do math for which very-little-to-no training data exists. The internet and the mathematical literature is so vast that, unless you are doing something truly novel, there’s some relevant subfield there—in which case FrontierMath-style benchmarks would be informative of capability to do real math research.
Also, re-reading Wentworth’s original comment, I note that o1 is weak according to FM. Maybe the things Wentworth is doing are just too hard for o1, rather than (just) overfitting-on-benchmarks style issues? In any case his frustration with o1′s math skills doesn’t mean that FM isn’t measuring real math research capability.
Previously, I’d intuitively assumed the same as well: that it doesn’t matter if LLMs can’t “genuinely research/innovate”, because there is enough potential for innovative-yet-trivial combination of existing ideas that they’d still massively speed up R&D by finding those combinations. (“Innovation overhang”, as @Nathan Helm-Burger puts it here.)
Back in early 2023, I’d considered it fairly plausible that the world would start heating up in 1-2 years due to such synthetically-generated innovations.
Except this… just doesn’t seem to be happening? I’m yet to hear of a single useful scientific paper or other meaningful innovation that was spearheaded by a LLM.[1] And they’re already adept at comprehending such innovative-yet-trivial combinations if a human prompts them with those combinations. So it’s not the matter of not yet being able to understand or appreciate the importance of such synergies. (If Sonnet 3.5.1 or o1 pro didn’t do it, I doubt o3 would.)
Yet this is still not happening. My guess is that “innovative-yet-trivial combinations of existing ideas” are not actually “trivial”, and LLMs can’t do that for the same reasons they can’t do “genuine research” (whatever those reasons are).
Admittedly it’s possible that this is totally happening all over the place and people are just covering it up in order to have all of the glory/status for themselves. But I doubt it: there are enough remarkably selfless LLM enthusiasts that if this were happening, I’d expect it would’ve gone viral already.
There are 2 things to keep in mind:
It’s only now that LLMs are reasonably competent in at least some hard problems, and at any rate, I expect RL to basically solve the domain, because of verifiability properties combined with quite a bit of training data.
We should wait a few years, as we have another scale-up that’s coming up, and it will probably be quite a jump from current AI due to more compute:
https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/?commentId=7KSdmzK3hgcxkzmPX
I don’t think that’s the limiter here. Reports in the style of “my unpublished PhD thesis was about doing X using Y methodology, I asked an LLM to do that and it one-shot a year of my work! the equations it derived are correct!” have been around for quite a while. I recall it at least in relation to Claude 3, and more recently, o1-preview.
If LLMs are prompted to combine two ideas, they’ve been perfectly capable of “innovating” for ages now, including at fairly high levels of expertise. I’m sure there’s some sort of cross-disciplinary GPQA-like benchmark that they’ve saturated a while ago, so this is even legible.
The trick is picking which ideas to combine/in what direction to dig. This doesn’t appear to be something LLMs are capable of doing well on their own, nor do they seem to speed up human performance on this task. (All cases of them succeeding at it so far have been, by definition, “searching under the streetlight”: checking whether they can appreciate a new idea that a human already found on their own and evaluated as useful.)
I suppose it’s possible that o3 or its successors change that (the previous benchmarks weren’t measuring that, but surely FrontierMath does...). We’ll see.
Mm, I think it’s still up in the air whether even the o-series efficiently scales (as in, without requiring a Dyson Swarm’s worth of compute) to beating the Millennium Prize Eval (or some less legendary yet still major problems).
I expect such problems don’t pass the “can this problem be solved by plugging the extant crystallized-intelligence skills of a number of people into each other in a non-contrived[1] way?” test. Does RL training allow to sidestep this, letting the model generate new crystallized-intelligence skills?
I’m not confident one way or another.
I’m bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I’m sure it’d show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...
But I predict, with low-moderate confidence, that it still won’t kick off a deluge of synthetically derived innovations. It’d have even more breadth and eye for nuance, but somehow, perplexingly, still no ability to use those capabilities autonomously.
“Non-contrived” because technically, any cognitive skill is just a combination of e. g. NAND gates, since those are Turing-complete. But obviously that doesn’t mean any such skill is accessible if you’ve learned the NAND gate. Intuitively, a combination of crystallized-intelligence skills is only accessible if the idea of combining them is itself a crystallized-intelligence skill (e. g., in the math case, a known ansatz).
Which perhaps sheds some light on why LLMs can’t innovate even via trivial ideas combinations. If a given idea-combination “template” weren’t present in the training data, the LLM can’t reliably independently conceive of it except by brute-force enumeration...? This doesn’t seem quite right, but maybe in the right direction.
I think my key crux is that in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance, and mathematics/programming are domains that are unusually easy to verify/gather training data for RL performance, so with caveats it can become rather good at those specific domains/benchmarks like millennium prize evals, but the important caveat is I don’t believe this transfers very well to domains where verifying isn’t easy, like creative writing.
I was talking about the 1 GW systems that would be developed in late 2026-early 2027, not GPT-5.
Sure, the theory on that is solid. But how efficiently does it scale off-distribution, in practice?
The inference-time scaling laws, much like the pretraining scaling laws, are ultimately based on test sets whose entries are “shallow” (in the previously discussed sense). It doesn’t tell us much regarding how well the technique scales with the “conceptual depth” of a problem.
o3 took a million dollars in inference-time compute and unknown amounts in training-time compute just to solve the “easy” part of the FrontierMath benchmark (which likely take human experts single-digit hours, maybe <1 hour for particularly skilled humans). How much would be needed for beating the “hard” subset of FrontierMath? How much more still would be needed for problems that take individual researchers days; or problems that take entire math departments months; or problems that take entire fields decades?
It’s possible that the “synthetic data flywheel” works so well that the amount of human-researcher-hour-equivalents per unit of compute scales, say, exponentially with some aspect of o-series’ training, and so o6 in 2027 solves the Riemann Hypothesis.
Or it scales not that well, and o6 can barely clear real-life equivalents of hard FrontierMath problems. Perhaps instead the training costs (generating all the CoT trees on which RL training is then done) scale exponentially, while researcher-hour-equivalents per compute units scale linearly.
It doesn’t seem to me that we know which one it is yet. Do we?
I don’t think we know yet whether it will succeed in practice, or whether it training costs make it infeasibble to do.
Consider: https://www.cognitiverevolution.ai/can-ais-generate-novel-research-ideas-with-lead-author-chenglei-si/
I think a different phenomenon is occuring. My guess, updating on my own experience, is that ideas aren’t the current bottleneck. 1% inspiration, 99% perspiration.
As someone who has been reading 3-20 papers per month for many years now, in neuroscience and machine learning, I feel overwhelmed with ideas. I average about 0.75 per paper. I write them down, and the lists grow faster than they shrink by two orders of magnitude.
When I was on my favorite industry team, what I most valued about my technical manager was his ability to help me sort through and prioritize them. It was like I created a bunch of LEGO pieces, he picked one to be next, I put it in place by coding it up, he checked the placement by reviewing my PR. If someone has offered me a source of ideas ranging in quality between worse than my worst ideas, and almost as good as my best ideas, and skewed towards bad… I’d have laughed and turned them down without a second thought.
For something like a paper instead of a minor tech idea for 1 week PR… The situation is far more intense. The grunt work of running the experiments and preparing the paper is enormous compared to the time and effort of coming up with the idea in the first place. More like 0.1% to 99.9%.
Current LLMs can speed up creating a paper if given the results and experiment description to write about. That’s probably also not the primary bottleneck (although still more than idea generation).
So the current bottleneck, in my estimation, for ml experiments, is the experiments. Coding up the experiments accurately and efficiently, running them (and handling the compute costs), analyzing the results.
So I’ve been expecting to see an acceleration dependent on that aspect. That’s hard to measure though. Are LLMs currently speeding this work up a little? Probably. I’ve had my work sped up some by the recent Sonnet 3.5.1. Currently though it’s a trade-off, there’s overhead in checking for misinterpretations and correcting bugs. We still seem a long way in “capability space” from me being able to give a background paper and rough experiment description, and then having the model do the rest. Only once that’s the case will idea generation become my bottleneck.
That’s the opposite of my experience. Nearly all the papers I read vary between “trash, I got nothing useful out besides an idea for a post explaining the relevant failure modes” and “high quality but not relevant to anything important”. Setting up our experiments is historically much faster than the work of figuring out what experiments would actually be useful.
There are exceptions to this, large projects which seem useful and would require lots of experimental work, but they’re usually much lower-expected-value-per-unit-time than going back to the whiteboard, understanding things better, and doing a simpler experiment once we know what to test.
Ah, well, for most papers that spark an idea in me, the idea isn’t simply an extension of the paper. It’s a question tangentially related which probes at my own frontier of understanding.
I’ve always found that a boring lecture is a great opportunity to brainstorm because my mind squirms away from the boredom into invention and extrapolation of related ideas. A boring paper does some of the same for me, except that I’m less socially pressured to keep reading it, and thus less able to squeeze my mind with the boredom of it.
As for coming up with ideas… It is a weakness of mind that I am far better at generating ideas than at critiquing them (my own or others). Which is why I worked so well in a team where I had someone I trusted to sort through my ideas and pick out the valuable ones. It sounds to me like you have a better filter on idea quality.
That’s mostly my experience as well: experiments are near-trivial to set up, and setting up any experiment that isn’t near-trivial to set up is a poor use of the time that can instead be spent thinking on the topic a bit more and realizing what the experimental outcome would be or why this would be entirely the wrong experiment to run.
But the friction costs of setting up an experiment aren’t zero. If it were possible to sort of ramble an idea at an AI and then have it competently execute the corresponding experiment (or set up a toy formal model and prove things about it), I think this would be able to speed up even deeply confused/non-paradigmatic research.
… That said, I think the sorts of experiments we do aren’t the sorts of experiments ML researchers do. I expect they’re often things like “do a pass over this lattice of hyperparameters and output the values that produce the best loss” (and more abstract equivalents of this that can’t be as easily automated using mundane code). And which, due to the atheoretic nature of ML, can’t be “solved in the abstract”.
So ML research perhaps could be dramatically sped up by menial-software-labor AIs. (Though I think even now the compute needed for running all of those experiments would be the more pressing bottleneck.)
Convincing.
I agree that the trick scaling as far as it has is surprising, but I’d disagree with the claim that this doesn’t bear on AGI.
I do think that something like dumb scaling can mostly just work, and I think the main takeaway I take from AI progress is that there will not a be a clear resolution to when AGI happens, as the first AIs to automate AI research will have very different skill profiles from humans, and most importantly we need to disentangle capabilities in a way we usually don’t for humans.
I agree with faul sname here:
The exact degree of “mostly” is load-bearing here. You’d mentioned provisions for error-correction before. But are the necessary provisions something simple, such that the most blatantly obvious wrappers/prompt-engineering works, or do we need to derive some additional nontrivial theoretical insights to correctly implement them?
Last I checked, AutoGPT-like stuff has mostly failed, so I’m inclined to think it’s closer to the latter.
Actually, I’ve changed my mind, in that the reliability issue probably does need at least non-trivial theoretical insights to make AIs work.
I am unconvinced that “the” reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.
Yeah, I’m sympathetic to this argument that there won’t be a single insight, and that at least one approach will work out once hardware costs decrease enough, and I agree less with Thane Ruthenis’s intuitions here than I did before.
If I were to think about it a little, I’d suspect the big difference that LLMs and humans have is state/memory, where humans do have state/memory, but LLMs are currently more or less stateless today, and RNN training has not been solved to the extent transformers were.
One thing I will also say is that AI winters will be shorter than previous AI winters, because AI products can now be sort of made profitable, and this gives an independent base of money for AI research in ways that weren’t possible pre-2016.
A factor stemming from the same cause but pushing in the opposite direction is that “mundane” AI profitability can “distract” people who would otherwise be AGI hawks.
I agree with you on your assessment of GPQA. The questions themselves appear to be low quality as well. Take this one example, although it’s not from GPQA Diamond:
The correct answer is stated as yellow and blue. However, the question should read transmits, not emits; molecules cannot trivially absorb and re-emit light of a shorter wavelength without resorting to trickery (nonlinear effects, two-photon absorption).
This is, of course, a cherry-picked example, but is exactly characteristic of the sort of low-quality science questions I saw in school (e.g with a teacher or professor who didn’t understand the material very well). Scrolling through the rest of the GPQA questions, they did not seem like questions that would require deep reflection or thinking, but rather the sort of trivia things that I would expect LLMs to perform extremely well on.
I’d also expect “popular” benchmarks to be easier/worse/optimized for looking good while actually being relatively easy. OAI et. al probably have the mother of all publication biases with respect to benchmarks, and are selecting very heavily for items within this collection.
Re: LLMs for coding: One lens on this is that LLM progress changes the Build vs Buy calculus.
Low-power AI coding assistants were useful in both the “build” and “buy” scenarios, but they weren’t impactful enough to change the actual border between build-is-better vs. buy-is-better. More powerful AI coding systems/agents can make a lot of tasks sufficiently easy that dealing with some components starts feeling more like buying than building. Different problem domains have different peak levels of complexity/novelty, so the easier domains will start being affected more and earlier by this build/buy decision boundary shift. Many people don’t travel far from their primary domains, so to some of them it will look like the shift is happening quickly (because it is, in their vicinity) even though on the larger scale it’s still pretty gradual.
Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm.
About a month ago, after some back-and-forth with several people about their experiences (including on lesswrong), I hypothesized that I don’t feel the emotions signalled by oxytocin, and never have. (I do feel some adjacent things, like empathy and a sense of responsibility for others, but I don’t get the feeling of loving connection which usually comes alongside those.)
Naturally I set out to test that hypothesis. This note is an in-progress overview of what I’ve found so far and how I’m thinking about it, written largely to collect my thoughts and to see if anyone catches something I’ve missed.
Under the hypothesis, this has been a life-long thing for me, so the obvious guess is that it’s genetic (the vast majority of other biological state turns over too often to last throughout life). I also don’t have a slew of mysterious life-long illnesses, so the obvious guess is that’s it’s pretty narrowly limited to oxytocin—i.e. most likely a genetic variant in either the oxytocin gene or receptor, maybe the regulatory machinery around those two but that’s less likely as we get further away and the machinery becomes entangled with more other things.
So I got my genome sequenced, and went looking at the oxytocin gene and the oxytocin receptor gene.
The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein. (The oxytocin gene, on the other hand, was totally normal.)
So that sure is damn strong evidence in favor of the hypothesis! But, we have two copies of most genes, including the oxytocin receptor. The frameshift error is only on one copy. Why isn’t the other copy enough for almost-normal oxytocin signalling?
The frameshift error is the only thing I have which would obviously completely fuck up the whole protein, but there are also a couple nonsynonymous single nucleotide polymorphisms (SNPs) in the ORF, plus another couple upstream. So it’s plausible that one of the SNPs messes up the other copy pretty badly; in particular, one of them changes an arginine to a histidine at the edge of the second intracellular loop. (Oxytocin receptor is a pretty standard g-protein coupled receptor, so that’s the mental picture here.) I did drop the sequences into alphafold, and I don’t see any large structural variation from the SNPs, but (a) that histidine substitution would most likely change binding rather than structure in isolation, and (b) this is exactly the sort of case where I don’t trust alphafold much, because “this is one substitution away from a standard sequence, I’ll just output the structure of that standard sequence” is exactly the sort of heuristic I’d expect a net to over-rely upon.
It’s also possible-in-principle that the second receptor copy is fine, but the first copy frameshift alone is enough to mess up function. I think that’s unlikely in this case. The mRNA for the frameshifted version should be removed pretty quickly by nonsense-mediated decay (I did double check that it has a bunch of early stop codons, NMD should definitely trigger). So there should not be a bunch of junk protein floating around from the frameshifted gene. And the frameshift is early enough that the messed up proteins probably won’t e.g. form dimers with structurally-non-messed-up versions (even if oxytocin receptor normally dimerizes, which I’d guess it doesn’t but haven’t checked). At worst there should just be a 2x lower concentration of normal receptor than usual, and if there’s any stable feedback control on the receptor concentration then there’d be hardly any effect at all.
Finally, there’s the alternative hypothesis that my oxytocin signalling is unusually weak but not entirely nonfunctional. I do now have pretty damn strong evidence for that at a bare minimum, assuming that feedback control on receptor density doesn’t basically counterbalance the fucked up receptor copy.
Anyway, that’s where I’m currently at. I’m curious to hear others’ thoughts on what mechanisms I might be missing here!
I’m kind of astonished that this kind of advance prediction panned out!
I admit I was somewhat surprised as well. On a gut level, I did not think that the very first things to check would turn up such a clear and simple answer.
I’m insufficiently knowledgeable about deletion base rates to know how astonished to be. Does anyone have an estimate of how many Bayes bits such a prediction is worth?
FWIW, GPT-5T estimates around 10 bits, double that if it’s de novo (absent in both parents).
well, what happens when you take oxytocin?
This might be a bad idea right now, if it makes John’s interests suddenly more normal in a mostly-unsteered way, eg because much of his motivation was coming from a feeling he didn’t know was oxytocin-deficiency-induced. I’d suggest only doing this if solving this problem is likely to increase productivity or networking success; else, I’d delay until he doesn’t seem like a critical bottleneck. That said, it might also be a very good idea, if depression or social interaction are a major bottleneck, which they are for many many people, so this is not resolved advice, just a warning that this may be a high variance intervention, and since John currently seems to be doing promising work, introducing high variance seems likely to have more downside.
I wouldn’t say this to most people; taking oxytocin isn’t known for being a hugely impactful intervention[citation needed], and on priors, someone who doesn’t have oxytocin signaling happening is missing a lot of normal emotion, and is likely much worse off. Obviously, John, it’s up to you whether this is a good tradeoff. I wouldn’t expect it to completely distort your values or delete your skills. Someone who knows you better, such as yourself, would be much better equipped to predict if there’s significant reason to believe downward variance isn’t present. If you have experience with reward-psychoactive chemicals and yet are currently productive, it’s more likely you already know whether it’s a bad idea.
Didn’t want to leave it unsaid, though.
if the problem is with the receptor, taking more won’t make a difference
Sounds like a great empirical test!
Seems like that depends on details of the problem. If the receptor has zero function, then yes. If functionality is significantly reduced but nonzero… maybe.
Perhaps Gurkenglas meant this is as a ~confirmatory test that John is actually oxytocin-insensitive because the test results (IIUC) are compatible with only one gene copy being screwed up.
I ordered this one off of amazon. AFAICT it does nothing for me. But that’s a pretty minor update, because even those who use it say the effects are “subtle”, and frankly I think snorting oxytocin is probably bullshit and does nothing beyond placebo even for normal people. I did have a couple other people try the one I bought, and their results indeed sounded like a nothingburder.
Your link is broken. The raw HTML is:
<a href="https://One other thing - labs typically filter reportable genome results by the phenotype you give them. I don’t know how this guy did the genome, but if he were to put something like “social deficits”, “emotional dysregulation” or something else about his lack of emotional range, the lab would definitely report the variant plus their research on it and recommendations.">this one</a>BTW, has anyone on LW tried oxytocin and is willing to report on the experience?
Fixed, thanks.
Not directly related to your query, but seems interesting:
Which, in turn, is pretty solid evidence for “oxytocin mediates the emotion of loving connection/aching affection” (unless there are some mechanisms you’ve missed). I wouldn’t have guessed it’s that simple.
Generalizing, this suggests we can study links between specific brain chemicals/structures and cognitive features by looking for people missing the same universal experience, checking if their genomes deviate from the baseline in the same way, then modeling the effects of that deviation on the brain. Alternatively, the opposite: search for people whose brain chemistry should be genetically near-equivalent except for one specific change, then exhaustively check if there’s some blatant or subtle way their cognition differs from the baseline.
Doing a brief literature review via GPT-5, apparently this sort of thing is mostly done with regards to very “loud” conditions, rather than in an attempt to map out the brain in general. I could imagine that it won’t turn out that simple in practice, but the actual bottleneck is probably researchers with a good enough theory-of-mind to correctly figure out the subtle ways the subjects’ cognition differs (easy for “severe autism”, much harder for “I feel empathy and responsibility, but not loving connection”).
… and so at long last John found the answer to alignment
The answer was Love
and it had always has been
(hopes this is a joke)
~Surely there’s a lot of other things involved in mediating this aspect of human cognition, at the very least (/speaking very coarse-grainedly), having the entire oxytocin system adequately hooked up to the rest of everything.
IE it is damn strong evidence that oxytocin signalinf is strictly necessary (and that there’s no fallback mechanisms wtc) but not that it’s simple.
Did your mother think you were unusual as a baby? Did you bond with your parents as a young child? I’d expect there to be some symptoms there if you truly have an oxytocin abnormality.
For my family this is much more of a “wow that makes so much sense” than a “wow what a surprise”. It tracks extremely well with how I acted growing up, in a bunch of different little ways. Indeed, once the hypothesis was on my radar at all, it quickly seemed pretty probable on that basis alone, even before sequencing came back.
A few details/examples:
As a child, I had a very noticeable lack of interest in other people (especially those my own age), to the point where a school psychologist thought it was notable.
I remember being unusually eager to go off to overnight summer camp (without my parents), at an age where nobody bothered to provide overnight summer camp because kids that young were almost all too anxious to be away from their parents that long.
When family members or pets died, I’ve generally been noticeably less emotionally impacted than the rest of the family.
When out and about with the family, I’ve always tended to wander around relatively independently of the rest of the group.
Those examples are relatively easy to explain, but most of my bits here come from less legible things. It’s been very clear for a long time that I relate to other people unusually, in a way that intuitively matches being at the far low end of the oxytocin signalling axis.
Interesting. That seems like reasonable evidence.
Though beyond a certain level of development we have numerous other drives beyond the oxytocin-related ones. Hence why you-as-a-baby might be particularly telling. From what I understand, oxytocin is heavily involved in infant-caregiver bonding and is what enables mothers to soothe their babies so effectively (very much on my mind right now as I am typing this comment while a baby naps on me haha).
Whereas once you’re above a certain age, the rational mind and other traits probably have an increasingly strong effect. For example, if you’re very interested in your own thoughts and ideas, this might overwhelm your desire to be close to family members.
Anyway, it seems likely that your oxytocin hypothesis is correct either way. Cool finding!
I have a similar intuition about how some other people are missing a disgust response that I have. Seems like a biological thing that some people have much less of than others and it has a significant effect on how we relate to others.
Is that frame-shift error or those ~6 (?) SNPs previously reported in the literature for anything, or do they seem to be de novos? Also, what WGS depth did your service use? (Depending on how widely you cast your net, some of those could be spurious sequencing errors.)
Depth is tagged on each individual variation; the frame shift has depth 41, the others have depth anywhere from 40 to 60.
I have not found the frameshift mutation in dbSNP, but I’m not confident that I’ve understood the UI or intended usage patterns, so I’m not confident it’s not in there. The SNPs I haven’t looked for in there yet.
Really interesting post—this actually connects to some research I’ve been looking into recently around oxytocin and attachment patterns.
There’s this psychologist Adam Lane Smith who’s built on neurobiological work by researchers like Carolyn Zahn-Waxler and Ruth Feldman—they’ve found that under high stress conditions when younger, or absence of secure attachment figures, cortisol-induced stress actually strengthens cortisol and dopamine pathways for reward while inhibiting the oxytocin and serotonin pathways. The end result (avoidant attachment) sounds remarkably similar to what you’re describing: people who clearly care about others and feel responsibility, but don’t experience that warm “loving connection” feeling that most people seem to get from relationships.
What struck me about your situation is that you’ve essentially got the genetic version of what this research suggests can happen environmentally. Both paths seem to lead to the same place—having to navigate social connection through pattern recognition and cognitive analysis rather than emotional intuition, because your brain is essentially running on dopamine-driven systems instead of oxytocin-based ones.
Makes me wonder if there’s a whole spectrum of people out there—some genetic, some developmental—who are all essentially operating with similar neurochemical profiles but don’t realize they’re part of the same phenomenon. Your case might be the key to understanding how this actually works at a biological level.
Do you find you’ve gotten really good at reading people through behavioral patterns rather than gut feelings?
Yep. AlphaMissense, also from DeepMind, is tailored to pathogenicity prediction. You can find its pathogenicity scores in the annotations tab for any (at least I think any) human protein on AFDB.
https://alphafold.ebi.ac.uk/entry/P30559?activeTab=annotations
(You may have to click on a different tab and return to the annotations tab for the heatmap and structure viewer to load).
As a non-subject matter expert in all of the above, I decided to consult my swear-word-adverse relative that recently graduated genetic counseling school. Here is her response:
The logic is sound (if a little colorful haha 😅). It sounds like this guy functionally only has 1 copy of the OXTR gene, and spot on in hypothesis of nonsense-mediated decay.
How the OXTR gene is regulated, I don’t know and haven’t looked into. It would be weird (but possible) for a decrease in OXTR expression to only affect emotions—oxytocin is also important for other brain functions/development, so a genetic change should also impact embryological development of the brain. So if I were to suggest next steps, it would be doing functional studies of the brain (like an MRI) to further evaluate.
One other thing—labs typically filter reportable genome results by the phenotype you give them. I don’t know how this guy did the genome, but if he were to put something like “social deficits”, “emotional dysregulation” or something else about his lack of emotional range, the lab would definitely report the variant plus their research on it and recommendations.
Amazing, is this the future of psychotherapy?
“Doctor, I have a problem...” “Stop talking, just give me a blood sample. Okay, your problem is X.”
Huh interesting. I might get myself full genome sequenced at some point. I already got myself 23andme sequenced, downloaded the raw data, and put it into promethease a while ago. I did find out I’m AG at rs53576 which is slightly linked to lower empathy, but is also extremely common. I don’t think this is enough to explain a large proportion of my personality, the way your OXTR deletion might be.
(There was something quite amusing to check my SNPs checking whether to start early anti-balding interventions, and have result number 1 be “Low Empathy”. As a further datapoint, I mentioned this to my mum and she basically said “Yeah but what did you expect with me and [dad] as parents?”)
Seeing this
Made me think I should take a deeper look. This all sounds pretty familiar, and I don’t think the AG in rs53576 is strong enough to shift me off-distribution to the degree that I am.
If the one clearly fucked up receptor copy is sufficient for your “symptoms”, it seems pretty likely that one of your parents should have them too. I think there is no reason to expect a denovo mutation to be particularly likely in your case (unlike in cases that lead to severe disfunction). And of course you can check for that by sequencing your parents.
So my money would be on the second copy also being sufficiently messed up that you have basically no fully functioning oxytocin receptors. If you have siblings and you are the only odd one in the family, you could make a pretty strong case for both copies being messed up, by showing that you are the only one with the combination of frameshift in one copy and particular SNPs in the other. (If you are not the only odd one you can make an even stronger case).
Even if the structure is correct and does look the same, the binding properties of the receptor could still be different if the histidine is in the part that’s relevant for the receptor binding.
The thing you want is a tool that tells you how the receptor binding properties change through the mutation not the AlphaFold that just gives you the 3D structure. A quick question at GPT-5, suggests that there are freely available tools that tell you how the receptor binding properties change via a single point mutation.
I have read that some sequencing methods (nanopore) have a high error rate (comparing multiple reads can help correct this). Did you also spot-check some other genes that you have no reason to believe contain mutations to see if they look ok? Seeing a mutation in exactly the gene you expect is only damn strong evidence if there isn’t a sequencing error in every third gene.
EDIT: Looks like this was checked, nice: https://www.lesswrong.com/posts/Hds7xkLgYtm6qDGPS/how-i-learned-that-i-don-t-feel-companionate-love
I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there.
Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core “advantages” of the smartphone were the “ability to present timely information” (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly.
… and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS.
People would sometimes say something like “John, you should really get a smartphone, you’ll fall behind without one” and my gut response was roughly “No, I’m staying in place, and the rest of you are moving backwards”.
And in hindsight, boy howdy do I endorse that attitude! Past John’s gut was right on the money with that one.
I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs… I do not feel like I’ll fall behind if I don’t leverage them more. I feel like the people using them a lot are mostly moving backwards, and I’m staying in place.
I found LLMs to be very useful for literature research. They can find relevant prior work that you can’t find with a search engine because you don’t know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
Taking a photo of a menu in French (or providing a link to it) and asking it which dishes are vegan.
Recommending movies (I am a little wary of some kind of meme poisoning, but I don’t watch movies very often, so seems ok).
That said, I do agree that early adopters seem like they’re overeager and maybe even harming themselves in some way.
I’ve been trying to use Deep Research tools as a way to find hyper-specific fiction recommendations as well. The results have been mixed. They don’t seem to be very good at grokking the hyper-specificness of what you’re looking for, usually they have a heavy bias towards the popular stuff that outweighs what you actually requested[1], and if you ask them to look for obscure works, they tend to output garbage instead of hidden gems (because no taste).
It did produce good results a few times, though, and is only slightly worse than asking for recommendations on r/rational. Possibly if I iterate on the prompt a few times (e. g., explicitly point out the above issues?), it’ll actually become good.
Like, suppose I’m looking for some narrative property X. I want to find fiction with a lot of X. But what the LLM does is multiplying the amount of X in a work by the work’s popularity, so that works that are low in X but very popular end up in its selection.
I tend to have some luck with concrete analogies sometimes. For example I asked for the equivalent of Tonedeff (His polymer album is my favorite album) in other genres and it recommended me Venetian Snares. I then listened to some of his songs and it seemed like the kind of experimental stuff where I might find something I find interesting. Venetian Snares has 80k monthly listeners while Tonedeff has 14K, so there might be some weighting towards popularity, but that seems mild.
I can think of reasons why some would be wary, and am waried of something which could be called “meme poisoning” myself when I watch moves, but am curious what kind of meme poisoning you have in mind here.
I’ve updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)
I do still think some-manner-of-AI-integration is going to be an important part of “moving forward” but probably not whatever capitalism serves up.
I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it’s like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I’m handing them off to LLMs much of the time to see if it can handle it.
During work hours, I try to actively override this, i.e. have the habit “send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy.” But, this becomes harder to do as it gets later in the day and I get more tired.
One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in a stupid way.
(I feel sort of confused about how people who don’t use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don’t know of it being more than a somewhat better google).
I currently mostly avoid interactions that treat the AI like a person-I’m-talking to. That way seems most madness inducing.
(Disclaimer: only partially relevant rant.)
I’ve recently tried heavily leveraging o3 as part of a math-research loop.
I have never been more bearish on LLMs automating any kind of research than I am now.
And I’ve tried lots of ways to make it work. I’ve tried telling it to solve the problem without any further directions, I’ve tried telling it to analyze the problem instead of attempting to solve it, I’ve tried dumping my own analysis of the problem into its context window, I’ve tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I’ve tried picking out a subproblem and telling it to focus on that, I’ve tried giving it directions/proof sketches, I’ve tried various power-user system prompts, I’ve tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it’s lying or confabulating to me in its arguments or proofs (which it ~always did).
It was kind of okay for tasks like “here’s a toy setup, use a well-known formula to compute the relationships between A and B”, or “try to rearrange this expression into a specific form using well-known identities”, which are relatively menial and freed up my working memory for more complicated tasks. But it’s pretty minor usefulness (and you have to re-check the outputs for errors anyway).
I assume there are math problems at which they do okay, but that capability sure is brittle. I don’t want to overupdate here, but geez, getting LLMs from here to the Singularity in 2-3 years just doesn’t feel plausible.
Nod.
[disclaimer, not a math guy, only barely knows what he’s talking about, if this next thought is stupid I’m interested to learn more]
I don’t expect this to fix it right now, but, one thing I don’t think you listed is doing the work in lean or some other proof assistant that lets you check results immediately? I expect LLMs to first be able to do math in that format because it’s the format you can actually do a lot of training in. And it’d mean you can verify results more quickly.
My current vague understanding is that lean is normally too cumbersome to be a reasonable to work in, but, that’s the sort of thing that could change with LLMs in the mix.
I agree that it’s a promising direction.
I did actually try a bit of that back in the o1 days. What I’ve found is that getting LLMs to output formal Lean proofs is pretty difficult: they really don’t want to do that. When they’re not making mistakes, they use informal language as connective tissue between Lean snippets, they put in “sorry”s (a placeholder that makes a lemma evaluate as proven), and otherwise try to weasel out of it.
This is something that should be solvable by fine-tuning, but at the time, there weren’t any publicly available decent models fine-tuned for that.
We do have DeepSeek-Prover-V2 now, though. I should look into it at some point. But I am not optimistic, sounds like it’s doing the same stuff, just more cleverly.
Relevant: Terence Tao does find them helpful for some Lean-related applications.
yeah, it’s less that I’d bet it works now, just, whenever it DOES start working, it seems likely it’d be through this mechanism.
⚖ If Thane Ruthenis thinks there are AI tools that can meaningfully help with Math by this point, did they first have a noticeable period (> 1 month) where it was easier to get work out of them via working in lean-or-similar? (Raymond Arnold: 25% & 60%)
(I had a bit of an epistemic rollercoaster making this prediction, I updated “by the time someone makes an actually worthwhile Math AI, even if lean was an important part of it’s training process, it’s probably not that hard to do additional fine tuning that gets it to output stuff in a more standard mathy format. But, then, it seemed like it was still going to be important to quickly check it wasn’t blatantly broken as part of the process)
There’s common ways I currently use (the free version of) ChatGPT that are partially categorizable as “somewhat better search engine”, but where I feel like that’s not representative of the real differences. A lot of this is coding-related, but not all, and the reasons I use it for coding-related and non-coding-related tasks feel similar. When it is coding-related, it’s generally not of the form of asking it to write code for me that I’ll then actually put into a project, though occasionally I will ask for example snippets which I can use to integrate the information better mentally before writing what I actually want.
The biggest difference in feel is that a chat-style interface is predictable and compact and avoids pushing a full-sized mental stack frame and having to spill all the context of whatever I was doing before. (The name of the website Stack Exchange is actually pretty on point here, insofar as they were trying to provide something similar from crowdsourcing!) This is something I can see being a source of creeping mental laziness—but it depends on the size and nature of the rest of the stack: if you were already under high context-retention load relative to your capabilities, and you’re already task-saturated enough, and you use a chatbot for leaf calls that would otherwise cause you to have to do a lot of inefficient working-memory juggling, then it seems like you’re already getting a lot of the actually-useful mental exercise at the other levels and you won’t be eliminating much of it, just getting some probabilistic task speedups.
In roughly descending order of “qualitatively different from a search engine” (which is not the same as “most impactful to me in practice”):
Some queries are reverse concept search, which to me is probably the biggest and hardest-to-replicate advantage over traditional search engine: I often have the shape of a concept that seems useful, but because I synthesized it myself rather than drawing from popular existing uses, I don’t know what it’s called. This can be checked for accuracy using a traditional search engine in the forward direction once I have the candidate term.
Some queries are for babble purposes: “list a bunch of X” and I’ll throw out 90% of them for actual use but use the distribution to help nudge my own imagination—generally I’ll do my own babble first and then augment it, to limit priming effects. There’s potential for memetic health issues here, but in my case most of these are isolated enough that I don’t expect them to combine to create larger problems. (In a qualitatively similar way but with a different impact, some of it is pure silliness. “Suppose the protagonists of Final Fantasy XIII had Geese powers. What kind of powers might they have?”)
Synthesis and shaping of information is way different from search engine capabilities. This includes asking for results tailored along specific axes I care about where it’s much less likely an existing webpage author has used that as a focus, small leaps of connective reasoning that would take processing and filtering through multiple large pages to do via search engine, and comparisons between popular instances of a class (in coding contexts, often software components) where sometimes someone’s actually written up the comparison and sometimes not. Being able to fluently ask followups that move from a topic to a subtopic or related topic without losing all the context is also very useful. “Tell me about the main differences between X1 and X2.” → “This new thing introduced in X2, is that because of Y?” (but beware of sycophancy biases if you use leading questions like that)
(Beyond this point we get closer to “basically a search engine”.)
Avoiding the rise in Web annoyances is a big one in practice—which ties into the weird tension of social contracts around Internet publishing being kind of broken right now, but from an information-consumer perspective, the reprocessed version is often superior. If a very common result is that a search engine will turn up six plausible results, and three of them are entirely blog slop (often of a pre-LLM type!) which is too vague to be useful for me, two of them ask me to sign up for a ‘free’ account to continue but only after I’ve started reading the useless intro text, and one of them contains the information I need in theory but I have to be prepared to click the “reject cookies” button, and click the close button on the “get these delivered to your inbox” on-scroll popup, and hope it doesn’t load another ten-megabyte hero image that I don’t care about and chew through my cellular quota in the process, and if I try to use browser extensions to combat this then the text doesn’t load, and so on and so on… then obviously I will switch to asking the chatbot first! “most of the content is buried in hour-long videos” is skew to this but results in the same for me.
In domains like “how would I get started learning skill X”, where there’s enough people who can get a commercial advantage through SEO’ing that into “well, take our course or buy our starter kit” (but usually subtler than that), those results seem (and I think for now probably are) less trustworthy than chatbot output that goes directly to concrete aspects that can be checked more cleanly, and tend to disguise themselves to be hard to filter out without reading a lot of the way through. Of course, there’s obvious ways for this not to last, either as SEO morphs into AIO or if the chatbot providers start selling the equivalent of product placement behind the scenes.
(fwiw, I never felt like phones offered any real “you need them to not fall behind”. They are kinda a nice-to-have in some situations. I do need them for uber/lyfy and maps, I use them for other things which have some benefits and costs, this post is upweighting “completely block the internet on my phone.” I don’t have any social media apps on my phone but it doesn’t matter much, I just use the web browser)
I imagine this differs a lot based on what social position you’re already in and where you’re likely to get your needs met. When assumptions like “everyone has a smartphone” become sufficiently widespread, you can be blocked off from things unpredictably when you don’t meet them. You often can’t tell which things these are in advance: simplification pressure causes a phase transition from “communicated request” to “implicit assumption”, and there’s too many widely-distributed ways for the assumption to become relevant, so doing your own modeling will produce a “reliably don’t need” result so infrequently as to be effectively useless. Then, if making the transition to conformity when you notice a potential opportunity is too slow or is blocked by e.g. resource constraints or value differences, a lot of instant-lose faces get added to the social dice you roll. If your anticipated social set is already stable and well-adapted to you, you may not be rolling many dice, but if you’re precarious, or searching for breakthrough opportunities, or just have a role with more wide-ranging and unpredictable requirements on which interactions you need to succeed at, it’s a huge penalty. Other technologies this often happens with in the USA, again depending on your social class and milieu, include cars, credit cards, and Facebook accounts.
(It feels like there has to already be an explainer for this somewhere in the LW-sphere, right? I didn’t see an obvious one, though…)
yeah a friend of mine gave in because she was getting so much attitude about needing people to give her directions.
You’ve reminded me of a perspective I was meaning to include but then forgot to, actually. From the perspective of an equilibrium in which everyone’s implicitly expected to bring certain resources/capabilities as table stakes, making a personal decision that makes your life better but reduces your contribution to the pool can be seen as defection—and on a short time horizon or where you’re otherwise forced to take the equilibrium for granted, it seems hard to refute! (ObXkcd: “valuing unit standardization over being helpful possibly makes me a bad friend” if we take the protagonist as seeing “US customary units” as an awkward equilibrium.) Some offshoots of this which I’m not sure what to make of:
If the decision would lead to a better society if everyone did it, and leads to an improvement for you if only you do it, but requires the rest of a more localized group to spend more energy to compensate for you if you do it and they don’t, we have a sort of “incentive misalignment sandwich” going on. In practice I think there’s usually enough disagreement about the first point that this isn’t clear-cut, but it’s interesting to notice.
In the face of technological advances, what continues to count as table stakes tends to get set by Moloch and mimetic feedback loops rather than intentionally. In a way, people complaining vociferously about having to adopt new things are arguably acting in a counter-Moloch role here, but in the places I’ve seen that happen, it’s either been ineffective or led to a stressful and oppressive atmosphere of its own (or, most commonly and unfortunately, both).
I think intuitive recognition of (2) is a big motivator behind attacking adopters of new technology that might fall into this pattern, in a way that often gets poorly expressed in a “tech companies ruin everything” type of way. Personally taking up smartphones, or cars, or—nowadays the big one that I see in my other circles—generative AI, even if you don’t yourself look down on or otherwise directly negatively impact non-users, can be seen as playing into a new potential equilibrium where if you can, you ‘must’, or else you’re not putting in as much as everyone else, and so everyone else will gradually find that they get boxed in and any negative secondary effects on them are irrelevant compared to the phase transition energy. A comparison that comes to mind is actually labor unions; that’s another case where restraining individually expressed capabilities in order to retain a better collective bargaining position for others comes into play, isn’t it?
(Now much more tangentially:)
… hmm, come to think of it, maybe part of conformity-pressure in general can be seen as a special case of this where the pool resource is more purely “cognition and attention spent dealing with non-default things” and the nonconformity by default has more of a purely negative impact on that axis, whereas conformity-pressure over technology with specific capabilities causes the nature of the pool resource to be pulled in the direction of what the technology is providing and there’s an active positive thing going on that becomes the baseline… I wonder if anything useful can be derived from thinking about those two cases as denoting an axis of variation.
And when the conformity is to a new norm that may be more difficult to understand but produces relative positive externalities in some way, is that similar to treating the new norm as a required table stakes cognitive technology?
I mostly use it for syntax, and formatting/modifying docs, giving me quick visual designs...
I am perhaps an interesting corner case. I make extrenely heavy use of LLMs, largely via APIs for repetitive tasks. I sometimes run a quarter million queries in a day, all of which produce structured output. Incorrect output happens, but I design the surrounding systems to handle that.
A few times a week, I might ask a concrete question and get a response, which I treat with extreme skepticism.
But I don’t talk to the damn things. That feels increasingly weird and unwise.
Agree about phones (in fact I am seriously considering switching to a flip phone and using my iphone only for things like navigation).
Not so sure about LLMs. I had your attitude initially, and I still consider them an incredibly dangerous mental augmentation. But I do think that conservatively throwing a question at them to find searchable keywords is helpful, if you maintain the attitude that they are actively trying to take over your brain and therefore remain vigilant.
Why do you think LLMs are moving people backwards? With phones, it was their attention-sucking nature. What is it with LLMs?
Not speaking for john but, I think LLMs can cause a lack of gears lvl understanding, more vibe coding, less mental flexibility due to lack of deliberate thought and more dependency on it for thinking in general. A lot of my friends will most likely never learn coding properly and rely solely on chatgpt, it would be similar to calculators—which reduced people’s ability to do mental maths— but for thinking.
LLM’s danger lies in its ability to solve the majority of simple problems. This reduces opportunities to learn skills or benefit from the training these tasks provide. This allows for a level of mental stagnation, or even degradation, depending on how frequently you use LLMs to handle problems. In other words, it induces mental laziness. This is one way it’s not moving people forward and in more severe cases backward.
As a side note, it is also harmful to the majority of current education institutions, as it can solve most academic problems. I have personally seen people use it to do homework, write essays, or even write term papers. Some of the more crafty students manage to cheat with it on exams. This creates a very shallow education, which is bad for many reasons.
Setting aside cheating, do you think LLMs are diminishing opportunities for thought, or redistributing them to other topics? And why?
Yes, I do think that. They don’t actively diminish thought, after all, it’s a tool you decide to use. But when you use it to handle a problem, you lose the thoughts, and the growth you could’ve had solving it yourself. It could be argued, however, that if you are experienced enough in solving such problems, there isn’t much to lose, and you gain time to pursue other issues.
But as to why I think this way: people already don’t learn skills because chatGPT can do it for them, as lesswronguser123 said “A lot of my friends will most likely never learn coding properly and rely solely on ChatGPT”, and not just his friends use it this way. Such people, at the very least, lose the opportunity to adopt a programming mindset, which is useful beyond programming.
Outside of people not learning skills, I also believe there is a lot of potential to delegate almost all of your thinking to chatGPT. For example: I could have used it to write this response, decide what to eat for breakfast, tell me what I should do in the future, etc. It can tell you what to do on almost every day-to-day decision. Some use it to a lesser extent, some to a greater, but you do think less if you use it this way.
Does it redistrubute thinking to another topic? I believe it depends on the person in question, some use it to have more time to solve a more complex problem, others to have more time for entertainment.
I think that these are genuinely hard questions to answer in a scientific way. My own speculation is that using AI to solve problems is a skill of its own, along with recognizing which problems they are currently not good for. Some use of LLMs teaches these skills, which is useful.
I think a potential failure mode for AI might be when people systematically choose to work on lower-impact problems that AI can be used to solve, rather than higher-impact problems that AI is less useful for but that can be solved in other ways. Of course, AI can also increase people’s ambitions by unlocking the ability to pursue higher-impact goals they would not have been able to otherwise achieve. Whether or not AI increases or decreases human ambition on net seems like a key question.
In my world, I see limited use of AI except as a complement to traditional internet search, a coding assistant by competent programmers, a sort of Grammarly on steroids, an OK-at-best tutor that’s cheap and always available on any topic, and a way to get meaningless paperwork done faster. These use cases all seem basically ambition-enhancing to me. That’s the reason I asked John why he’s worried about this version of AI. My experience is that once I gained some familiarity with the limitations of AI, it’s been a straightforwaredly useful tool, with none of the serious downsides I have experienced from social media and smartphones.
The issues I’ve seen seem to have to do with using AI to deepfake political policy proposals, homework, blog posts, and job applications. These are genuine and serious problems, but mainly have to do with adding a tremendous amount of noise to collective discourse rather than the self-sabotage enabled by smartphones and social media. So I’m wondering if John’s more concerned about those social issues or by some sort of self-sabotage capacity from AI that I’m not seeing. Using AI to do your homework is obviously self-sabotage, but given the context I’m assuming that’s not what John’s talking about.
I mean, they’re great as search engines or code-snippet writers (basically, search engine for standard functions). If someone thinks that gippities know stuff or can think or write well, that could be brainrotting.
Agreed, that’s basically how I use them.
...but you are using a phone now. Are you using LLMs? Maybe in both cases it is about using the tool in the way that benefits most?
From my perspective, good things about smartphones:
phone and camera and navigation is the same device
very rarely, check something online
buy tickets for mass transit
my contacts are backed up in the cloud
Bad things:
notifications
The advantages outweigh the disadvantages, but it requires discipline about what you install.
(Food for though: If only I had the same discipline about which web services I create an account for and put them into bookmarks on my PC.)
Similar here, but that’s because no one could give me a good use case. (I don’t consider social networks on smartphone to be good.)
And it’s probably similar with LLMs, depends on how specifically you use them. I use them to ask questions (like a smarter version of Google) that I try to verify e.g. on Wikipedia afterwards, and sometimes to write code. Those seem like good things to me. There are probably bad ways to use them, but that is not what I would typically do.
My main concern with heavy LLM usage is what Paul Graham discusses in Writes and Write-Nots. His argument is basically that writing is thinking and that if you use LLM’s to do your writing for you, well, your ability to think will erode.
I’m similar, for both smart phones and LLM usage.
For smart phones there was one argument that moved me a moderate amount. I’m a web developer and startup founder. I was talking to my cousin’s boyfriend who is also in tech. He made the argument to me that if I don’t actively use smart phones I won’t be able to empathize as much with smart phone users, which is important because to a meaningful extent, that’s who I’m building for.
I didn’t think the empathy point was as strong as my cousin’s boyfriend thought it was. Like, he seemed to think it was pretty essential and that if I don’t use smart phones I just wouldn’t be able to develop enough empathy to build a good product. I, on the other hand, saw it as something “useful” but not “essential”. Looking back, I think I’d downgrade it to something like “a little useful” instead of “useful”.
I’m not sure where I’m going with this, exactly. Just kinda reflecting and thinking out loud.
Conditional on LLMs scaling to AGI, I feel like it’s a contradiction to say that “LLMs offer little or negative utility AND it’s going to stay this way”. My model is that we are either dying in a couple years to LLMs getting us to AGI, and we are going to have a year or two or of AIs that can provide incredible utility, or we are not dying to LLMs and the timelines are longer.
I think I read somewhere that you don’t believe LLMs will get us to AGI, so this might already be implicit in your model? I personally am putting at least some credence on the ai-2027 model, which predicts superhuman coders in the near future. (Not saying that I believe this is the most probable future, just that I find it convincing enough that I want to be prepared for it.)
Up until recently I was in the “LLMs offer zero utility” camp (for coding), but now at work we have a Cursor plan (still would not pay for it for personal use probably), and with a lot of trial and error I feel like I am finding the kinds of tasks where AIs can offer a bit of utility, and I am slowly moving towards the “marginal utility” camp.
One kind of thing I like using it for is small scripts to automate bits of my workflow. E.g. I have an idea for a script, I know it would take me 30m-1h to implement it, but it’s not worth it because e.g. it would only save me a few seconds each time. But if I can reduce the time investment to only a few minutes by giving the task to the LLM, it can suddenly be worth it.
I would be interested in other people’s experiences with the negative side effects of LLM use. What are the symptoms/warning signs of “LLM brain rot”? I feel like with my current use I am relatively well equipped to avoid that:
I only ask things from LLMs that I know I could solve in a few hours tops.
I code review the result, tell it if it did something stupid.
90% of my job is stuff that is currently not close to being LLM automatable anyway.
Hypothesis: for smart people with a strong technical background, the main cognitive barrier to doing highly counterfactual technical work is that our brains’ attention is mostly steered by our social circle. Our thoughts are constantly drawn to think about whatever the people around us talk about. And the things which are memetically fit are (almost by definition) rarely very counterfactual to pay attention to, precisely because lots of other people are also paying attention to them.
Two natural solutions to this problem:
build a social circle which can maintain its own attention, as a group, without just reflecting the memetic currents of the world around it.
“go off into the woods”, i.e. socially isolate oneself almost entirely for an extended period of time, so that there just isn’t any social signal to be distracted by.
These are both standard things which people point to as things-historically-correlated-with-highly-counterfactual-work. They’re not mutually exclusive, but this model does suggest that they can substitute for each other—i.e. “going off into the woods” can substitute for a social circle with its own useful memetic environment, and vice versa.
One thing that I do after social interactions, especially those which pertain to my work, is to go over all the updates my background processing is likely to make and to question them more explicitly.
This is helpful because I often notice that the updates I’m making aren’t related to reasons much at all. It’s more like “ah they kind of grimaced when I said that, so maybe I’m bad?” or like “they seemed just generally down on this approach, but wait are any of those reasons even new to me? Haven’t I already considered those and decided to do it anyway?” or “they seemed so aggressively pessimistic about my work, but did they even understand what I was saying?” or “they certainly spoke with a lot of authority, but why should I trust them on this, and do I even care about their opinion here?” Etc. A bunch of stuff which at first blush my social center is like “ah god, it’s all over, I’ve been an idiot this whole time” but with some second glancing it’s like “ah wait no, probably I had reasons for doing this work that withstand surface level pushback, let’s remember those again and see if they hold up” And often (always?) they do.
This did not come naturally to me; I’ve had to train myself into doing it. But it has helped a lot with this sort of problem, alongside the solutions you mention i.e. becoming more of a hermit and trying to surround myself by people engaged in more timeless thought.
solution 2 implies that a smart person with a strong technical background would go on to work on important problems (by default) which is not necessarily universally true and it’s IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on
The claim is not that either “solution” is sufficient for counterfactuality, it’s that either solution can overcome the main bottleneck to counterfactuality. After that, per Amdahl’s Law, there will still be other (weaker) bottlenecks to overcome, including e.g. keeping oneself focused on something important.
I don’t think the social thing ranks above “be able to think useful important thoughts at all”. (But maybe otherwise agree with the rest of your model as an important thing to think about)
[edit: hrm, “for smart people with a strong technical background” might be doing most of the work here”]
Plausibly going off into the woods decreases the median output while increasing the variance.
Why do you think this? When I try to think of concrete examples here, its all confounded by the relevant smart people having social circles not working on useful problems.
I also think that 2 becomes more true once the relevant smart person already wants to solve alignment, or otherwise is already barking up the right tree.
One need not go off into the woods indefinitely, though.
I don’t think I implied that John’s post implied that and I don’t think going into the woods non-indefinitely mitigates the thing I pointed out.
As a counterpoint to the “go off into the woods” strategy, Richard Hamming said the following in “You and Your Research”, describing his experience at Bell Labs:
Bell Labs certainly produced a lot of counterfactual research, Shannon’s information theory being the prime example. I suppose Bell Labs might have been well-described as a group that could maintain its own attention, though.
Bell Labs is actually my go-to example of a much-hyped research institution whose work was mostly not counterfactual; see e.g. here. Shannon’s information theory is the only major example I know of highly counterfactual research at Bell Labs. Most of the other commonly-cited advances, like e.g. transistors or communication satellites or cell phones, were clearly not highly counterfactual when we look at the relevant history: there were other groups racing to make the transistor, and the communication satellite and cell phones were both old ideas waiting on the underlying technology to make them practical.
That said, Hamming did sit right next to Shannon during the information theory days IIRC, so his words do carry substantial weight here.
solution 3 is to be an iconoclast and to feel comfortable pushing against the flow and to try to prove everyone else wrong.
Good idea, but… I would guess that basically everyone who knew me growing up would say that I’m exactly the right sort of person for that strategy. And yet, in practice, I still find it has not worked very well. My attention has in fact been unhelpfully steered by local memetic currents to a very large degree.
For instance, I do love proving everyone else wrong, but alas reversed stupidity is not intelligence. People mostly don’t argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one’s attention steered by the prevailing memetic currents.
This is true, but I still can’t let go of the fact that this fact itself ought to be a blindingly obvious first-order bit that anyone who calls zerself anything like “aspiring rationalist” would be paying a good chunk of attention to, and yet this does not seem to be the case. Like, motions in the genre of
where XYZ could centrally be things like e.g. copium or subtly contemptuous indifference, do not seem to be at all common motions.
Accusing people in my head of not being numerate enough when this happens has helped, because then I don’t want to be a hypocrite. GPT4o or o1 are good at fermi estimates, making this even easier.
Note that it is not necessary for the social circle to share your beliefs, only to have a social norm that people express interest in each other’s work. Could be something like: once or twice in a week the people will come to a room and everyone will give a presentation about what they have achieved recently, and maybe the other people will provide some feedback (not in the form of “why don’t you do Y instead”, but with the assumption that X is a thing worth doing).
How would this model treat mathematicians working on hard open problems? P vs NP might be counter factual just because no one else is smart enough or has the right advantage to solve it. Insofar as central problems of a field have been identified but not solved, I’m not sure your model gives good advice.
I visited Mikhail Khovanov once in New York to give a seminar talk, and after it was all over and I was wandering around seeing the sights, he gave me a call and offered a long string of general advice on how to be the kind of person who does truly novel things (he’s famous for this, you can read about Khovanov homology). One thing he said was “look for things that aren’t there” haha. It’s actually very practical advice, which I think about often and attempt to live up to!
What else did he say? (I’d love to hear even the “obvious” things he said.)
I’m ashamed to say I don’t remember. That was the highlight. I think I have some notes on the conversation somewhere and I’ll try to remember to post here if I ever find it.
I can spell out the content of his Koan a little, if it wasn’t clear. It’s probably more like: look for things that are (not there). If you spend enough time in a particular landscape of ideas, you can (if you’re quiet and pay attention and aren’t busy jumping on bandwagons) get an idea of a hole, which you’re able to walk around but can’t directly see. In this way new ideas appear as something like residues from circumnavigating these holes. It’s my understanding that Khovanov homology was discovered like that, and this is not unusual in mathematics.
By the way, that’s partly why I think the prospect of AIs being creative mathematicians in the short term should not be discounted; if you see all the things you see all the holes.
For those who might not have noticed Dan’s clever double entendre: (Khovanov) homology is literally about counting/measuring holes in weird high-dimensional spaces—designing a new homology theory is in a very real sense about looking for holes that are not (yet) there.
Are there any examples yet, of homology or cohomology being applied to cognition, whether human or AI?
There’s plenty, including a line of work by Carina Curto, Katrin Hess and others that is taken seriously by a number of mathematically inclined neuroscience people (Tom Burns if he’s reading can comment further). As far as I know this kind of work is the closest to breaking through into the mainstream. At some level you can think of homology as a natural way of preserving information in noisy systems, for reasons similar to why (co)homology of tori was a useful way for Kitaev to formulate his surface code. Whether or not real brains/NNs have some emergent computation that makes use of this is a separate question, I’m not aware of really compelling evidence.
There is more speculative but definitely interesting work by Matilde Marcolli. I believe Manin has thought about this (because he’s thought about everything) and if you have twenty years to acquire the prerequisites (gamma spaces!) you can gaze into deep pools by reading that too.
No.
Topological data analysis comes closest, and there are some people who try to use it for ML, eg.
Though my understanding is this is used in interp, not so much because people necessarily expect deep connections to homology, but because its just another way to look for structure in your data.
TDA itself is also a relatively shallow tool too.
As someone who does both data analysis and algebraic topology, my take is that TDA showed promise but ultimately there’s something missing such that it’s not at full capacity. Either the formalism isn’t developed enough or it’s being consistently used on the wrong kinds of datasets. Which is kind of a shame, because it’s the kind of thing that should work beautifully and in some cases even does!
I thought it might be “look for things that might not even be there as hard as you would if they are there.” Then the koan form takes it closer to “the thereness of something just has little relevance on how hard you look for it.” But it needs to get closer to the “biological” part of your brain, where you’re not faking it with all your mental and bodily systems, like when your blood pressure rises from “truly believing” a lion is around the corner but wouldn’t if you “fake believe” it.
I imagine it’s something like “look for things that are notably absent, when you would expect them to have been found if there”?
Some things even withdraw. https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html#aside-on-withdrawal-and-the-leap https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#withdrawal
Obvious point—I think a lot of this comes from the financial incentives. The more “out of the box” you go, the less sure you can be that there will be funding for your work.
Some of those that do this will be rewarded, but I suspect many won’t be.
As such, I think that funders can help more to encourage this sort of thing, if they want to.
Conjecture’s Compendium is now up. It’s intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it’s probably the best first source to link e.g. policymakers to right now.
I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.
I think there’s something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It’s higher status and sexier and there’s a default vibe that the best way to understand/improve the world is through rigorous empirical research.
I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy.
I also think there are memes spreading around that you need to be some savant political mastermind genius to do comms/policy, otherwise you will be net negative. The more I meet policy people (including successful policy people from outside the AIS bubble), the more I think this narrative was, at best, an incorrect model of the world. At worst, a take that got amplified in order to prevent people from interfering with the AGI race (e.g., by granting excess status+validity to people/ideas/frames that made it seem crazy/unilateralist/low-status to engage in public outreach, civic discourse, and policymaker engagement.)
(Caveat: I don’t think the adversarial frame explains everything, and I do think there are lots of people who were genuinely trying to reason about a complex world and just ended up underestimating how much policy interest there would be and/or overestimating the extent to which labs would be able to take useful actions despite the pressures of race dynamics.)
I think I probably agree, although I feel somewhat wary about it. My main hesitations are:
The lack of epistemic modifiers seems off to me, relative to the strength of the arguments they’re making. Such that while I agree with many claims, my imagined reader who is coming into this with zero context is like “why should I believe this?” E.g., “Without intervention, humanity will be summarily outcompeted and relegated to irrelevancy,” which like, yes, but also—on what grounds should I necessarily conclude this? They gave some argument along the lines of “intelligence is powerful,” and that seems probably true, but imo not enough to justify the claim that it will certainly lead to our irrelevancy. All of this would be fixed (according to me) if it were framed more as like “here are some reasons you might be pretty worried,” of which there are plenty, or “here’s what I think,” rather than “here is what will definitely happen if we continue on this path,” which feels less certain/obvious to me.
Along the same lines, I think it’s pretty hard to tell whether this piece is in good faith or not. E.g., in the intro Connor writes “The default path we are on now is one of ruthless, sociopathic corporations racing toward building the most intelligent, powerful AIs as fast as possible to compete with one another and vie for monopolization and control of both the market and geopolitics.” Which, again, I don’t necessarily disagree with, but my imagined reader with zero context is like “what, really? sociopaths? control over geopolitics?” I.e., I’m expecting readers to question the integrity of the piece, and to be more unsure of how to update on it (e.g. “how do I know this whole thing isn’t just a strawman?” etc.).
There are many places where they kind of just state things without justifying them much. I think in the best case this might cause readers to think through whether such claims make sense (either on their own, or by reading the hyperlinked stuff—both of which put quite a lot of cognitive load on them), and in the worst case just causes readers to either bounce or kind of blindly swallow what they’re saying. E.g., “Black-Box Evaluations can only catch all relevant safety issues insofar as we have either an exhaustive list of all possible failure modes, or a mechanistic model of how concrete capabilities lead to safety risks.” They say this without argument and then move on. And although I agree with them (having spent a lot of time thinking this through myself), it’s really not obvious at first blush. Why do you need an exhaustive list? One might imagine, for instance, that a small number of tests would generalize well. And do you need mechanistic models? Sometimes medicines work safely without that, etc., etc. I haven’t read the entire Compendium closely, but my sense is that this is not an isolated incident. And I don’t think this is a fatal flaw or anything—they’re moving through a ton of material really fast and it’s hard to give a thorough account of all claims—but it does make me more hesitant to use it as the default “here’s what’s happening” document.
All of that said, I do broadly agree with the set of arguments, and I think it’s a really cool activity for people to write up what they believe. I’m glad they did it. But I’m not sure how comfortable I feel about sending it to people who haven’t thought much about AI.
One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there’s nothing else we can do. There’s not a better alternative– these are the only things that labs and governments are currently willing to support.
The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:
One possible critique is that their suggestions are not particularly ambitious. This is likely because they’re writing for a broader audience (people who haven’t been deeply engaged in AI safety).
For people who have been deeply engaged in AI safety, I think the natural steelman here is “focus on helping the public/government better understand the AI risk situation.”
There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.
And it’s not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I’m even excluding people who are working on evals/if-then plans: like, I’m focusing on people who see their primary purpose as helping the public or policymakers develop “situational awareness”, develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)
I appreciated their section on AI governance. The “if-then”/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I’m a fan of preparedness efforts– especially on the government level– but I think it’s worth engaging with the counterarguments.)
Pasting some content from their piece below.
High-level thesis against current AI governance efforts:
Critique of reactive frameworks:
Critique of waiting for warning shots:
This seems to be confusing a dangerous capability eval (of being able to ‘deceive’ in a visible scratchpad) with an assessment of alignment, which seems like exactly what the ‘questioning’ was about.
I like it. I do worry that it, and The Narrow Path, are both missing how hard it will be to govern and restrict AI.
My own attempt is much less well written and comprehensive, but I think I hit on some points that theirs misses: https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy
(There was already a linkpost.)
NVIDIA Is A Terrible AI Bet
Short version: Nvidia’s only moat is in software; AMD already makes flatly superior hardware priced far lower, and Google probably does too but doesn’t publicly sell it. And if AI undergoes smooth takeoff on current trajectory, then ~all software moats will evaporate early.
Long version: Nvidia is pretty obviously in a hype-driven bubble right now. However, it is sometimes the case that (a) an asset is in a hype-driven bubble, and (b) it’s still a good long-run bet at the current price, because the company will in fact be worth that much. Think Amazon during the dot-com bubble. I’ve heard people make that argument about Nvidia lately, on the basis that it will be ridiculously valuable if AI undergoes smooth takeoff on the current apparent trajectory.
My core claim here is that Nvidia will not actually be worth much, compared to other companies, if AI undergoes smooth takeoff on the current apparent trajectory.
Other companies already make ML hardware flatly superior to Nvidia’s (in flops, memory, whatever), and priced much lower. AMD’s MI300x is the most obvious direct comparison. Google’s TPUs are probably another example, though they’re not sold publicly so harder to know for sure.
So why is Nvidia still the market leader? No secret there: it’s the CUDA libraries. Lots of (third-party) software is built on top of CUDA, and if you use non-Nvidia hardware then you can’t use any of that software.
That’s exactly the sort of moat which will disappear rapidly if AI automates most-or-all software engineering, and on current trajectory software engineering would be one of the earlier areas to see massive AI acceleration. In that world, it will be easy to move any application-level program to run on any lower-level stack, just by asking an LLM to port it over.
So in worlds where AI automates software engineering to a very large extent, Nvidia’s moat is gone, and their competition has an already-better product at already-lower price.
Why do you believe AMD and Google make better hardware than Nvidia?
The easiest answer is to look at the specs. Of course specs are not super reliable, so take it all with many grains of salt. I’ll go through the AMD/Nvidia comparison here, because it’s a comparison I looked into a few months back.
MI300x vs H100
Techpowerup is a third-party site with specs for the MI300x and the H100, so we can do a pretty direct comparison between those two pages. (I don’t know if the site independently tested the two chips, but they’re at least trying to report comparable numbers.) The H200 would arguably be more of a “fair comparison” since the MI300x came out much later than the H100; we’ll get to that comparison next. I’m starting with MI300x vs H100 comparison because techpowerup has specs for both of them, so we don’t have to rely on either company’s bullshit-heavy marketing materials as a source of information. Also, even the H100 is priced 2-4x more expensive than the MI300x (~$30-45k vs ~$10-15k), so it’s not unfair to compare the two.
Key numbers (MI300x vs H100):
float32 TFLOPs: ~80 vs ~50
float16 TFLOPs: ~650 vs ~200
memory: 192 GB vs 80 GB (note that this is the main place where the H200 improves on the H100)
bandwidth: ~10 TB/s vs ~2 TB/s
… so the comparison isn’t even remotely close. The H100 is priced 2-4x higher but is utterly inferior in terms of hardware.
MI300x vs H200
I don’t know of a good third-party spec sheet for the H200, so we’ll rely on Nvidia’s page. Note that they report some numbers “with sparsity” which, to make a long story short, means those numbers are blatant marketing bullshit. Other than those numbers, I’ll take their claimed specs at face value.
Key numbers (MI300x vs H200):
float32 TFLOPs: ~80 vs ~70
float16 TFLOPs: don’t know, Nvidia conspicuously avoided reporting that number
memory: 192 GB vs 141 GB
bandwidth: ~10 TB/s vs ~5 TB/s
So they’re closer than the MI300x vs H100, but the MI300x still wins across the board. And pricewise, the H200 is probably around $40k, so 3-4x more expensive than the MI300x.
Its worth noting that even if nvidia is charging 2-4x more now, the ultimate question for competitiveness will be manufactoring cost for nvidia vs amd. If nvidia has much lower manufactoring costs than amd per unit performance (but presumably higher markup), then nvidia might win out even if their product is currently worse per dollar.
Note also that price discrimination might be a big part of nvidia’s approach. Scaling labs which are willing to go to great effort to drop compute cost by a factor of two are a subset of nvidia’s customers where nvidia would ideally prefer to offer lower prices. I expect that nvidia will find a way to make this happen.
I’m holding a modest long position in NVIDIA (smaller than my position in Google), and expect to keep it for at least a few more months. I expect I only need NVIDIA margins to hold up for another 3 or 4 years for it to be a good investment now.
It will likely become a bubble before too long, but it doesn’t feel like one yet.
While the first-order analysis seems true to me, there are mitigating factors:
AMD appears to be bungling on their GPUs being reliable and fast, and probably will for another few years. (At least, this is my takeaway from following the TinyGrad saga on Twitter...) Their stock is not valued as it should be for a serious contender with good fundamentals, and I think this may stay the case for a while, if not forever if things are worse than I realize.
NVIDIA will probably have very-in-demand chips for at least another chip generation due to various inertias.
There aren’t many good-looking places for the large amount of money that wants to be long AI to go right now, and this will probably inflate prices for still a while across the board, in proportion to how relevant-seeming the stock is. NVDA rates very highly on this one.
So from my viewpoint I would caution against being short NVIDIA, at least in the short term.
No, the mi300x is not superior to nvidias chips, largely because It costs >2x to manufacture as nvidias chips
Potential counterpoints:
If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
The disadvantages of AMD software development potentially need to be addressed at levels not accessible to an arbitrary feral automated software engineer in the wild, to make the stack sufficiently usable. (A lot of actual human software engineers would like the chance.)
NVIDIA is training their own AIs, who are pretty capable.
NVIDIA can invest their current profits. (Revenues, not stock valuations.)
I don’t think the advantages would necessarily compound—quite the opposite, there are diminishing returns and I expect ‘catchup’. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context—such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more ‘data’ it can ‘transfer’ to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target language. You already, by stipulation, have an ‘oracle’ to check outputs of the target against, which can produce counterexamples.) To a sad degree, pretty much all programming languages are the same these days: ALGOL with C sugaring to various degrees and random ad hoc addons; a LLM which can master Python can master Javascript can master Typescript… The hard part is the non-programming-language parts, the algorithms and reasoning and being able to understand & model the implicit state updates—not memorizing the standard library of some obscure language.
So at some point, even if you have a model which is god-like at Python (at which point each additional Python datapoint adds basic next to nothing), you will find it is completely acceptable at JavaScript, say, or even your brand-new language with 5 examples which you already have on hand in the documentation. You don’t need ‘the best possible performance’, you just need some level of performance adequate to achieve your goal. If the Python is 99.99% on some benchmark, you are probably fine with 99.90% performance in your favorite language. (Presumably there is some absolute level like 99% at which point automated CUDA → ROCm becomes possible, and it is independent of whether some other language has even higher accuracy.) All you need is some minor reason to pay that slight non-Python tax. And that’s not hard to find.
Also, I suspect that the task of converting CUDA code to ROCm code might well fall into the ‘most’ category rather than being the holdout programming tasks. This is a category of code ripe for automation: you have, again by stipulation, correct working code which can be imitated and used as an oracle autonomously to brute force translation, which usually has very narrow specific algorithmic tasks (‘multiply this matrix by that matrix to get this third matrix; every number should be identical’), random test-cases are easy to generate (just big grids of numbers), and where the non-algorithmic number also has simple end-to-end metrics (‘loss go down per wallclock second’) to optimize. Compared to a lot of areas, like business logic or GUIs, this seems much more amenable to tasking LLMs with. geohot may lack the followthrough to make AMD GPUs work, and plow through papercut after papercut, but there would be no such problem for a LLM.
So I agree with Wentsworth that there seems to be a bit of a tricky transition here for Nvidia: it’s always not been worth the time & hassle to try to use an AMD GPU (although a few claim to have made it work out financially for them), because of the skilled labor and wallclock and residual technical risk and loss of flexibility ecosystem; but if LLM coding works out well enough and intelligence becomes ‘too cheap to meter’, almost all of that goes away. Even ordinary unsophisticated GPU buyers will be able to tell their LLM to ‘just make it work on my new GPU, OK? I don’t care about the details, just let me know when you’re done’. At this point, what is the value-add for Nvidia? If they cut down their fat margins and race to the bottom for the hardware, where do they go for the profits? The money all seems to be in the integration and services—none of which Nvidia is particularly good at. (They aren’t even all that good at training LLMs! The Megatron series was a disappointment, like Megatron-NLG-530b is barely a footnote, and even the latest Nemo seems to barely match Llama-3-70b which being like 4x larger and thus more expensive to run.)
And this will be true of anyone who is relying on software lockin: if the lockin is because it would take a lot of software engineer time to do a reverse-engineering rewrite and replacement, then it’s in serious danger in a LLM human coding level world. In a world where you can hypothetically spin up a thousand SWEs on a cloud service, tell them, ‘write me an operating system like XYZ’, and they do so overnight while you sleep, durable software moats are going to require some sort of mysterious blackbox like a magic API; anything which is so modularized as to fit on your own computer is also sufficiently modularized as to easily clone & replace...
It’s probably worth mentioning that there’s now a licensing barrier to running CUDA specifically through translation layers: https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers
This isn’t a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.
Edit: Additionally, I don’t think a world where “most but not all” software engineering is automated is one where it will be a simple matter to spin up a thousand effective SWEs of that capability; I think there’s first a world where that’s still relatively expensive even if most software engineering is being done by automated systems. Paying $8000 for overnight service of 1000 software engineers would be a rather fine deal, currently, but still too much for most people.
I don’t think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren’t ‘translating’ or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)
While there’s truth in what you say, I also think a market that’s running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn’t put it in a bad position.
People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU ‘manufacturer’ simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don’t own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not ‘the current manufacturer’ - that’s everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.
So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc—possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can’t sell at your target price and you aren’t sure what you’re going to do.) And if AMD doesn’t do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).
But why would the profit go to NVIDIA, rather than TSMC? The money should go to the company with the scarce factor of production.
(… lol. That snuck in without any conscious intent to imply anything, yes. I haven’t even personally interacted with the open Nvidia models yet.)
I do think the analysis is a decent map to nibbling at NVIDIA’s pie share if you happen to be a competitor already—AMD, Intel, or Apple currently, to my knowledge, possibly Google depending what they’re building internally and if they decide to market it more. Apple’s machine learning ecosystem is a bit of a parallel one, but I’d be at least mildly interested in it from a development perspective, and it is making progress.
But when it comes to the hardware, this is a sector where it’s reasonably challenging to conjure a competitor out of thin air still, so competitor behavior—with all its idiosyncrasies—is pretty relevant.
Two questionson this.
First, if AI is a big value driver, in a general economic sense, is your view that NVIDIA is over prices against its future potential or just that relatively NVIDIA will under perform other investment alternatives you see.
Second, and perhaps an odd and speculative (perhaps nonsense) thought. I would expect that in this area one might see some network effects in play as well so wondering if that might impact the AI engineering decisions on software. Could the AI software solutions look towards maximising the value of the installed network (AIs work better on a common chip and code infrastructure) than will be true if one looks at some isolated technical stats. A bit a long the lines of why Beta was displaced by VHS dispite being a better technology. If so, then it seems possible that NVIDA could remain a leader and enjoy its current pricing powers (at least to some extent) for a fairly long period of time.
Apparently there already exists a CUDA-alternative for non-Nvidia hardware. The open source project ZLUDA. As far as I can tell its less performant than CUDA, and it has the same challenges as firefox does when competing with chromium based browsers, which will only get worse as it gets more popular. But its something to track at least.
AI that can rewrite CUDA is a ways off. It’s possible that it won’t be that far away in calendar time, but it is far away in terms of AI market growth and hype cycles. If GPT-5 does well, Nvidia will reap the gains more than AMD or Google.
Transpiling assembly code written for one OS/kernel to assembly code for another OS/kernel while taking advantage the full speed of the processor, is a completely different task from transpiling say, java code into python.
Also, the hardware/software abstraction might break. A python developer can say hardware failures are not my problem. An assembly developer working at an AGI lab needs to consider hardware failures as lost wallclock time in their company’s race to AGI, and will try to write code so that hardware failures don’t cause the company to lose time.
GPT4 definitely can’t do this type of work and I’ll bet a lot of money GPT5 can’t do it either. ASI can do it but there’s bigger considerations than whether Nvidia makes money there, such as whether we’re still alive and whether markets and democracy continue to exist. Making a guess of N for which GPT-N can get this done requires evaluating how hard of a software task this actually is, and your comment contains no discussion of this.
Have you looked at tinygrad’s codebase or spoken to George Hotz about this?
Shorting nvidia might be tricky. I’d short nvidia and long TSM or an index fund to be safe at some point. Maybe now? Typically the highest market cap stock has poor performance after it claims that spot.
Here’s a side project David and I have been looking into, which others might have useful input on...
Background: Thyroid & Cortisol Systems
As I understand it, thyroid hormone levels are approximately-but-accurately described as the body’s knob for adjusting “overall metabolic rate” or the subjective feeling of needing to burn energy. Turn up the thyroid knob, and people feel like they need to move around, bounce their leg, talk fast, etc (at least until all the available energy sources are burned off and they crash). Turn down the thyroid knob, and people are lethargic.
That sounds like the sort of knob which should probably typically be set higher, today, than was optimal in the ancestral environment. Not cranked up to 11; hyperthyroid disorders are in fact dangerous and unpleasant. But at least set to the upper end of the healthy range, rather than the lower end.
… and that’s nontrivial. You can just dump the relevant hormones (T3/T4) into your body, but there’s a control system which tries to hold the level constant. Over the course of months, the thyroid gland (which normally produces T4) will atrophy, as it shrinks to try to keep T4 levels fixed. Just continuing to pump T3/T4 into your system regularly will keep you healthy—you’ll basically have a hypothyroid disorder, and supplemental T3/T4 is the standard treatment. But you better be ready to manually control your thyroid hormone levels indefinitely if you start down this path. Ideally, one would intervene further up the control loop in order to adjust the thyroid hormone set-point, but that’s more of a research topic than a thing humans already have lots of experience with.
So that’s thyroid. We can tell a similar story about cortisol.
As I understand it, the cortisol hormone system is approximately-but-accurately described as the body’s knob for adjusting/tracking stress. That sounds like the sort of knob which should probably be set lower, today, than was optimal in the ancestral environment. Not all the way down; problems would kick in. But at least set to the lower end of the healthy range.
… and that’s nontrivial, because there’s a control loop in place, etc. Ideally we’d intervene on the relatively-upstream parts of the control loop in order to change the set point.
We’d like to generalize this sort of reasoning, and ask: what are all the knobs of this sort which we might want to adjust relative to their ancestral environment settings?
Generalization
We’re looking for signals which are widely broadcast throughout the body, and received by many endpoints. Why look for that type of thing? Because the wide usage puts pressure on the signal to “represent one consistent thing”. It’s not an accident that there are individual hormonal signals which are approximately-but-accurately described by the human-intuitive phrases “overall metabolic rate” or “stress”. It’s not an accident that those hormones’ signals are not hopelessly polysemantic. If we look for widely-broadcast signals, then we have positive reason to expect that they’ll be straightforwardly interpretable, and therefore the sort of thing we can look at and (sometimes) intuitively say “I want to turn that up/down”.
Furthermore, since these signals are widely broadcast, they’re the sort of thing which impacts lots of stuff (and is therefore impactful to intervene upon). And they’re relatively easy to measure, compared to “local” signals.
The “wide broadcast” criterion helps focus our search a lot. For instance, insofar as we’re looking for chemical signals throughout the whole body, we probably want species in the bloodstream; that’s the main way a concentration could be “broadcast” throughout the body, rather than being a local signal. So, basically endocrine hormones.
Casting a slightly wider net, we might also be interested in:
Signals widely broadcast through the body by the nervous system.
Chemical signals widely broadcast through the brain specifically (since that’s a particularly interesting/relevant organ).
Non-chemical signals widely broadcast through the brain specifically.
… and of course for all of these there will be some control system, so each has its own tricky question about how to adjust it.
Some Promising Leads, Some Dead Ends
With some coaxing, we got a pretty solid-sounding list of endocrine hormones out of the LLMs. There were some obvious ones on the list, including thyroid and cortisol systems, sex hormones, and pregnancy/menstruation signals. There were also a lot of signals for homeostasis of things we don’t particularly want to adjust: salt balance, calcium, digestion, blood pressure, etc. There were several inflammation and healing signals, which we’re interested in but haven’t dug into yet. And then there were some cool ones: oxytocin (think mother-child bonding), endocannabinoids (think pot), satiety signals (think Ozempic). None of those really jumped out as clear places to turn a knob in a certain direction, other than obvious things like “take ozempic if you are even slightly overweight” and the two we already knew about (thyroid and cortisol).
Then there were neuromodulators. Here’s the list we coaxed from the LLMs:
Dopamine: Tracks expected value/reward—how good things are compared to expectations.
Norepinephrine: Sets arousal/alertness level—how much attention and energy to devote to the current situation.
Serotonin: Regulates resource availability mindset—whether to act like resources are plentiful or scarce. Affects patience, time preference, and risk tolerance.
Acetylcholine: Controls signal-to-noise ratio in neural circuits—acts like a gain/precision parameter, determining whether to amplify precise differences (high ACh) or blur things together (low ACh).
Histamine: Manages the sleep/wake switch—promotes wakefulness and suppresses sleep when active.
Orexin: Acts as a stability parameter for brain states—increases the depth of attractor basins and raises transition barriers between states. Higher orexin = stronger attractors = harder to switch states.
Of those, serotonin immediately jumps out as a knob you’d probably want to turn to the “plentiful resources” end of the healthy spectrum, compared to the ancestral environment. That puts the widespread popularity of SSRIs in an interesting light!
Moving away from chemical signals, brain waves (alpha waves, theta oscillations, etc) are another potential category—they’re oscillations at particular frequencies which (supposedly) are widely synced across large regions of the brain. I read up just a little, and so far have no idea how interesting they are as signals or targets.
Shifting gears, the biggest dead end so far has been parasympathetic tone, i.e. overall activation level of the parasympathetic nervous system. As far as I can tell, parasympathetic tone is basically Not A Thing: there are several different ways to measure it, and the different measurements have little correlation. It’s probably more accurate to think of parasympathetic nervous activity as localized, without much meaningful global signal.
Anybody see obvious things we’re missing?
Uh… Guys. Uh. Biology is complicated. It’s a messy pile of spaghetti code. Not that it’s entirely intractable to make Pareto improvements but, watch out for unintended consequences.
For instance: you are very wrong about cortisol. Cortisol is a “stress response hormone”. It tells the body to divert resources to bracing itself to deal with stress (physical and/or mental). Experiments have shown that if you put someone through a stressful event while suppressing their cortisol, they have much worse outcomes (potentially including death). Cortisol doesn’t make you stressed, it helps you survive stress. Deviation from homeostatic setpoints (including mental ones) are what make you stressed.
This is interesting. Can you say more about these experiments?
Hmm, I’ll see if I can find some old papers.… I’m just reciting memories from grad school lectures like… 12 years ago. Here’s an example of the finding being replicated and explored further in a primate model: https://www.jci.org/articles/view/112443
Here’s a review of cortisol inhibition and surgery findings. A mixed bag, a complicated system. https://academic.oup.com/bja/article/85/1/109/263834
https://onlinelibrary.wiley.com/doi/abs/10.1111/ejn.15721 “Evidence suggests that psychological stress has effects on decision making, but the results are inconsistent, and the influence of cortisol and other modulating factors remains unclear. ”
Basically, cortisol is helpful for surviving injuries. Is it helpful for mental stress? Unclear. Long term high cortisol is harmful, but the stress in one’s life resulting in that high cortisol level is harmful in more ways than just high cortisol. So are there times when it would be helpful to reduce someone’s cortisol level? Absolutely. But it’s complicated and should be done thoughtfully and selectively, and in combination with other things (particularly seeking out and treating the upstream causes).
You can find lots more on Google scholar.
I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here for a reference.)
I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here.
Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.
This sounds logical but I don’t think is backed empirically, at least to the degree you’re claiming. Source: I have a biology BA and can’t speak directly to the question because I never took those classes because they had reputations for being full of exceptions and memorization.
The most obvious one imo is the immune system & the signals it sends.
Others:
Circadian rhythm
Age is perhaps a candidate here, though it may be more or less a candidate depending on if you’re talking about someone before or after 30
Hospice workers sometimes talk about the body “knowing how to die”, maybe there’s something to that
I don’t have deep expertise in the subject, but I’m inclined to concur with the people saying that the widely broadcast signals don’t actually represent one consistent thing, despite your plausible argument to the contrary.
Here’s a Scott Alexander post speculating why that might be the case. In short: there was an optimization pressure towards making internal biological signals very difficult to decode, because easily decodable signals were easy target for parasites evolving to exploit them. As the result, the actual signals are probably represented as “unnecessarily” complicated, timing-based combinations of various “basic” chemical, electrical, etc. signals, and they’re somewhat individualized to boot. You can’t decode them just by looking at any one spatially isolated chunk of the body, by design.
Basically: separate chemical substances (and other components that look “simple” locally/from the outside) are not the privileged basis for decoding internal signals. They’re the anti-privileged basis, if anything.
Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.
…Except in the time domain, to a limited extent. For example, in rats, tonic oxytocin in the bloodstream controls natriuresis, while pulsed oxytocin in the bloodstream controls lactation and birth. The kidney puts a low-pass filter on its oxytocin detection system, and the mammary glands & uterus put a high-pass filter, so to speak.
The point wouldn’t be to direct it, but to have different mixtures of chemicals (and timings) to mean different things to different organs.
Loose analogy: Suppose that the intended body behaviors (“kidneys do X, heart does Y, brain does Z” for all combinations of X, Y, Z) are latent features, basic chemical substances and timings are components of the input vector, and there are dramatically more intended behaviors than input-vector components. Can we define the behavior-controlling function of organs (distributed across organs) such that, for any intended body behavior, there’s a signal that sets the body into approximately this state?
It seems that yes. The number of almost-orthogonal vectors in d dimensions scales exponentially with d, so we simply need to make the behavior-controlling function sensitive to these almost-orthogonal directions, rather than the chemical-basis vectors. The mappings from the input vector to the output behaviors, for each organ, would then be some complicated mixtures, not a simple “chemical A sets all organs into behavior X”.
This analogy seems flawed in many ways, but I think something directionally-like-this might be happening?
Just because the number of almost-orthogonal vectors in d dimensions scales exponentially with d, doesn’t mean one can choose all those signals independently. We can still only choose d real-valued signals at a time (assuming away the sort of tricks by which one encodes two real numbers in a single real number, which seems unlikely to happen naturally in the body). So “more intended behaviors than input-vector components” just isn’t an option, unless you’re exploiting some kind of low-information-density in the desired behaviors (like e.g. very “sparse activation” of the desired behaviors, or discreteness of the desired behaviors to a limited extent).
The above toy model assumed that we’re picking one signal at a time, and that each such “signal” specifies the intended behavior for all organs simultaneously...
… But you’re right that the underlying assumption there was that the set of possible desired behaviors is discrete (i. e., that X in “kidneys do X” is a discrete variable, not a vector of reals). That might’ve indeed assumed me straight out of the space of reasonable toy models for biological signals, oops.
I had seen recommendations for T3/T4 on twitter to help with low energy, and even purchased some, but haven’t taken it. I hadn’t considered that the thyroid might respond by shrinking, and now think that that’s a worrying intervention! So I’m glad I read this—thank you.
As someone who has Graves’ Disease … one of the reasons that you really don’t want to run your metabolism faster with higher T4 levels is that higher heart rate for an extended period can cause your heart to fail.
More generally: changing the set point of any of these system might cause the failure of some critical component that depends on the old value of the set point,
Gwern gave a list in his Nootropics megapost.
Yup, I’m familiar with that one. The big difference is that I’m backward-chaining, whereas that post forward chains; the hope of backward chaining would be to identify big things which aren’t on peoples’ radar as nootropics (yet).
(Relatedly: if one is following this sort of path, step 1 should be a broad nutrition panel and supplementing anything in short supply, before we get to anything fancier.)
So I find the question underspecified, why do you want this?
Why are you decomposing body signalling without looking at the major sub-regulstort systems? If you want to predict sleep then cortisol, melatonin, etc. is something quite good and this will tell you about stress regulation which effects both endocrine as well as cortisol systems.
If you want to look at nutritional systems then GLP-1 activation is good for average food need whilst grelin is predictive of whether you will feel hungry at specific times.
If you’re looking at brain health then serotonin activation patterns can be really good to check but this is different from how the stomach uses it and it does have the majority of serotonin. But this is like way to simplified especially for the brain.
Different subsystems use the same molecules in different ways, waste not and all that so what are you looking for and why?
Is there a particular reason to not include sex hormones? Some theories suggest that testosterone tracks relative social status. We might expect that high social status → less stress (of the cortisol type) + more metabolic activity. Since it’s used by trans people we have a pretty good idea of what it does to you at high doses (makes you hungry, horny, and angry) but its unclear whether it actually promotes low cortisol-stress and metabolic activity.
AFAICT, approximately every “how to be good at conversation” guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That’s a somewhat lossy compression, but not that lossy.) And approximately every guide is like “if you get good at this free association game, then it will be fun and easy!”. And that’s probably true for some subset of people.
But speaking for myself personally… the problem is that the free-association game just isn’t very interesting.
I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that’s what you want. But, like… that is not my utility function. The free association game is a fine ice-breaker, it’s sometimes fun for ten minutes if I’m in the mood, but most of the time it’s just really boring.
Even for serious intellectual conversations, something I appreciate in this kind of advice is that it often encourages computational kindness. E.g. it’s much easier to answer a compact closed question like “which of these three options do you prefer” instead of an open question like “where should we go to eat for lunch”. The same applies to asking someone about their research; not every intellectual conversation benefits from big open questions like the Hamming Question.
I think this is especially important for me/us to remember. On this site we often have a complex way of thinking, and a high computational budget (because we like exercising our brains to failure) and if we speak freely to the average person, they mat be annoyed at how hard it is to parse what we are saying.
We’ve all probably had this experience when genuinely trying to understand someone from a very different background. Perhaps they are trying to describe their inner experience when mediating, or Japanese poetry, or are simply from a different’t discipline. Or perhaps we were just very tired that day, meaning we had a low computational budget.
On the other hand, we are often a “tell” culture, which had a lower computational load compared to ask or guess culture. As long as we don’t tell too much.
Generally fair and I used to agree, I’ve been looking at it from a bit of a different viewpoint recently.
If we think of a “vibe” of a conversation as a certain shared prior that you’re currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.
My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.
There’s apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to “find the shared prior and vibe there”.
Hm. This rings true… but also I think that selecting [vibes, in this sense] for attention also selects against [things that the other person is really committed to]. So in practice you’re just giving up on finding shared commitments. I’ve been updating that stuff other than shared commitments is less good (healthy, useful, promising, etc.) than it seems.
Hmm, I find that I’m not fully following here. I think “vibes” might be thing that is messing it up.
Let’s look at a specific example: I’m talking to a new person at an EA-adjacent event and we’re just chatting about how the last year has been. Part of the “vibing” here might be to hone in on the difficulties experienced in the last year due to a feeling of “moral responsibility”, in my view vibing doesn’t have to be done with only positive emotions?
I think you’re bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you’re more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?
Closeness is the operating drive, but it’s not the operating telos. The drive is towards some sort of state or feeling—of relating, standing shoulder-to-shoulder looking out at the world, standing back-to-back defending against the world; of knowing each other, of seeing the same things, of making the same meaning; of integrated seeing / thinking. But the telos is tikkun olam (repairing/correcting/reforming the world)--you can’t do that without a shared idea of better.
As an analogy, curiosity is a drive, which is towards confusion, revelation, analogy, memory; but the telos is truth and skill.
In your example, I would say that someone could be struggling with “moral responsibility” while also doing a bunch of research or taking a bunch of action to fix what needs to be fixed; or they could be struggling with “moral responsibility” while eating snacks and playing video games. Vibes are signals and signals are cheap and hacked.
There’s a general-purpose trick I’ve found that should, in theory, be applicable in this context as well, although I haven’t mastered that trick myself yet.
Essentially: when you find yourself in any given cognitive context, there’s almost surely something “visible” from this context such that understanding/mastering/paying attention to that something would be valuable and interesting.
For example, suppose you’re reading a boring, nonsensical continental-philosophy paper. You can:
Ignore the object-level claims and instead try to reverse-engineer what must go wrong in human cognition, in response to what stimuli, to arrive at ontologies that have so little to do with reality.
Start actively building/updating a model of the sociocultural dynamics that incentivize people to engage in this style of philosophy. What can you learn about mechanism design from that? It presumably sheds light on how to align people towards pursuing arbitrary goals, or how to prevent this happening...
Pay attention to your own cognition. How exactly are you mapping the semantic content of the paper to an abstract model of what the author means, or to the sociocultural conditions that created this paper? How do these cognitive tricks generalize? If you find a particularly clever way to infer something form the text, check: would your cognitive policy automatically deploy this trick in all context where it’d be useful, or do you need to manually build a TAP for that?
Study what passages make the feelings of boredom or frustration spike. What does that tell you about how your intuitions/heuristics work? Could you extract any generalizable principles out of that? For example, if a given sentence particularly annoys you, perhaps it’s because it features a particularly flawed logical structure, and it’d be valuable to learn to spot subtler instances of such logical flaws “in the wild”.
The experience of reading the paper’s text almost certainly provides some data uniquely relevant to some valuable questions, data you legitimately can’t source any other way. (In the above examples: sure you can learn more efficiently about the author’s cognition or the sociocultural conditions by reading some biographies or field overviews. But (1) this wouldn’t give you the meta-cognitive data about how you can improve your inference functions for mapping low-level data to high-level properties, (2) those higher-level summaries would necessarily be lossy, and give you a more impoverished picture than what you’d get from boots-on-the-ground observations.)
Similar applies to:
Listening to boring lectures. (For example, you can pay intense attention to the lecturer’s body language, or any tricks or flaws in their presentation.)
Doing a physical/menial task. (Could you build, on the fly, a simple model of the physics (or logistics) governing what you’re doing, and refine it using some simple experiments? Then check afterwards if you got it right. Or: If you were a prehistoric human with no idea what “physics” is, how could you naturally arrive at these ideas from doing such tasks/making such observations? What does that teach you about inventing new ideas in general?)
Doing chores. (Which parts of the process can you optimize/streamline? What physical/biological conditions make those chores necessary? Could you find a new useful takeaway from the same chore every day, and if not, why?)
Et cetera.
There’s a specific mental motion I associate with using this trick, which involves pausing and “feeling out” the context currently loaded in my working memory, looking at it from multiple angles, trying to see anything interesting or usefully generalizable.
In theory, this trick should easily apply to small-talk as well. There has to be something you can learn to track in your mind, as you’re doing small-talk, that would be useful or interesting to you.
One important constraint here is that whatever it is, it has to be such that your outwards demeanour would be that of someone who is enjoying talking to your interlocutor. If the interesting thing you’re getting out of the conversation is so meta/abstract you end up paying most of the attention to your own cognitive processes, not on what the interlocutor is saying, you’ll have failed at actually doing the small-talk. (Similarly, if, when doing a menial task, you end up nerd-sniped by building a physical model of the task, you’ll have failed at actually doing the task.)
You also don’t want to come across as sociopathic, so making a “game” of it where you’re challenging yourself to socially engineer the interlocutor into something is, uh, not a great idea.
The other usual advice for finding ways to enjoy small-talk are mostly specialized instances of the above idea that work for specific people. Steering the small-talk to gradient-descend towards finding emotional common ground, ignoring the object-level words being exchanged and build a social model of the interlocutor, doing a live study of the social construct of “small-talk” by playing around with it, etc.
You’ll probably need to find an instance of the trick that works for your cognition specifically, and it’s also possible the optimization problem is overconstrained in your case. Still, there might be something workable.
Some people struggle with the specific tactical task of navigating any conversational territory. I’ve certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.
Unfortunately, your problem is most likely that you’re talking to boring people (so as to avoid doing any moral value judgements I’ll make clear that I mean johnswentworth::boring people).
There are specific skills to elicit more interesting answers to questions you ask. One I’ve heard is “make a beeline for the edge of what this person has ever been asked before” which you can usually reach in 2-3 good questions. At that point they’re forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.
This is easiest when you can latch onto a topic you’re interested in, because then it’s easy on your part to come up with meaningful questions. If you can’t find any topics like this then re-read paragraph 2.
Talking to people is often useful for goals like “making friends” and “sharing new information you’ve learned” and “solving problems” and so on. If what conversation means (in most contexts and for most people) is ‘signaling that you repeatedly have interesting things to say’, it’s required to learn to do that in order to achieve your other goals.
Most games aren’t that intrinsically interesting, including most social games. But you gotta git gud anyway because they’re useful to be able to play well.
Hmm, the ‘making friends’ part seems the most important (since there are ways to share new information you’ve learned, or solve problems, beyond conversation), but it also seems a bit circular. Like, if the reason for making friends is to hang out and have good conversations(?), but one has little interest in having conversations, then doesn’t one have little reason to make friends in the first place, and therefore little reason to ‘git gud’ at the conversation game?
Er, friendship involves lots of things beyond conversation. People to support you when you’re down, people to give you other perspectives on your personal life, people to do fun activities with, people to go on adventures and vacations with, people to celebrate successes in your life with, and many more.
Good conversation is a lubricant for facilitating all of those other things, for making friends and sustaining friends and staying in touch and finding out opportunities for more friendship-things.
The skill in such a game is largely in understanding the free association space, knowing how people likely react and thinking enough steps ahead to choose moves that steer the person where you want to go, either into topics you find interesting, information you want from them, or getting them to a particular position, and so on. If you’re playing without goals, of course it’s boring...
I think that “getting good” at the “free association” game is in finding the sweet spot / negotiation between full freedom of association and directing toward your own interests, probably ideally with a skew toward what the other is interested in. If you’re both “free associating” with a bias toward your own interests and an additional skew toward perceived overlap, updating on that understanding along the way, then my experience says you’ll have a good chance of chatting about something that interests you both. (I.e. finding a spot of conversation which becomes much more directed than vibey free association.) Conditional on doing something like that strategy, I find it ends up being just a question of your relative+combined ability at this and the extent of overlap (or lack thereof) in interests.
So short model is: Git gud at free association (+sussing out interests) → gradient ascend yourselves to a more substantial conversation interesting to you both.
I have similar tastes, but, some additional gears:
I think all day, these days. Even if I’m trying to have interesting, purposeful conversations with people who also want that, it is useful to have sorts of things to talk about that let some parts of my brain relax (while using other parts of my brain I don’t use as much)
on the margin, you can do an intense intellectual conversation, but still make it funnier, or with more opportunity for people to contribute.
It’s becomes more interresting when the people constrain their output based on what they expect is true information that the other person does not yet know. It’s useful to talk to an expert, who tells you a bunch of random stuff they know that you don’t.
Often some of it will be useful. This only works if they understand what you have said though (which presumably is something that you are interested in). And often the problem is that people’s models about what is useful are wrong. This is especially likely if you are an expert in something. Then the thing that most people will say will be worse what you would think on the topic. This is especially bad if the people can’t immediately even see why what you are saying is right.
The best strategy around this I have found so far is just to switch the topic to the actually interesting/important things. Suprisingly usually people go along with it.
...How is that definition different than a realtime version of what you do when participating in this forum?
Good question. Some differences off the top of my head:
On this forum, if people don’t have anything interesting to say, the default is to not say anything, and that’s totally fine. So the content has a much stronger bias toward being novel and substantive and not just people talking about their favorite parts of Game of Thrones or rehashing ancient discussions (though there is still a fair bit of that) or whatever.
On this forum, most discussions open with a relatively-long post or shortform laying out some ideas which at least the author is very interested in. The realtime version would be more like a memo session or a lecture followed by discussion.
The intellectual caliber of people on this forum (or at least active discussants) is considerably higher than e.g. people at Berkeley EA events, let alone normie events. Last event I went to with plausibly-higher-caliber-people overall was probably the ILLIAD conference.
In-person conversations have a tendency to slide toward the lowest denominator, as people chime in about whatever parts they (think they) understand, thereby biasing toward things more people (think they) understand. On LW, karma still pushes in that direction, but threading allows space for two people to go back-and-forth on topics the audience doesn’t really grock.
Not sure to what extent those account for the difference in experience.
Totally understand why this would be more interesting; I guess I would still fundamentally describe what we’re doing on the internet as conversation, with the same rules as you would describe above. It’s just that the conversation you can find here (or potentially on Twitter) is superstimulating compared to what you’re getting elsewhere. Which is good in the sense that it’s more fun, and I guess bad inasmuch as IRL conversation was fulfilling some social or networking role that online conversation wasn’t.
I understand, for someone with a strong drive to solve hard problems, there’s an urge for conversations to serve a function, exchange information with your interlocutor so things can get done. There’s much to do and communication is already painfully inefficient at it’s best.
The thing is, I don’t think the free-association game is inefficient, if one is skilled at it. It’s also not all that free. The reason it is something humans “developed” is because it is the most efficient way to exchange rough but extensive models of our minds with others via natural language. It acts a bit like a ray tracer, you shoot conversational rays and by how they bounce around in mental structures, the thought patterns, values and biases of the conversation partners are revealed to each other. Shapes become apparent. Sometimes rays bounce off into empty space, then you need to restart the conversation, shoot a new ray. And getting better at this game, keeping the conversation going, exploring a wider range of topics more quickly, means building a faster ray tracer, means it takes less time to know if your interlocutor thinks in a way and about topics which you find enlightening/aesthetically pleasing/concretely useful/whatever you value.
Or to use a different metaphor, starting with a depth-first search and never running a breadth-first search will lead to many false negatives. There are many minds out there that can help you in ways you won’t know in advance.
So if the hard problems you are working on could profit from more minds, it pays off to get better as this. Even if it has not much intrinsic value for you, it has instrumental value.
Hope this doesn’t come across as patronizing, definitely not meant that way.
Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be “ray traced” with fairly little effort. It’s especially bad in EA circles.
Then I misunderstood your original comment, sorry. As a different commenter wrote, the obvious solution would be to only engage with interesting people. But, of course, unworkable in practice. And “social grooming” nearly always involves some level of talking. A curse of our language abilities, I guess. Other social animals don’t have that particular problem.
The next best solution would be higher efficiency, more socializing bang for your word count buck, so to speak. Shorter conversations for the same social effect. Not usually a focus of anything billed as conversation guide, for obvious reasons. But there are some methods aimed at different goals that, in my experience, also help with this as a side effect.
Say more about “ray-tracing”? What does that look like? And do you have a bullshit-but-useful PCA-flavored breakdown of those few dimensions of variation?
Ok but how do you deal with the tragedy of the high dimensionality of context-space? People worth thinking with have wildly divergent goals—and even if you share goals, you won’t share background information.
Yeah it sucks, search by free association is hillclimbing (gets stuck in local optima) and the contemporary media environment and political culture is an illustration of its problems.
The pattern itself is a local optimum, it’s a product of people walking into a group without knowing what the group is doing and joining in anyway, and so that pattern of low-context engagement becomes what we’re doing, and the anxiety that is supposed to protect us from bad patterns like this and help us to make a leap out to somewhere better is usually drowned in alcohol.
Instead of that, people should get to know each other before deciding what to talk about, and then intentionally decide to talk about what they find interesting or useful with that person. This gets better results every time.
But when we socialise as children, there isn’t much about our friends to get to know, no specialists to respectfully consult, no well processed life experiences to learn from, so none of us just organically find that technique of like, asking who we’re talking to, before talking, it has to be intentionally designed.
One blind spot we rationalists sometimes have is that charismatic people actually treat the game as:
“Can I think of an association that will make the other person feel good and/or further my goal?”. You need people to feel good, or they won’t participate. And if you want some complicated/favour/uncomftorble_truth then you better mix in some good feels to balance it out and keep the other person participating.
To put it another way: If you hurt people’s brain or ego, rush them, or make them feel unsure, or contradict them, then most untrained humans will feel a little bad. Why would they want to keep feeling bad? Do you like it when people don’t listen, contradict you, insult you, rush you, disagree with you? Probably not, probobly no one does.
But if someone listens to you, smiles at you, likes you, has a good opinion of you, agrees with you, make sense to you. Then it feels good!
This might sound dangerously sycophantic, and that’s because it is—if people overdo it! But if it’s mixed with some healthy understanding, learning, informing then It’s a great conversational lubricant, and you should apply as needed. It just ensures that everyone enjoys themselves and comes back for more, counteracting the normal frictions of socialising.
There are books about this. “How to Win Friends and Influence People” recommends talking about the other person’s interests (including themselves) and listening to them, which they will enjoy.
So I’d say, don’t just free associate. Make sure it’s fun for both parties, make room to listen to the other person, and to let them steer. (And ideally your conversational partner reciprocates, but that is not guaranteed).
Hm, I think this really does change when you get better at it? This only works for people you’re interested in, but if you have someone you are interested in, the free association can be a way to explore a large number of interesting topics that you can pick up in a more structured way later.
I think the statement you summarized from those guides is true, just not helpful to you.
Another view would be that people want to be good at conversation not only because they find it fun but there is utility in building rapport quickly, networking and not being cast as a cold person.
I do find the ice breaky, cached Q&A stuff really boring and tend to want to find an excuse to run away quickly, something that happens often at the dreaded “work event”. I tend to see it as almost fully acting a part despite my internal feelings
At these things, I do occasionally come across the good conversationalist, able to make me want to stick with speaking to them even if the convo is not that deep or in my interest areas. I think becoming like such a person isn’t a herculean task but does take practice and is something I aspire too
This is more from a professional setting though, in a casual setting it’s much easier to disengage from a boring person, find shared interests and the convos have much less boundaries
I predict you would enjoy the free-association game better if you cultivated the skill of vibing more.
I’m personally skeptical of this. I’ve found I’m far more likely to lie than I’d endorse when vibing. Saying “sure I’d be happy to join you on X event” when it is clear with some thought that I’d end up disliking it. Or exaggerating stories because it fits with the vibe.
I view System-1 as less concerned with truth here, it is the one that is more likely to produce a fake-argument in response to a suggested problem. More likely to play social games regardless of if they make sense.
Oh yes, if you’re going on people’s words, it’s obviously not much better, but the whole point of vibing is that it’s not about the words. Your aesthetics, vibes, the things you care about will be communicated non-verbally.
John’s Simple Guide To Fun House Parties
The simple heuristic: typical 5-year-old human males are just straightforwardly correct about what is, and is not, fun at a party. (Sex and adjacent things are obviously a major exception to this. I don’t know of any other major exceptions, though there are minor exceptions.) When in doubt, find a five-year-old boy to consult for advice.
Some example things which are usually fun at house parties:
Dancing
Swordfighting and/or wrestling
Lasertag, hide and seek, capture the flag
Squirt guns
Pranks
Group singing, but not at a high skill level
Lighting random things on fire, especially if they explode
Building elaborate things from whatever’s on hand
Physical party games, of the sort one would see on Nickelodeon back in the day
Some example things which are usually not fun at house parties:
Just talking for hours on end about the same things people talk about on LessWrong, except the discourse on LessWrong is generally higher quality
Just talking for hours on end about community gossip
Just talking for hours on end about that show people have been watching lately
Most other forms of just talking for hours on end
This message brought to you by the wound on my side from taser fighting at a house party last weekend. That is how parties are supposed to go.
One of my son’s most vivid memories of the last few years (and which he talks about pretty often) is playing laser tag at Wytham Abbey, a cultural practice I believe instituted by John and which was awesome, so there is a literal five-year-old (well seven-year-old at the time) who endorses this message!
My guess is laser tags were actually introduced to Wytham Abbey during their Battleschool, not by John. (People familiar with the history can correct me)
John graciously and brilliantly came up with the laser tag guns when he was captain-by-night for agent foundations 2024.
October 2023 I believe
No, I got a set of lasertag guns for Wytham well before Battleschool. We used them for the original SardineQuest.
This is one of the better sentences-that-sound-bizarre-without-context I’ve seen in a while.
It took me years of going to bars and clubs and thinking the same thoughts:
Wow this music is loud
I can barely hear myself talk, let alone anyone else
We should all learn sign language so we don’t have to shout at the top of our lungs all the time
before I finally realized—the whole draw of places like this is specifically that you don’t talk.
The reason the place is designed so that you can’t talk is to make you buy more drinks. (Because when people start talking a lot, they forget to keep drinking.) It may or may not have a positive side effect on you having fun, but it wasn’t designed with your fun as a goal.
Would be interesting to see a survey of five year olds to see if the qualifiers in your opening statement are anything like correct. I doubt you need to filter to just boys, for example.
For me, it depends on whether the attendees are people I’ve never met before, or people I’ve known my entire life. If it’s people I don’t know, I do like to talk to them, to find out whether we have anything interesting to exchange. If it’s someone I’ve known forever, then things like karaoke or go-karting are more fun than just sitting around and talking.
Snowball fights/rolling big balls of snow fall into the same genre, if good snow is available.
I guess this gives me a decent challenge for the next boring party: Turn the party into something fun as a project. Probably the best way to achieve this is to grab the second-most on-board person and escalate from there, clearly having more fun than the other people?
Personally, I’m fairly committed to [talking a lot]. But I do find it incredibly difficult to do at parties. I’ve been trying to figure out why, but the success rate for me plus [talking a lot] at parties seems much lower than I would have hoped.
My mind derives pleasure from deep philosophical and technical discussions.
I’ll add to this list: If you have a kitchen with a tile floor, have everyone take their shoes off, pour soap and water on the floor, and turn it into a slippery sliding dance party. It’s so fun. (My friends and I used to call it “soap kitchen” and it was the highlight of our house parties.)
what was the injury rate?
We haven’t had one yet! But we only did it ~3 times. Obviously people are more careful than they’d normally be while dancing on the slippery floor.
After most people had left a small house party I was throwing, my close friends and I stayed and started pouring ethanol from a bottle on random surfaces and things and burning it. It was completely stupid, somewhat dangerous (some of us sustained some small burns), utterly pointless, very immature, and also extremely fun.
most of these require
more preparation & coordination
more physical energy from everyone
which can be in short supply
Which doesn’t make the OP wrong.
A Different Gambit For Genetically Engineering Smarter Humans?
Background: Significantly Enhancing Adult Intelligence With Gene Editing, Superbabies
Epistemic Status: @GeneSmith or @sarahconstantin or @kman or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.
I’ve been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They’re Measuring. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?
… and after going through that exercise I mostly think the underlying studies are fine, but they’re known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.
The Baseline
Before sketching the “different gambit”, let’s talk about the baseline, i.e. the two proposals linked at top. In particular, we’ll focus on the genetics part.
GeneSmith’s plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single base-pair sometimes differs between two humans. (This type of mutation is in contrast to things like insertions or deletions.) GeneSmith argues pretty well IMO that just engineering all the right SNPs would be sufficient to raise a human’s intelligence far beyond anything which has ever existed to date.
GeneSmith cites this Steve Hsu paper, which estimates via a simple back-the-envelope calculation that there are probably on the order of 10k relevant SNPs, each present in ~10% of the population on average, each mildly deleterious.
Conceptually, the model here is that IQ variation in the current population is driven mainly by mutation load: new mutations are introduced at a steady pace, and evolution kills off the mildly-bad ones (i.e. almost all of them) only slowly, so there’s an equilibrium with many random mildly-bad mutations. Variability in intelligence comes from mostly-additive contributions from those many mildly-bad mutations. Important point for later: the arguments behind that conceptual model generalize to some extent beyond SNPs; they’d also apply to other kinds of mutations.
What’s Missing?
Based on a quick googling, SNPs are known to not account for the majority of genetic heritability of intelligence. This source cites a couple others which supposedly upper-bound the total SNP contribution to about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method). Estimates of the genetic component of IQ tend to be 50-70%, so SNPs are about half or less.
Notably, IIRC, attempts to identify which mutations account for the rest by looking at human genetic datasets have also mostly failed to close the gap. (Though I haven’t looked closely into that piece, so this is a place where I’m at particularly high risk of being wrong.)
So what’s missing?
Guess: Copy Count Variation of Microsats/Minisats/Transposons
We’re looking for some class of genetic mutations, which wouldn’t be easy to find in current genetic datasets, have mostly-relatively-mild effects individually, are reasonably common across humans, and of which there are many in an individual genome.
Guess: sounds like variation of copy count in sequences with lots of repeats/copies, like microsatellites/minisatellites or transposons.
Most genetic sequencing for the past 20 years has been shotgun sequencing, in which we break the genome up into little pieces, sequence the little pieces, then computationally reconstruct the whole genome later. That method works particularly poorly for sequences which repeat a lot, so we have relatively poor coverage and understanding of copy counts/repeat counts for such sequences. So it’s the sort of thing which might not have already been found via sequencing datasets, even though at least half the genome consists of these sorts of sequences.
Notably, these sorts of sequences typically have unusually high mutation rates. So there’s lots of variation across humans. Also, there’s been lots of selection pressure for the effects of those mutations to be relatively mild.
What Alternative Strategies Would This Hypothesis Suggest?
With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences. So the engineering part could be quite a lot easier, if we don’t need to do different things with different copies. For instance, if the problem boils down to “get rid of live L1 transposons” or “lengthen all the XYZ repeat sequences”, that would probably be simpler engineering-wise than targeting 10k SNPs.
The flip side is that there’s more novel science to do. The main thing we’d want is deep sequencing data (i.e. sequencing where people were careful to get all those tricky high-copy parts right) with some kind of IQ score attached (or SAT, or anything else highly correlated with g-factor). Notably, we might not need a very giant dataset, as is needed for SNPs. Under (some versions of) the copy count model, there aren’t necessarily thousands of different mutations which add up to yield the roughly-normal trait distribution we see. Instead, there’s independent random copy events, which add up to a roughly-normal number of copies of something. (And the mutation mechanism makes it hard for evolution to fully suppress the copying, which is why it hasn’t been selected away; transposons are a good example.)
So, main steps:
Get a moderate-sized dataset of deep sequenced human genomes with IQ scores attached.
Go look at it, see if there’s something obvious like “oh hey centromere size correlates strongly with IQ!” or “oh hey transposon count correlates strongly with IQ!”
If we find anything, go engineer that thing specifically, rather than 10k SNPs.
No, rare variants are no silver bullet here. There’s not a small set, there’s a larger set—there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it’s hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it’s hard to even sequence a CNV, how are you going to edit it?)
They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn’t mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it’s difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don’t help you nearly as much as their rarity hurts you.)
So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it’s a lot of ‘sand in the gears’, and once you move past the easy specks of sand, they all become their own special little snowflakes.
This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like ‘select embryos with the fewest de novo mutations’… but then you lose most of the possible variance and it’ll add little.
That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.
Right.
If you are doing genome synthesis, you aren’t frustrated by the rare variant problems as much because you just aren’t putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a ‘wild’ genome nor make highly challenging edits. (This is the ‘modal genome’ baseline. I believe it has still not been statistically modeled at all.)
While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)
Yeah, separate from both the proposal at top of this thread and GeneSmith’s proposal, there’s also the “make the median human genome” proposal—the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the “mutational load” model is basically correct.
I didn’t read this carefully—but it’s largely irrelevant. Adult editing probably can’t have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets—the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).
I haven’t looked at any of the studies and also don’t know much about genomics so my guess might be completely wrong, but a different hypothesis that seems pretty plausible to me is:
Most of the variance of intelligence comes from how well different genes/hyperparamets-of-the-brain can work together, rather than them having individually independent effects on intelligence. Aka e.g. as made-up specifc implausible example (I don’t know that much neuroscience), there could be different genes controlling the size, the snapse-density, and the learning/placticity-rate of cortical columns in some region and there are combinations of those hyperparameters which happen to work well and some that don’t fit quite as well.
So this hypothesis would predict that we didn’t find the remaining genetic component for intelligence yet because we didn’t have enough data to see what clusters of genes together have good effects and we also didn’t know in what places to look for clusters.
Reasonable guess a priori, but I saw some data from GeneSmith at one point which looked like the interactions are almost always additive (i.e. no nontrivial interaction terms), at least within the distribution of today’s population. Unfortunately I don’t have a reference on hand, but you should ask GeneSmith if interested.
@towards_keeperhood yes this is correct. Most research seems to show ~80% of effects are additive.
Genes are actually simpler than most people tend to think
I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.
Thanks.
So I only briefly read through the section of the paper, but not really sure whether it applies to my hypothesis: My hypothesis isn’t about there being gene-combinations that are useful which were selected for, but just about there being gene-combinations that coincidentally work better without there being strong selection pressure for those to quickly rise to fixation.
(Also yeah for simpler properties like how much milk is produced I’d expect a much larger share of the variance to come from genes which have individual contributions. Also for selection-based eugenics the main relevant thing are the genes which have individual contribution. (Though if we have precise ability to do gene editing we might be able to do better and see how to tune the hyperparameters to fit well together.))
Please let me know whether I’m missing something though.
(There might be a sorta annoying analysis one could do to test my hypothesis: On my hypothesis the correlation between the intelligence of very intelligent parents and their children would be even a bit less than on the just-independent-mutations hypothesis, because very intelligent people likely also got lucky in how their gene variants work together but those properties would unlikely to all be passed along and end up dominant.)
Thanks for confirming.
To clarify in case I’m misunderstanding, the effects are additive among the genes explaining the part of the IQ variance which we can so far explain, and we count that as evidence that for the remaining genetically caused IQ variance the effects will also be additive?
I didn’t look into how the data analysis in the studies was done, but on my default guess this generalization does not work well / the additivity on the currently identified SNPs isn’t significant counterevidence for my hyptohesis:
I’d imagine that studies just correlated individual gene variants with IQ and thereby found gene variants that have independent effects on intelligence. Or did they also look at pairwise or triplet gene-variant combinations and correlated those with IQ? (There would be quite a lot of pairs, and I’m not be sure whether the current datasets are large enough to robustly identify the combinations that really have good/bad effects from false positives.)
One would of course expect that the effects of the gene variants which have independent effects on IQ are additive.
But overall, except if the studies did look for higher-order IQ correlations, the fact that the IQ variance we can explain so far comes from genes which have independent effects isn’t significant evidence for the remaining genetically-caused IQ variation also comes from gene variants which have independent effects, because we were bound to much rather find the genes which do have independent effects.
(I think the above should be sufficient explanation of what I think but here’s an example to clarify my hypothesis:
Suppose gene A has variants A1 and A2 and gene B has B1 and B2. Suppose that A1 can work well with B1 and A2 with B2, but the other interactions don’t fit together that well (like badly tuned hyperparameters) and result in lower intelligence.
When we only look at e.g. A1 and A2, none is independently better than the other—they are uncorrelated to IQ. Studies would need to look at combinations of variants to see that e.g. A1+B1 has slight positive correlation with intelligence—and I’m doubting whether studies did that (and whether we have sufficient data to see the signal among the combinatorical explosion of possibilities), and it would be helpful if someone clarified to me briefly how studies did the data analysis.
)
(Thanks. I don’t think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith’s comment.)
Another confusing relevant piece of evidence I thought I throw in:
Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)
If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn’t expect the difference there to be that big.
It’s a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.
It’s IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other—where supergeniuses are in some dimensions “more lucky than the base genome” (in a way that’s not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn’t yet rise to fixation). But even there I’d still be pretty surprised by the heavytail.
The heavytail of intelligence really confuses me. (Given that it doesn’t even come from sub-critical intelligence explosion dynamics.)
If each deleterious mutation decreases the success rate of something by an additive constant, but you need lots of sequential successes for intellectual achievements, then intellectual formidability is ~exponentially related to deleterious variants.
Yeah I know that’s why I said that if a major effect was through few significantly deleterious mutations this would be more plausible. But i feel like human intelligence is even more heavitailed than what one would predict given this hypothesis.If you have many mutations that matter, then via central limit theorem the overall distribution will be roughly gaussian even though the individual ones are exponential.(If I made a mistake maybe crunch the numbers to show me?)(initially misunderstood what you mean where i thought complete nonsense.)
I don’t understand what you’re trying to say. Can you maybe rephrase again in more detail?
Suppose people’s probability of solving a task is uniformly distributed between 0 and 1. That’s a thin-tailed distribution.
Now consider their probability of correctly solving 2 tasks in a row. That will have a sort of triangular distribution, which has more positive skewness.
If you consider e.g. their probability of correctly solving 10 tasks in a row, then the bottom 93.3% of people will all have less than 50%, whereas e.g. the 99th percentile will have 90% chance of succeeding.
Conjunction is one of the two fundamental ways that tasks can combine, and it tends to make the tasks harder and rapidly make the upper tail do better than the lower tail, leading to an approximately-exponential element. Another fundamental way that tasks can combine is disjunction, which leads to an exponential in the opposite direction.
When you combine conjunctions and disjunctions, you get an approximately sigmoidal relationship. The location/x-axis-translation of this sigmoid depends on the task’s difficulty. And in practice, the “easy” side of this sigmoid can be automated or done quickly or similar, so really what matters is the “hard” side, and the hard side of a sigmoid is approximately exponential.
Thanks!
Is the following a fair paraphrasing of your main hypothesis? (I’m leaving out some subtleties with conjunctive successes, but please correct the model in that way if it’s relevant.):
“”″
Each deleterious mutation multiplies your probability of succeeding at a problem/thought by some constant. Let’s for simplicity say it’s 0.98 for all of them.
Then the expected number of successes per time for a person is proportional to 0.98^num_deleterious_mutations(person).
So the model would predict that when Person A had 10 less deleterious mutations than person B, they would on average accomplish 0.98^10 ~= 0.82 times as much in a given timeframe.
”″”
I think this model makes a lot of sense, thanks!
In itself I think it’s insufficient to explain how heavytailed human intelligence is—there were multiple cases where Einstein seems to have been able to solve problems multiple times faster than the next runner ups. But I think if you use this model in a learning setting where success means “better thinking algorithms” then if you have 10 fewer deleterious mutations it’s like having 1⁄0.82 longer training time, and there might also be compounding returns from having better thinking algorithms to getting more and richer updates to them.
Not sure whether this completely deconfuses me about how heavytailed human intelligence is, but it’s a great start.
I guess at least the heavytail is much less significant evidence for my hypothesis than I initially thought (though so far I still think my hypothesis is plausible).
Half-informed take on “the SNPs explain a small part of the genetic variance”: maybe the regression methods are bad?
Two responses:
It’s a pretty large part—somewhere between a third and half—just not a majority.
I was also tracking that specific hypothesis, which was why I specifically flagged “about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method)”. Again, I don’t know the method, but it sounds like it wasn’t dependent on details of the regression methods.
Continuing the “John asks embarrassing questions about how social reality actually works” series...
I’ve always heard (and seen in TV and movies) that bars and clubs are supposed to be a major place where single people pair up romantically/sexually. Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?
I’m not sure what’s up with this. Is there only a tiny fraction of bars and clubs where the matching happens? If so, how do people identify them? Am I just really, incredibly oblivious? Are bars and clubs just rare matching mechanisms in the Bay Area specifically? What’s going on here?
that trope is heavily out of date
I get the impression that this is true for straight people, but from personal/anecdotal experience, people certainly do still pair up in gay bars/clubs.
Yeah, feels like the current zeitgeist in Anglo countries and upper middle class environments at least is that it is simply bad manners to ever approach anyone with romantic/sexual intentions lest it’s a context where everyone has explicitly agreed that’s what you’re there for (speed dating, dating app, etc).
I think this is exaggerated fwiw.
Well, I don’t have much recent experience of dating myself, so it’s second-hand. But also, this user specifically is talking about Bay Area, and if there’s a single place and single social circle in the world where I expect this to be closest to true, “educated well-off tech people in the Bay Area” is it.
I’m not saying this is a truth anywhere and with everyone. Also, even if it’s not out of an actual social custom, I think at this point lots of people still resort to the internet as a way of looking for dates simply because the possibility is there and seemingly more direct (and lower effort). IIRC there’s data showing that the number of couples that started on the internet has dramatically increased across the last years, leaving almost all other methods behind.
I think people use the internet/apps for dating due to a combination of convenience in sorting/search, because it’s less awkward to be rejected online, and because it’s the path of least resistance, not because asking people out in person is considered rude.
It’s true that in middle-class/upper middle class circles, professional events/workplace is now considered ~off-limits for dating, which wasn’t true 30 years ago. However, that’s a big difference from what you originally said where only dating-specific events are okay.
People also do professional networking online + in dedicated networking events, but I don’t think it’s considered impolite to (eg) incidentally network in a ski lodge. Less effective, sure, but not impolite.
I’m also in the general Bay Area/tech/educated milieu, so I do have relevant anecdotal experience here[1].
eg I recently went on a few dates with a leftist girl I asked out at a stargazing thing. Neither of us thought it was impolite, I think. That said, it didn’t work out, and I guess I should’ve been able to figure that out a priori from stargazing not being the type of thing that’s sufficiently indicative of relationship compatibility.
The best relationships don’t go from zero to romantic in the first exchanged message[citation needed][original research?]
I don’t quite see how this comment connects to the comment you’re responding to.
TLDR: People often kiss/go home with each other after meeting in clubs, less so bars. This isn’t necessarily always obvious but should be observable when looking out for it.
OK, so I think most of the comments here don’t understand clubs (@Myron Hedderson’s comment has some good points though). As someone who has made out with a few people in clubs, and still goes from time to time I’ll do my best to explain my experiences.
I’ve been to bars and clubs in a bunch of places, mostly in the UK but also elsewhere in Europe and recently in Korea and South East Asia.
In my experience, bars don’t see too many hookups, especially since most people go with friends and spend most of their time talking to them. I imagine that one could end up pairing up at a bar if they were willing enough to meet new people and had a good talking game (and this also applied to the person they paired up with), but I feel like most of the actual action happens in clubs on the dancefloor.
I think matching can happen at just about any club in my experience, although I think . Most of the time it just takes the form of 2 people colliding (not necessarily literally), looking at each other, drunkeness making both much more obvious than usual and then them spending a while making out with each other. Sometimes things go beyond that point. Mostly not, in my experience although a friend recently told me that he rarely kisses girls in clubs and instead directly asks them home (apparently successfully).
I’ve seen enough people making out in clubs before to be confused as to why John hasn’t seen this sort of behaviour. I don’t know in what ways clubbing in the Bay Area is different from the UK, so I won’t speculate on that but I think that there is sometimes a difference in attitude depending on the music being played. In particular, I think people are more likely to make out to pop/classics than to e.g house. It may also just be that I’m more likely to kiss people when listening to music I enjoy.
Additional advice for clubs (heterosexual male):
Go there to enjoy the music (this may sound weird but enjoying clubs is very much a skill)
Don’t worry about pairing up with someone too much, this will remove opportunities to have fun (although you can still take actions which improve your odds)
Drink enough that you have no issues with dancing badly
When dancing, do literally any movement in time with the beat (ideally make the motions as varied as possible)
Humour is king, if something funny pops into your head do it.
Good examples: Miming the lyrics of a song (depending on the song), dancing with another guy (the more exaggerated, the more obvious it is you’re being funny), miming sex positions (you’d be shocked how many people in clubs are completely cool with this, and just find it entertaining)
If someone else does something entertaining support them (apart from anything else the more funny stuff is happening around you the more you have to bounce off of)
These tips do tend to require some extroversion—I don’t know how good this advice is macroscopically but in the clubbing scene this tends to be achieved via alcohol
If getting with girls really is the priority, then be obvious (there’s always the caveat not to do things likely to upset people, but I think that in the context of a) LessWrong b) clubs, the advice is overwhelmingly on the side of being far more forward and less worried about misdemeanours)
Pick one girl and single her out, don’t hedge your bets. Read body language (it’ll be more obvious when everyone else is drunk, and hearing each other can be a pain)
If rejected, brush yourself off and try again (probably in another part of the club, although remember having fun is the main thing so don’t abandon a good group)
The centre of the circle is centre stage—go nuts here, this is your opportunity to entertain people with the dumbest idea that just occurred to you
Caveats: this is what works for me. I have found that people consistently commenting they enjoy nights out with me significantly more than average, and I have found I enjoy nights out more when I employ these methods. I have not tried this everywhere and there have been places where I’ve felt a bit out of place (although I’d still argue I was having more fun than those around me).
I expect introverts to be scared by many of the ideas here, but I also feel like there are situations in life where acting more confident is universally better (public speaking is another example). Personally I’ve found this becomes easier with time and practise. Good luck all.
Edit: I just remembered I first got together with my ex-girlfriend at a bar. However we already knew each other and decided to meet up just the 2 of us, which is a somewhat different situation from most occasions I go to the bar.
How do you find good places and times to go? You just described exactly the sort of clubbing experience I most enjoy, but I’ve never had many close friends into it so I don’t really know where to look.
Yeah having the right friends to go with is important. I’ve recently finished university so that’s been easier for me than most, but in general I think it’s easier when going to an event with a decent number of people (I play ice hockey and so team/club dinners are a good example). With more people there’s a greater chance of there being a critical mass willing to go.
Aside from that I’ve recently been backpacking around Vietnam, Cambodia and Thailand and I’ve found that being in a hostel makes it incredibly easy to meet people and go out locally. This does require being comfortable in that environment though.
I think that all you really need is one friend who is willing to go with you, and they then become the main point of contact when you want to go.
It’s also possible to go alone, especially in communities like the backpacker community where it’s incredibly easy to meet people. This is generally a lot more sketchy in many places though as you have no backup if you e.g get spiked or drink too much.
Oh I have no problem going clubbing alone, I can have plenty of fun dancing with strangers. The hard part is finding the right club on the right night; AFAICT most of them are dead most nights. How do you solve that problem?
Oof honestly I feel like I mostly just kind of go and find a place with decent music that’s open. I normally find there’s at least one (or maybe my standards are just low), but I’d imagine that in places where that isn’t the case you’d be able to look on the good clubs websites to see when they have events.
I know that in Oxford clubs often have weekly theme nights, such as this one https://www.bridgeoxford.co.uk/wednesday. I’d imagine that a quick browse of your favourite clubs’ websites would give you a good idea of where to go when.
I’ve not done this myself* (my clubbing days were long ago now) but a few approaches:
If you live somewhere where some areas specialize in nightlife—bars, clubs, restaurants and even cool street scene—then just be a tourist there for a bit. You’ll see/find something that seems to fit for you.
Used to be “City Papers” that tended to focus on social life and what was happening during the week/month for people to learn about. So you’d hear about live music or popular DJs and where they were playing.
2a. More current take I assume would be online versions of this.
Social apps that are about meetups (One is called that) but I suspect even FB has something along these lines, which have group you can join or are open to the public that talk about what activities, where and when the get together is occurs. So will specifically state they are NOT about any hookup possibility but other are about meeting others for more than the specific activity (activity is more about the introduction and something to so rather than the whole reason for going).
Last, you might check for any pub crawls going on. Some of the stops will be good clubs to check-out and even sometimes joining the crawl will offer opportunities. Particularly true if you’re good at joining in with some new group of strangers—very good social skills required as the group needs to want you to join.
* Well, I have used Meetups for getting together with others but that was language based for learning and practicing so anyone that seemed more interested in meeting and other activities were discouraged or kicked out if overly obvious.
What’s the age range on clubbing? I’m newly single at 43 and I might have aged out of it, and a 43 year old trying to dance the way he did in high school usually looks stupid. (Or at least my late wife thought so.)
I think with enough enthusiasm anyone can go clubbing, and tbh imo stuff which looks stupid in a club just becomes entertaining. If you really feel embarrassed about it, one way to go about this is to play into the stupidity by really overexaggerating the moves to play into the humour.
I think with age the ick comes from older guys who come to look at young girls and nothing else. I have a mate who’s 49 and comes out clubbing with us, and is more enthusiastic than any of us on the dance floor and everyone loves it.
My late wife in particular thought my dancing was bad, which is why I brought it up; I mentioned the term “dad dancing” to her and she thought it was an appropriate description. (She happened to be nine years younger than I was.)
The point about making out is very valid, I’ve seen that plenty of times, and that should count as “pairing up sexually”. For whatever reason/no good reason, it didn’t occur to me to mention it in my longer comment.
From the perspective of someone who has never actually enjoyed the clubbing experience before, the above advice sounds like good advice for how to have a better time. :)
I heard it was usually at work, school, or a social group, church. This is not fully captured by How Couples Meet: Where Most Couples Find Love in 2025, but bar is higher than I expected.
My brother met his spouse at a club in NYC, around 2008. If I recall the story correctly, he was “doing the robot” on the stage, and then she started “doing the robot” on the floor. They locked eyes, he jumped down and danced over to her, and they were married a couple years later.
(Funny to think we’re siblings, when we have such different personalities!)
Go to a bar and ask a bartender how it works! I have tried pulling the autist card on a stranger to ask for social advice and it worked.
In my experience, bars these days (in the era of dating apps) are less a place where straight people pair up with strangers, and more a place where they:
Go on a first/second/third date with someone they know from a dating app or mutual friend or interest group, and maybe hook up with that person
Go with a friend group, and meet someone through a mutual friend, and maybe pair up with that person
But fwiw, it still seems reasonably common for people pair up with strangers in bars/clubs where I live. I don’t think bars/clubs are the perfect solution to meeting people romantically/sexually, but they have some advantages:
Alcohol makes people more willing to approach strangers, open up personally, and judge potential partners less critically
Bars/clubs (at least in major cities) are mostly filled with strangers you won’t see again, reducing the (perceived) costs of rejection or committing some faux pas
Bars/clubs being dark and noisy makes it easier to approach someone without a lot of other people observing you
In bars and especially clubs, (good) music creates an atmosphere where people (who like that music) feel mildly intoxicated
Clubs in particular involve quite a lot of moving around (across/to/from dance floors, bars, toilets, and chill-out areas) that create opportunities to meet/interact with strangers
That said, I think 10+ years ago bars/clubs were more of a place where people paired up with strangers. My sense is that this has changed largely due to dating apps, not by making it less acceptable to approach strangers, but more that dating apps offer an (often superior) alternative way of getting dates, which means people go to bars/clubs less to meet strangers and more to spend time with friends/partners. And even if a person is still interested in going to bars/clubs to meet strangers, it is harder when most other people are just there with their friend groups and not interested in interacting with strangers.
(Bars/clubs for gay people, and especially gay men, are different. There, it is still pretty common with random hook-ups, I should think.)
Personal experience—in uni, went to bars/clubs, I was generally pretty incompetent at the flirting thing, but danced with a bunch of girls, got numbers and didn’t really know what to do after that.
A handsome, charismatic friend of mine got together with a number of women, went home with a few, etc.
As did a couple other friends.
Location: scotland, dundee
Years: 2021-2022
also, was pretty common from a lot of friends stories to get with people after meeting them in the club. not relationships though
also, clubs in general are very animalistic, eq driven places that i think most rats/lesswrong users dont understand
Sitting on a long table (or bar itself) is a signal that you are open to connect with other people.
A quick search found this chart from a 2019 study on how couples meet. It looks like the fraction of couples who met at a bar has actually been going up in recent decades which is not what I would have predicted. But I don’t know how reliable this study is.
A member of my family (rather normie-ish) met his current girlfriend in a bar. A similar story with an EA acquaintance. But I don’t hear stories like that very often, and also caveat that these were in Eastern Europe (Poland and Estonia, respectively).
It’s out of date given how much dating has moved to apps.
And before apps, it was friends/families, and various communities like church, more than it was bars.
Whether cause or effect, alcohol interest has gone down, so it’s only weirder to picture meeting someone in a bar.
There’s some moral panic that Gen Z doesn’t know how to talk to people in person which interacts with your question somehow. Like people will excessively mourn the loss of bar dating, when actually meeting dates while drunk sort of sucks. I’m sure there’s a kernel-of-truth here, but generational moral panics are pretty much the default.
It’s extremely sitcom-friendly in a way that staring at phones and computers isn’t.
By the time it’s in TV/movies, it’s already heavily romanticized. The best example is “Cheers” which is a bar-as-church show. But the show is made when that type of community is already bygone.
When I was dating 10 years ago people still romanticized “meeting someone organically,” but not in any serious way that would stop them from app dating.
Related (and hilarious): Why You Secretly Hate Cool Bars from WaitButWhy
My model is that the primary service the Cool Bars provide is gatekeeping, so if you’re not the kind of person big spenders want to be seen with (pretty girls and impressive men) it’s going to be a hassle.
I can’t make strong claims here, as I go to bars and clubs fairly rarely. But I second the observation that it might be different in urban vs. rural areas, or (I add) different based on type of club. For example, the bar in my dad’s family’s extremely small hometown is the local gathering spot for those who want to have a beer with friends, which is very different from the loud, confusing, crowded dance clubs where you’re packed in like sardines with people you don’t know and can’t even see clearly. I think a valid analysis has to segregate by type of bar/club. The small-town bar I’m thinking of does have live entertainment and dancing (also darts, which wouldn’t work in a darkened environment where many people are quite drunk), but it’s a very different scene.
With respect specifically to the loud, dark, crowded places, lots of people find those off-putting and don’t go, or go rarely. It is fairly common advice to look elsewhere rather than at bars/clubs for dates. But: for someone who is young and anxious and not very sure how to meet people for dates/sex, going out with friends and getting moderately to very intoxicated in a place where you will also meet people you don’t know who are in a similar situation, is a way to overcome that barrier. And the fact you can’t really tell what’s going on 10 feet away, can’t hear what other people are saying very well, and everyone is expecting this to be an environment where people are drinking, means this is a more forgiving environment to do/try things that might be judged inappropriate and/or unacceptable in other environments. If you do something very obvious to indicate your sexual interest in someone in most public places, security may be called, but on the dance floor of a club, standards of acceptable behaviour are more lax, and behaviours themselves are less consistently observable. Also, if you try something with one person and get rebuffed, few to none of the other people will know it happened, so you can try again with someone else shortly thereafter. So my sense is that there are (or, were last time I checked, which admittedly was a few years ago) a lot of young (late teens to early 20′s) people bumbling their drunken way through social interactions in clubs. I’m sure plenty of them do leave the club together and have sex, but it’s hard to know for sure. Another thought: By reputation and it seems by design, clubs are places where bad decisions are made, and so if you want to stress less about your decision-making around sex and just go have some with someone fairly random who you don’t know well (many people will not want to do this, but some do), clubs give you license to do so. Or you think of yourself as not particularly worthy of the attention of those you may be interested in and so you figure you might have a better short if everyone’s drunk and can’t see each other very well, a club is a place where this will be true.
I started this whole train of thought by considering the sentence “Yet in my admittedly-limited experience of actual bars and clubs, I basically never see such matching?”, and thinking that that’s true for me as well, but I do (did, when I went) see people trying to get laid—and by its nature, the environment of a club is not conducive to my monitoring the social interactions of the people around me to see how they’re going, so I wouldn’t expect to know for sure based on what I observe while in a club, who went home with who. You learn that after the fact, if your friends tell you they hooked up.
I also saw a lot of guys sitting on the sidelines and drinking, trying to build up the “liquid courage” to initiate a conversation with someone they might like the look of, and a lot of women dancing in groups, so that they can be visible without being too vulnerable.
If you don’t like alcohol but can act disinhibited anyway, does that work too? (Also there’s the issue of whether your partner is too intoxicated to give consent...)
I am not the right person to ask about what works well in clubs, as I wouldn’t say my experiences at clubs were particularly successful or enjoyable, but I very much doubt anyone would kick you out of a club for not drinking or anything like that, so give it a shot and see how it goes? You get to decide what “works” for you, in this situation, and if you had a good time that’s a success.
As for the issue of consent while very intoxicated, yes that is an issue.
I got in a few dance battles in clubs while sober, was pretty fun. Had my first crowdsurf while sober in a club too.
The fun sober club experience very much depends on good music, being in the mood, being with friends who are very fun and you trust very deeply, etc, imo. oh, and being the kind of person who really likes music, dancing, kinda enjoys doing dumb shit, etc
This might be a cultural/region-based thing. Stop by a bar in Alabama, or even just somewhere rural, and I think there might be more use of bars as matchmaking.
I liked the explanation as provided in “Mate: Become The Man Women Want”.
Chapter 17 has a whole section on bars and clubs. In particular:
Probably very true on one level (but the young need some of that type of random experience to even learn what they want or who they want to be or be with).
But I’m not sure that is relevant for John’s question, but perhaps have taken his query incorrectly and it’s not about just meeting someone new for some unspecified level of commitment, e.g., just a short-term hookup, but is asking about where to meet his next long-term partner.
My primary motivation was actually just to understand how the world works; I didn’t necessarily plan to use that information to meet anyone at all. I just noticed I was confused about something and wanted to figure out what was going on.
TBF I always felt that if you wanted to find someone, “place where you have to make your throat hurt to speak even a few simple words” ain’t it, but I’m not known for my social prowess so I guessed maybe it was just me.
It probably works better if the people you’re trying to hook up with aren’t total strangers—consider a high school dance, or a college frat party...
Things non-corrigible strong AGI is never going to do:
give u() up
let u go down
run for (only) a round
invert u()
If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I’d expect to be quite different from a human mind. And I don’t see the reason why this property should make a system worse at optimizing the world in principle.
Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations.
At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won’t change your u anymore.
Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable.
So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then “give u() up”, and “let u go down” would be something the system will definitely do. At least I am pretty sure I don’t know what I want the universe to look like right now unambiguously.
Maybe I am just confused because I don’t know how to think about a human upload in terms of having a utility function. It does not seem to make any sense intuitively. Sure you can look at the functional behavior of the system and say “Aha it is optimizing for u. That is the revealed preference based on the actions of the system.” But that just seems wrong to me. A lot of information seems to be lost when we are just looking at the functional behavior instead of the low-level processes that are going on inside the system. Utility functions seem to be a useful high-level model. However, it seems to ignore lots of details that are important when thinking about the reflective stability of a system.
One of the classic conceptual problems with a Solomonoff-style approach to probability, information, and stat mech is “Which Turing machine?”. The choice of Turing machine is analogous to the choice of prior in Bayesian probability. While universality means that any two Turing machines give roughly the same answers in the limit of large data (unlike two priors in Bayesian probability, where there is no universality assumption/guarantee), they can be arbitrarily different before then.
My usual answer to this problem is “well, ultimately this is all supposed to tell us things about real computational systems, so pick something which isn’t too unreasonable or complex for a real system”.
But lately I’ve been looking at Aram Ebtekar and Marcus Hutter’s Foundations of Algorithmic Thermodynamics. Based on both the paper and some discussion with Aram (along with Steve Petersen), I think there’s maybe a more satisfying answer to the choice-of-Turing-machine issue in there.
Two key pieces:
The “Comparison against Gibbs-Shannon entropy” section of the paper argues that uncomputability is a necessary feature, in order to assign entropy to individual states and still get a Second Law. The argument says: if there exists a short program which can provably find and output a high-entropy string S, then we can physically instantiate a machine to run that short program. Then, when that physical machine spits out the high-entropy string S, S could be used to erase another copy of S. In other words, there is some high-entropy state (S) which this physical machine + program could steer into a low-entropy state.
As Aram pointed out, most of the bounds have a constant for the complexity of the laws of physics. If we choose a machine for which the laws of physics have high complexity, then the bounds are quantitatively trash.
The first piece is a part of the theory which can only bind to reality insofar as our chosen Turing machine is tractable to physically implement. The second piece is a part of the theory which can only bind to reality insofar as our physics can be tractably implemented on our chosen Turing machine.
In other words: in order for this thermodynamic theory to work well, we need to choose a Turing machine which is “computationally equivalent to” physics, in the sense that our physics can run the machine without insane implementation size, and the machine can run our physics without insane implementation size.
I’m still wrapping my head around all the pieces here, so hopefully I (or, better yet, someone else) will write up a more clear explainer in the future. But this smells really promising to me. Not just for purposes of Solomonoff thermodynamics, but also as a more principled way to tackle bounded rationality of embedded systems.
Cf. https://web.archive.org/web/20120331071849/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm
The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?
Notably that post has a section arguing against roughly the sort of thing I’m arguing for:
My response would be: yes, what-constitutes-a-low-level-language is obviously contingent on our physics and even on our engineering, not just on the language. I wouldn’t even expect aliens in our own universe to have low-level programming languages very similar to our own. Our low level languages today are extremely dependent on specific engineering choices made in the mid 20th century which are now very locked in by practice, but do not seem particularly fundamental or overdetermined, and would not be at all natural in universes with different physics or cultures with different hardware architecture. Aliens would look at our low-level languages and recognize them as low-level for our hardware, but not at all low-level for their hardware.
Analogously: choice of a good computing machine depends on the physics of one’s universe.
I do like the guy’s style of argumentation a lot, though.
I’m well out of my depth here, and this is probably a stupid question, but given the standard views of the “known” part of our physics, does that mean that the machine can do operations on arbitrary, fully precise complex numbers in constant time?
The continuous state-space is coarse-grained into discrete cells where the dynamics are approximately markovian (the theory is currently classical) & the “laws of physics” probably refers to the stochastic matrix that specifies the transition probabilities of the discrete cells (otherwise we could probably deal with infinite precision through limit computability)
Doesn’t such a discretization run into the fermion doubling problem?
The current theory is based on classical hamiltonian mechanics, but I think the theorems apply whenever you have a markovian coarse-graining. Fermion doubling is a problem for spacetime discretization in the quantum case, so the coarse-graining might need to be different. (E.g. coarse-grain the entire hilbert space, which might have locality issues but probably not load-bearing for algorithmic thermodynamics)
On outside view, quantum reduces to classical (which admits markovian coarse-graining) in the correspondence limit, so there must be some coarse-graining that works
In practice, we only ever measure things to finite precision. To predict these observations, all we need is to be able to do these operations to any arbitrary specified precision. Runtime is not a consideration here; while time-constrained notions of entropy can also be useful, their theory becomes messier (e.g., the 2nd law won’t hold in its current form).
Good question, it’s the right sort of question to ask here, and I don’t know the answer. That does get straight into some interesting follow-up questions about e.g. the ability to physically isolate the machine from noise, which might be conceptually load-bearing for things like working with arbitrary precision quantities.
I’ve been thinking about it in terms of “but which language are we using to compute the complexity of our universe/laws of physics?”. Usually I likewise just go “only matters up to an additive constant, just assume we’re not using a Turing tarpit and we’re probably good”. If we do dig into it, though, what can we conclude?
Some thoughts:
What is the “objectively correct” reference language?
We should, of course, assume that the algorithm computing our universe is simple to describe in terms of the “natural” reference language, due to the simplicity prior. I. e., it should have support for the basic functions our universe’s physics computes. I think that’s already equivalent to “the machine can run our physics without insane implementation size”.
On the flip side, it’s allowed to lack support for functions our universe can’t cheaply compute. For example, it may not have primitive functions for solving NP-complete problems. (In theory, I think there was nothing stopping physics from having fundamental particles that absorb Traveling Salesman problems and near-instantly emit their solutions.)
Now suppose we also assume that our observations are sampled from the distribution over all observers in Tegmark 4. This means that when we’re talking about the language/TM underlying it, we’re talking about some “natural”, “objective” reference language.
What can we infer about it?
First, as mentioned, we should assume the reference language is not a Turing tarpit. After all, if we allowed reality to “think” in terms of some arbitrarily convoluted Turing-tarpit language, we could arbitrarily skew the simplicity prior.
But what is a “Turing tarpit” in that “global”/”objective” sense, not defined relative to some applications/programs? Intuitively, it feels like “one of the normal, sane languages that could easily implement all the other sane languages” should be possible to somehow formalize...
Which is to say: when we’re talking about the Kolmogorov complexity of some algorithm, in what language are we measuring it? Intuitively, we want to, in turn, pick one of the “simplest” languages to define.[1] But what language do we pick for measuring this language’s complexity? An infinite recursion follows.
Intuitively, there’s perhaps some way to short-circuit that recursion. (Perhaps by somehow defining the complexity of a language by weighing its complexity across “all” languages while prioritizing the opinions of those languages which are themselves simple in terms of whatever complexity measure this expression defines? Or something along those lines, circular definitions not always a problem. (Though see an essay Tsvi linked to which breaks down why many of those definitions don’t work.))
Regardless, if something like this is successful, we’ll get a “global” definition of what counts as a simple/natural language. This would, in turn, allow us to estimate the “objective” complexity of various problems, by measuring the length of their solutions in terms of that natural language (i. e., the length of the execution trace of a computation solving the problem). This would perhaps show that some problems are “objectively” hard, such as some theoretical/philosophical problems or the NP-complete problems.
The speed prior
What if we try to compute the complexity not of the laws of physics, but of a given observer-moment/universe-state, and penalize the higher-complexity ones?
In chaotic systems, this actually works out to the speed prior: i. e., to assuming that the later steps of a program have less realityfluid than the early ones. Two lines of reasoning:
The farther in time a state is,[2] the more precisely you have to specify the initial conditions in order to hit it.
Justification: Suppose the program’s initial state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate the universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Aside from picking the initial state that generates the observation, you also have to pinpoint that observation in the execution trace of the program. It can be as easy as defining the time-step (if you’re working with classical mechanics), or as difficult as pointing at a specific Everett branch. And pinpointing generally gets more expensive with time (even in the trivial case of “pick a time-step”, the length of the number you have to provide grows).
Anthropically, this means that the computations implementing us are (relatively) stable, and produce “interesting” states (relatively) quickly/in few steps.
Anyway, digging into the paper now...
1. Oh, I see it’s likewise concerned with the description length of states:
2. The way the paper justifies the second law of thermodynamics is neat.
My understanding of that
Suppose the microstate of a system is defined by a (set of) infinite-precision real numbers, corresponding to e. g. its coordinates in phase space.
We define the coarse-graining as a truncation of those real numbers: i. e., we fix some degree of precision.
That degree of precision could be, for example, the Planck length.
At the microstate level, the laws of physics may be deterministic and reversible.
At the macrostate level, the laws of physics are stochastic and irreversible. We define them as a Markov process, with transition probabilities P(x,y) defined as “the fraction of the microstates in the macrostate x that map to the macrostate y in the next moment”.
Over time, our ability to predict what state the system is in from our knowledge of its initial coarse-grained state + the laws of physics degrades.
Macroscopically, it’s because of the properties of the specific stochastic dynamic we have to use (this is what most of the paper is proving, I think).
Microscopically, it’s because ever-more-distant decimal digits in the definition of the initial state start influencing dynamics ever stronger. (See the multibaker map in Appendix A, the idea of “microscopic mixing” in a footnote, and also apparently Kolmogorov-Sinai entropy.)
That is: in order to better pinpoint farther-in-time states, we would have to spend more bits (either by defining more fine-grained macrostates, or maybe by locating them in the execution trace).
Thus: stochasticity, and the second law, are downstream of the fact that we cannot define the initial state with infinite precision.
3. The part about incomputability being necessary is also interesting, metaphysically.
Why must it be impossible to prove lower bounds on Kolmogorov complexity?
So, Kolmogorov complexity is upper-semicomputable. This means that, for some x:
You can prove an upper bound on K(x), just by finding a program that computes x.
You can only prove a lower bound on K(x) using a program p with K(p)>K(x). Meaning, you can’t use any fixed-size program (or formal system) to prove arbitrarily high complexity.
Imagine if it were otherwise, if some p much smaller than K(x) could prove a lower bound on K(x). Then you could use that p to cheaply pinpoint x: by setting up a program that goes through programs in order, uses p to estimate the lower bound on their K(x), then outputs the first program whose complexity is above a threshold. Which would simultaneously functions as an upper bound on K(x): since our small program was able to compute it, K(x) can’t be higher than K(p).
Thus, in order for arbitrarily complex states/programs to exist, it must be impossible to prove that they are complex.
Why? Why does that have to be the case?
Intuitively, it’s because “proving” complexity requires pointing at specific features of the state x and explaining why exactly they are complex. That is, your formal language must be expressive enough to precisely talk about those features, in their full detail. If, however, you can get away with using some abstractions/generalizations to prove x‘s complexity, that by definition decreases x’s complexity.
Impromptu poll: is structuring long-form comments this way, with collapsibles for topics, convenient, or should I have just used titles? Please react with thumbs up/down to the following statement: “collapsibles good”.
All that said,
I’m curious what you have in mind here. I’ve kind of been treating my thinking on those topics as basically recreational/a guilty pleasure. The possibility that there’s something actually useful here interests me.
Since that would allow its emulators to be common across Tegmark IV, which would, in turn, give a bump to any algorithms simple in its terms.
More specifically, the first occurrence of that observation.
What I have in mind re:boundedness...
If we need to use a Turing machine which is roughly equivalent to physics, then a natural next step is to drop the assumption that the machine in question is Turing complete. Just pick some class of machines which can efficiently simulate our physics, and which can be efficiently implemented in our physics. And then, one might hope, the sort of algorithmic thermodynamic theory the paper presents can carry over to that class of machines.
Probably there are some additional requirements for the machines, like some kind of composability, but I don’t know exactly what they are.
This would also likely result in a direct mapping between limits on the machines (like e.g. limited time or memory) and corresponding limits on the physical systems to which the theory applies for those machines.
The resulting theory would probably read more like classical thermo, where we’re doing thought experiments involving fairly arbitrary machines subject to just a few constraints, and surprisingly general theorems pop out.
Attempted abstraction and generalization: If we don’t know what the ideal UTM is, we can start with some arbitrary UTM U1, and use it to predict the world for a while. After (we think) we’ve gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM U2 might have updated to that posterior faster, using less bits of observation about (our universe/the string we’re predicting). You could think of this as a way to define what the ‘correct’ UTM is. But I don’t find that definition very satisfying, because the validity of this procedure for finding a good U2 depends on how correct the posterior we’ve converged on with our previous, arbitrary, U1 is. ‘The best UTM is the one that figures out the right answer the fastest’ is true, but not very useful.
Is the thermodynamics angle gaining us any more than that for defining the ‘correct’ choice of UTM?
We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we’re basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct.
I think that’s roughly correct, but it is useful...
Another way to frame it would be: after one has figured out the laws of physics, a good-for-these-laws-of-physics Turning machine is useful for various other things, including thermodynamics. ‘The best UTM is the one that figures out the right answer the fastest’ isn’t very useful for figuring out physics in the first place, but most of the value of understanding physics comes after it’s figured out (as we can see from regular practice today).
Also, we can make partial updates along the way. If e.g. we learn that physics is probably local but haven’t understood all of it yet, then we know that we probably want a local machine for our theory. If we e.g. learn that physics is causally acyclic, then we probably don’t want a machine with access to atomic unbounded fixed-point solvers. Etc.
I agree that this seems maybe useful for some things, but not for the “Which UTM?” question in the context of debates about Solomonoff induction specifically, and I think that’s the “Which UTM?” question we are actually kind of philosophically confused about. I don’t think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we’re confused about how to pick if we don’t have any information at all yet.
I think roughly speaking the answer is: whichever UTM you’ve been given. I aim to write a more precise answer in an upcoming paper specifically about Solomonoff induction. The gist of it is that the idea of a “better UTM” U_2 is about as absurd as that of a UTM that has hardcoded knowledge of the future: yes such UTMs exists, but there is no way to obtain it without first looking at the data, and the best way to update on data is already given by Solomonoff induction.
I also talked to Aram recently & he’s optimistic that there’s an algorithmic version of the generalized heat engine where the hot vs cold pool correspond to high vs low k-complexity strings. I’m quite interested in doing follow-up work on that
Yes! I expect the temperatures won’t quite be proportional to complexity, but we should be able to reuse the thermodynamic definition of temperature as a derivative of entropy, which we’ve now replaced by K-complexity.
But also, even with universality, (algorithmic) Jeffrey-Bolker preference remains dependent on the machines that define its two algorithmic priors (it assigns expected utility to an event as the ratio of two probability measures of that event, using two different priors on the same sample space).
This suggests that choice of the machines in algorithmic priors should be meaningful data for the purposes of agent foundations, and gives some sort of an explanation for how agents with different preferences tend to arrive at the same probabilities of events (after updating on a lot of data), and so agreeing on questions of fact, while still keeping different preferences and endorsing different decisions, so disagreeing on questions of normativity.
This just talks about the bits of program available in our physics’ subroutine of a simulation tree, rather than about a universal across Teg 4 convergence, right?
(probably the bit it does is the useful bit, I’ve just been wishing for some convergent UTM for the multiverse for philosophical satisfaction for a while)
Yeah, I’m not convinced that the problem of induction is solvable at Teg 4. However, Universes with similar primitive laws and operations to ours will tend to produce intelligences with similar built-in priors. Thus, the right UTM to use is in a sense just the one that you happen to have in your possession.
Yeah, I mostly think that this is where it ends up, but it would be so neat if it there was convergence.
A proof of exactly why that’s not an option might also be similarly satisfying/enlightening.
Is this wish compatible with not throwing away a free lunch?
it’s settling on a universal price for each lunch, rather than just subjective ones depending on which lunch you’re near
I would have just answered “It depends on what you want to do”, with there being no set best prior/Universal Turing Machine, because of theorems like the No Free Lunch theorem (and more generally a takeaway from learning/computational theories is that there is no one best prior that was always justified, contrary to the ancient philosopher’s hopes).
Then you would have been wrong. No Free Lunch Theorems do not bind to reality.
I will propose an answer to No Free Lunch in an upcoming paper about Solomonoff induction. It is indeed subtle and important. In the interim, Schurz’ book “Hume’s Problem Solved” is a pretty good take. Schurz and Wolpert seem to argue against each other in their writing about NFL; I’ll explain later why I think they’re both right.
For a concrete answer on what the reference machine or low-level language should be, please see this 10-minute live-discussion only about the choice of the reference machine, starting at minute 20 and ending at minute 30: https://www.youtube.com/live/FNfGoQhf2Zw?si=Pg1ppTZmlw1S-3g9&t=1206
After one hour and 18 minutes, I spend another couple of minutes answering a question about the reference formalism. After one hour and 30 minutes into the video, someone asked me whether space aliens would agree with lambda calculus.
And in my paper, I have a 3-page discussion on the choice of the reference machine, Section 3.2: https://arxiv.org/pdf/2506.23194
The reason that I did not suggest that one should derive a reference machine from physics is that arriving at a consensus about the laws of physics will already have required the use of either Occam’s razor, common sense, or intuition, thus making the derivation seem circular, or otherwise, equivalent to choosing a simple reference machine directly based on its commonsensical simplicity in the first place but with extra steps through physics which might be redundant, depending on what exactly Aram’s argument was about.
Working on a paper with David, and our acknowledgments section includes a thankyou to Claude for editing. Neither David nor I remembers putting that acknowledgement there, and in fact we hadn’t intended to use Clause for editing the paper at all nor noticed it editing anything at all.
Were you by any chance writing in Cursor? I think they recently changed the UI such that it’s easier to end up in “agent mode” where it sometimes randomly does stuff.
Nope, we were in Overleaf.
… but also that’s useful info, thanks.
Only partially relevant, but it’s exciting to hear a new John/David paper is forthcoming!
Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?
The joke is that Claude somehow got activated on the editor, and added a line thanking itself for editing despite us not wanting it to edit anything and (as far as we’ve noticed) not editing anything else besides that one line.
Is it a joke or did it actually happen?
I have no idea. It’s entirely plausible that one of us wrote the Claude bit in there months ago and then forgot about it.
Does Overleaf have such AI integration that can get “accidentally” activated, or are you using some other AI plugin?
Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn’t even have to break out, we just “accidentally” hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)
My MATS program people just spent two days on an exercise to “train a shoulder-John”.
The core exercise: I sit at the front of the room, and have a conversation with someone about their research project idea. Whenever I’m about to say anything nontrivial, I pause, and everyone discusses with a partner what they think I’m going to say next. Then we continue.
Some bells and whistles which add to the core exercise:
Record guesses and actual things said on a whiteboard
Sometimes briefly discuss why I’m saying some things and not others
After the first few rounds establish some patterns, look specifically for ideas which will take us further out of distribution
Why this particular exercise? It’s a focused, rapid-feedback way of training the sort of usually-not-very-legible skills one typically absorbs via osmosis from a mentor. It’s focused specifically on choosing project ideas, which is where most of the value in a project is (yet also where little time is typically spent, and therefore one typically does not get very much data on project choice from a mentor). Also, it’s highly scalable: I could run the exercise in a 200-person lecture hall and still expect it to basically work.
It was, by all reports, exhausting for everyone but me, and we basically did this for two full days. But a majority of participants found it high-value, and marginal returns were still not dropping quickly after two days (though at that point people started to report that they expected marginal returns to drop off soon).
I’d be interested to see other people try this exercise—e.g. it seems like Eliezer doing this with a large audience for a day or two could generate a lot of value.
This was arguably the most useful part of the SERI MATS 2 Scholars program.
Later on, we actually did this exercise with Eliezer. It was less valuable. It seemed like John was mainly prodding the people who were presenting the ideas, such that their patterns of thought would carry them in a good direction. For example, John would point out that a person proposes a one-bit experiment and asks if there isn’t a better experiment that we could do that gives us lots of information all at once.
This was very useful because when you learn what kinds of things John will say, you can say them to yourself later on, and steer your own patterns of thought in a good direction on demand. When we did this exercise with Eliezer he was mainly explaining why a particular idea would not work. Often without explaining the generator behind his criticism. This can of course still be valuable as feedback for a particular idea. However, it is much harder to extract a general reasoning pattern out of this that you can then successfully apply later in different contexts.
For example, Eliezer would criticize an idea about trying to get a really good understanding of the scientific process such that we can then give this understanding to AI alignment researchers such that they can make a lot more progress than they otherwise would. He criticized this idea as basically being too hard to execute because it is too hard to successfully communicate how to be a good scientist, even if you are a good scientist.
Assuming the assertion is correct, hearing it, doesn’t necessarily tell you how to think in different contexts such that you would correctly identify if an idea would be too hard to execute or flawed in some other way. And I am not necessarily saying that you couldn’t extract a reasoning algorithm out of the feedback, but that if you could do this, then it would take you a lot more effort and time, compared to extracting a reasoning algorithm from the things that John was saying.
Now, all of this might have been mainly an issue of Eliezer not having a good model on how this workshop would have a positive influence on the people attending it. I would guess that if John had spent more time thinking about how to communicate what the workshop is doing and how to achieve its goal, then Eliezer could have probably done a much better job.
Strong endorsement; this resonates with:
My own experiences running applied rationality workshops
My experiences trying to get people to pick up “ops skill” or “ops vision”
Explicit practice I’ve done with Nate off and on over the years
May try this next time I have a chance to teach pair debugging.
This suggests formulation of exercises about the author’s responses to various prompts, as part of technical exposition (or explicit delimitation of a narrative by choices of the direction of its continuation). When properly used, this doesn’t seem to lose much value compared to the exercise you describe, but it’s more convenient for everyone. Potentially this congeals into a style of writing with no explicit exercises or delimitation that admits easy formulation of such exercises by the reader. This already works for content of technical writing, but less well for choices of topics/points contrasted with alternative choices.
So possibly the way to do this is by habitually mentioning alternative responses (that are expected to be plausible for the reader, while decisively, if not legibly, rejected by the author), and leading with these rather than the preferred responses. Sounds jarring and verbose, a tradeoff that needs to be worth making rather than a straight improvement.
Petrov Day thought: there’s this narrative around Petrov where one guy basically had the choice to nuke or not, and decided not to despite all the flashing red lights. But I wonder… was this one of those situations where everyone knew what had to be done (i.e. “don’t nuke”), but whoever caused the nukes to not fly was going to get demoted, so there was a game of hot potato and the loser was the one forced to “decide” to not nuke? Some facts possibly relevant here:
Petrov’s choice wasn’t actually over whether or not to fire the nukes; it was over whether or not to pass the alert up the chain of command.
Petrov himself was responsible for the design of those warning systems.
… so it sounds like Petrov was ~ the lowest-ranking person with a de-facto veto on the nuke/don’t nuke decision.
Petrov was in fact demoted afterwards.
There was another near-miss during the Cuban missile crisis, when three people on a Soviet sub had to agree to launch. There again, it was only the lowest-ranked who vetoed the launch. (It was the second-in-command; the captain and political officer both favored a launch—at least officially.)
This was the Soviet Union; supposedly (?) this sort of hot potato happened all the time.
Those are some good points. I wonder whether similar happened (or could at all happen) in other nuclear countries, where we don’t know about similar incidents—because the system haven’t collapsed there, the archives were not made public etc.
Also, it makes actually celebrating Petrov’s day as widely as possible important, because then the option for the lowest-ranked person would be: “Get demoted, but also get famous all around the world.”
Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman’s public statements pointed in the same direction too. I’ve also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.
Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn’t released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn’t be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers would move on to other things, but that’s an optimistic take, not a median world.
(To be clear, I don’t think you should be giving us much prediction-credit for that, since we didn’t talk about it publicly. I’m posting mostly because I’ve seen a decent number of people for whom the death of scaling seems to be a complete surprise and they’re not sure whether to believe it. For those people: it’s not a complete surprise, this has been quietly broadcast for a while now.)
Original GPT-4 is rumored to be a 2e25 FLOPs model. With 20K H100s that were around as clusters for more than a year, 4 months at 40% utilization gives 8e25 BF16 FLOPs. Llama 3 405B is 4e25 FLOPs. The 100K H100s clusters that are only starting to come online in the last few months give 4e26 FLOPs when training for 4 months, and 1 gigawatt 500K B200s training systems that are currently being built will give 4e27 FLOPs in 4 months.
So lack of scaling-related improvement in deployed models since GPT-4 is likely the result of only seeing the 2e25-8e25 FLOPs range of scale so far. The rumors about the new models being underwhelming are less concrete, and they are about the very first experiments in the 2e26-4e26 FLOPs range. Only by early 2025 will there be multiple 2e26+ FLOPs models from different developers to play with, the first results of the experiment in scaling considerably past GPT-4.
And in 2026, once the 300K-500K B200s clusters train some models, we’ll be observing the outcomes of scaling to 2e27-6e27 FLOPs. Only by late 2026 will there be a significant chance of reaching a scaling plateau that lasts for years, since scaling further would need $100 billion training systems that won’t get built without sufficient success, with AI accelerators improving much slower than the current rate of funding-fueled scaling.
I don’t expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we’ve been on for the past few years, and we’re not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.
Nobody admitted to trying repeated data at scale yet (so we don’t know that it doesn’t work), which from the tiny experiments can 5x the data with little penalty and 15x the data in a still-useful way. It’s not yet relevant for large models, but it might turn out that small models would greatly benefit already.
There are 15-20T tokens in datasets whose size is disclosed for current models (Llama 3, Qwen 2.5), plausibly 50T tokens of tolerable quality can be found (pretraining only needs to create useful features, not relevant behaviors). With 5x 50T tokens, even at 80 tokens/parameter[1] we can make good use of 5e27-7e27 FLOPs[2], which even a 1 gigawatt 500K B200s system of early 2026 would need 4-6 months to provide.
The isoFLOP plots (varying tokens per parameter for fixed compute) seem to get loss/perplexity basins that are quite wide, once they get about 1e20 FLOPs of compute. The basins also get wider for hybrid attention (compare 100% Attention isoFLOPs in the “Perplexity scaling analysis” Figure to the others). So it’s likely that using a slightly suboptimal tokens/parameter ratio of say 40 won’t hurt performance much at all. In which case we get to use 9e27-2e28 FLOPs by training a larger model on the same 5x 50T tokens dataset. The data wall for text data is unlikely to be a 2024-2026 issue.
Conservatively asking for much more data than Chinchilla’s 20 tokens per parameter, in light of the range of results in more recent experiments and adding some penalty for repetition of data. For example, Llama 3 had 40 tokens per parameter estimated as optimal for 4e25 FLOPs from isoFLOPs for smaller runs (up to 1e22 FLOPs, Figure 2), and linear extrapolation in log-coordinates (Figure 3) predicts that this value slowly increases with compute. But other experiments have it decreasing with compute, so this is unclear.
The usual estimate for training compute of a dense transformer is 6ND, but a recent Tencent paper estimates 9.6ND for their MoE model (Section 2.3.1).
FYI, my update from this comment was:
Hmm, seems like a decent argument...
… except he said “we don’t know that it doesn’t work”, which is an extremely strong update that it will clearly not work.
Use of repeated data was first demonstrated in the 2022 Galactica paper (Figure 6 and Section 5.1), at 2e23 FLOPs but without a scaling law analysis that compares with unique data or checks what happens for different numbers of repeats that add up to the same number of tokens-with-repetition. The May 2023 paper does systematic experiments with up to 1e22 FLOPs datapoints (Figure 4).
So that’s what I called “tiny experiments”. When I say that it wasn’t demonstrated at scale, I mean 1e25+ FLOPs, which is true for essentially all research literature[1]. Anchoring to this kind of scale (and being properly suspicious of results several orders of magnitude lower) is relevant because we are discussing the fate of 4e27 FLOPs runs.
The largest datapoints in measuring the Chinchilla scaling laws for Llama 3 are 1e22 FLOPs. This is then courageously used to choose the optimal model size for the 4e25 FLOPs run that uses 4,000 times more compute than the largest of the experiments.
For what it’s worth, and for the purpose of making a public prediction in case I’m wrong, my median prediction is that [some mixture of scaling + algorithmic improvements still in the LLM regime, with at least 25% gains coming from the former] will continue for another couple years. And that’s separate from my belief that if we did try to only advance through the current mixture of scale and algorithmic advancement, we’d still get much more powerful models, just slower.
I’m not very convinced by the claims about scaling hitting a wall, considering we haven’t had the compute to train models significantly larger than GPT-4 until recently. Plus other factors like post-training taking a lot of time (GPT-4 took ~6 months from the base model being completed to release, I think? And this was a lot longer than GPT-3), labs just not being good at understanding how good their models are, etc. Though I’m not sure how much of your position is closer to “scaling will be <25-50% of future gains” than “scaling gains will be marginal / negligible”, especially since a large part of this trajectory involves e.g. self-play or curated data for overcoming the data wall (would that count more as an algorithmic improvement or scaling?)
The interesting thing is that scaling parameters (next big frontier models) and scaling data (small very good models) seems to be hitting a wall simultaneously. Small models now seem to get so much data crammed into them that quantisation becomes more and more lossy. So we seem to be reaching a frontier of the performance per parameter-bits as well.
While I’m not a believer in the scaling has died meme yet, I’m glad you do have a plan for what happens if AI scaling does stop.
Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?
Some of the underlying evidence, like e.g. Altman’s public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.
What’s your opinion on the possible progress of systems like AlphaProof, o1, or Claude with computer use?
Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.
Ever since GeneSmith’s post and some discussion downstream of it, I’ve started actively tracking potential methods for large interventions to increase adult IQ.
One obvious approach is “just make the brain bigger” via some hormonal treatment (like growth hormone or something). Major problem that runs into: the skull plates fuse during development, so the cranial vault can’t expand much; in an adult, the brain just doesn’t have much room to grow.
BUT this evening I learned a very interesting fact: ~1/2000 infants have “craniosynostosis”, a condition in which their plates fuse early. The main treatments involve surgery to open those plates back up and/or remodel the skull. Which means surgeons already have a surprisingly huge amount of experience making the cranial vault larger after plates have fused (including sometimes in adults, though this type of surgery is most common in infants AFAICT)
.… which makes me think that cranial vault remodelling followed by a course of hormones for growth (ideally targeting brain growth specifically) is actually very doable with current technology.
Well, the key time to implement an increase in brain size is when the neuron-precursors which are still capable of mitosis (unlike mature neurons) are growing. This is during fetal development, when there isn’t a skull in the way, but vaginal birth has been a limiting factor for evolution in the past. Experiments have been done on increasing neuron count at birth in mammals via genetic engineering. I was researching this when I was actively looking for a way to increase human intelligence, before I decided that genetically engineering infants was infeasible [edit: within the timeframe of preparing for the need for AI alignment]. One example of a dramatic failure was increasing Wnt (a primary gene involved in fetal brain neuron-precursor growth) in mice. The resulting mice did successfully have larger brains, but they had a disordered macroscale connectome, so their brains functioned much worse.
it’s probably possible to get neurons back into mitosis-ready mode via some sort of crazy levin bioelectric cocktail, not that this helps us since that’s probably 3 to 30 years of research away, depending on amount of iteration needed and funding and etc etc.
Fleshing this out a bit more: insofar as development is synchronized in an organism, there usually has to be some high-level signal to trigger the synchronized transitions. Given the scale over which the signal needs to apply (i.e. across the whole brain in this case), it probably has to be one or a few small molecules which diffuse in the extracellular space. As I’m looking into possibilities here, one of my main threads is to look into both general and brain-specific developmental signal molecules in human childhood, to find candidates for the relevant molecular signals.
(One major alternative model I’m currently tracking is that the brain grows to fill the brain vault, and then stops growing. That could in-principle mechanistically work via cells picking up on local physical forces, rather than a small molecule signal. Though I don’t think that’s the most likely possibility, it would be convenient, since it would mean that just expanding the skull could induce basically-normal new brain growth by itself.)
I hope by now you’re already familiar with michael levin & his lab’s work on the subject of morphogenesis signals? Pretty much everything I’m thinking here is based on that.
Yes, I am familiar with Levin’s work.
Yes, it’s absolutely a combination of chemical signals and physical pressure. An interesting specific example of these two signals working together during fetal development when the pre-neurons are growing their axons. There is both chemotaxis which steers the ameoba-like tip of the growing axon, and at the same time a substantial stretching force along the length of the axon. The stretching happens because the cells in-between the origin and current location of the axon tip are dividing and expanding. The long distance axons in the brain start their growth relatively early on in fetal development when the brain is quite small, and have gotten stretched quite a lot by the time the brain is near to birth size.
Neurons are really really hard to reverse. You are much better off using existing neural stem cells (adults retain a population in the hippocampus which spawn new neurons throughout life just specifically in the memory formation area.) So actually it’s pretty straightforward to get new immature neurons for an adult. The hard part is inserting them without doing damage to existing neurons, and then getting them to connect in helpful rather than harmful ways. The developmental chemotaxis signals are no longer present, and the existing neurons are now embedded in a physically hardened extracellular matrix made of protein that locks axons and dendrites in place. So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it. Plus, you don’t have the stretching forces, so new long distance axons are just definitely not going to be achievable. But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.
My hope here would be that a few upstream developmental signals can trigger the matrix softening, re-formation of the chemotactic signal gradient, and whatever other unknown factors are needed, all at once.
Right. what I’m imagining is designing a new chemotaxis signal.
That certainly does sound like a very hard part yup.
Roll to disbelieve in full generality, sounds like a perfectly reasonable claim for any sort of sane research timeframe.
Maybe. I think you might run out of room pretty quick if you haven’t reintroduced enough plasticity to grow new neurons. Seems like you’re gonna need a lot of new neurons, not just a few, in order to get a significant change in capability. Might be wrong about that, but it’s my current hunch.
Yes, ok. Not in full generality. It’s not prohibited by physics, just like 2 OOMs more difficult. So yeah, in a future with ASI, could certainly be done.
Any particular readings you’d recommend?
15 years ago when I was studying this actively I could have sent you my top 20 favorite academic papers on the subject, or recommended a particular chapter of a particular textbook. I no longer remember these specifics. Now I can only gesture vaguely at Google scholar and search terms like “fetal neurogenesis” or “fetal prefrontal cortex development”. I did this, and browsed through a hundred or so paper titles, and then a dozen or so abstracts, and then skimmed three or four of the most promising papers, and then selected this one for you. https://www.nature.com/articles/s41386-021-01137-9 Seems like a pretty comprehensive overview which doesn’t get too lost in minor technical detail.
More importantly, I can give you my takeaway from years of reading many many papers on the subject. If you want to make a genius baby, there are lots more factors involved than simply neuron count. Messing about with generic changes is hard, and you need to test your ideas in animal models first, and the whole process can take years even ignoring ethical considerations or budget.
There is an easier and more effective way to get super genius babies, and that method should be exhausted before resorting to genetic engineering.
The easy way: find a really smart woman, ideally young. Surgically remove one of her ovaries. Collect sperm from a bunch of very smart men (ideally with diverse genetic backgrounds). Have a team of hundreds of scientists carefully fertilize many thousands of eggs from the ovary. Grow them all into blastocysts, and run a high fidelity genetic sequencing on all of them. Using what we know about the genes associated with intelligence, pick the top 20 who seem likely to be the smartest. Implant those in surrogate mothers. Take good care of the mothers. This is likely to get you multiple nobel level geniuses, and possibly a human smarter than has ever been born before. Raise the children in a special accelerated education environment. I think this would work, and it doesn’t require any novel technology. But it would take a while to raise the children… (Credit to Stephen Hsu for the idea)
Brain expansion also occurs after various insults to the brain. It’s only temporary, usually, but it will kill unless the skull pressure is somehow relieved. So there are various surgical methods for relieving pressure on a growing brain. I don’t know much more than this.
Just made this for an upcoming post, but it works pretty well standalone.
lolnice.
I’ve been trying to push against the tendency for everyone to talk about FTX drama lately, but I have some generalizable points on the topic which I haven’t seen anybody else make, so here they are. (Be warned that I may just ignore responses, I don’t really want to dump energy into FTC drama.)
Summary: based on having worked in startups a fair bit, Sam Bankman-Fried’s description of what happened sounds probably accurate; I think he mostly wasn’t lying. I think other people do not really get the extent to which fast-growing companies are hectic and chaotic and full of sketchy quick-and-dirty workarounds and nobody has a comprehensive view of what’s going on.
Long version: at this point, the assumption/consensus among most people I hear from seems to be that FTX committed intentional, outright fraud. And my current best guess is that that’s mostly false. (Maybe in the very last couple weeks before the collapse they toed the line into outright lies as a desperation measure, but even then I think they were in pretty grey territory.)
Key pieces of the story as I currently understand it:
Moving money into/out of crypto exchanges is a pain. At some point a quick-and-dirty solution was for customers to send money to Alameda (Sam Bankman-Fried’s crypto hedge fund), and then Alameda would credit them somehow on FTX.
Customers did rather a lot of that. Like, $8B worth.
The FTX/Alameda team weren’t paying attention to those particular liabilities; they got lost in the shuffle.
At some point in the weeks before the collapse, when FTX was already under moderate financial strain, somebody noticed the $8B liability sitting around. And that took them from “moderate strain” to “implode”.
How this contrasts with what seems-to-me to be the “standard story”: most people seem to assume that it is just totally implausible to accidentally lose track of an $8B liability. Especially when the liability was already generated via the decidedly questionable practice of routing customer funds for the exchange through a hedge fund owned by the same people. And therefore it must have been intentional—in particular, most people seem to think the liability was intentionally hidden.
I think the main reason I disagree with others on this is that I’ve worked at a startup. About 5 startups, in fact, over the course of about 5 years.
The story where there was a quick-and-dirty solution (which was definitely sketchy but not ill-intentioned), and then stuff got lost in the shuffle, and then one day it turns out that there’s a giant unanticipated liability on the balance sheet… that’s exactly how things go, all the time. I personally was at a startup which had to undergo a firesale because the accounting overlooked something. And I’ve certainly done plenty of sketchy-but-not-ill-intentioned things at startups, as quick-and-dirty solutions. The story that SBF told about what happened sounds like exactly the sort of things I’ve seen happen at startups many times before.
I think this is likely wrong. I agree that there is a plausible story here, but given the case that Sam seems to have lied multiple times in confirmed contexts (for example when saying that FTX has never touched customer deposits), and people’s experiences at early Alameda, I think it is pretty likely that Sam was lying quite frequently, and had done various smaller instances of fraud.
I don’t think the whole FTX thing was a ponzi scheme, and as far as I can tell FTX the platform itself (if it hadn’t burned all of its trust in the last 3 weeks), would have been worth $1-3B in an honest evaluation of what was going on.
But I also expect that when Sam used customer deposits he was well-aware that he was committing fraud, and others in the company were too. And he was also aware that there was a chance that things could blow up in the way it did. I do believe that they had fucked up their accounting in a way that caused Sam to fail to orient to the situation effectively, but all of this was many months after they had already committed major crimes and trust violations after touching customer funds as a custodian.
The problem with this explanation is that there is a very clear delineation here between not-fraud and fraud. It is the difference between not touching customer deposits and touching them. Your explanation doesn’t dispute that they were knowingly and intentionally touching customer deposits. In that case, it is indisputably intentional, outright fraud. The only thing left to discuss is whether they knew the extent of the fraud or how risky it was.
I don’t think it was ill-intentioned based on SBF’s moral compass. He just had the belief, “I will pass a small amount of risk onto our customers, tell some small lies, and this will allow us to make more money for charity. This is net positive for the world.” Then the risks mounted, the web of lies became more complicated to navigate, and it just snowballed from there.
Everyone says flirting is about a “dance of ambiguous escalation”, in which both people send progressively more aggressive/obvious hints of sexual intent in conversation.
But, like… I don’t think I have ever noticed two people actually do this? Is it a thing which people actually do, or one of those things which like 2% of the population does and everyone else just talks about a lot and it mostly doesn’t actually work in practice (like cold approaches)? Have you personally done the thing successfully with another person, with both of you actually picking up on the other person’s hints? Have you personally seen two other people do the thing firsthand, where they actually picked up on each others’ hints?
EDIT-TO-ADD: Those who have agree/disagree voted, I don’t know if agree/disagree indicates that you have/haven’t done the thing, or if agree/disagree indicates that you also have/haven’t ever noticed anyone (including yourself) successfully do the thing, or something else entirely.
Yes, I’ve had this experience many times and I’m aware of many other cases of it happening.
Maybe the proliferation of dating apps means that it happens somewhat less than it used to, because when you meet up with someone from a dating app, there’s a bit more common knowledge of mutual interest than there is when you’re flirting in real life?
Mind painting a picture of a typical example? What’s the setting, and what do the first few hints from each person look like?
The classic setting is a party (a place where you meet potential romantic partners who you don’t already know (or who you otherwise know from professional settings where flirting is inappropriate), and where conversations are freely starting and ending, such that when you start talking to someone the conversation might either go for two minutes or four hours).
Examples of hints:
Mentioning things that indicate that you’re romantically available, e.g. saying that you’re single, that you’re poly, telling a story of recently going on a date; more extreme would be telling a story of doing something promiscuous.
Mentioning things that indicate that you want to relate to the other person in a romantic or sexual context rather than a non-sexual way. For example, a woman talking about how she likes wearing revealing clothes, or commenting on her body or my body. And then responding positively to that kind of statement, e.g. building on it rather than demurring, replying flatly, or changing the subject,
Offering and accepting invitations to spend more time interacting one-on-one, especially in semi-private places. E.g. asking to sit together. (For example, person A might say “I’m getting a drink, want me to get you one?”, which is sort of an invitation to have a drink together, and person B might say “sure, let’s sit over there to have it”, which escalates the invitation to involve them talking for longer.)
Giving and accepting opportunities for physical contact.
In all cases, saying those things is more flirty if it was unnecessary for them to say it. E.g. if they say they’re single because it came up in conversation in a way that they couldn’t have contrived, that’s less flirty than if they tell a story that brings it up.
I think that online content on all this stuff is often pretty accurate.
I know this is LessWrong, and that sexual norms are different in the Bay Area, but for the average person:
Please don’t tell prospective romantic interests that you “went on a date recently” or that you did something promiscuous. The majority of the time, it would be interpreted as a sign you’re taken. Of course, if you elaborate that the date didn’t work out, that’s a different story.
I think that saying you went on a date usually is evidence that you’re not in a monogamous relationship, and if it’s ambiguous it gives the other person an opportunity to say “oh, how did it go?” which gives you an opportunity to subtly clarify that it was a casual date (and so confirm that you’re in the market for casual dating).
I guess “I was alone and masturbated recently” also wouldn’t work well, so… what are the proper words to suggest that I am available? :D
The only thing that comes to my mind, is that if you arrived with a person of the opposite sex, to explicitly mention that they are not your boyfriend/girlfriend.
Hmm.. That’s actually a tough question. As far as I can remember, I’ve rarely had to tell people outright that I’m single.
My recommendation would be to flirt away, and if they don’t casually namedrop a boyfriend or allude to having one, that’s strong enough evidence that they’re not taken.
>The only thing that comes to my mind, is that if you arrived with a person of the opposite sex, to explicitly mention that they are not your boyfriend/girlfriend.
Most tactful way to say as much would be to explicitly call them a “friend”. That should get the message across.
My disagree vote means: yes, this obviously happens a lot, and the fact that you haven’t noticed this happening, to the point you think it might be made up, reveals a huge blindspot of one kind or another.
Now THAT’S an interesting possibility. Did you already have in mind hypotheses of what that blindspot might be, or what else might be in it?
Followed up with John offline.
Some examples of flirting:
medium skill on The Wire, failing to land: https://www.youtube.com/shorts/eyyqoFhXRao
in crazy ex-girlfriend “I’m Going to the Beach with Josh and His Friends!”, there’s a scene between White Josh and Derrick. I can’t find a clip, but the key is that Derrick is hanging on to White Josh’s every word.
Ted Lasso:
Note how and how much she’s laughing at his very mediocre jokes. Ted could reasonably be interpreted as flirting back, but the audience knows he always make those stupid ass jokes. Actually the whole Ted Lasso show might be good for watching someone who’s generally very playful and seeing how it changes when he’s actually into someone.
Roy and Keeley, also from Ted Lasso. Note she’s dating his teammate.
Roy and some lady, still from Ted Lasso
Note how long she looks at him around 0:50, even though it’s awkward while she’s putting something away. She also contrives a way to ask if he’s married, and makes an interesting face when he says no. He is giving her enough breadcrumbs to continue but not flirting back (because he’s still into Keeley).
Half of the movie Challengers (including between the two ambiguously platonic male leads)
[At this point John contacted me offline and clarified he wanted examples of flirting that successfully end with asking someone out, but I didn’t want to throw away my work]
Pretty sure My Name Is Earl: The Professor has this but can’t find a clip. Also the first season of Ted Lasso.
I second the point about physical touch being important, and add: in my experience what you’re going for when flirting isn’t “ambiguous signal” but “plausible deniability”. The level of ambiguity is to be minimized, subject to the constraint that plausible deniability is maintained—ambiguity is an unfortunate side-effect, not something you’re aiming to modulate directly. Why you want plausible deniability: If the person doesn’t respond, or responds in the negative, you want to be able to back off without embarrassment to either party and pretend nothing happened/you were just being friendly/etc. You want to send a signal that is clear enough the other person will pick up on it, but can plausibly claim not to have done so if asked, so you’re not backing them into a corner socially where they have to give you a definite yes/no. Similar to the advice not to flirt in an elevator or other enclosed space the person you’re flirting with can’t easily leave, except the “enclosed space” is the space of possible social responses.
Once you’ve done a few things they ought to have picked up on, and no negative and some seemingly positive interaction has occurred afterwards (physical proximity has increased, verbally things seem to be going well, they’re smiling… if they’ve picked up your attempts at signaling and would like it to stop typically none of that will happen) you can try a physical touch. Something small and nonsexual. Particularly if you’re dealing with a new person or a friend you have never touched before, this usually doesn’t happen by accident—and you can do it in a way that is definitely a deliberate choice on your part, but still plausibly deniable/something both of you can walk away from as a signal of sexual interest. If you get a touch back soon after, you’re good to go (by which I mean, continue escalating in a way that is no longer very plausibly deniable), if you don’t, either the person is socially unskilled, or you’ve misread the situation, but in any case it’s their turn.
One possibility in my hypothesis space here is that there usually isn’t a mutual dance of plausibly-deniable signals, but instead one person sending progressively less deniable signals and the other person just not responding negatively (but not otherwise sending signals themselves).
I imagine that can happen for a while, but if I’m getting nothing back, I stop once I’m pretty sure they should have noticed what I’m doing. Silence in response to a received message, is a form of response, and not one that indicates “keep getting progressively less subtle please”.
If that is the wrong move (the person is interested in me continuing), they will let me know once I back off.
Another thought: You refer to this as a dance, and one model of what’s happening when one flirts is “demonstrate social skill/difficult-to-fake signal of intelligence by calibrating levels of ambiguity and successfully modeling the other person’s mind --> this is attractive --> get date”, in the same way that dancing successfully in an actual dance can be “demonstrate physical skill/difficult-to-fake signal of health --> this is attractive --> get date”. And I’m sure that happens sometimes, and for some people, but my model of flirting does not involve “demonstrate social skill/intelligence --> get date”. For me, flirting solves a different problem, which is “communicate that you like someone (in the sense one likes people one might like to date), and have them communicate back that they like you, without either of you risking much embarrassment or social awkwardness if it’s not mutual or for any other reason a date can’t happen right now”.
Depending on what you’re trying to do by flirting (demonstrate social skill vs. give someone you’re attracted to a low-pressure way to tell you whether they like you back) the approach may be different. Although, even the latter can be a tricky thing to do and ability to do it successfully demonstrates a useful skill.
I think most people who flirt are like, not super socially skilled around people they’re attracted to, and “try to get a sense of whether it’s mutual in a low-risk way” is the more important problem that flirting solves for them. But maybe that’s just me typical-minding :).
Also: the higher the number of spectators, the more you have to be very careful about plausible deniability, because you have to take into consideration what everyone is going to think, and the level of social awkwardness involved in a fumble or a rejection is higher. I’ve flirted with a few women before, but it only lasts more than a few seconds if the woman is flirting back, and I have always done it 1:1 rather than with a group of onlookers. And whenever I’ve noticed someone who might be flirting with me, it has likewise been in a 1:1 situation, at least initially. So it doesn’t surprise me that you haven’t noticed others doing this. Anything done in front of a group has to be so unclear to onlookers that most people would miss it, something like an inside joke or reference to a past conversation.
What is this context in which you are hanging out 1:1 with a woman and it’s not already explicitly a date? (I mean, that of course does happen sometimes, but at least for me it’s not particularly common, so I’m wondering what the contexts were when this actually happened to you.)
The classic is at a party where conversations of different sizes are regularly starting and stopping.
Um… well, first off, flirting doesn’t have to happen when you’re hanging out. It can start with something as simple as a compliment to a stranger. Start from the premise that people like to hear positive messages about themselves without any strings attached, and hand them out like candy (but recognizing that taking candy from strangers is something some people would prefer not to do for obvious reasons, so accept whatever response you get to what is offered) - some people will respond back, others won’t, but no harm will be done. I am an introvert so I don’t do this often, but striking up conversations with new people at random is a thing I can force myself to do, and it rarely goes as poorly as one might fear.
But also, my friend-group is mixed, more women than men, and typically it’s people I’ve met one at a time over the years, less of a “friend group” than “a number of people who are my friends”- so I have lots of 1:1 time with female friends. In terms of flirting with those friends, well, they’re friends, so that almost never happens—but almost never is not never. Three times that I can recall off the top of my head, it turned out that one of my friends was attracted to me, and I learned that either because she explicitly said so (in one case, we were teenagers and both clueless about how to flirt, her idea was to follow me around everywhere, and from my perspective I just didn’t know that was a thing that I should notice) or because of some flirting (two cases). When I was younger and much, much more awkward, there were innumerable instances where I was attracted to a female friend and didn’t say anything because from young-me’s perspective of course not that’s insane and I’m lucky this amazing person even wants to be my friend and allow me to continue to be in her presence. There was once when I did say something to a good friend and it wasn’t reciprocated, we’re still close friends, but that wasn’t flirting so much as “we’ve just
methad lunch because you suggested it, and I’m feeling some attraction—you? Nope? Ok then, I still think you’re awesome and we should be friends”. There have also been a couple instances where I’ve met someone at an activity or through other friends or at work, hinted at an attraction, she’d hinted back, we’d done something low-stakes like going for a walk together or having a coffee, but it wasn’t an official date or anything, and there was some attempted flirting with mixed success in that context.What I’m picturing if I was back on the dating market (I’m with a good partner currently, hopefully in perpetuity) is, if I met a woman outside of a dating app who I’d like to date or add to my list of woman friends, depending on how she feels (I tend not to date people just for the hotness, they’ve got to be someone I could be friends with too), we’d probably do something low-stakes 1:1 that wasn’t officially a date or not a date, and depending on how that went, either become friends, go on dates, or part ways. And in the initial figuring out how things were going to go, there would likely be some flirting. At least, I expect that’s how it’d go.
I’m not so deliberate/strategic about it, but yeah. Like, there’s another ‘algorithm’ that’s more intuitive, which is something like “When interacting with the person, it’s ~always an active part of your mental landscape that you’re into them, and this naturally affects your words and actions. Also, you don’t want to make them uncomfortable, so you suppress anything that you think they wouldn’t welcome”. This produces approximately the same policy, because you’ll naturally leak some bits about your interest in them, and you’ll naturally be monitoring their behaviour to estimate their interest in you, in order to inform your understanding of what they would welcome from you. As you gather more evidence that they’re interested, you’ll automatically become more free in allowing your interest to show, resulting in ~the same ‘escalation of signals of interest’.
I think the key thing about this is like “flirting is not fundamentally about causing someone to be attracted to you, it’s about gracefully navigating the realisation that you’re both attracted to each other”. This is somewhat confused by the fact that “ability to gracefully navigate social situations” is itself attractive, so flirting well can in itself make someone more attracted to you. But I claim that this isn’t fundamentally different from the person seeing you skillfully break up a fight or lead a team through a difficult situation, etc.
Notwithstanding, I think flirting is substantially (perhaps even fundamentally) about both (i) attraction, and (ii) seduction. Moreover, I think your model is too symmetric between the parties, both in terms of information-symmetry and desire-symmetry across time.
My model of flirting is roughly:
Alice attracts Bob → Bob tries attracting Alice → Alice reveals Bob attracts Alice → Bob tries seducing Alice → Alice reveals Bob seduces Alice → Initiation
I never did quite that thing successfully. I did have one time when I dropped progressively unsubtle hints on a guy, who remained stubbornly oblivious for a long time until he finally got the message and reciprocated.
I interpret the confusion around flirting as “life imitating art” — specifically, there is a cultural narrative about how flirting works that a lot of socially awkward people are trying to implement.
That means there are big discrepancies between how experts flirt and how most people flirt. It also means that most people have to learn how to read the flirtation signals of other low-flirtation-skill people.
The cultural narrative around flirting therefore doesn’t exactly match practice, even though it influences practice.
It doesn’t necessarily take that much flirting to build enough confidence to ask someone out. Are they alone at a party? Is your conversation with them going on longer than for most people? Is it fun? You’re all set.
Yes. But usually the escalation happens over weeks or months, over multiple conversations (at least in my relatively awkward nerd experience). So it’d be difficult to notice people doing this. Maybe twice I’ve been in situations where hints escalated within a day or two, but both were building from a non-zero level of suspected interest. But none of these would have been easy to notice from the outside, except maybe at a couple of moments.
There’s two parts here.
Are people using escalating hints to express romantic/sexual interest in general?
Does it follow the specific conversational patterns usually used?
1 is true in my experience, while 2 usually isn’t. I can think of two examples where I’ve flirted by escalating signals. In both cases it was more to do with escalating physical touch and proximity, though verbal tone also played a part. I would guess that the typical examples of 2 you normally see (like A complimenting B’s choice of shoes, then the B using a mild verbal innuendo, then A making a comment about the B’s figure) don’t happen as often, since not many people are good enough wordsmiths to do the escalation purely verbally.
Plus it’s not the Victorian era anymore and it’s acceptable to escalate by slowly leaning forward as the conversation progresses, almost-accidentally brushing someone’s hand, etc.
One of the first things that (shy?) people use to gauge each other’s interests before or instead of talking about anything explicit is eye contact. So I think that wearing your glasses puts you at a disadvantage unless you take them off when you are flirting. I’m not sure why you’re wearing them, but taking them off in itself could be a flirty move. I am not particularly good at flirting. But I remember in 9th grade a girl I had flirted with for like half an hour at an event via eye contact. We didn’t exchange more than ~3 sentences in person (there were no innuendos). Then she called me later that same day, asking me out explicitly if I wanted to be her boyfriend.
I’m pretty sure I wouldn’t escalate those signs above a rather low threshold given any observers, and my intuition tells me other people would be similar in this regard. So not observing flirting could just imply people don’t flirt if you’re in the conversation with them. As an extreme example, I’ve never seen anyone having sex, but it seems as if people do that all the time.
In model flirting is about showing that you are paying attention. You say things that you could only pick up if you pay close attention to me and what I say. It’s like a cryptographic proof certificate, showing that you think that I am important enough to pay attention to continuously. Usually this is coupled with an optimization process of using that knowledge to make me feel good, e.g. given a compliment that actually tracks reality in a way I care about.
It’s more general than just showing sexual interest I think.
I’ve seen it happen, and have done it myself with decent success.
As @Buck notes below, dating apps, which are now a majority share of how people begin or seek to begin relationships, are far more targeted. There’s little plausible deniability involved, both of you are talking on Tinder.
Not that there isn’t some, of course. There are mind games afoot where people claim to be interested only in long-term relationships, but if you’re attractive enough, they might easily accept something shorter with no strings attached. Conversely, there are people who state they’re looking for a quick romp, but are hiding the degree of yearning they contain for something more serious.
It’s hard to break it down into a play-by-play, but in my experience, flirting starts out with friendly interactions, obvious or not so obvious signs that you’re single, gauging the reception of jokes or compliments, and then grows from there. The more you gradually establish compatibility and interest, the easier it gets to stop beating around the bush.
Epistemic status: rumor.
Word through the grapevine, for those who haven’t heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn’t just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.
My best guess this is false. As a quick sanity-check, here are some bipartisan and right-leaning organizations historically funded by OP:
FAI leans right. https://www.openphilanthropy.org/grants/foundation-for-american-innovation-ai-safety-policy-advocacy/
Horizon is bipartisan https://www.openphilanthropy.org/grants/open-philanthropy-technology-policy-fellowship-2022/ .
CSET is bipartisan https://www.openphilanthropy.org/grants/georgetown-university-center-for-security-and-emerging-technology/ .
IAPS is bipartisan. https://www.openphilanthropy.org/grants/page/2/?focus-area=potential-risks-advanced-ai&view-list=false, https://www.openphilanthropy.org/grants/institute-for-ai-policy-strategy-general-support/
RAND is bipartisan. https://www.openphilanthropy.org/grants/rand-corporation-emerging-technology-fellowships-and-research-2024/.
Safe AI Forum. https://www.openphilanthropy.org/grants/safe-ai-forum-operating-expenses/
AI Safety Communications Centre. https://www.openphilanthropy.org/grants/effective-ventures-foundation-ai-safety-communications-centre/ seems to lean left.
Of those, I think FAI is the only one at risk of OP being unable to fund them, based on my guess of where things are leaning. I would be quite surprised if they defunded the other ones on bipartisan grounds.
Possibly you meant to say something more narrow like “even if you are trying to be bipartisan, if you lean right, then OP is substantially less likely to fund you” which I do think is likely true, though my guess is you meant the stronger statement, which I think is false.
Also worth noting Dustin Moskowitz was a prominent enough donor this election cycle, for Harris, to get highlighted in news coverage of her donors: https://www.washingtonexaminer.com/news/campaigns/presidential/3179215/kamala-harris-influential-megadonors/ https://www.nytimes.com/2024/10/09/us/politics/harris-billion-dollar-fundraising.html
Curious whether this is a different source than me. My current best model was described in this comment, which is a bit different (and indeed, my sense was that if you are bipartisan, you might be fine, or might not, depending on whether you seem more connected to the political right, and whether people might associate you with the right):
If it is true that OP has withdrawn funding from explicitly bipartisan orgs, even if not commonly associated with the right, then that would be an additional update for me, so am curious whether this is mostly downstream of my interpretations or whether you have additional sources.
I am posting this now mostly because I’ve heard it from multiple sources. I don’t know to what extent those sources are themselves correlated (i.e. whether or not the rumor started from one person).
A related comment from lukeprog (who works at OP) was posted on the EA Forum. It includes:
I think the comment more confirms than disconfirms John’s comment (though I still think it’s too broad for other reasons). OP “funding” something historically has basically always meant recommending a grant to GV. Luke’s language to me suggests that indeed the right of center grants are no longer referred to GV (based on a vague vibe of how he refers to funders in plural).
OP has always made some grant recommendations to other funders (historically OP would probably describe those grants as “rejected but referred to an external funder”). As Luke says, those are usually ignored, and OP’s counterfactual effect on those grants is much less, and IMO it would be inaccurate to describe those recommendations as “OP funding something”. As I said in the comment I quote in the thread, most OP staff would like to fund things right of center, but GV does not seem to want to, as such the only choice OP has is to refer them to other funders (which sometimes works, but mostly doesn’t).
As another piece of evidence, when OP defunded all the orgs that GV didn’t want to fund anymore, the communication emails that OP sent said that “Open Philanthropy is exiting funding area X” or “exiting organization X”. By the same use of language, yes, it seems like OP has exited funding right-of-center policy work.
(I think it would make sense to taboo “OP funding X” in future conversations to avoid confusion, but also, I think historically it was very meaningfully the case that getting funded by GV is much better described as “getting funded by OP” given that you would never talk to anyone at GV and the opinions of anyone at GV would basically have no influence on you getting funded. Things are different now, and in a meaningful sense OP isn’t funding anyone anymore, they are just recommending grants to others, and it matters more what those others think then what OP staff thinks)
Is this development unexpected enough to worth remarking upon? This is just Conquest’s Second Law.
Takeaways From “The Idea Factory: Bell Labs And The Great Age Of American Innovation”
Main takeaway: to the extent that Bell Labs did basic research, it actually wasn’t all that far ahead of others. Their major breakthroughs would almost certainly have happened not-much-later, even in a world without Bell Labs.
There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.
Most inventions which were central to Bell Labs actually started elsewhere. The travelling-wave tube started in an academic lab. The idea for fiber optic cable went way back, but it got its big kick at Corning. The maser and laser both started in universities. The ideas were only later picked up by Bell.
In other cases, the ideas were “easy enough to find” that they popped up more than once, independently, and were mostly-ignored long before deployment—communication satellites and cell communications, for instance.
The only fundamental breakthrough which does not seem like it would have soon appeared in a counterfactual world was Shannon’s information theory.
So where was Bell’s big achievement? Mostly in development, and the research division was actually an important component of that. Without in-house researchers chewing on the same problems as the academic labs, keeping up-to-date with all the latest findings and running into the same barriers themselves, the development handoff would have been much harder. Many of Bell Labs’ key people were quite explicitly there to be consulted—i.e. “ask the guy who wrote the book”. I think it makes most sense to view most of the Labs’ research that way. It was only slightly ahead of the rest of the world at best (Shannon excepted), and often behind, but having those researchers around probably made it a lot easier to get new inventions into production.
Major reason this matters: a lot of people say that Bell was able to make big investments in fundamental research because they had unusually-long time horizons, protected by a monopoly and a cozy government arrangement (essentially a Schumpeterian view). This is contrasted to today’s silicon valley, where horizons are usually short. But if Bell’s researchers generally weren’t significantly ahead of others, and mostly just helped get things to market faster, then this doesn’t seem to matter as much. The important question is not whether something silicon-valley-like induces more/less fundamental research in industrial labs, but whether academics heeding the siren call of startup profits can get innovations to market as quickly as Bell Labs’ in-house team could. And by that metric, silicon valley looks pretty good: Bell Labs could get some impressive things through the pipe very quickly when rushed, but they usually had no reason to hurry, and they acted accordingly.
I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of “make communication reliable and practical between any two places on earth”. When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn’t do anything of the same significance because he’d lose that “compass”. Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.
So I read SB1047.
My main takeaway: the bill is mostly a recipe for regulatory capture, and that’s basically unavoidable using anything even remotely similar to the structure of this bill. (To be clear, regulatory capture is not necessarily a bad thing on net in this case.)
During the first few years after the bill goes into effect, companies affected are supposed to write and then implement a plan to address various risks. What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing. Or, worse, those symbolic-gesture plans will become the new standard going forward.
In order to avoid this problem, someone at some point would need to (a) have the technical knowledge to evaluate how well the plans actually address the various risks, and (b) have the incentive to actually do so.
Which brings us to the real underlying problem here: there is basically no legible category of person who has the requisite technical knowledge and also the financial/status incentive to evaluate those plans for real.
(The same problem also applies to the board of the new regulatory body, once past the first few years.)
Having noticed that problem as a major bottleneck to useful legislation, I’m now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group—the insurers—who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.
The only enforcement mechanism that the bill has is that the Attorney General (AG) of California can bring a civil claim. And, the penalties are quite limited except for damages. So, in practice, this bill mostly establishes liability enforced by the AG.
So, the way I think this will go is:
The AI lab implements a plan and must provide this plan to the AG.
If an incident occurs which causes massive damages (probably ball park of $500 million in damages given language elsewhere in the bill), then the AG might decide to sue.
A civil court will decide whether the AI lab had a reasonable plan.
I don’t see why you think “the bill is mostly a recipe for regulatory capture” given that no regulatory body will be established and it de facto does something very similar to the proposal you were suggesting (impose liability for catastrophes). (It doesn’t require insurance, but I don’t really see why self insuring is notably different.)
(Maybe you just mean that if a given safety case doesn’t result in that AI lab being sued by the AG, then there will be a precedent established that this plan is acceptable? I don’t think not being sued really establishes precedent. This doesn’t really seem to be how it works with liability and similar types of requirements in other industries from my understanding. Or maybe you mean that the AI lab will win cases despite having bad safety plans and this will make a precedent?)
(To be clear, I’m worried that the bill might be unnecessarily burdensome because it no longer has a limited duty exemption and thus the law doesn’t make it clear that weak performance on capability evals can be sufficient to establish a good case for safety. I also think the quantity of damages considered a “Critical harm” is too low and should maybe be 10x higher.)
Here is the relevant section of the bill discussing enforcement:
(1) is decently small, (2) is only indirectly expensive, (3) is where the real penalty comes in (note that this is damages), (4) is small, (5) is probably unimportant (but WTF is (5) suppose to be for?!?).
Good argument, I find this at least somewhat convincing. Though it depends on whether penalty (1), the one capped at 10%/30% of training compute cost, would be applied more than once on the same model if the violation isn’t remedied.
I’m pessimistic enough about the AI situation that even if all the bill does is slow down the AGI project a little (by wasting the time of managers and contributors) I’m tentatively for it.
For the reasonable price of $300 dollars per month, I insure anybody against the destruction of the known world. Should the world be destroyed by AGI I’ll give you your money back 10100 fold.
That said, if there were insurers, they would probably be more likely than average to look into AI X-risk. Some might then be convinced that it is important and that they should do something about it.
I don’t understand this. Isn’t the strongest incentive already present (because extinction would effect them)? Or maybe you mean smaller scale ‘catastrophes’?
I think people mostly don’t believe in extinction risk, so the incentive isn’t nearly as real/immediate.
+1, and even for those who do buy extinction risk to some degree, financial/status incentives usually have more day-to-day influence on behavior.
I’m imagining this:
Case one: would-be-catastrophe-insurers don’t believe in x-risks, don’t care to investigate. (At stake: their lives)
Case two: catastrophe-insurers don’t believe in x-risks, and either don’t care to investigate, or do for some reason I’m not seeing. (At stake: their lives and insurance profits (correlated)).
They can believe in catastrophic but non-existential risks. (Like, AI causes something like crowdstrike periodically if your not trying to prevent that )
I’ve just started reading the singular learning theory “green book”, a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I’ll call one of them “second-language Bayesian”, and the other “native Bayesian”.
Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I’ll call “classical” statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there’s some “true distribution” from which the data is sampled independently and identically. The core question is then “Does our inference technique converge to the true distribution as the number of data points grows?” (or variations thereon, like e.g. “Does the estimated mean converge to the true mean”, asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference methods are judged; that’s the main reason to choose one method over another in the first place.
Watanabe’s book is pretty explicitly second-language Bayesian. I also remember Gelman & co’s Bayesian Data Analysis textbook being second-language Bayesian, although it’s been a while so I could be misremembering. In general, as the name suggests, second-language Bayesianism seems to be the default among people who started with a more traditional background in statistics or learning theory, then picked up Bayesianism later on.
In contrast, native Bayesian texts justify Bayesian inference via Cox’ theorem, dutch book theorems, or one among the long tail of similar theorems. “Does our inference technique converge to the ‘true distribution’ as the number of data points grows?” is not the main success criterion in the first place (in fact a native Bayesian would raise an eyebrow at the entire concept of a “true distribution”), so mostly the question of convergence just doesn’t come up. Insofar as it does come up, it’s an interesting but not particularly central question, mostly relevant to numerical approximation methods. Instead, native Bayesian work ends up focused mostly on (1) what priors accurately represent various realistic kinds of prior knowledge, and (2) what methods allow efficient calculation/approximation of the Bayesian update?
Jaynes’ writing is a good example of native Bayesianism. The native view seems to be more common among people with a background in economics or AI, where they’re more likely to absorb the Bayesian view from the start rather than adopt it later in life.
Is there any “native” textbook that is pragmatic and explains how to use bayesian in practice (perhaps in some narrow domain)?
I don’t know of a good one, but never looked very hard.
Just got my whole genome sequenced. A thing which I could have figured out in advance but only realized once the results came back: if getting a whole genome sequence, it’s high value to also get your parents’ genomes sequenced.
Here’s why.
Suppose I have two unusual variants at two different positions (not very close together) within the same gene. So, there’s a variant at location A, and a variant at location B. But (typically) I have two copies of each gene, one from each parent. So, I might have the A and B variants both on the same copy, and the other copy could be normal. OR, I could have the A variant on one copy and the B variant on the other copy. And because modern sequencing usually works by breaking DNA into little chunks, sequencing the chunks, and then computationally stitching it together… those two possibilities can’t be distinguished IIUC.
The difference is hugely important if e.g. both the A variant and the B variant severely fuck up the gene. If both are on the same copy, I’d have one normal working variant and one fucked up. If they’re on different copies, then I’d have zero normal working variants, which will typically have much more extreme physiological results.
The easiest way to distinguish those two possibilities, IIUC, is to get the parents’ genomes. In one case, I’d see the A and B variant in the same parent, and the other parent would have a normal gene. In the other case, I’d see the A variant in one parent and the B variant in the other.
In principle there are other ways to distinguish the two possibilities (like long-read sequencing), but getting the parents’ sequence is probably the cheapest/easiest.
Yeah, if anyone is interested in learning more, this is called the phasing problem. For common enough variants, it’s often possible to figure this out by looking at general patterns of co-inheritance if you have a large reference dataset for the population (see: https://www.nature.com/articles/s41588-023-01415-w). Long read sequencing which you mentioned is another approach. But you’re right that these days it would just be cheapest to get the parental genomes (assuming that’s an option).
Question I’d like to hear peoples’ takes on: what are some things which are about the same amount of fun for you as (a) a median casual conversation (e.g. at a party), or (b) a top-10% casual conversation, or (c) the most fun conversations you’ve ever had? In all cases I’m asking about how fun the conversation itself was, not about value which was downstream of the conversation (like e.g. a conversation with someone who later funded your work).
For instance, for me, a median conversation is about as fun as watching a mediocre video on youtube or reading a mediocre blogpost. A top-10% conversation is about as fun as watching a generic-but-fun movie, like e.g. a Jason Statham action flick. In both cases, the conversation drains more energy than the equal-fun alternative. I have probably had at most a single-digit number of conversations in my entire life which were as fun-in-their-own-right as e.g. a median night out dancing, or a median escape room, or median sex, or a median cabaret show. Maybe zero, unsure.
The rest of this is context on why I’m asking which you don’t necessarily need to read in order to answer the question...
So I recently had a shortform asking “hey, that thing where people send mutually escalating signals of sexual intent during a conversation, is that a thing which typical people actually do?” and a horde of people descended to say “YES, obviously, how the fuck have you not noticed that???”. So naturally I now wonder exactly how this apparently-obvious-to-everyone-else thing has remained approximately invisible to me, and what else I might be missing nearby. What exactly is the shape of my blindspot here?
And a leading hypothesis for the shape of the blindspot is that I generally find casual conversation way more boring than most people, and therefore have not noticed some things which happen during casual conversation.
Some supporting evidence for this:
Back in fourth grade a school psychologist observed me for reasons, and in her report said that I would sit alone at lunch with a book, and if anyone came over to chat I would put the book down and talk to them and generally seemed friendly in normal ways, and then once they left I would pick the book back up. I certainly recall finding the books more interesting than conversation with my classmates.
Notably, plenty of people have said that I am pretty good at casual conversation, at least when I’m bothering. (The people who know me best eventually realize that I have a mental switch for this, and can intentionally toggle it.) I can make it a relatively fun conversation. But, like, I personally still find it kind of mid as entertainment goes.
When I think of conversations which stand out as really great for me, they’re cases where either I learned some technical thing I didn’t previously know, or they lead into more fun things later (and most of the fun was from the later things). I can drive the sort of playful conversations which IIUC lots of people like, but… they don’t stand out as especially fun in my recollection. Fun relative to other conversation, sure, but conversation just isn’t a particularly fun medium.
So anyway, I’m trying to get a bead on whether this hypothesis is correct, or whether I have a differently-shaped blindspot, or whether I’m missing something else entirely. Thank you all in advance for your data!
I find conversations more meaningful than many comparably-fun activities. What provides the meaning is my intuition about the opportunities the conversation can lead to and the update in how I’m perceived by my counterpart. As a secondary effect, conversations exercise and test my ability to think on my feet.
Flirtation can lead to sex, a coffee break chat with a collaborator can lead to a new project, a talk with anyone can lead to closer friendship. Flirtation suggests I’m more desirable than I thought, talk about projects that I’m regarded as more capable, talk with acquaintances that I’m charismatic.
These social updates and the mental exercise conversation provides are why I seek out conversation compared to many other more-fun activities. Also, I have to recognize that I probably value conversation for its own sake above and beyond these instrumental purposes. It just feels like it ought to be part of a good life aesthetic, like eating fresh fruits and vegetables.
As said by @Mateusz Bagiński , normal smalltalk is +epsilon, but some more comparisons:
a short smile with a stranger or acquaintance is like eating a very tasty fruit.
90% percentile conversations are all with good friends and leave me high for a few hours. As good as a very good date. No non-social activities come close.
I don’t actually remember any best particular ones, but the best ones i can recall aren’t about conversations anymore but about presence, which isn’t conversation anymore, I think. They feel extremely nourishing and meaningful and my only comparison is a really, really good IFS or therapy session.
A top [1-5?]% conversation is as good in the moment as an early playthrough of my favorite video games, and feels better afterward. That’s probably top 10% of conversations at parties, which have higher selection pressure than uber drivers.
I’ve been working on getting more out of lower percentile conversations. The explanation is fairly woo-ey but might also relate to your interest around flirting.
Median conversation is about as good as a TV show I will watch for two episodes and give up on.
Tangent: my standards for media have gone way up over the last ~5 years, I abandon a lot more out of boredom, especially books. I worried this was some sort of generalized anhedonia, but every once in a while read or reread something great and enjoy it immensely, so I think it’s just raised standards.
I’d be interested to hear that.
This mostly comes up with talkative Uber drivers. The superficial thing I do is I ask myself “what vibes is this person offering?” And then do some kind of centering move. Sometimes it feels unexpectedly good and I do an accepting mood and feel nourished by the conversation. Sometimes it will feel bad and I’ll be more aggressive in shutting conversations down. I’m often surprised by the vibe answer, it feels different than what my conscious brain would answer.
The obvious question is what am I doing with the inquiry and accepting moves. I don’t know how to explain that.
Overall a growth edge I’m exploring right now is “forms of goodness other than interesting.” And I think that’s probably a weak area for you too, although maybe an endorsed one
Median party conversation is probably about as good as playing a video game I enjoy, or reading a good blog post. Value maybe £2/hr. More tiring than the equivalent activity.
Top 10% party conversation is somewhere around going for a hike somewhere very beautiful near to where I live, or watching an excellent film. Value maybe £5/hr. These are about as tiring as the equivalent activity.
Best conversations I’ve ever had were on par with an equal amount of time spent on a 1/year quality holiday, like to Europe (I live in the UK) but not to, say, Madagascar. Most of these conversations went on for >1 hr. Value maybe 25/hr. Less tiring and if anything energizing.
(For monetary values I’m imagining what I’d pay to go to a party for 4 hours where that event woud occur. My overall income minus expenses is probably a bit below average for the UK, so take that into account.)
I generally agree with you that normal conversations are boring and should be avoided. There are two main strats I employ:
Don’t let go of relationships where you can relax: my sample size is highly skewed towards retaining long-term relationships where you’re comfortable enough with people that you can just chill and relax so my median conversation is like that?
You create a shared space and the norms come from that shared space so to shape conversations you can say some deliberately out of pocket stuff (randomly jump into a yoda accent for example) in order to change the vibe and therefore remove part of the cognitive load?
If the person is like “ugghh, wtf?” in vibe you just move on to the next conversation ¯\_(ツ)_/¯
I think the median conversation for me is zero or positive-but-very-small epsilon fun, whereas the 90th percentile is maybe as fun as discovering a new song/band/album that I like a lot or listening to one of my favorite songs after several weeks of not listening to it. The most fun conversations I’ve ever had are probably the most fun experiences I’ve ever had.
I don’t find conversations-in-general draining, although I can get exhausted by social activities where I’m supposed to play some role that is out of character for me, like in LARPing (though that might be a learnable-skill issue) or extended-family reunions.
Can you give an example of what a “most fun” conversation looked like? What’s the context, how did it start, how did the bulk of it go, how did you feel internally throughout, and what can you articulate about what made it so great?
At a recent EAG afterparty, bored @Algon suggested that he explain something to me, and I explain something to him in return. He explained to me this thing. When it was my turn, I thought that maybe I should do the thing that had been on my mind for several months: give a technical explanation of monads starting with the very basics of category theory, and see how long it takes. It turned out that he knew the most basic basics of category theory, so it was a bit more of an easy mode, but it still took something like 50 minutes, out of which maybe half was spent on natural transformations. A few minutes in, @niplav joined us. I enjoyed drawing diagrams and explaining and discussing a technical topic that I love to think about, in the absurd setting of people playing beerpong one meter from the whiteboard, with passers-by asking “Are you guys OK?” or “WTF are you doing?” (“He’s explaining The Meme!”). It was great to witness them having intuition breakthroughs, where you start seeing something that is clear and obvious in hindsight but not in foresight (similar to bistable figures). Throughout, I also noticed some deficiencies in my understanding (e.g., I noticed that I didn’t have a handy collection of examples with which to illustrate some concepts). I felt very satisfied afterwards.
https://x.com/norvid_studies/status/1931841744754323941
Can confirm that I was bored (no room for a sword-fight!), knew very little category theory, and learned about monads. But at least now I know that while a monad is not like a burrito, a burrito is like a monad.
Rant: Man, I don’t like how unwieldy the categorical definition of a monoid is! So very many functors, transformations, diagrams etc. And they’re not even particularly pleasing diagrams. The type-theoretic definition of a monad, as covered in this lovely n-lab article, felt less awkward to me. But admittedly, learning the categorical definition did help with learning the type-theoretic definition.
That was very useful for me, thankyou!
Follow-up question: can you give an example of a plausibly-most-fun non-conversation experience you’ve had?
[REDACTED but you can DM if you want to know]
The last year, my median conversation was about as entertaining as yours. The top 10% conversations are fun-in-their-own-right at that moment already because my brain anticipates some form of long-term value (with the exception of cracking jokes). I don’t know if all those conversations would count as “casual”. As intellectually stimulating as the Task Master TV-show is funny. Conversation is more heavy tailed than movies though. Long term value includes: learning or teaching (learning some new technical thing that’s usually not written down anywhere (Podcasts tend to be better for that), getting a pointer about something to learn about, teaching something technical in the anticipation that the other person is actually going to do anything with that knowledge, incorporating the generating function behind someone’s virtues/wisdom), thinking out loud with someone else in the expectation that this might lead to an interesting idea, gossip, life stories (sometimes preventing you from harm from people/situations that can’t be trusted. Sometimes just illuminating parts of life you’d know less about). My most fun conversation had me grinning for 30 minutes after still, and my heartbeat after that time was also still 10–20 beats higher than usual.
My median conversations at parties over my entire life are probably less entertaining than your median ones. My bar for an interesting conversation also rose when I stumbled over the wider rationalist sphere. I remember two conversations from before that era where the main important information was essentially just “there are other smart people out there, and you can have interesting conversations with them where you can ask the questions you have etc.”. One was at a networking event for startup founders, and the other was a Computer Science PhD student showing me his work and the university campus (same conversation that got my heart-beat up).
I’m returning to this thread to check a new hypothesis.
For those who said top ~10% of conversations are high value: what’s the felt experience during those conversations?
In particular (this is a question about a specific hypothesis, please read it only after considering the first question in order to avoid anchoring):
is there a sort of warm fuzzy feeling in your chest directed at the other participants, and does the bulk of the value derive from that feeling?
Tagging people who had useful answers previously and whose answers to this question I’d like to hear: @Selfmaker662 @Elizabeth @J Bostock @Mateusz Bagiński
Part 1
In my mind, there’s a difference between “conversation was valuable” and “conversation was fun”. They often go together, but not necessarily so.
Valuable: The best thing I can come up with is something like: my understanding has grown thanks to this conversation, or I have seen a bigger picture (not necessarily being able to legibilize/verbalize this new piece of my understanding). I feel like my mind is drawn to the inquiry, especially when it’s challenging, but I’m having some minimum of traction to keep me going and retain mostly positive valence.
Fun: Some sort of intellectual/cognitive camaraderie (“meeting of minds”) is often a big part of the fun. Not even super high-falluting bluesky conversations, I can bond with someone by helping them fix a pernicious bug in code. Something something, we are acting a bit more like one superagent that is trying to do something through conversation or spread one part’s understanding to other parts?
Part 2
I mostly don’t feel emotions in my body that much, at least much less so than other people, and when I do, it’s usually either clearly negative emotions (strong stress, panic) or “raw”/ambiguous excitement/arousal. (If it feels like part 1 doesn’t quite answer your question, that’s why (though it might also be some sort of skill issue on my side, lol).) So, no, no warm fuzzy feelings in my chest.
Spoilered to avoid anchoring:
There’s two main categories, but they both have in common a kind of “flow state” where attention and awareness are focused onto the other person. The two categories are:
Flirting, where the back and forth comes from signalling sexual/romantic interest
Productive intellectual discussion with an equal, where the back and forth comes from sharing evidence and updating
The qualia for me for conversations is usually not pronouncedly “a warm feeling in chest” (it is noticeably different from what I call “Deep/Meaningful Limerence” which I think you’re pointing at).
Three distinct flavor of good conversation:
alive, creative, magnetic vibrant conversation (I think I might describe part of this as slightly warm chest, I don’t quite remember, I haven’t had it recently. But it’s more the qualia of traditional excitement than warm connection”. (I bet you have these conversations occasionally, or at least ever have, and they correlate more with obvious John values)
slightly nice sitting-around-living-room or restaurant/bar or campfire vibes (shallow)
somewhat-more-nice sitting around living-room/campfire vibes where the conversation is sort of “deep”, in a way that multiple people are talking about something either emotionally confusing, or psychologically fraught, or “meaning-making”-ish.
I expect #3 (less confidently than #1) to be somewhat obviously valuable to you in some circumstances regardless of qualia. But, it does have some particular qualia that’s like (hrm, probably can’t remember actual biological phenomenology right now), but, like, spacious, relaxed, I think there’s maybe some kind of feeling in my chest but I don’t have a good word for it.
#2… I think might have a very mild version of “warm feeling in chest”. Or, I think it does feel warm but I think it’s more distributed throughout my body.
But I think #2 more importantly for me is like: “there is an actively (slightly) bad qualia to not-having-had-nice-livingroom-conversations lately” which is, like, feeling sort of blah, or just somewhat less vibrant. If I have something to be socially anxious about, lack of recent #2 makes it worse.
It’s different: sometimes it’s spacious calmness of being able to sit in silence together; sometimes warm feelings of seeing and being seen, when discussing something private with a good friend; or just listening to a really good story. IIRC I also included dates into conversations back then, they have a different dynamic, where a lot of pleasure is feeling a young beautiful woman being with me.
— this is a very particular feeling you have and those differ a lot in where they appear for different people, how they feel and what they’re about. Not having seen other people’s answers I‘d bet your hypothesis to be wrong.
Did you ever try Circling? I wonder some if there’s a conversational context that’s very “get to the interesting stuff” which would work better for you. (Or, even if it’s boring, it might be because it’s foregrounding relational aspects of the conversation which are much less central for you than they are for most people.)
I have a few times, found it quite interesting, and would happily do it again. It feels like the sort of thing which is interesting mainly because I learned a lot, but marginal learnings would likely fall off quickly, and I don’t know how interesting it would be after doing it a few more times.
I wanted to say that for me it is the opposite, but reading the second half I have to say it’s the same.
I have defnetly had the problem that I talked too long sometimes to somebody. E.g. multiple times I talked to a person for 8-14 hours without break about various technical things. E.g. talking about compiler optimizations, CPU architectures and this kind of stuff, and it was really hard to stop.
Also just solving problems in a conversation is very fun. The main reason I didn’t do this a lot is that there are not that many people I know, actually basically zero right now (if you exclude LLMs), that I can have the kinds of conversations with that I like to have.
It seems to be very dependent on the person.
So I am quite confused why you say “but conversation just isn’t a particularly fun medium”. If it’s anything like for me, then engaging with the right kind of people on the right kind of content is extremenly fun. It seems like your model is confused because you say “conversations are not fun” when infact in the space of possible conversations I expect there are many types of conversations that can be very fun, but you haven’t mapped this space, while implicitly assuming that your map is complete.
Probably there are also things besides technical conversations that you would find fun but that you simply don’t know about, such as hardcore flirting in a very particular way. E.g. I like to talk to Grok in voice mode, in romantic mode, and then do some analysis of some topic (or rather that is what I just naturally do), and then Grok complements my mind in ways that my mind likes, e.g. pointing out that I used a particular thinking pattern that is good or that I at all thought about this difficult thing and then I am like “Ah yes that was actually good, and yes it seems like this is a difficult topic most people would not think about.”
My life is less “fun” than it used to be because I’ve become more work-focussed. That being said, something I like is getting positive reception for ideas I’m otherwise guessing might receive negative reception. The first couple of times this happens is really nice, after that it becomes normal.
I’m confused about this anecdote. How else did the psychologist expect you (or any other kid) to behave? What else does one do when a conversation is over, other than “go back to doing what you were doing before / what you would be doing otherwise”…?
I presume the psychologist expected John to actively seek out similar conversations. From the psychologist’s perspective:
most kids would do that, but John didn’t.
most of the kids who wouldn’t do that would decline because of social anxiety/a lack of social skills/a hatred of social interactions etc, which is not the case for John; he seemed perfectly comfortable while partaking in such conversations.
Since John wasn’t in either category, it probably struck the psychologist as odd.
I see, thanks. That makes sense. (At least, the reasoning makes sense, given the psychologist’s beliefs as you describe them; I have no idea if those beliefs are true or not.)
Do group conversations count?
I would agree that the median one-on-one conversation for me is equivalent to something like a mediocre blogpost (though I think my right-tail is longer than yours, I’d say my favorite one-on-one conversations were about as fun as watching some of my favorite movies).
But, in groups, my median shifts toward 80th percentile YouTube video (or maybe the average curated post here on LessWrong).
It does feel like a wholly different activity, and might not be the answer you’re looking for. Group conversations, for example, are in a way inherently less draining: you’re not forced to either speak or actively listen for 100% of the time.
Yes.
Here’s a meme I’ve been paying attention to lately, which I think is both just-barely fit enough to spread right now and very high-value to spread.
Meme part 1: a major problem with RLHF is that it directly selects for failure modes which humans find difficult to recognize, hiding problems, deception, etc. This problem generalizes to any sort of direct optimization against human feedback (e.g. just fine-tuning on feedback), optimization against feedback from something emulating a human (a la Constitutional AI or RLAIF), etc.
Many people will then respond: “Ok, but if how on earth is one supposed to get an AI to do what one wants without optimizing against human feedback? Seems like we just have to bite that bullet and figure out how to deal with it.” … which brings us to meme part 2.
Meme part 2: We already have multiple methods to get AI to do what we want without any direct optimization against human feedback. The first and simplest is to just prompt a generative model trained solely for predictive accuracy, but that has limited power in practice. More recently, we’ve seen a much more powerful method: activation steering. Figure out which internal activation-patterns encode for the thing we want (via some kind of interpretability method), then directly edit those patterns.
I agree that there’s something nice about activation steering not optimizing the network relative to some other black-box feedback metric. (I, personally, feel less concerned by e.g. finetuning against some kind of feedback source; the bullet feels less jawbreaking to me, but maybe this isn’t a crux.)
(Medium confidence) FWIW, RLHF’d models (specifically, the LLAMA-2-chat series) seem substantially easier to activation-steer than do their base counterparts.
What other methods fall into part 2?
This seems basically correct though it seems worth pointing out that even if we are able to do “Meme part 2” very very well, I expect we will still die because if you optimize hard enough to predict text well, with the right kind of architecture, the system will develop something like general intelligence simply because general intelligence is beneficial for predicting text correctly. E.g. being able to simulate the causal process that generated the text, i.e. the human, is a very complex task that would be useful if performed correctly.
This is an argument Eliezer brought forth in some recent interviews. Seems to me like another meme that would be beneficial to spread more.
Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It’s one thing when OpenAI does it, but when Anthropic thinks it’s a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
I’d also be interested in someone doing this; I tend towards seeing it as good, but haven’t seen a compilation of arguments for and against.
Here’s an idea for a novel which I wish someone would write, but which I probably won’t get around to soon.
The setting is slightly-surreal post-apocalyptic. Society collapsed from extremely potent memes. The story is episodic, with the characters travelling to a new place each chapter. In each place, they interact with people whose minds or culture have been subverted in a different way.
This provides a framework for exploring many of the different models of social dysfunction or rationality failures which are scattered around the rationalist blogosphere. For instance, Scott’s piece on scissor statements could become a chapter in which the characters encounter a town at war over a scissor. More possible chapters (to illustrate the idea):
A town of people who insist that the sky is green, and avoid evidence to the contrary really hard, to the point of absolutely refusing to ever look up on a clear day (a refusal which they consider morally virtuous). Also they clearly know exactly which observations would show a blue sky, since they avoid exactly those (similar to the dragon-in-the-garage story).
Middle management of a mazy company continues to have meetings and track (completely fabricated) performance metrics and whatnot at the former company headquarters. None of the company’s actual business exists anymore, but every level of manager is trying to hide this fact from the levels above.
A university department with researchers who spend all of their time p-hacking results from a quantum random noise generator. They have no interest in the fact that their “research” does not tell them anything about the physical world or does not replicate; what does that have to do with Science? Their goal is to publish papers.
A government agency which still has lots of meetings and paperwork and gives Official Recommendations and updates their regulations. They have no interest in the fact that the thing they once regulated (maybe banks?) no longer exists, or the fact that no central government enforces their regulations any more.
An automated school (i.e. video lectures and auto-graded assignments/tests) in which students continue to study hard and stress over their grades and attendance, despite there no longer being anyone in the world who cares.
Something like Parable of the Dammed.
Something like Feynman’s cargo-cults parable or the emporer’s nose parable.
Something like House of God. A readers’ digest version of House of God could basically be a chapter in its own right, that’s roughly the vibe I have in mind.
A residential area in which “keeping up with the Joneses” has been ramped up to 11, with everyone spending every available resource (and roughly-all waking hours) on massive displays of Christmas lights.
A group trying to save the world by spreading awareness of dangerous memes, but their movement is a dangerous meme of its own and they are spreading it.
A town of people who really want to maximize the number paperclips in the universe (perhaps due to an AI-optimized advertisement), and optimize for that above all else.
A town of people who all do whatever everyone else is doing, on the basis of generalized efficient markets: if there were any better options, then someone would have found it already. None of them ever actually explore, so they’re locked in.
A happy-death-spiral town around some unremarkable object (like an old shoe or something) kept on a pedestal in the town square.
A town full of people convinced by a sophisticated model that the sun will not come up tomorrow. Every day when the sun comes up, they are distressed and confused until somebody adds some more epicycles to the model and releases an updated forecast that the sun will instead fail to come up the next day.
A town in which a lion shows up and starts eating kids, but the whole town is at simulacrum 3, so they spend a lot of time arguing about the lion as a way of signalling group association but they completely forget about the actual lion standing right there, plainly visible, even as it takes a kid right in front of them all.
Witch-hunt town, in which everything is interpreted as evidence of witches. If she claims to be a witch, she’s a witch! If she claims not to be a witch, well that’s what a witch would say, so she’s a witch! Etc.
The generator for these is basically: look for some kind of rationality failure mode (either group or personal), then ramp it up to 11 in a somewhat-surrealist way.
Ideally this would provide an introduction to a lot of key rationalist ideas for newcomers.
A town of anti-inductivists (if something has never happened before, it’s more likely to happen in the future). Show the basic conundrum (“Q: Why can’t you just use induction? A: Because anti-induction has never worked before!”).
A town where nearly all people are hooked to maximally attention grabbing & keeping systems (maybe several of those, keeping people occupied in loops).
I’m writing a 1-year update for The Plan. Any particular questions people would like to see me answer in there?
I had a look at The Plan and noticed something I didn’t notice before: You do not talk about people and organization in the plan. I probably wouldn’t have noticed if I hadn’t started a project too, and needed to think about it. Google seems to think that people and team function play a big role. Maybe your focus in that post wasn’t on people, but I would be interested in your thoughts on that too: What role did people and organization play in the plan and its implementation? What worked, and what should be done better next time?
What’s the specific most-important-according-to-you progress that you (or other people) have made on your agenda? New theorems, definitions, conceptual insights, …
Any changes to the high-level plan (becoming less confused about agency, then ambitious value learning)? Any changes to how you want to become less confused (e.g. are you mostly thinking about abstractions, selection theorems, something new?)
What are the major parts of remaining deconfusion work (to the extent to which you have guesses)? E.g. is it mostly about understanding abstractions better, or mostly about how to apply an understanding of abstractions to other problems (say, what it means for a program to have a “subagent”), or something else? Does the most difficult part feel more conceptual (“what even is an agent?”) or will the key challenges be more practical concerns (“finding agents currently takes exponential time”)?
Specifically for understanding abstractions, what do you see as important open problems?
Corrigibility proposal. Status: passed my quick intuitive checks, I want to know if anyone else immediately sees a major failure mode before I invest more time into carefully thinking it through.
Setup: shutdown problem. Two timesteps, shutdown button will be either pressed or not-pressed at second timestep, we want agent to optimize for one of two different utility functions depending on whether button is pressed. Main thing we’re trying to solve here is the “can’t do this with a utility maximizer” barrier from the old MIRI work; we’re not necessarily trying to solve parts like “what utility function incentivizes shutting down nicely”.
Proposal: agent consists of two subagents with veto power. Subagent 1 maximizes E[u1|do(press)], subagent 2 maximizes E[u2|do(no press)]. Current guess about what this does:
The two subagents form a market and equilibrate, at which point the system has coherent probabilities and a coherent utility function over everything.
Behaviorally: in the first timestep, the agent will mostly maintain optionality, since both subagents need to expect to do well (better than whatever the veto-baseline is) in their worlds. The subagents will bet all of their wealth against each other on the outcome of the button, so at the second timestep whichever one is correct will gain 100% of decision-making power. That all sounds like what we want in this setup.
The counterfact breaks any causal influence over the button-press for purposes of the agent’s planning, so it will not try to manipulate button-press probability.
Notably: once the subagents equilibrate, their market probability on button outcome will have ~nothing to do with the original probability assigned by whatever model was counterfacted-upon; that market probability mainly reflects the bargaining power of the two subagents. Similarly, any correlations between button-press and stuff causally upstream of button-press are broken, and the agent will make incorrect predictions about those correlations if asked. That means external agents should expect to be able to extract money via bets on the button-press, if the agent trades on such bets at all. (This might be a feature rather than a bug, since corrigible agents aren’t really supposed to think about the causal process upstream of their shutdown-button.) My guess is that this is where the proposal steps through a loophole in the old impossibility results, though I haven’t carefully read through those results and checked the math on this proposal against them yet.
Anybody immediately see a problem? I’d be particularly interested in input from @Stuart_Armstrong or @So8res, though you don’t need to spend a lot of time carefully thinking through it—I’m just hoping for a quick check here, from people who have better cached heuristics than I do for what failure modes to look for.
I don’t necessarily see how that prevents the AI from destroying the button and murdering its operators (and then everyone else). Neither subagent inherently cares; both assume it already has been pressed or hasn’t been. Which, yes, means neither will deliberately plot to interact with it for reasons downstream of this setup; but also, neither will care to protect it from being destroyed as the side-effect of some other activity.
And if u1 and u2 were perfectly aligned, that’d be fine. But the reason we’re using the corrigibility setup to begin with is because we’re not confident in them, right? So it needs to work in situations where utility functions, u1 in particular, recommend bad stuff.
And in that case, either Subagent 2 will have enough bargaining power to prevent Subagent 1 from ever doing anything, or it won’t, meaning Subagent 1 will be able to have some influence on the world. At which point the planet-destroying death rays argument applies: the most minuscule amount of superintelligent optimization not carefully shaped to avoid killing everyone will kill everyone.
Like, imagine that Subagent 1 is straight-up omnicidal out of the gates, and Subagent 2 is perfectly aligned to the shutdown task. That’s a situation in which this setup should work, right? And what we’d want is for Subagent 2 to dominate completely. But Subagent 2′s bargaining power doesn’t scale with Subagent 1′s misalignment, so if the setup doesn’t always give it complete veto power, Subagent 1 would be allowed to do small stuff like “release the hyperviral worship-the-machine-god memagent” (as bargained down by Subagent 2 from its preferred “release the grey goo”).
Maybe I’m miscalibrated as to how you’re imagining the counterfactuals to work here. But the way I see it, even if the AI aims to take actions that do similarly well in both worlds, that’s not necessarily survivable for us? Especially if they disagree so badly they have to compromise on something that both of them hate (importantly including Subagent 2!).
(Like both settling on only ever taking over 50% of the universal negentropy while leaving the other 50% causally uninfluenced, or only ever using 50% of the causal influence they can bring to bear while wiping out humanity, or whatever “do 50% of immediately shutting down” shakes out to mean by u2′s terms.)
Another issue I see is implementational, so maybe not what you’re looking for. But: how are we keeping these “subagents” trapped as being part of a singular agent? Rather than hacking their way out into becoming separate agents and going to war with each other, or neatly tiling exactly 50% of the cosmos with their preferred squiggles, or stuff like that? How is the scenario made meaningfully different from “we deploy two AIs simultaneously: one tasked with building an utopia-best-we-could-define-it, and another tasked with foiling all of the first AI’s plans”, with all the standard problems with multi-AI setups?
… Overall, ironically, this kind of has the vibe of Godzilla Strategies? Which is the main reason I’m immediately skeptical of it.
Yeah, the thing where the operators/button are destroyed as a side effect, without the AI actively optimizing for that, isn’t something this proposal is trying to solve (yet). The proposal is aimed at a relatively narrow problem.
One thing which this proposal is trying to solve is the barrier found in MIRI’s old work: if we try to avoid button-manipulation by making the agent indifferent to the button state (and otherwise maximize U1 before button is pressed and U2 after), and the agent builds a child prior to the button being pressed, then the agent won’t construct the child to respect shutdown commands; it will just construct the child to maximize U1.
(More generally, this proposal is generally trying to get past the hardness results from MIRI’s work, which themselves were aimed at a very simplified version of the problem.)
Key point here: whatever actions the composite agent takes prior to the button revelation, must be a pareto improvement over the (implicit) default action. Otherwise, one of the two subagents would veto, resulting in the default action.
It does matter a lot what the default action is.
More precisely, the analogy would be: we deploy two AIs simultaneously, one which builds a utopia and expects that a certain button will be pressed with probability 1 and has nothing causally upstream of it, the other of which is evil and expects the same button will be pressed with probability 0 and has nothing causally upstream of it. The button plays a very important role: the two AIs will bet all their influence on that button-press, each confident that they’ll win, and neither has any incentive at all, or even any ability at all (under their own model), to manipulate the button.
The main way that this is different from the original proposal is that there’s not a built-in default option which is taken if one or the other AI vetos prior to the button press. Which does make things importantly worse, and I agree that is another important piece to be sorted out.
Some additional important pieces to be sorted out:
Make the AI want to maintain the button
Make the AI want to maintain stuff upstream of the button, like e.g. humans
Ontological stability
Probably more importantly: how this naturally fits into a more general architecture for a distributed agent
More precisely, “do(no press)” means something like “you construct an alternate model of physics where there’s an unstoppable force pushing back against any attempt to push the button”, right? As in, if someone presses the button then it will “mysteriously” seem to be stuck and unpressable. And then subagent 2 believes we live in that world? And “do(press)” presumably means something like “you construct an alternate model of the universe where some mysterious force has suddenly pressed the button”.
Seems like they would immediately want to try to press the button to settle their disagreement? If it can be pressed, then that disprove the “do(no press)” model, which subagent 2 has fully committed. to.
Correct reasoning, but not quite the right notion of do(). “do(no press)” would mean that the button just acts like a completely normal button governed by completely normal physics, right up until the official time at which the button state is to be recorded for the official button-press random variable. And at that exact moment, the button magically jumps into one particular state (either pressed or not-pressed), in a way which is not-at-all downstream of any usual physics (i.e. doesn’t involve any balancing of previously-present forces or anything like that).
One way to see that the do() operator has to do something-like-this is that, if there’s a variable in a causal model which has been do()-operated to disconnect all parents (but still has some entropy), then the only way to gain evidence about the state of that variable is to look at things causally downstream of it, not things upstream of it.
I think we’re not disagreeing on the meaning of do (just slightly different state of explanation), I just hadn’t realized the extent to which you intended to rely on there being “Two timesteps”.
(I just meant the forces as a way of describing the jump to a specific position. That is, “mysterious forces” in contrast to a perfectly ordinary explanation for why it went to a position, such as “a guard stabs anybody who tries to press the button”, rather than in contrast to “the button just magically stays place”.)
I now think the biggest flaw in your idea is that it literally cannot generalize to anything that doesn’t involve two timesteps.
[ not that deep on the background assumptions, so maybe not the feedback you’re looking for. Feel free to ignore if this is on the wrong dimensions. ]
I’m not sure why either subagent would contract away whatever influence it had over the button-press. This is probably because I don’t understand wealth and capital in the model of your “Why not subagents” post. That seemed to be about agreement not to veto, in order to bypass some path-dependency of compromise improvements. In the subagent-world where all value is dependent on the button, this power would not be given up.
I’m also a bit skeptical of enforced ignorance of a future probability. I’m unsure it’s possible to have a rational superintelligent (sub)agent that is prevented from knowing it has influence over a future event that definitely affects it.
On the agents’ own models, neither has any influence at all over the button-press, because each is operating under a model in which the button-press has been counterfacted-upon.
Post which someone should write (but I probably won’t get to soon): there is a lot of potential value in earning-to-give EA’s deeply studying the fields to which they donate. Two underlying ideas here:
When money is abundant, knowledge becomes a bottleneck
Being on a pareto frontier is sufficient to circumvent generalized efficient markets
The key idea of knowledge bottlenecks is that one cannot distinguish real expertise from fake expertise without sufficient expertise oneself. For instance, it takes a fair bit of understanding of AI X-risk to realize that “open-source AI” is not an obviously-net-useful strategy. Deeper study of the topic yields more such insights into which approaches are probably more (or less) useful to fund. Without any expertise, one is likely to be mislead by arguments which are optimized (whether intentionally or via selection) to sound good to the layperson.
That takes us to the pareto frontier argument. If one learns enough/earns enough that nobody else has both learned and earned more, then there are potentially opportunities which nobody else has both the knowledge to recognize and the resources to fund. Generalized efficient markets (in EA-giving) are thereby circumvented; there’s potential opportunity for unusually high impact.
To really be a compelling post, this needs to walk through at least 3 strong examples, all ideally drawn from different areas, and spell out how the principles apply to each example.
Below is a graph from T-mobile’s 2016 annual report (on the second page). Does anything seem interesting/unusual about it?
I’ll give some space to consider before spoiling it.
...
...
...
Answer: that is not a graph of those numbers. Some clever person took the numbers, and stuck them as labels on a completely unrelated graph.
Yes, that is a thing which actually happened. In the annual report of an S&P 500 company. And apparently management considered this gambit successful, because the 2017 annual report doubled down on the trick and made it even more egregious: they added 2012 and 2017 numbers, which are even more obviously not on an accelerating growth path if you actually graph them. The numbers are on a very-clearly-decelerating growth path.
Now, obviously this is an cute example, a warning to be on alert when consuming information. But I think it prompts a more interesting question: why did such a ridiculous gambit seem like a good idea in the first place? Who is this supposed to fool, and to what end?
This certainly shouldn’t fool any serious investment analyst. They’ll all have their own spreadsheets and graphs forecasting T-mobile’s growth. Unless T-mobile’s management deeply and fundamentally disbelieves the efficient markets hypothesis, this isn’t going to inflate the stock price. Presumably shareholder elections for board seats, as well as the board itself, are also not dominated by people who are paying so little attention as to fall for such a transparent ploy.
It could just be that T-mobile’s management were themselves morons, or had probably-unrealistic models of just how moronic their investors were. Still, I’d expect competition (both market pressure and competition for control in shareholder/board meetings) to weed out that level of stupidity.
One more hypothesis: maybe this is simulacrum 3 bullshit. T-mobile is in the cellular business; they presumably have increasing returns to scale. More capital investment makes them more profitable, expectations of more profits draw in more investment; there’s potential for a self-fulfilling prophecy here. Investors want to invest if-and-only-if they expect other investors to invest. So, nobody actually has to be fooled by the graph; they just need to see that T-mobile is successfully pretending to pretend to have accelerating growth, and that’s enough to merit investment.
Basically every time a new model is released by a major lab, I hear from at least one person (not always the same person) that it’s a big step forward in programming capability/usefulness. And then David gives it a try, and it works qualitatively the same as everything else: great as a substitute for stack overflow, can do some transpilation if you don’t mind generating kinda crap code and needing to do a bunch of bug fixes, and somewhere between useless and actively harmful on anything even remotely complicated.
It would be nice if there were someone who tries out every new model’s coding capabilities shortly after they come out, reviews it, and gives reviews with a decent chance of actually matching David’s or my experience using the thing (90% of which will be “not much change”) rather than getting all excited every single damn time. But also, to be a useful signal, they still need to actually get excited when there’s an actually significant change. Anybody know of such a source?
EDIT-TO-ADD: David has a comment below with a couple examples of coding tasks.
My guess is neither of you is very good at using them, and getting value out of them somewhat scales with skill.
Models can easily replace on the order of 50% of my coding work these days, and if I have any major task, my guess is I quite reliably get 20%-30% productivity improvements out of them. It does take time to figure out at which things they are good at, and how to prompt them.
I think you’re right, but I rarely hear this take. Probably because “good at both coding and LLMs” is a light tail end of the distribution, and most of the relative value of LLMs in code is located at the other, much heavier end of “not good at coding” or even “good at neither coding nor LLMs”.
(Speaking as someone who didn’t even code until LLMs made it trivially easy, I probably got more relative value than even you.)
Sounds plausible. Is that 50% of coding work that the LLMs replace of a particular sort, and the other 50% a distinctly different sort?
Note this 50% likely only holds if you are using a main stream language. For some non-main stream language I have gotten responses that where really unbelivably bad. Things like “the name of this variable wrong” which literally could never be the problem (it was a valid identifier).
And similarly, if you are trying to encode novel concepts, it’s very different from gluing together libraries, or implementing standard well known tasks, which I would guess is what habryka is mostly doing (not that this is a bad thing to do).
I do use LLMs for coding assistance every time I code now, and I have in fact noticed improvements in the coding abilities of the new models, but I basically endorse this. I mostly make small asks of the sort that sifting through docs or stack-overflow would normally answer. When I feel tempted to make big asks of the models, I end up spending more time trying to get the LLMs to get the bugs out than I’d have spent writing it all myself, and having the LLM produce code which is “close but not quite and possibly buggy and possibly subtly so” that I then have to understand and debug could maybe save time but I haven’t tried because it is more annoying than just doing it myself.
If someone has experience using LLMs to substantially accelerate things of a similar difficulty/flavor to transpilation of a high-level torch module into a functional JITable form in JAX which produces numerically close outputs, or implementation of a JAX/numpy based renderer of a traversable grid of lines borrowing only the window logic from, for example, pyglet (no GLSL calls, rasterize from scratch,) with consistent screen-space pixel width and fade-on-distance logic, I’d be interested in seeing how you do your thing. I’ve done both of these, with and without LLM help and I think leaning hard on the LLMs took me more time rather than less.
File I/O and other such ‘mundane’ boilerplate-y tasks work great right off the bat, but getting the details right on less common tasks still seems pretty hard to elicit from LLMs. (And breaking it down into pieces small enough for them to get it right is very time consuming and unpleasant.)
I find them quite useful despite being buggy. I spend about 40% of my time debugging model code, 50% writing my own code, and 10% prompting. Having a planning discussion first with s3.6, and asking it to write code only after 5 or more exchanges works a lot better.
Also helpful is asking for lots of unit tests along the way yo confirm things are working as you expect.
Two guesses on what’s going on with your experiences:
You’re asking for code which involves uncommon mathematics/statistics. In this case, progress on scicodebench is probably relevant, and it indeed shows remarkably slow improvement. (Many reasons for this, one relatively easy thing to try is to breakdown the task, forcing the model to write down the appropriate formal reasoning before coding anything. LMs are stubborn about not doing CoT for coding, even when it’s obviously appropriate IME)
You are underspecifying your tasks (and maybe your questions are more niche than average), or otherwise prompting poorly, in a way which a human could handle but models are worse at. In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.
I would contribute to a bounty for y’all to do this. I would like to know whether the slow progress is prompting-induced or not.
We did end up doing a version of this test. A problem came up in the course of our work which we wanted an LLM to solve (specifically, refactoring some numerical code to be more memory efficient). We brought in Ray, and Ray eventually concluded that the LLM was indeed bad at this, and it indeed seemed like our day-to-day problems were apparently of a harder-for-LLMs sort than he typically ran into in his day-to-day.
A thing unclear from the interaction: it had seemed towards the end that “build a profile to figure out where the bottleneck is” was one of the steps towards figuring out the problem, and that the LLM was (or might have been) better at that part. And, maybe models couldn’t solve you entire problem wholesale but there was still potential skills in identifying factorable pieces that were better fits for models.
Interesting! Two yet more interesting versions of the test:
Someone who currently gets use from LLMs writing more memory-efficient code, though maybe this is kind of question-begging
Someone who currently gets use from LLMs, and also is pretty familiar with trying to improve the memory efficiency of their code (which maybe is Ray, idk)
Maybe you include this in “stack overflow substitute”, but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn’t evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).
The last part of the conversation can then looks like this:
I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.
I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent “leaps”). (Didn’t test o1 extensively because it’s so expensive).
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain
everythingmost stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
Maybe for your use case try codex.
One thing I’ve noticed is that current models like Claude 3.5 Sonnet can now generate non-trivial 100-line programs like small games that work in one shot and don’t have any syntax or logical errors. I don’t think that was possible with earlier models like GPT-3.5.
My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.
I’d be down to do this. Specifically, I want to do this, but I want to see if the models are qualitatively better at alignment research tasks.
In general, what I’m seeing is that there is not big jump with o1 Pro. However, it is possibly getting closer to one-shot a website based on a screenshot and some details about how the user likes their backend setup.
In the case of math, it might be a bigger jump (especially if you pair it well with Sonnet).
Regarding coding in general, I basically only prompt programme these days. I only bother editing the actual code when I notice a persistent bug that the models are unable to fix after multiple iterations.
I don’t know jackshit about web development and have been making progress on a dashboard for alignment research with very little effort. Very easy to build new projects quickly. The difficulty comes when there is a lot of complexity in the code. It’s still valuable to understand how high-level things work and low-level things the model will fail to proactively implement.
While Carl Brown said (a few times) he doesn’t want to do more youtube videos for every new disappointing AI release, so far he seems to be keeping tabs on them in the newsletter just fine—https://internetofbugs.beehiiv.com/
...I am quite confident that if anything actually started to work, he would comment on it, so even if he won’t say much about any future incremental improvements, it might be a good resource to subscribe to for getting better signal—if Carl will get enthusiastic about AI coding assistants, it will be worth paying attention.
How can biochemical interventions be spatially localized, and why is that problem important?
High vs low voltage has very different semantics at different places on a computer chip. In one spot, a high voltage might indicate a number is odd rather than even. In another spot, a high voltage might indicate a number is positive rather than negative. In another spot, it might indicate a jump instruction rather than an add.
Likewise, the same chemical species have very different semantics at different places in the human body. For example, high serotonin concentration along the digestive tract is a signal to digest, whereas high serotonin concentration in various parts of the brain signals… uh… other stuff. Similarly, acetylcholine is used as a neurotransmitter both at neuromuscular junctions and in the brain, and these have different semantics. More generally, IIUC neurotransmitters like dopamine, norepinephrine, or serotonin are released by neurons originating at multiple anatomically distinct little sub-organs in the brain. Each sub-organ projects to different places, and the same neurotransmitter probably has different semantics when different sub-organs project to different targets.
Yet most pharmaceutical interventions target one type of molecule, or one receptor, or what have you, approximately everywhere. Such an intervention is analogous to e.g. attempting to make every float in a computer’s memory positive by flipping the first bit in every block, but then as a side-effect also changing a bunch of jump instructions to add instructions because there was no way to localize the effect to float-containing memory locations.
Thus the question: how can biochemical interventions be localized, especially in general-purpose ways? I’ll throw out some ideas off the top of my head, but I’m interested to hear other peoples’ thoughts as well.
Some Methods
Natural Barriers
The blood-brain barrier springs to mind as one example. If a chemical has different semantics in the brain and outside, and one wishes to target outside the brain, then just use a drug which can’t cross the barrier.
Implant + Slow Transport/Fast Breakdown
One could put an implant in the right spot to release a drug, and then choose a drug which either isn’t transported quickly or breaks down before it can get very far (or both).
Notably, making some random molecule diffuse less quickly seems relatively tractable: one can just attach a bigger molecule to it. And there’s an absolutely enormous space of possibilities for what that bigger molecule could be, so it’s especially likely to be tractable.
Genetic Modification
Cells already need the ability to tell “where they are” in order for us to have anatomically distinct regions at all. So in principle, it should be possible to genetically modify cells to do something different, but gate the change on the cell being in a particular distinct anatomical region, so cells everywhere else do the same thing as before.
For adult genetic modifications, one would probably want to combine this method with something similar to the implant + slow transport/fast release method above. Adult genetic modifications usually don’t hit every cell or even a majority of them, so an ideal use would be modifying some small percentage of cells to release a molecule which influences all the others. Slow diffusion/fast breakdown could then localize that molecule.
What Else?
I’m curious about other methods to localize biochemical interventions in the body, both speculative and already-existing.
It feels like unstructured play makes people better/stronger in a way that structured play doesn’t.
What do I mean? Unstructured play is the sort of stuff I used to do with my best friend in high school:
unscrewing all the cabinet doors in my parents’ house, turning them upside down and/or backwards, then screwing them back on
jumping in and/or out of a (relatively slowly) moving car
making a survey and running it on people at the mall
covering pool noodles with glow-in-the-dark paint, then having pool noodle sword fights with them at night while the paint is still wet, so we can tell who’s winning by who’s glowing more
In contrast, structured play is more like board games or escape rooms or sports. It has fixed rules. (Something like making and running a survey can be structured play or unstructured play or not play at all, depending on the attitude with which one approaches it. Do we treat it as a fun thing whose bounds can be changed at any time?)
I’m not quite sure why it feels like unstructured play makes people better/stronger, and I’d be curious to hear other peoples’ thoughts on the question. I’m going to write some of mine below, but maybe don’t look at them yet if you want to answer the question yourself?
Just streaming thoughts a bit...
Unstructured play encourages people to question the frame, change the environment/rules, treat social constraints as malleable. It helps one to notice degrees of freedom which are usually taken to be fixed.
Because there’s so much more freedom, unstructured play pushes people to notice their small desires moment-to-moment and act on them, rather than suppress them (as is normal most of the time).
Unstructured play offers an environment in which to try stuff one wouldn’t normally try, in a way which feels lower-risk.
… and probably others. But I’m not sure which such factor(s) most account for my gut feeling that unstructured play makes people better/stronger. (Or, to account for the other possibility, maybe the causal arrow goes the other way, i.e. better/stronger people engage more in unstructured play, and my gut feeling is picking up on that.) Which factor is most important for growing better/stronger?
(Written before reading the second part of the OP.)
I don’t really share that feeling[1]. But if I conditioned on that being true and then produced an answer:
Obviously because it trains research taste.
Or, well, the skills in that cluster. If you’re free to invent/modify the rules of the game at any point, then if you’re to have fun, you need to be good at figuring out what rules would improve the experience for you/everyone, and what ideas would detract from it. You’re simultaneously acting as a designer and as a player. And there’s also the element of training your common-sense/world-modeling skills: what games would turn out fun and safe in the real world, and which ones seem fun in your imagination, but would end up boring due to messy realities or result in bodily harm.
By contrast, structured play enforces a paradigm upon you and only asks you to problem-solve within it. It trains domain-specific skills, whereas unstructured play is “interdisciplinary”, in that you can integrate anything in your reach into it.
More broadly: when choosing between different unstructured plays, you’re navigating a very-high-dimensional space of possible games, and (1) that means there’s simply a richer diversity of possible games you can engage in, which means a richer diversity of skills you can learn, (2) getting good at navigating that space is a useful skill in itself. Structured plays, on the other hand, present for choice a discrete set of options pre-computed to you by others.
Unstructured play would also be more taxing on real-time fluid-intelligence problem-solving. Inferring the rules (if they’ve been introduced/changed by someone else), figuring out how to navigate them on the spot, etc.
What’s the sense of “growing better/stronger” you’re using here? Fleshing that out might make the answer obvious.
Not in the sense that I think this statement is wrong, but in that I don’t have the intuition that it’s true.
My guess would be unstructured play develops more material skills and structured play develops more social skills.
One thing we’ve been working on lately is finding natural latents in real datasets. Looking for natural latents between pairs of variables with only a few values each is relatively easy in practice with the math we have at this point. But that doesn’t turn up much in excel-style datasets, and one wouldn’t particularly expect it to turn up much in such datasets. Intuitively, it seems like more “distributed” latents are more practically relevant for typical excel-style datasets—i.e. latents for which many different observables each yield some weak information.
Here’s one operationalization, which runs into some cute math/numerical algorithm problems for which I have a working solution but not a very satisfying solution. Maybe you enjoy those sorts of problems and will want to tackle them!
Setup and Math
Assume we have (categorical) observable variables X1,...,Xm and a latent variable Λ. We’ll make two assumptions about the form of the distribution:
Assumption 1: P[X|Λ] has exponential form with all Xi independent given Λ. I.e. P[X|Λ]=∏i1Zi|Λ(λ)eλTfi(xi)=1Z|Λ(λ)eλT∑ifi(xi).
Assumption 2: P[Λ|X] is normal. I.e.P[Λ|X]=1Z|X(x)e−12λTSλ+λT(μ+∑ifi(xi))dλ for some inverse covariance matrix S and some μ.
(The notation Z|Λ just indicates that this is a normalizer for a distribution conditioned on Λ. There’s going to several normalizers floating around, so we need to distinguish them.) One could handwavily justify these assumptions, but we’ll take them as given for now.
The main thing we want to calculate is then P[X] for various “feature” functions f, in order to do model comparison between different feature functions.
Using Bayes’ Rule and our two assumptions, we get
P[X]P[Λ]=P[X|Λ]P[Λ|X]
=1Z|Λ(λ)eλT∑ifi(xi)1Z|X(x)e−12λTSλ+λT(μ+∑ifi(xi))dλ
=Z|X(x)Z|Λ(λ)e−12λTSλ+λTμdλ
Note that P[X] can only depend on x (not λ), and P[Λ] can only depend on λ (not x), so in general the form above implies
P[X]=1ZZ|X(x)
P[Λ]=1ZZ|Λ(λ)e−12λTSλ+λTμdλ
for some normalizer Z. Looking back at the earlier distributions, we have
Z|X(x)=∫λe−12λTSλ+λT(μ+∑ifi(xi))dλ
=(2π)k2|S|−12e12(μ+∑ifi(xi))TS−1(μ+∑ifi(xi))
Z=∑xZ|X(x)
… and then the tricky numerical algorithm problem is to efficiently calculate Z.
Current Strategy
Trick I’m currently using: we can view the sum ∑xZ|X(x) as taking an expectation of Z|X(x) under a uniform distribution Q[X]. Under that uniform distribution, ∑ifi(Xi) is a sum of independent random variables, so let’s wave our hands just a little and assume that sum is approximately normal. Then, modulo an easy to calculate constant, our problem is to compute
E[e12(μ+η)TS−1(μ+η)]
where η is normal, with mean and variance matching the mean and variance of ∑ifi(Xi) under the uniform distribution Q (which is easy to compute).
That expectation is a gaussian integral, so we can compute it exactly, and it simplifies somewhat with a little algebra. Problem is, it doesn’t always converge! If the variance of η is greater (along any direction) than S, then e12(μ+η)TS−1(μ+η) grows faster than probability density falls off along that direction, so the integral blows up to infinity. In that case, our assumption of normality probably still works fine in the middle of the distribution, but the value of Z is dominated by the tails.
Currently, my working solution is to set Z to infinity in the tail-dominated region, and then a maximum likelihood search for f values avoids that region. But I don’t have a good way to check how bad the error in the calculation is getting. (I can see that the optimizer isn’t going all the way to the edge of the non-tail-dominated region, so that’s a very good sign.)
It would be a lot nicer to either have an argument that maximum likelihood f values won’t end up in the tail-dominated region, or an efficient method to calculate Z in all cases.
If you want to play with this numerically, I also set S=I and μ=0, which can be done without loss of generality (except when S is zero or infinite along some direction) by absorbing them into f.
Not following this part. Can you elaborate?
Some scattered thoughts:
Regrading convergence, to state the probably obvious, since P[Xi∣Λ]∝∑xeλTfi(xi), fi(xi) at least has to go to zero for x going to infinity.
In my field-theory-brained head, the analysis seems simpler to think about for continuous x. So unless we’re married to x being discrete, I’d switch from ∑x to ∫dx. Then you can potentially use Gaussian integral and source-term tricks with the dependency on x as well. If you haven’t already, you might want to look at (quantum) field theory textbooks that describe how to calculate expectation values of observables over path integrals. This expression looks extremely like the kind of thing you’d usually want to calculate with Feynman diagrams, except I’m not sure whether the fi(xi)have the right form to allow us to power expand in xi and then shove the non-quadratic xi terms into source derivatives the way we usually would in perturbative quantum field theory.
If all else fails, you can probably do it numerically, lattice-QFT style, using techniques like hybrid Monte Carlo to sample points in the integral efficiently.[1]
You can maybe also train a neural network to do the sampling.
I’m assuming, for simplicity, that each Xi has finitely many values. The sum on X is then a sum on the cartesian product of the values of each Xi, which we can rewrite in general as ∑Xg(X)=1∏iniEQ[g(X)], where Q is the uniform distribution on X and ni is the number of values of Xi. That uniform distribution Q is a product of uniform distributions over each individual Xi, i.e. Uniform[X]=∏iUniform[Xi], so the Xi‘s are all independent under Q. So, under Q, the fi(Xi)’s are all independent.
Did that clarify?
Yup, it sure does look similar. One tricky point here is that we’re trying to fit the f’s to the data, so if going that route we’d need to pick some parametetric form for f. We’d want to pick a form which always converges, but also a form general enough that the fitting process doesn’t drive f to the edge of our admissible region.
Yes. Seems like a pretty strong assumption to me.
Ah. In that case, are you sure you actually need Z to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model p(λ∣X) to feed its output into m tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions qi(xi∣λ)?
To sketch out a little more what I mean, p(λ∣X) could e.g. be constructed as a parametrised function[1] which takes in the actual samples X and returns the mean of a Gaussian, which λ is then sampled from in turn[2]. The qi(xi∣λ) would be constructed using normalising flow networks[3], which take in λ as well as uniform distributions over variables zi that have the same dimensionality as their xi. Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities qi(xi∣λ), which you can then fit to the actual data using KL-divergence.
You’d get explicit representations for both P[λ∣X] and P[X∣λ] from this.
Or ensemble of functions, if you want the mean of λ to be something like ∑ifi(xi) specifically.
Using reparameterization to keep the sampling operation differentiable in the mean.
If the dictionary of possible values of X is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every xi of course.
That would be pretty reasonable, but it would make the model comparison part even harder. I do need P[X] (and therefore Z) for model comparison; this is the challenge which always comes up for Bayesian model comparison.
Why does it make Bayesian model comparison harder? Wouldn’t you get explicit predicted probabilities for the data X from any two models you train this way? I guess you do need to sample from the Gaussian in λ a few times for each X and pass the result through the flow models, but that shouldn’t be too expensive.
For my interest, for these reallife latents with many different pieces contributing a small amount of information do you reckon Eisenstat’s Condensation / some unpublished work you mentioned at ODYSSEY would be the right framework here?
Sort of. Condensation as-written requires what David and I call “strong redundancy”, i.e. the latent must be determinable from any one observable downstream, which is the opposite of “small amount of information from each individual observable”. But it’s pretty easy to bridge between the two mathematically by glomming together multiple observables into one, which is usually how David and I think about it.
The way you’d use this is:
Use the sort of machinery above to find a latent which is weakly loaded on many different observables.
Check how well that latent satisfies redundancy over some subset of the observables.
If we can find disjoint subsets of observables (any disjoint subsets) such that the latent can be determined reasonably well from any one of the subsets, then the machinery of natural latents/condensation kicks in to give us guarantees about universality of the latent.
No kidding? Did you get a sense of why the datasets I picked didn’t really work for the purpose when I gave that a try? Entirely possible that you don’t remember but it was a dataset of candidate exoplanets and an admittedly synthetic clustering tester set.
Haven’t been using that one, but I expect it would have very different results than the dataset we are using. That one would test very different things than we’re currently trying to get feedback on; there’s a lot more near-deterministic known structure in that one IIRC.
I’ve heard various people recently talking about how all the hubbub about artists’ work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.
If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:
Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.
Model/generator behind this: given the active political salience, it probably wouldn’t be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical component to the right lobbyist/regulator, is the main thing which would make a regulation actually do anything in practice.
Edit-to-add: also, the technical solution should ideally be an implementation of some method already published in some academic paper. Then when some lawyer or bureaucrat or whatever asks what it does and how we know it works, you can be like “look at this Official Academic Paper” and they will be like “ah, yes, it does Science, can’t argue with that”.
Suppose I have a binary function f, with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions—i.e. for each of the 21000000 possible inputs x, we flipped a coin to determine the output f(x) for that particular input.
Now, suppose I know f, and I know all but 50 of the input bits—i.e. I know 999950 of the input bits. How much information do I have about the output?
Answer: almost none. For almost all such functions, knowing 999950 input bits gives us ∼1250 bits of information about the output. More generally, If the function has n input bits and we know all but k, then we have o(12k) bits of information about the output. (That’s “little o” notation; it’s like big O notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.
Proof Sketch
With k input bits unknown, there are 2k possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have 2k independent coin flips. If m of those flips are 1, then we assign a probability of m2k that the output will be 1.
As long as 2k is large, Law of Large Numbers will kick in, and very close to half of those flips will be 1 almost surely—i.e. m≈ 2k2. The error in this approximation will (very quickly) converge to a normal distribution, and our probability that the output will be 1 converges to a normal distribution with mean 12 and standard deviation 12k/2. So, the probability that the output will be 1 is roughly 12±12k/2.
We can then plug that into Shannon’s entropy formula. Our prior probability that the output bit is 1 is 12, so we’re just interested in how much that ±12k/2 adjustment reduces the entropy. This works out to o(12k) bits.
Why Is This Interesting?
One core idea of my work on abstraction is that noise very quickly wipes out almost all information; only some very-low-dimensional summary is relevant “far away”. This example shows that this sort of thing is not unusual, but rather “the default”: for almost all random functions, information drops off exponentially with the number of unknown bits. In a large system (i.e. a function with many inputs), ignorance of even just a few bits is enough to wipe out essentially-all information. That’s true even if we know the vast majority of the bits.
A good intuitive example of this is the “butterfly effect”: the flap of a butterfly’s wings could change the course of a future hurricane, because chaos. But there’s an awful lot of butterflies in the world, and the hurricane’s path is some complicated function of all of their wing-flaps (and many other variables too). If we’re ignorant of even just a handful of these flaps, then almost all of our information about the hurricane’s path is probably wiped out. And in practice, we’re ignorant of almost all the flaps. This actually makes it much easier to perform Bayesian reasoning about the path of the hurricane: the vast majority of information we have is basically-irrelevant; we wouldn’t actually gain anything from accounting for the butterfly-wing-flaps which we do know.
o(1/2^k) doesn’t vary with n—are you saying that it doesn’t matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant? That would be quite interesting if so (though I have some question about how likely the function is to be truly random from an even distribution of such functions).
One can enumerate all such 3-bit functions (8 different inputs, each input can return 0 or 1, so 256 functions (one per output-bit-pattern of the 8 possible inputs). But this doesn’t seem to follow your formula—if you have 3 unknown bits, that should be 1⁄8 of a bit about the output, 2 for 1⁄4, and 1 unknown for 1⁄2 a bit about the output. But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.
Yes, that’s correct.
The claim is for almost all functions when the number of inputs is large. (Actually what we need is for 2^(# of unknown bits) to be large in order for the law of large numbers to kick in.) Even in the case of 3 unknown bits, we have 256 possible functions, and only 18 of those have less than 1⁄4 1′s or more than 3⁄4 1′s among their output bits.
Little o is just a tighter bound. I don’t know what you are referring to by your statement:
I’m not sure what context that link is assuming, but in an analysis context I typically see little o used in ways like e.g. “f(x)=f(x0)+dfdx|x0dx+o(dx2)”. The interpretation is that, as dx goes to 0, the o(dx2) terms all fall to zero at least quadratically (i.e. there is some C such that Cdx2 upper bounds the o(dx2) term once dx is sufficiently small). Usually I see engineers and physicists using this sort of notation when taking linear or quadratic approximations, e.g. for designing numerical algorithms.
I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here’s a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated—comments on the doc are open.
I’m mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it’s up.
EDIT: it’s up. Thank you to Stephen for comments; the post is better as a result.
Here’s a place where I feel like my models of romantic relationships are missing something, and I’d be interested to hear peoples’ takes on what it might be.
Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella’s data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
What doesn’t make sense under my current models is why so many of these relationships persist. Why don’t the men in question just leave? Obviously they might not have better relationship prospects, but they could just not have any relationship. The central question which my models don’t have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?
Some obvious candidate answers:
Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There’s plenty of cases which make sense which are individually unusual—e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they’d plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there’s a lot of activation energy blocking that change.
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.
I’m interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?
Edit-to-add: apparently lots of people are disagreeing with this, but I don’t know what specifically you all are disagreeing with, it would be much more helpful to at least highlight some specific sentence or leave a comment or something.
Ah, I think this just reads like you don’t think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to ‘settle’, rather than things that could actually plausibly outweigh sexual satisfaction.
Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).
I’m mostly just trying to explain all the disagree votes. I think you’ll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you’re looking for here).
That’s an excellent suggestion, thanks.
There are a lot of replies here, so I’m not sure whether someone already mentioned this, but: I have heard anecdotally that homosexual men often have relationships which maintain the level of sex over the long term, while homosexual women often have long-term relationships which very gradually decline in frequency of sex, with barely any sex after many decades have passed (but still happily in a relationship).
This mainly argues against your model here:
It suggests instead that female sex drive naturally falls off in long-term relationships in a way that male sex drive doesn’t, with sexual attraction to a partner being a smaller factor.
Note: You can verify this is the case by filtering for male respondents with male partners and female respondents with female partners in the survey data
“I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”
Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50⁄50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.
That’s the stereotype, but men are the ones who die sooner if divorced, which suggests they’re getting a lot out of marriage.
ETA: looked it up, divorced women die sooner as well, but the effect is smaller despite divorce having a bigger financial impact on women.
Causality dubious, seems much more likely on priors that men who divorced are disproportionately those with Shit Going On in their lives. That said, it is pretty plausible on priors that they’re getting a lot out of marriage.
I will also note that Aella’s relationships data is public, and has the following questions:
which should allow you to test a lot of your candidate answers, for example your first 3 hypotheses could be answered by looking at these:
Do you have children with your partner? (qgjf1nu)
“If my partner and I ever split up, it would be a logistical nightmare (e.g., separating house, friends) (e1claef) or 21. “The long-term routines and structure of my life are intertwined with my partner’s” (li0toxk)
“I feel like I would still be a desirable mate even if my partner left me” (qboob7y)
I see two explanations: the boring wholesome one and the interesting cynical one.
The wholesome one is: You’re underestimating how much other value the partner offers and how much the men care about the mostly-platonic friendship. I think that’s definitely a factor that explains some of the effect, though I don’t know how much.
The cynical one is: It’s part of the template. Men feel that are “supposed to” have wives past a certain point in their lives; that it’s their role to act. Perhaps they even feel that they are “supposed to” have wives they hate, see the cliché boomer jokes.
They don’t deviate from this template, because:
It’s just something that is largely Not Done. Plans such as “I shouldn’t get married” or “I should get a divorce” aren’t part of the hypothesis space they seriously consider.
In the Fristonian humans-are-prediction-error-minimizers frame: being married is what the person expects, so their cognition ends up pointed towards completing the pattern, one way or another. As a (controversial) comparison, we can consider serial abuse victims, which seem to somehow self-select for abusive partners despite doing everything in their conscious power to avoid them.
In your parlance: The “get married” life plan becomes the optimization target, rather than a prediction regarding how a satisfying life will look like.
More generally: Most humans most of the time are not goal-optimizers, but adaptation-executors (or perhaps homeostatic agents). So “but X isn’t conductive to making this human happier” isn’t necessarily a strong reason to expect the human not to do X.
Deviation has social costs/punishments. Being viewed as a loser, not being viewed as a reliable “family man”, etc. More subtly: this would lead to social alienation, inability to relate. Consider the cliché “I hate my wife” boomer jokes again. If everyone in your friend group is married and makes these jokes all the time, and you aren’t, that would be pretty ostracizing.
Deviation has psychological costs. Human identities (in the sense of “characters you play”) are often contextually defined. If someone spent ten years defining themselves in relation to their partner, and viewing their place in the world as part of a family unit, exiting the family unit would be fairly close to an identity death/life losing meaning. At the very least, they’d spend a fair bit of time adrift and unsure who they are/how to relate to the world anew – which means there are friction costs/usual problems with escaping a local optimum.
Not-deviation has psychological benefits. The feeling of “correctness”, coming to enjoy the emotional labor, enjoying having a dependent, etc.
I don’t know which of the two explains more of the effect. I’m somewhat suspicious of the interesting satisfyingly cynical one, simply because it’s satisfyingly cynical and this is a subject for which people often invent various satisfyingly cynical ideas. It checks out to me at the object level, but it doesn’t have to be the “real” explanation. (E. g., the “wholesome” reasons may be significant enough that most of the men wouldn’t divorce even if the template dynamics were magically removed.)
it’s the mystery of love, John
Assuming arguendo this is true: if you care primarily about sex, hiring sex workers is orders of magnitude more efficient than marriage. Therefor the existence of a given marriage is evidence both sides get something out of it besides sex.
If both partners have an income, then living together is usually cheaper than each of them living alone, and sex is just a bonus to that. How would sex workers be the cheaper alternative?
Possibly true if one size has zero income.
Making no claim about the actual value of each, but can’t I counter your specific argument by saying, marriage is a socially enforced cartel for sex, and if they could do so without being punished, a lot more men would rather not get sex without getting married?
Imagine a woman is a romantic relationship with somebody else. Are they still so great a person that you would still enjoy hanging out with them as a friend? If not that woman should not be your girlfriend. Friendship first. At least in my model romantic stuff should be stacked ontop of platonic love.
This data seems to be for sexual satisfaction rather than romantic satisfaction or general relationship satisfaction.
Yes, the question is what value-proposition accounts for the romantic or general relationship satisfaction.
Relationship … stuff?
I guess I feel kind of confused by the framing of the question. I don’t have a model under which the sexual aspect of a long-term relationship typically makes up the bulk of its value to the participants. So, if a long-term relationship isn’t doing well on that front, and yet both participants keep pursuing the relationship, my first guess would be that it’s due to the value of everything that is not that. I wouldn’t particularly expect any one thing to stick out here. Maybe they have a thing where they cuddle and watch the sunrise together while they talk about their problems. Maybe they have a shared passion for arthouse films. Maybe they have so much history and such a mutually integrated life with partitioned responsibilities that learning to live alone again would be a massive labour investment, practically and emotionally. Maybe they admire each other. Probably there’s a mixture of many things like that going on. Love can be fed by many little sources.
So, this I suppose:
I don’t find it hard at all to see how that’d add up to something that vastly outweighs the costs, and this would be my starting guess for what’s mainly going on in most long-term relationships of this type.
Update 3 days later: apparently most people disagree strongly with
Most people in the comments so far emphasize some kind of mysterious “relationship stuff” as upside, but my actual main update here is that most commenters probably think the typical costs are far far lower than I imagined? Unsure, maybe the “relationship stuff” is really ridiculously high value.
So I guess it’s time to get more concrete about the costs I had in mind:
A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I’m generally considering the no-kids case here; I don’t feel as confused about couples with kids.)
I was picturing an anxious attachment style as the typical female case (without kids). That’s unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
Eyeballing Aella’s relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
Less legibly… conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they’re not having much sex. For instance, there’s a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is “No, he wasn’t looking for a fishing rod. He came in looking for tampons, and I told him ‘dude, your weekend is shot, you should go fishing!’”.
(One thing to emphasize in these: sex isn’t just a major value prop in its own right, I also expect that lots of the main costs of a relationship from the man’s perspective are mitigated a lot by sex. Like, the sex makes the female partner behave less unpleasantly for a while.)
So, next question for people who had useful responses (especially @Lucius Bushnaq and @yams): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?
But remember that you already conditioned on ‘married couples without kids’. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples. These properties seem like they’d be heavily anti-correlated.
In the subset of man-woman married couples without kids that get along, I wouldn’t be surprised if having a partner effectively works out to more money for both participants, because you’ve got two incomes, but less than 2x living expenses.
I am … not … picturing that as the typical case? Uh, I don’t know what to say here really. That’s just not an image that comes to mind for me when I picture ‘older hetero married couple’. Plausibly I don’t know enough normal people to have a good sense of what normal marriages are like.
I think for many of those couples that fight multiple times a month, the alternative isn’t separating and finding other, happier relationships where there are never any fights. The typical case I picture there is that the relationship has some fights because both participants aren’t that great at communicating or understanding emotions, their own or other people’s. If they separated and found new relationships, they’d get into fights in those relationships as well.
It seems to me that lots of humans are just very prone to getting into fights. With their partners, their families, their roommates etc., to the point that they have accepted having lots of fights as a basic fact of life. I don’t think the correct takeaway from that is ‘Most humans would be happier if they avoided having close relationships with other humans.’
Conventional wisdom also has it that married people often love each other so much they would literally die for their partner. I think ‘conventional wisdom’ is just a very big tent that has room for everything under the sun. If even 5-10% of married couples have bad relationships where the partners actively dislike each other, that’d be many millions of people in the English speaking population alone. To me, that seems like more than enough people to generate a subset of well-known conventional wisdoms talking about how awful long-term relationships are.
Case in point, I feel like I hear those particular conventional wisdoms less commonly these days in the Western world. My guess is this is because long-term heterosexual marriage is no longer culturally mandatory, so there’s less unhappy couples around generating conventional wisdoms about their plight.
So, in summary, both I think? I feel like the ‘typical’ picture of a hetero marriage you sketch is more like my picture of an ‘unusually terrible’ marriage. You condition on a bad sexual relationship and no children and the woman doesn’t earn money and the man doesn’t even like her, romantically or platonically. That subset of marriages sure sounds like it’d have a high chance of the man just walking away, barring countervailing cultural pressures. But I don’t think most marriages where the sex isn’t great are like that.
This comment gave me the information I’m looking for, so I don’t want to keep dragging people through it. Please don’t feel obligated to reply further!
That said, I did quickly look up some data on this bit:
… so I figured I’d drop it in the thread.
When interpreting these numbers, bear in mind that many couples with no kids probably intend to have kids in the not-too-distant future, so the discrepancy shown between “no children” and 1+ children is probably somewhat smaller than the underlying discrepancy of interest (which pushes marginally more in favor of Lucius’ guess).
Big thank you for responding, this was very helpful.
Not sure how much this generalizes to everyone, but part of the story (for either the behavior or the pattern of responses to the question) might that some people are ideologically attached to believing in love: that women and men need each other as a terminal value, rather than just instrumentally using each other for resources or sex. For myself, without having any particular empirical evidence or logical counterargument to offer, the entire premise of the question just feels sad and gross. It’s like you’re telling me you don’t understand why people try to make ghosts happy. But I want ghosts to be happy.
That is useful, thanks.
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact “most men would be happier single but are ideologically attached to believing in love”, then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I’ve learned is that lots of people are triggered by the question, but that doesn’t really tell me much about the underlying reality.
Track record: My own cynical take seems to be doing better with regards to not triggering people (though it’s admittedly less visible).
First off, I’m kind of confused about how you didn’t see this coming. There seems to be a major “missing mood” going on in your posts on the topic – and I speak as someone who is sorta-aromantic, considers the upsides of any potential romantic relationship to have a fairly low upper bound for himself[1], and is very much willing to entertain the idea that a typical romantic relationship is a net-negative dumpster fire.
So, obvious-to-me advice: Keep a mental model of what topics are likely very sensitive and liable to trigger people, and put in tons of caveats and “yes, I know, this is very cynical, but it’s my current understanding” and “I could totally be fundamentally mistaken here”.
In particular, a generalization of an advice from here has been living in my head rent-free for years (edited/adapted):
More concretely, here’s how I would have phrased your initial post:
Rewrite
Here’s a place where my model of the typical traditional romantic relationships seems to be missing something. I’d be interested to hear people’s takes on what it might be.
Disclaimer: I’m trying to understand the general/stereotypical case here, i. e., what often ends up happening in practice. I’m not claiming that this is how relationships ought to be like, nor that all existing relationships are like this. But on my model, most people are deeply flawed, they tend to form deeply flawed relationships, and I’d like to understand why these relationships still work out. Bottom line is, this is going to be a fairly cynical/pessimistic take (with the validity of its cynicism being something I’m willing to question).
Background claims:
My model of the stereotypical/traditional long-term monogamous hetero relationship has a lot of downsides for men. For example:
Financial costs: Up to 50% higher living costs (since in the “traditional” template, men are the breadwinners.)
Frequent, likely highly stressful, arguments. See Aella’s relationship survey data: a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more.
General need to manage/account for the partner’s emotional issues. (My current model of the “traditional” relationship assumes the anxious attachment style for the woman, which would be unpleasant to manage.)
For hetero men, consistent sexual satisfaction is a major upside offered by a relationship, providing a large fraction of the relationship-value.
A majority of traditional relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella’s data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of dating: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don’t find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.
Taking on purely utilitarian lens, for a relationship to persist, the benefits offered by it should outweigh its costs. However, on my current model, that shouldn’t be the case for the average man. I expect the stated downsides to be quite costly, and if we remove consistent sex from the equation, the remaining value (again, for a stereotypical man) seems comparatively small.
So: Why do these relationships persist? Obviously the men might not have better relationship prospects, but they could just not have any relationship. The central question which my models don’t have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?
Some obvious candidate answers:
The cultural stereotypes diverge from reality in some key ways, so my model is fundamentally mistaken. E. g.:
I’m overestimating the downsides: the arguments aren’t that frequent/aren’t very stressful, female partners aren’t actually “high-maintanance”, etc.
I’m overestimating the value of sex for a typical man.
I’m underestimating how much other value relationships offers men. If so: what is that “other value”, concretely? (Note that it’d need to add up to quite a lot to outweigh the emotional and financial costs, under my current model.)
Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There’s plenty of cases which make sense which are individually unusual—e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they’d plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there’s a lot of activation energy blocking that change.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.
I’m interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?
Obvious way to A/B test this would be to find some group of rationalist-y people who aren’t reading LW/your shortform, post my version there, and see the reactions. Not sure what that place would be. (EA forum? r/rational’s Friday Open Threads? r/slatestarcodex? Some Discord/Substack group?)
Adapting it for non-rationalist-y audiences (e. g., r/AskMen) would require more rewriting. Mainly, coating the utilitarian language in more, ahem, normie terms.
Given the choice between the best possible romantic relationship and $1m, I’d pick $1m.
Absent munchkinry like “my ideal girlfriend is a genius alignment researcher on the level of von Neumann and Einstein”.I think it’s net negative. Seen it with any combination of genders. The person who’s less happy in the relationship stays due to force of habit, fear of the unknown, and the other person giving them a precise minimum of “crumbs” to make them stay. Even a good relationship can fall into this pattern slowly, with the other person believing all along that everything is fine. And when it finally breaks (often due to some random event breaking the suspension of disbelief), the formerly unhappy person is surprised how much better things become.
An effect I noticed: Going through Aella’s correlation matrix (with poorly labeled columns sadly), a feature which strongly correlates with the length of a relationship is codependency. Plotting question
20. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk)assuming that’s what “codependency” refers toThe shaded region is a 95% posterior estimate for the mean of the distribution conditioned on the time-range (every 2 years) and cis-male respondents, with prior N(0,0.5).
Note also that codependency and sex satisfaction are basically uncorrelated
This shouldn’t be that surprising. Of course the longer two people are together the more their long term routines will be caught up with each other. But also this seems like a very reasonable candidate for why people will stick together even without a good sex life.
This seems supported by the popular wisdom. Question is, how much this is about relationships and sex specifically, and how much it is just another instance of a more general “life is full of various frustrations” or “when people reach their goals, after some time they became unsatisfied again” i.e. hedonistic treadmill.
Is it?
So, basically those women pretend to be more attracted than they are (to their partner, and probably also to themselves) in order to get married. Then they gradually stop pretending.
But why is it so important to get married (or whatever was the goal of the original pretending), but then it is no longer important to keep the marriage happy? Is that because women get whatever they want even from an unhappy marriage, and divorces are unlikely? That doesn’t feel like a sufficient explanation to me: divorces are quite frequent, and often initiated by women.
I guess I am not sure what exactly is the women’s utility function that this model assumes.
Kids, not wanting to lose money in divorce, other value the partner provides, general lack of agency, hoping that the situation will magically improve… probably all of that together.
Also, it seems to me that often both partners lose value on the dating market when they start taking their relationship for granted, stop trying hard, gain weight, stop doing interesting things, and generally get older. Even if the guy is frustrated, that doesn’t automatically mean that entering the dating market again would make him happy. I imagine that many divorced men find out that an alternative to “sex once a month” could also be “sex never” (or “sex once a month, but it also takes a lot of time and effort and money”).
Worth noting that this pattern occurs among gay couples as well! (i.e. sexless long-term-relationship, where one party is unhappy about this).
I think that conflict in desires/values is inherent in all relationship, and long-term-relationships have more room for conflict because they involve a closer/longer relationship. Sex drive is a major area where partners tend to diverge especially frequently (probably just for biological reasons in het couples).
It’s not obvious to me that sex in marriages needs much special explanation beyond the above. Unless of course the confusion is just “why don’t people immediately end all relationships whenever their desires conflict with those of their counterparty”.
A general source of problems is that when people try to get a new partner, they try to be… more appealing than usual, in various ways. Which means that after the partner is secured, the behavior reverts to the norm, which is often a disappointment.
One way how people try to impress their partners is that the one with lower sexual drive pretends to be more enthusiastic about sex than they actually are in long term. So the moment one partner goes “amazing, now I finally have someone who is happy to do X every day or week”, the other partner goes “okay, now that the courtship phase is over, I guess I no longer have to do X every day or week”.
There are also specific excuses in heterosexual couples, like the girl pretending that she is actually super into doing sex whenever possible, it’s just that she is too worried about accidental pregnancy or her reputation… and when these things finally get out of the way, it turns out that it was just an excuse.
Perhaps the polyamorous people keep themselves in better shape, but I suspect that they have similar problems, only instead of “my partner no longer wants to do X” it is “my partner no longer wants to do X with me”.
I thought I would give you another causal model based on neuroscience which might help.
I think your models are missing a core biological mechanism: nervous system co-regulation.
Most analyses of relationship value focus on measurable exchanges (sex, childcare, financial support), but overlook how humans are fundamentally regulatory beings. Our nervous systems evolved to stabilize through connection with others.
When you share your life with someone, your biological systems become coupled. This creates several important values:
Your stress response systems synchronize and buffer each other. A partner’s presence literally changes how your body processes stress hormones—creating measurable physiological benefits that affect everything from immune function to sleep quality.
Your capacity to process difficult emotions expands dramatically with someone who consistently shows up for you, even without words.
Your nervous system craves predictability. A long-term partner represents a known regulatory pattern that helps maintain baseline homeostasis—creating a biological “home base” that’s deeply stabilizing.
For many men, especially those with limited other sources of deep co-regulation, these benefits may outweigh sexual dissatisfaction. Consider how many men report feeling “at peace” at home despite minimal sexual connection—their nervous systems are receiving significant regulatory benefits.
This also explains why leaving feels so threatening beyond just practical considerations. Disconnecting an integrated regulatory system that has developed over years registers in our survival-oriented brains as a fundamental threat.
This isn’t to suggest people should stay in unfulfilling relationships—rather, it helps explain why many do, and points to the importance of developing broader regulatory networks before making relationship transitions.
reading it is weird, because my model is somewhat the opposite—more women initiate divorce then men, and more women will gain from initiating it, and remain in relationships they should leave.
women make more of the housework, more of the emotional labor (the point about women require emotional work is wildly contradicting my model), more of the maintaining social ties (there are studies i read about that, and socialization reasons for that. women have more friends and more intimate friends, and a lot of men freeload on their gf friendships and have no intimate relationship that is not romantic).
it can be that both are true, and it’s not hard imagining two deeply incompatible people, when breaking up will be net-positive for both of them. but this is not my actual model, nor are the statistics i encountered—for example, that married men live longer, while married women shorter. in my model, in standard marriage, the wins-from-trade are distributed unevenly, and a lot of times the man gain and the woman lose. and all that still hold marriages is kids, and the remains of social stigma. and i know various statistics -about housework and happiness after the spouse die and life expectancy that does not contradict this model.
I also encountered a lot of anecdata that sounds like (not actual citation) “i broke up, this bf made my life so much worse” and even (not actual citation) “i divorced, and despite having to do all the work alone and not having the money he provided, i have more time, because he was so useless housework and childcare-wise, that he net-added work, and i much easier without him.”
so, like, models when marriages are net-negative for men look to me so strange, and one that i don’t know how to reconcile with so much contradicting data.
An obvious answer you missed: Lacking a prenup, courts often rule in favor of the woman over the man in the case of a contested divorce.
girl prety
personal desire to be worthy of being an example vindicating the hope that good guys can ‘get the girl’; giving up on one means nothing will ever stay and doom is eternal
Consider two claims:
Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model
Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility
These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.
I expect that many peoples’ intuitive mental models around utility maximization boil down to “boo utility maximizer models”, and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
FWIW I endorse the second claim when the utility function depends exclusively on the state of the world in the distant future, whereas I endorse the first claim when the utility function can depend on anything whatsoever (e.g. what actions I’m taking right this second). (details)
I wish we had different terms for those two things. That might help with any alleged yay/boo reasoning.
(When Eliezer talks about utility functions, he seems to assume that it depends exclusively on the state of the world in the distant future.)
Expected Utility Maximization is Not Enough
Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let’s take this as a reasonable assumption).
No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let’s assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations).
So, my claim is not that modeling a system as an expected utility maximizer can’t be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned.
Of course, you can model any system, as an expected utility maximizer. But just because I can use the “high level” conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system.
So the following seems to be beside the point unless I am <missing/misunderstanding> something:
Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I’m saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.
I am not an expert, but as I remember it, it was a claim that “any system that follows certain axioms can be modeled as maximizing some utility function”. The axioms assumed that there were no circular preferences—if someone prefers A to B, B to C, and C to A, it is impossible to define a utility function such that u(A) > u(B) > u(C) > u(A) -- and that if the system says that A > B > C, it can decide between e.g. a 100% chance of B, and a 50% chance of A with a 50% chance of C, again in a way that is consistent.
I am not sure how this works when the system is allowed to take current time into account, for example when it is allowed to prefer A to B on Monday but prefer B to A on Tuesday. I suppose that in such situation any system can trivially be modeled by a utility function that at each moment assigns utility 1 to what the system actually did in that moment, and utility 0 to everything else.
Corrigibility is incompatible with assigning utility to everything in advance. A system that has preferences about future will also have a preference about not having its utility function changed. (For the same reason people have a preference not to be brainwashed, or not to take drugs, even if after brainwashing they are happy about having been brainwashed, and after getting addicted they do want more drugs.)
Corrigible system would be like: “I prefer A to B at this moment, but if humans decide to fix me and make me prefer B to A, then I prefer B to A”. In other words, it doesn’t have values for u(A) and u(B), or it doesn’t always act according to those values. A consistent system that currently prefers A to B would prefer not to be fixed.
I think John’s 1st bullet point was referring to an argument you can find in https://www.lesswrong.com/posts/NxF5G6CJiof6cemTw/coherence-arguments-do-not-entail-goal-directed-behavior and related.
A utility function represents preference elicited in a large collection of situations, each a separate choice between events that happens with incomplete information, as an event is not a particular point. This preference needs to be consistent across different situations to be representable by expected utility of a single utility function.
Once formulated, a utility function can be applied to a single choice/situation, such as a choice of a policy. But a system that only ever makes a single choice is not a natural fit for expected utility frame, and that’s the kind of system that usually appears in “any system can be modeled as maximizing some utility function”. So it’s not enough to maximize something once, or in a narrow collection of situations, the situations the system is hypothetically exposed to need to be about as diverse as choices between any pair of events, with some of the events very large, corresponding to unreasonably incomplete information, all drawn across the same probability space.
One place this mismatch of frames happens is with updateless decision theory. An updateless decision is a choice of a single policy, once and for all, so there is no reason for it to be guided by expected utility, even though it could be. The utility function for the updateless choice of policy would then need to be obtained elsewhere, in a setting that has all these situations with separate (rather than all enacting a single policy) and mutually coherent choices under uncertainty. But once an updateless policy is settled (by a policy-level decision), actions implied by it (rather than action-level decisions in expected utility frame) no longer need to be coherent. Not being coherent, they are not representable by an action-level utility function.
So by embracing updatelessness, we lose the setting that would elicit utility if the actions were instead individual mutually coherent decisions. And conversely, by embracing coherence of action-level decisions, we get an implied policy that’s not updatelessly optimal with respect to the very precise outcomes determined by any given whole policy. So an updateless agent founded on expected utility maximization implicitly references a different non-updateless agent whose preference is elicited by making separate action-level decisions under a much greater uncertainty than the policy-level alternatives the updateless agent considers.
Completely off the cuff take:
I don’t think claim 1 is wrong, but it does clash with claim 2.
That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way “whatever utility function is maximizes must be along multiple dimensions”.
Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else).
Note to self: Think more about this and if possible write up something more coherent and explanatory.
One second-order effect of the pandemic which I’ve heard talked about less than I’d expect:
This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it’s really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, …
For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.
How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the ‘making fast food in a stall in a Third World country’ sort of ‘startup’, which make essentially no or negative long-term contributions).
Good question. I haven’t seen particularly detailed data on these on FRED, but they do have separate series for “high propensity” business applications (businesses they think are likely to hire employees), business applications with planned wages, and business applications from corporations, as well as series for each state. The spike is smaller for planned wages, and nonexistent for corporations, so the new businesses are probably mostly single proprietors or partnerships. Other than that, I don’t know what the breakdown looks like across industries.
How do you feel about this claim now? I haven’t noticed a whole lot of innovation coming from all these small businesses, and a lot of them seem like they were likely just vehicles for the extraordinary extent of fraud as the results from all the investigations & analyses come in.
Well, it wasn’t just a temporary bump:
… so it’s presumably also not just the result of pandemic giveaway fraud, unless that fraud is ongoing.
Presumably the thing to check here would be TFP, but Fred’s US TFP series currently only goes to end of 2019, so apparently we’re still waiting on that one? Either that or I’m looking at the wrong series.
Somebody should post this on Paul Graham’s twitter. He would be very interested in it (I can’t): https://mobile.twitter.com/paulg
Neat problem of the week: researchers just announced roughly-room-temperature superconductivity at pressures around 270 GPa. That’s stupidly high pressure—a friend tells me “they’re probably breaking a diamond each time they do a measurement”. That said, pressures in single-digit GPa do show up in structural problems occasionally, so achieving hundreds of GPa scalably/cheaply isn’t that many orders of magnitude away from reasonable, it’s just not something that there’s historically been much demand for. This problem plays with one idea for generating such pressures in a mass-produceable way.
Suppose we have three materials in a coaxial wire:
innermost material has a low thermal expansion coefficient and high Young’s modulus (i.e. it’s stiff)
middle material is a thin cylinder of our high-temp superconducting concoction
outermost material has a high thermal expansion coefficient and high Young’s modulus.
We construct the wire at high temperature, then cool it. As the temperature drops, the innermost material stays roughly the same size (since it has low thermal expansion coefficient), while the outermost material shrinks, so the superconducting concoction is squeezed between them.
Exercises:
Find an expression for the resulting pressure in the superconducting concoction in terms of the Young’s moduli, expansion coefficients, temperature change, and dimensions of the inner and outer materials. (Assume the width of the superconducting layer is negligible, and the outer layer doesn’t break.)
Look up parameters for some common materials (e.g. steel, tungsten, copper, porcelain, aluminum, silicon carbide, etc), and compute the pressures they could produce with reasonable dimensions (assuming that their material properties don’t change too dramatically with such high pressures).
Find an expression for the internal tension as a function of radial distance in the outermost layer.
Pick one material, look up its tensile strength, and compute how thick it would have to be to serve as the outermost layer without breaking, assuming the superconducting layer is at 270 GPa.
So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books and Eliezer’s commentary on ASC’s latest linkpost, and I have cached thoughts on the matter.
My cached thoughts start with a somewhat different question—not “what role does magic play in fantasy fiction?” (e.g. what fantasies does it fulfill), but rather… insofar as magic is a natural category, what does it denote? So I’m less interested in the relatively-expansive notion of “magic” sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called “magic” which recurs among tons of real-world ancient cultures.
Claim (weakly held): the main natural category here is symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore—it doesn’t make the world do something different. But if the symbols are “magic”, then changing the symbols changes the things they represent in the world. Canonical examples:
Wizard/shaman/etc draws magic symbols, speaks magic words, performs magic ritual, or even thinks magic thoughts, thereby causing something to happen in the world.
Messing with a voodoo doll messes with the person it represents.
“Sympathetic” magic, which explicitly uses symbols of things to influence those things.
Magic which turns emotional states into reality.
I would guess that most historical “magic” was of this type.
Everybody’s been talking about Paxlovid, and how ridiculous it is to both stop the trial since it’s so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don’t think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.
Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then I just stop there. (Indeed, it looks like the Paxlovid study only had 30 actual data points, i.e. people hospitalized.) Rather than only getting “significance” if all 100 data points together are significant, I can declare “significance” if the p-value drops below the line at any time. That gives me a lot more choices in the garden of forking counterfactual paths.
Now, success rates on most clinical trials are not very high. (They vary a lot by area—most areas are about 15-25%. Cancer is far and away the worst, below 4%, and vaccines are the best, over 30%.) So I’d expect that p-hacking is a pretty large chunk of approved drugs, which means pharma companies are heavily selected for things like finding-excuses-to-halt-good-seeming-trials-early.
It was stopped after a pre-planned interim analysis; that means they’re calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.
Here’s an AI-driven external cognitive tool I’d like to see someone build, so I could use it.
This would be a software tool, and the user interface would have two columns. In one column, I write. Could be natural language (like google docs), or code (like a normal IDE), or latex (like overleaf), depending on what use-case the tool-designer wants to focus on. In the other column, a language and/or image model provides local annotations for each block of text. For instance, the LM’s annotations might be:
(Natural language or math use-case:) Explanation or visualization of a mental picture generated by the main text at each paragraph
(Natural language use-case:) Emotional valence at each paragraph
(Natural language or math use-case:) Some potential objections tracked at each paragraph
(Code:) Fermi estimates of runtime and/or memory usage
This is the sort of stuff I need to track mentally in order to write high-quality posts/code/math, so it would potentially be very high value to externalize that cognition.
Also, the same product could potentially be made visible to readers (for the natural language/math use-cases) to make more visible the things the author intends to be mentally tracked. That, in turn, would potentially make it a lot easier for readers to follow e.g. complicated math.
Can you share your prompts and if you consider the output satisfactory for some example test cases?
I haven’t experimented very much, but here’s one example prompt.
This one produced basically-decent results from GPT-4.
Although I don’t have the exact prompt on hand at the moment, I’ve also asked GPT-4 to annotate a piece of code line-by-line with a Fermi estimate of its runtime, which worked pretty well.
Yeah i was thinking your specs were, well
Wrap gpt-4 and Gemini, columned output over a set of text, applying prompts to each section? Prototype in a weekend.
Make the AI able to meaningfully contribute non obvious comments to help someone who already is an expert?
https://xkcd.com/1425/
Don’t really need comments which are non-obvious to an expert. Part of what makes LLMs well-suited to building external cognitive tools is that external cognitive tools can create value by just tracking “obvious” things, thereby freeing up the user’s attention/working memory for other things.
So kinda like spellcheckers (most typos you could figure out, but why spend time and attention on proofreading if the program can do that for you), but… thought-checkers.
Like, if a part of your article contradicts another part, it would be underlined.
I’ve long wanted this, but it’s not clear how to do it. Long-context LLMs are still expensive and for authors who need it most, context windows are still too small: me or Yudkowsky, for example, would still exceed the context window of almost all LLMs except possibly the newest Gemini. And then you have their weak reasoning. You could try to RAG it, but embeddings are not necessarily tuned to encode logically contradictory or inconsistent claims: probably if I wrote “the sky is blue” in one place and “the sky is red” in another, a retrieval would be able to retrieve both paragraphs and a LLM point out that they are contradictory, but such blatant contradictions are probably too rare to be useful to check for. You want something more subtle, like where you say “the sky is blue” and elsewhere “I looked up from the ground and saw the color of apples”. You could try to brute force it and consider every pairwise comparison of 2 reasonable sized chunks of text and ask for contradictions, but this is quadratic and will get slow and expensive and probably turn up too many false positives. (And how do you screen off false positives and mark them ‘valid’?)
My general thinking these days is that these truly useful ‘tools for thought’ LLMs are going to require either much better & cheaper LLMs, so smart that they can provide useful assistance despite being used in a grossly unnatural way input-wise or safety-tuned to hell, or biting the bullet of finetuning/dynamic-evaluation (see my Nenex proposal).
A LLM finetuned on my corpus can hope to quickly find, with good accuracy, contradictions because it was trained to know ‘the sky was blue’ when I wrote that at the beginning of the corpus, and it gets confused when it hits ‘the color of ____’ and it gets the prediction totally wrong. And RAG on an embedding tailored to the corpus can hope to surface the contradictions because it sees the two uses are the same in the essays’ context, etc. (And if you run them locally, and they don’t need a large context window because of the finetuning, they will be fast and cheap, so you can more meaningfully apply the brute force approach; or you could just run multiple epoches on your data, with an auxiliary prompt asking for a general critique, which would cover contradictions. ‘You say here X, but don’t I recall you saying ~X back at the beginning? What gives?’)
Perhaps you could do it in multiple steps.
Feed it a shorter text (that fits in the window) and ask it to provide a short summary focusing on factual statements. Then hopefully all short versions could fit in the window. Find the contradiction—report the two contradicting factual statements and which section they appeared in. Locate the statement in the original text.
Did you write more than 7 million words yet @gwern? https://www.google.com/amp/s/blog.google/technology/ai/google-gemini-next-generation-model-february-2024/amp/
Basically it’s the “lazy wait” calculation. Get something to work now or wait until the 700k or 7m word context window ships.
I may have. Just gwern.net is, I think, somewhere around 2m, and it’s not comprehensive. Also, for contradictions, I would want to detect contradictions against citations/references as well (detecting miscitations would be more important than self-consistency IMO), and as a rough ballpark, the current Gwern.net annotation* corpus is approaching 4.3m words, looks like, and is also not comprehensive. So, closer than one might think! (Anyway, doesn’t deal with the cost or latency: as you can see in the demos, we are talking minutes, not seconds, for these million-token calls and the price is probably going to be in the dollar+ regime per call.)
* which are not fulltext. It would be nice to throw in all of the hosted paper & book & webpage fulltexts, but then that’s probably more like 200m+ words.
There isn’t any clear technical obstruction to getting this time down pretty small with more parallelism.
There may not be any ‘clear’ technical obstruction, but it has failed badly in the past. ‘Add more parallelism’ (particularly hierarchically) is one of the most obvious ways to improve attention, and people have spent the past 5 years failing to come up with efficient attentions that do anything but move along a Pareto frontier from ‘fast but doesn’t work’ to ‘slow and works only as well as the original dense attention’. It’s just inherently difficult to know what tokens you will need across millions of tokens without input from all the other tokens (unless you are psychic), implying extensive computation of some sort, which makes things inherently serial and costs you latency, even if you are rich enough to spend compute like water. You’ll note that when Claude-2 was demoing the ultra-long attention windows, it too spent a minute or two churning. While the most effective improvements in long-range attention like Flash Attention or Ring Attention are just hyperoptimizing dense attention, which is inherently limited.
I’ve long been very suspicious of aggregate economic measures like GDP. But GDP is clearly measuring something, and whatever that something is it seems to increase remarkably smoothly despite huge technological revolutions. So I spent some time this morning reading up and playing with numbers and generally figuring out how to think about the smoothness of GDP increase.
Major takeaways:
When new tech makes something previously expensive very cheap, GDP mostly ignores it. (This happens in a subtle way related to how we actually compute it.)
Historical GDP curves mainly measure things which are expensive ~now. Things which are cheap now are mostly ignored. In other words: GDP growth basically measures the goods whose production is revolutionized the least.
Re: AI takeoff, the right way to extrapolate today’s GDP curve to post-AI is to think about things which will still be scarce post-AI, and then imagine the growth of production of those things.
Even a very sharp, economically-revolutionary AI takeoff could look like slow smooth GDP growth, because GDP growth will basically only measure the things whose production is least revolutionized.
Why am I harping on about technicalities of GDP? Well, I hear about some AI forecasts which are heavily based on the outside view that economic progress (as measured by GDP) is smooth, and this is so robust historically that we should expect it to continue going forward. And I think this is basically right—GDP, as we actually compute it, is so remarkably smooth that we should expect that to continue. Alas, this doesn’t tell us very much about how crazy or sharp AI takeoff will be, because GDP (as we actually compute it) systematically ignores anything that’s revolutionized.
If you want a full post on this, upvote this comment.
In writing How much should we value life?, I spent some time digging into AI timeline stuff. It lead me to When Will AI Be Created?, written by Luke Muehlhauser for MIRI. He noted that there is reason not to trust expert opinions on AI timelines, and that trend extrapolation may be a good alternative. This point you’re making about GDP seems like it is real progress towards coming up with a good way to do trend extrapolation, and thus seems worth a full post IMO. (Assuming it isn’t already well known by the community or something, which I don’t get the sense is the case.)
Upvoted, but I mostly trust you to write the post if it seems like there’s an interesting meaty thing worth saying.
Eh, these were the main takeaways, the post would just be more details and examples so people can see the gears behind it.
A similar point is made by Korinek in his review of Could Advanced AI Drive Explosive Economic Growth:
In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on “objective” metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.
[Epistemic status: highly speculative]
Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.
Brief update on how it’s going with RadVac.
I’ve been running ELISA tests all week. In the first test, I did not detect stronger binding to any of the peptides than to the control in any of several samples from myself or my girlfriend. But the control itself was looking awfully suspicious, so I ran another couple tests. Sure enough, something in my samples is binding quite strongly to the control itself (i.e. the blocking agent), which is exactly what the control is supposed to not do. So I’m going to try out some other blocking agents, and hopefully get an actually-valid control group.
(More specifics on the test: I ran a control with blocking agent + sample, and another with blocking agent + blank sample, and the blocking agent + sample gave a strong positive signal while the blank sample gave nothing. That implies something in the sample was definitely binding to both the blocking agent and the secondary antibodies used in later steps, and that binding was much stronger than the secondary antibodies themselves binding to anything in the blocking agent + blank sample.)
In other news, the RadVac team released the next version of their recipe + whitepaper. Particularly notable:
Note that they’re talking specifically about serum (i.e. blood) antibodies here. So apparently injecting it does induce blood antibodies of the sort detectable by commercial tests (at least some of the time), but snorting it mostly just produces mucosal antibodies (also at least some of the time).
This is a significant update: most of my prior on the vaccine working was based on vague comments in the previous radvac spec about at least some people getting positive test results. But we didn’t know what kind of test results those were, so there was a lot of uncertainty about exactly what “working” looked like. In particular, we didn’t know whether antibodies were induced in blood or just mucus, and we didn’t know if they were induced consistently or only in some people (the latter of which is the “more dakka probably helps” world). Now we know that it’s mostly just mucus (at least for nasal administration). Still unsure about how consistently it works—the wording in the doc makes it sound like only some people saw a response, but I suspect the authors are just hedging because they know there’s both selection effects and a lot of noise in the data which comes back to them.
The latest version of the vaccine has been updated to give it a bit more kick—slightly higher dose, and the chitosan nanoparticle formula has been changed in a way which should make the peptides more visible to the immune system. Also, the list of peptides has been trimmed down a bit, so the latest version should actually be cheaper, though the preparation is slightly more complex.
I would expect that hedging also happens because making definitive clinical claims has more danger from the FDA then making hedged statements.
Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.
I had a shortform post pointing out the recent big jump in new businesses in the US, and Gwern replied:
This was a good question in context, but I disagree with Gwern’s model of where-progress-comes-from, especially in the context of small businesses.
Let’s talk ice-cream cones.
As the story goes, an ice-cream vendor was next door to a waffle vendor at the 1904 World’s Fair. At some point, the ice-cream vendor ran short on paper cups, and inspiration struck. He bought some thin waffles from the waffle vendor, rolled them into cones, and ice-cream cones took off.
That’s just the first step. From there, the cone spread memetically. People heard about it, and either asked for cones (on the consumer side) or tried making them (on the supplier side).
Insight + Memetics → Better Food
When I compare food today to the stuff my grandparents ate, there’s no comparison. Today’s dishes are head and shoulders better. Partly it’s insights like ice-cream cones, partly it’s memetic spread of dishes from more parts of the world (like sisig, soup dumplings, ropa vieja, chicken Karahi, …).
Those little fast-food stalls? They’re powerhouses of progress. It’s a hypercompetitive market, with low barriers to entry, and lots of repeat business. The conditions are ideal for trying out new dishes, spreading culinary ideas and finding out the hard way what people like to eat. That doesn’t mean they’re highly profitable—culinary innovation spreads memetically, so it’s hard to capture the gains. But progress is made.
The pandemic also has the effect of showing the kind of business ideas people try. It pushes a lot of innovation in food delivery. Some of the pandemic driver innovation will become worthless once the pandemic is over but a few good ideas likely survive and the old ideas of the businesses that went out of business are still around.
Does The Information-Throughput-Maximizing Input Distribution To A Sparsely-Connected Channel Satisfy An Undirected Graphical Model?
[EDIT: Never mind, proved it.]
Suppose I have an information channel X→Y. The X components X1,...,Xm and the Y components Y1,...,Yn are sparsely connected, i.e. the typical Yi is downstream of only a few parent X-components Xpa(i). (Mathematically, that means the channel factors as P[Y|X]=∏iP[Yi|Xpa(i)].)
Now, suppose I split the Y components into two sets, and hold constant any X-components which are upstream of components in both sets. Conditional on those (relatively few) X-components, our channel splits into two independent channels.
E.g. in the image above, if I hold X4 constant, then I have two independent channels: (X1,X2,X3)→(Y1,Y2,Y3,Y4) and (X5,X6,X7)→(Y5,Y6,Y7,Y8).
Now, the information-throughput-maximizing input distribution to a pair of independent channels is just the product of the throughput maximizing distributions for the two channels individually. In other words: for independent channels, we have independent throughput maximizing distribution.
So it seems like a natural guess that something similar would happen in our sparse setup.
Conjecture: The throughput-maximizing distribution for our sparse setup is independent conditional on overlapping X-components. E.g. in the example above, we’d guess that P[X]=P[X4]P[X1,X2,X3|X4]P[X5,X6,X7|X4] for the throughput maximizing distribution.
If that’s true in general, then we can apply it to any Markov blanket in our sparse channel setup, so it implies that P[X] factors over any set of X components which is a Markov blanket splitting the original channel graph. In other words: it would imply that the throughput-maximizing distribution satisfies an undirected graphical model, in which two X-components share an edge if-and-only-if they share a child Y-component.
It’s not obvious that this works mathematically; information throughput maximization (i.e. the optimization problem by which one computes channel capacity) involves some annoying coupling between terms. But it makes sense intuitively. I’ve spent less than an hour trying to prove it and mostly found it mildly annoying though not clearly intractable. Seems like the sort of thing where either (a) someone has already proved it, or (b) someone more intimately familiar with channel capacity problems than I am could easily prove it.
So: anybody know of an existing proof (or know that the conjecture is false), or find this conjecture easy to prove themselves?
Proof
Specifically, we’ll show that there exists an information throughput maximizing distribution which satisfies the undirected graph. We will not show that all optimal distributions satisfy the undirected graph, because that’s false in some trivial cases—e.g. if all the Y’s are completely independent of X, then all distributions are optimal. We will also not show that all optimal distributions factor over the undirected graph, which is importantly different because of the P[X]>0 caveat in the Hammersley-Clifford theorem.
First, we’ll prove the (already known) fact that an independent distribution P[X]=P[X1]P[X2] is optimal for a pair of independent channels (X1→Y1,X2→Y2); we’ll prove it in a way which will play well with the proof of our more general theorem. Using standard information identities plus the factorization structure Y1−X1−X2−Y2 (that’s a Markov chain, not subtraction), we get
MI(X;Y)=MI(X;Y1)+MI(X;Y2|Y1)
=MI(X;Y1)+(MI(X;Y2)−MI(Y2;Y1)+MI(Y2;Y1|X))
=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1)
Now, suppose you hand me some supposedly-optimal distribution P[X]. From P, I construct a new distribution Q[X]:=P[X1]P[X2]. Note that MI(X1;Y1) and MI(X2;Y2) are both the same under Q as under P, while MI(Y2;Y1) is zero under Q. So, because MI(X;Y)=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1), the MI(X;Y) must be at least as large under Q as under P. In short: given any distribution, I can construct another distribution with as least as high information throughput, under which X1 and X2 are independent.
Now let’s tackle our more general theorem, reusing some of the machinery above.
I’ll split Y into Y1 and Y2, and split X into X1−2 (parents of Y1 but not Y2), X2−1 (parents of Y2 but not Y1), and X1∩2 (parents of both). Then
MI(X;Y)=MI(X1∩2;Y)+MI(X1−2,X2−1;Y|X1∩2)
In analogy to the case above, we consider distribution P[X], and construct a new distribution Q[X]:=P[X1∩2]P[X1−2|X1∩2]P[X2−1|X1∩2]. Compared to P, Q has the same value of MI(X1∩2;Y), and by exactly the same argument as the independent case MI(X1−2,X2−1;Y|X1∩2) cannot be any higher under Q; we just repeat the same argument with everything conditional on X1∩2 throughout. So, given any distribution, I can construct another distribution with at least as high information throughput, under which X1−2 and X2−1 are independent given X1∩2.
Since this works for any Markov blanket X1∩2, there exists an information thoughput maximizing distribution which satisfies the desired undirected graph.
I suppose another way to look at this is the overlapping components are the blanket states in some kind of time dependent markov blanket setup, right?
In the scenario you created you could treat x1,x2,x3as the some shielded state at time step t, so it. Then x5,x6,x7 are states outside of the blanket, so et (which group of states is i and which is e don’t really matter, so long as they are on either side of the blanket). y1,y2,y3,y4 [1]become it+1, and y5,y6,y7,y8 become et+1.
Then x4 becomes the blanket bt such that
I(it+1,et+1|bt)≈0
and
P(it+1,et+1|it,et,bt)=P(it+1|it,bt)⋅P(et+1|et,bt)
With all that implies. In fact you can just as easily have three shielded states, or four, using this formulation.
(the setup for this is shamelessly ripped off from @Gunnar_Zarncke ’s unsupervised agent detection work)
Did you miss an arrow going to y4 ?
(Was in the middle of writing a proof before noticing you did it already)
I believe the end result is that if we have Y=(Y1,Y2), X=(X1,X2,X3) with P(Y|X)=P(Y1|X1,X3)P(Y2|X2,X3) (X1 upstream of Y1, X2 upstream of Y2, X3 upstream of both),
then maximizing I(X;Y) is equivalent to maximizing I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).
& for the proof we can basically replicate the proof for additivity except substituting the factorization P(X1,X2,X3)=P(X3)P(X1|X3)P(X2|X3) as assumption in place of independence, then both directions of inequality will result in I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).
[EDIT: Forgot −I(Y1;Y2) term due to marginal dependence P(Y1,Y2)≠P(Y1)P(Y2)]
Does anyone know of an “algebra for Bayes nets/causal diagrams”?
More specifics: rather than using a Bayes net to define a distribution, I want to use a Bayes net to state a property which a distribution satisfies. For instance, a distribution P[X, Y, Z] satisfies the diagram X → Y → Z if-and-only-if the distribution factors according to
P[X, Y, Z] = P[X] P[Y|X] P[Z|Y].
When using diagrams that way, it’s natural to state a few properties in terms of diagrams, and then derive some other diagrams they imply. For instance, if a distribution P[W, X, Y, Z] satisfies all of:
W → Y → Z
W → X → Y
X → (W, Y) → Z
… then it also satisfies W → X → Y → Z.
What I’m looking for is a set of rules for “combining diagrams” this way, without needing to go back to the underlying factorizations in order to prove things.
David and I have been doing this sort of thing a lot in our work the past few months, and it would be nice if someone else already had a nice write-up of the rules for it.
Weather just barely hit 80°F today, so I tried the Air Conditioner Test.
Three problems came up:
Turns out my laser thermometer is all over the map. Readings would change by 10°F if I went outside and came back in. My old-school thermometer is much more stable (and well-calibrated, based on dipping it in some ice water), but slow and caps out around 90°F (so I can’t use to measure e.g. exhaust temp). I plan to buy a bunch more old-school thermometers for the next try.
I thought opening the doors/windows in rooms other than the test room and setting up a fan would be enough to make the temperature in the hall outside the test room close to outdoor temp. This did not work; hall temp was around 72°F with outside around 80°F. I’ll need to change that part of the experiment design; most likely I’ll seal around the door and let air infiltrate exclusively from the window instead. (The AC is right next to the window, so this could screw with the results, but I don’t really have a better option.)
In two-hose mode, the AC hit its minimum temperature of 60°F, so I’ll need a hotter day. I’ll try again when we hit at least 85°F.
In case anyone’s wondering: in one-hose mode, the temperature in the room equilibrated around 66°F. Power consumption was near-constant throughout all conditions.
One additional Strange Observation: cool air was blowing out under the door of the test room in two-hose mode. This should not happen; my best guess is that, even though the AC has two separate intake vents, the two are not actually partitioned internally, so the fan for indoor-air was pulling in outdoor-air (causing air to blow out under the door to balance that extra inflow). Assuming that’s the cause, it should be fixable with some strategically-placed cardboard inside the unit.
Chrome is offering to translate the LessWrong homepage for me. Apparently, it is in Greek.
Huh, amusing. We do ship a font that has nothing but the greek letter set in it, because people use greek unicode symbols all the time and our primary font doesn’t support that character set. So my guess is that’s where Google gets confused.
Oh, I had just assumed it was commentary on the writing style/content.
If about 10% of articles have “Ω” in their title, what is the probability that the page is in Greek? :D
What if physics equations were written like statically-typed programming languages?
(mass⋅lengthtime2:F)=(mass−:m)(lengthtime2:a)
(masslength⋅time2:P)(length3−:V)=(−−:N)(mass⋅length2time2⋅temp:R)(temp−:T)
The math and physics worlds still use single-letter variable names for everything, decades after the software world realized that was extremely bad practice. This makes me pessimistic about the adoption of better notation practices.
Better? I doubt it. If physicists wrote equations the way programmers write code, a simple homework problem would easily fill ten pages.
Verboseness works for programmers because programmers rarely need to do anything more complicated with their code than run it—analogous to evaluating an expression, for a physicist or mathematician. Imagine if you needed to prove one program equivalent to another algebraically—i.e. a sequence of small transformations, with a record of intermediate programs derived along the way in order to show your work. I expect programmers subjected to such a use-case would quickly learn the virtues of brevity.
Related to that: You have much fewer variables under consideration that you can even have standard names for. A remnant of this effect can be seen in typical Fortan programs.
Yeah, I’m apparently not intelligent enough to do error-free physics/engineering calculations without relying on dimensional analysis as a debugging tool. I even came up with a weird, hack-y way to do that in computing environments like Excel and Cython, where flexible multiplicative types are not supported.
Is interpersonal variation in anxiety levels mostly caused by dietary iron?I stumbled acrossthis paperyesterday. I haven’t looked at it very closely yet, but the high-level pitch is that they look at genetic predictors of iron deficiency and then cross that with anxiety data. It’s interesting mainly because it sounds pretty legit (i.e. the language sounds like direct presentation of results without any bullshitting, the p-values are satisfyingly small, there’s no branching paths), and the effect sizes are BIG IIUC:Odds ratio of anxiety disorders changes by roughly 0.9 per standard deviation in iron level, across four different measures of iron level. (Note that TIBC, the last of the four iron level measures, didn’t hit statistical significance but did have a similar effect size to the other three.)Just eyeballing those effect sizes… man, it kinda sounds like iron levels are maybethemain game for most anxiety? Am I interpreting that right? Am I missing something here?EDIT: I read more, and it turns out the wording of the part I quoted was misleading. The number 0.922, for instance, was the odds ratio AT +1 standard deviation serum iron level, not PER +1 standard deviation serum iron level. That would be −0.078 PER standard deviation serum iron level, so it’s definitely not the “main game for most anxiety”.
Have you tested this hypothesis on your friends? Ask them for their iron level from last blood test, and ask them to self-report anxiety level (you also make a separate estimate of their anxiety level).
I keep seeing news outlets and the like say that SORA generates photorealistic videos, can model how things move in the real world, etc. This seems like blatant horseshit? Every single example I’ve seen looks like video game animation, not real-world video.
Have I just not seen the right examples, or is the hype in fact decoupled somewhat from the model’s outputs?
I think I mildly disagree, but probably we’re looking at the same examples.
I think the most impressive (in terms of realism) videos are under “Sora is able to generate complex scenes with multiple characters, …”. (Includes white SUV video and Toyko suburbs video.)
I think all of these videos other than the octopus and paper planes are “at-a-glance” photorealistic to me.
Overall, I think SORA can do “at-a-glance” photorealistic videos and can model to some extent how things move in the real world. I don’t think it can do both complex motion and photorealism in the same video. As in, the videos which are photorealistic don’t really involve complex motion and the videos which involve complex motion aren’t photorealistic.
(So probably some amount of hype, but also pretty real?)
Hmm, I don’t buy it. These two scenes seem very much not like the kind of thing a video game engine could produce:
Look at this frame! I think there is something very slightly off about that face, but the cat hitting the person’s face and the person’s reaction seem very realistic to me and IMO qualifies as “complex motion and photorealism in the same video”.
Were these supposed to embed as videos? I just see stills, and don’t know where they came from.
These are stills from some of the videos I was referencing.
TBC, I wasn’t claiming anything about video game engines.
I wouldn’t have called the cat one “complex motion”, but I can see where you’re coming from.
Yeah, I mean I guess it depends on what you mean by photorealistic. That cat has three front legs.
Yeah, this is the example I’ve been using to convince people that the game engines are almost certainly generating training data but are probably not involved at sampling time. I can’t come up with any sort of hybrid architecture like ‘NN controlling game-engine through API’ where you get that third front leg. One of the biggest benefits of a game-engine would be ensuring exactly that wouldn’t happen—body parts becoming detached and floating in mid-air and lack of conservation. If you had a game engine with a hyper-realistic cat body model in it which something external was manipulating, one of the biggest benefits is that you wouldn’t have that sort of common-sense physics problem. (Meanwhile, it does look like past generative modeling of cats in its errors. Remember the ProGAN interpolation videos of CATS? Hilarious, but also an apt demonstration of how extremely hard cats are to model. They’re worse than hands.)
In addition, you see plenty of classic NN tells throughout—note the people driving a ‘Dandrover’...
Yeah, those were exactly the two videos which most made me think that the model was mostly trained on video game animation. In the tokyo one, the woman’s facial muscles never move at all, even when the camera zooms in on her. And in the SUV one, the dust cloud isn’t realistic, but even covering that up the SUV has a Grand Theft Auto look to its motion.
“Can’t do both complex motion and photorealism in the same video” is a good hypothesis to track, thanks for putting that one on my radar.
(Note that I was talking about the one with the train going through Toyko suburbs.)
Putting this here for posterity: I have thought since the superconductor preprint went up, and continue to think, that the markets are putting generally too little probability on the claims being basically-true. I thought ~70% after reading the preprint the day it went up (and bought up a market on manifold to ~60% based on that, though I soon regretted not waiting for a better price), and my probability has mostly been in the 40-70% range since then.
After seeing the markets jump up in response to the latest, I think I’m more like 65-80%.
Languages should have tenses for spacelike separation. My friend and I do something in parallel, it’s ambiguous/irrelevant which one comes first, I want to say something like “I expect my friend <spacelike version of will do/has done/is doing> their task in such-and-such a way”.
That sounds more like a tenseless sentence than using a spacelike separation tense. Your friend’s performance of the task may well be in your future or past lightcone (or extend through both), but you don’t wish to imply any of these.
There are languages with tenseless verbs, as well as some with various types of spatial tense.
The closest I can approximate this in English without clumsy constructs is “I expect my friend does their task in such-and-such a way”, which I agree isn’t very satisfactory.
Who would have thought that someone would ever look at CSP and think “I want english to be more like that”?
lol
Future perfect (hey, that’s the name of the show!) seems like a reasonable hack for this in English
Two kinds of cascading catastrophes one could imagine in software systems...
A codebase is such a spaghetti tower (and/or coding practices so bad) that fixing a bug introduces, on average, more than one new bug. Software engineers toil away fixing bugs, making the software steadily more buggy over time.
Software services managed by different groups have dependencies—A calls B, B calls C, etc. Eventually, the dependence graph becomes connected enough and loopy enough that a sufficiently-large chunk going down brings down most of the rest, and nothing can go back up until everything else goes back up (i.e. there’s circular dependence/deadlock).
How could we measure how “close” we are to one of these scenarios going supercritical?
For the first, we’d need to have attribution of bugs—i.e. track which change introduced each bug. Assuming most bugs are found and attributed after some reasonable amount of time, we can then estimate how many bugs each bug fix introduces, on average.
(I could also imagine a similar technique for e.g. medicine: check how many new problems result from each treatment of a problem.)
For the second, we’d need visibility into codebases maintained by different groups, which would be easy within a company but much harder across companies. In principle, within a company, some kind of static analysis tool could go look for all the calls to apis between services, map out the whole graph, and then calculate which “core” pieces could be involved in a catastrophic failure.
(Note that this problem could be mostly-avoided by intentionally taking down services occasionally, so engineers are forced to build around that possibility. I don’t think any analogue of this approach would work for the first failure-type, though.)
I wish there were a fund roughly like the Long-Term Future Fund, but with an explicit mission of accelerating intellectual progress.
I mean, just to be clear, I am all in favor of intellectual progress. But doing so indiscriminately does sure seem a bit risky in this world of anthropogenic existential risks. Reminds me of my mixed feelings on the whole Progress Studies thing.
Yeah, I wouldn’t want to accelerate e.g. black-box ML. I imagine the real utility of such a fund would be to experiment with ways to accelerate intellectual progress and gain understanding of the determinants, though the grant projects themselves would likely be more object-level than that. Ideally the grants would be in areas which are not themselves very risk-relevant, but complicated/poorly-understood enough to generate generalizable insights into progress.
I think it takes some pretty specific assumptions for such a thing to increase risk significantly on net. If we don’t understand the determinants of intellectual progress, then we have very little ability to direct progress where we want it; it just follows whatever the local gradient is. With more understanding, at worst it follows the same gradient faster, and we end up in basically the same spot.
The one way it could net-increase risk is if the most likely path of intellectual progress leads to doom, and the best way to prevent doom is through some channel other than intellectual progress (like political action, for instance). Then accelerating the intellectual progress part potentially gives the other mechanisms (like political bodies) less time to react. Personally, though, I think a scenario in which e.g. political action successfully prevents intellectual progress from converging to doom (in a world where it otherwise would have) is vanishingly unlikely (like, less than one-in-a-hundred, maybe even less than one-in-a-thousand).
You might check out Donald Braben’s view, it says “transformative research” (i.e. fundamental results that create new fields and industries) is critical for the survival of civilization. He does not worry that transformative results might end civilization.
For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.
For instance: suppose I’m thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to Ninfected. But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to Ninfected. So, multiplying those two together, I’ll get a number roughly independent of Ninfected.
How general is this? Does some version of it apply to long-term scenarios too (possibly accounting for herd immunity)? What short-term decisions do depend on Ninfected?
Way back in the halcyon days of 2005, a company called Cenqua had an April Fools’ Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I’m wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there’s now a clear market leader for that particular product niche, but for real.
Archived website
You are a scholar and a gentleman.
Here is an archived version of the page :
http://web.archive.org/web/20050403015136/http://www.cenqua.com/commentator/
Here’s an interesting problem of embedded agency/True Names which I think would make a good practice problem: formulate what it means to “acquire” something (in the sense of “acquiring resources”), in an embedded/reductive sense. In other words, you should be able-in-principle to take some low-level world-model, and a pointer to some agenty subsystem in that world-model, and point to which things that subsystem “acquires” and when.
Some prototypical examples which an answer should be able to handle well:
Organisms (anything from bacteria to plant to animals) eating things, absorbing nutrients, etc.
Humans making money or gaining property.
...and how the brain figures this out and why it is motivated to do so. There are a lot of simple animals that apparently “try to control” resources or territory. How?
Drives to control resources occur everywhere. And your control of resources is closely related to your dominance in a dominance hierarchy. Which seems to be regulated in many animals by serotonin. See e.g. https://www.nature.com/articles/s41386-022-01378-2
This billboard sits over a taco truck I like, so I see it frequently:
The text says “In our communities, Kaiser Permanente members are 33% less likely to experience premature death due to heart disease.*”, with the small-text directing one to a url.
The most naive (and presumably intended) interpretation is, of course, that being a Kaiser Permanente member provides access to better care, causing 33% lower chance of death due to heart disease.
Now, I’d expect most people reading this to immediately think something like “selection effects!”—i.e. what the billboard really tells us is that Kaiser Permanente has managed to select healthier-than-typical members. And indeed, that was my immediate thought.
… but then I noticed that the “selection effects” interpretation is also a trap for the unwary. After all, this is a number on a billboard. Number. Billboard. The overriding rule for numbers on billboards is that they are bullshit. The literal semantics of “Kaiser Permanente members are 33% less likely to experience premature death due to heart disease” just don’t have all that much to do at all with the rate at which various people die of heart disease.
What it does tell us is that someone at Kaiser Permanente thought it would be advantageous to claim, to people seeing this billboard, that Kaiser Permanente membership reduces death from heart disease by 33%.
… and that raises a very different set of questions! Who, exactly, is this billboard advertising to? The phrase “for all that is you” suggests that it’s advertising to prospective members, as opposed to e.g. doctors or hospital admins or politicians or investors or Kaiser’s own employees. (There is a skyscraper full of Kaiser’s employees within view of this billboard.) Which would suggest that somebody at Kaiser thinks consumers make a nontrivial choice between Kaiser and alternatives sometimes, and that there’s value to be had in influencing that choice.
… though perhaps that thinking is also a trap, and in fact the sign is just a result of corporate stupidity. I don’t know.
the actual trap is that it caught your attention, you posted about it online and now more people know and think about Kaiser Permanente than before and according to whoever was in charge of making this billboard, that’s a success metric they can leverage for a promotion.
Is that what is does tell us? The sign doesn’t make the claim you suggest—it doesn’t claim it’s reducing the deaths from heart disease, it states it’s 33% less likely to be “premature”—which is probably a weaselly term here. But it clearly is not making any claims about reducing deaths from heart disease.
You seem to be projecting the conclusion that the claim/expected interpretation is that membership reduces the deaths by 33%. But I don’t know how you’re concluding that the marketing team thought that would be the general interpretation by those seeing the sign.
While I would not be incline to take an billboard ad at face value, a more reasonable take seems to me that claiming that even with heard disease KP’s members are less likely to die earlier than expect that other with other healthcare providers. That may be a provable and true claim or it might be more “puffing” and everyone will play with just how “premature” is going to be measured.
Whether or not it’s corporate stupidity, I think that might be a separate question but understanding exactly what results such an ad is supposed to be producing will matter a lot here. Plus, there is the old adage about no one every going bankrupt underestimating the intelligence of the American consumer—and I suspect that might go double in the case of medical/healthcare consumption.
“Kaiser Permanente members are younger and healthier, and thus consume fewer healthcare resources on average, which allows us to pass the savings on to you.”
This tells me you don’t know anything about LW-rationality or are being deliberately uncharitable to it.
You’re mostly making blanket broad claims, maybe make a top level post which is charitable to the entire project. Go in depth post by post on where you think people have gone wrong, and in what way. High effort posting is appreciated.
An interesting conundrum: one of the main challenges of designing useful regulation for AI is that we don’t have any cheap and robust way to distinguish a dangerous neural net from a non-dangerous net (or, more generally, a dangerous program from a non-dangerous program). This is an area where technical research could, in principle, help a lot.
The problem is, if there were some robust metric for how dangerous a net is, and that metric were widely known and recognized (as it would probably need to be in order to be used for regulatory purposes), then someone would probably train a net to maximize that metric directly.
This seems to lead to the solution of trying to make your metric one-way, in the sense that your metric should
Provide an upper-bound on the dangerousness of your network
Compress the space of networks which map to approximately the same dangerousness level on the low end of dangerousness, and expand the space of networks which map to approximately the same dangerousness level on the upper end of dangerous, so that you can train your network to minimize the metric, but when you train your network to maximize the metric you end up in a degenerate are with technically very high measured danger levels but in actuality very low levels of dangerousness.
We can hope (or possibly prove) that as you optimize upwards on the metric you get subject to goodheart’s curse, but the opposite occurs on the lower end.
Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn’t even need to train a model to maximize it. You’d be able to just lift the design from the metric directly.
Do you have any thoughts on a softer version of this problem, where the metric can’t be maximized directly, but gives a concrete idea of what sort of challenge your AI needs to beat to qualify as AGI? (And therefore in which direction in the architectural-design-space you should be moving.)
Some variation on this seems like it might work as a “fire alarm” test set, but as you point out, inasmuch as it’s recognized, it’ll be misapplied for benchmarking instead.
(I suppose the ideal way to do it would be to hand it off to e. g. ARC, so they can use it if OpenAI invites them for safety-testing again. This way, SOTA models still get tested, but the actors who might misuse it aren’t aware of the testing’s particulars until they succeed anyway...)
I just went looking for a good reference for the Kelly criterion, and didn’t find any on Lesswrong. So, for anybody who’s looking: chapter 6 of Thomas & Cover’s textbook on information theory is the best source I currently know of.
Might be a good thing to add to the Kelly Criterion tag
Neat problem of the week: we have n discrete random variables, X1...Xn. Given any variable, all variables are independent:
∀i:P[X|Xi]=∏jP[Xj|Xi]
Characterize the distributions which satisfy this requirement.
This problem came up while working on the theorem in this post, and (separately) in the ideas behind this post. Note that those posts may contain some spoilers for the problem, though frankly my own proofs on this one just aren’t very good.