any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
only after you have ruled out bugs do you get to actually think about how to fit your theory to the data, and even then, there might still be a hidden bug.
most papers are terrible and don’t replicate.
most techniques that sound intuitively plausible don’t work.
most techniques only look good if you don’t pick a strong enough baseline.
an actually good idea can take many tries before it works.
once you have good research intuitions, the most productive state to be in is to literally not think about what will go into the paper and just do experiments that satisfy your curiosity and. convince yourself that the thing is true. once you have that, running the final sweeps is really easy
most people have no intuition whatsoever about their hardware and so will write code that is horribly inefficient. even learning a little bit about hardware fundamentals so you don’t do anything obviously dumb is super valuable
in a long and complex enough project, you will almost certainly have a bug that invalidates weeks (or months) of work. being really careful and testing helps but slows down velocity a lot. unclear what the right equilibrium is.
feedback loop time is incredibly important, if you can get rapid feedback, you will make so much more progress.
implementing something that is already known to work is always vastly easier than inventing/researching something new.
you will inevitably spend a lot of time doing things that have no impact on the final published work whatsoever. like not even contributing that much useful intuition. this is unfortunate but unavoidable
oftentimes you will spend a lot of time being fundamentally philosophically confused about what to do, and only really figure out halfway through the project. this is normal.
direction is really important. most well executed research is still useless because it was the wrong direction.
research impact is super super long tailed. i think it’s really not worth doing research if you aren’t aiming for the long tail. if you’re early career, you should probably focus on doing things that enable you to aim at the long tail eventually, instead of trying to have lots of impact early on (for example, probablly better to do something you feel motivated by and learn a lot from than something that is “maximally important” but which you don’t have the skills to execute adequately on yet)
I agree and this is why research grant proposals often feel very fake to me. I generally just write up my current best idea / plan for what research to do, but I don’t expect it actually pan out that way and it would be silly to try to stick rigidly to a plan.
Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
Using the wrong loss function
Using the right loss function but with the wrong reduction
Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
I will add: 80% likely to be a bug, or a result from random-matrix theory.
i recently ran into to a vegan advocate tabling in a public space, and spoke briefly to them for the explicit purpose of better understanding what it feels like to be the target of advocacy on something i feel moderately sympathetic towards but not fully bought in on. (i find this kind of thing very valuable for noticing flaws in myself and improving; it’s much harder to be perceptive of one’s own actions otherwise). the part where i am genuinely quite plausibly persuadable of his position in theory is important; i think if i had talked to e.g flat earthers one might say my reaction is just because i’d already decided not to be persuaded. several interesting things i noticed (none of which should be surprising or novel, especially for someone less autistic than me, but as they say, intellectually knowing things is not the same as actual experience):
this guy certainly knew more about e.g health impacts of veganism than i did, and i would not have been able to hold my own in an actual debate.
in particular, it’s really easy for actually-good-in-practice heuristics to come out as logical fallacies, especially when arguing with someone much more familiar with the object level details than you are.
interestingly, since i was pushing the conversation in a pretty meta direction, he actually explicitly said something to the effect that he’s had thousands of conversations like this and has a response to basically every argument i could make, do i really think i have something he hasn’t heard before, etc. in that moment i realized this was probably true, and that this nonetheless did not necessarily mean that he was correct in his claim. and in addition it certainly didn’t make me feel any more emotionally willing to accept his argument
in the past, i’ve personally had the exact experience of arguing for something where i had enough of a dialogue tree that other people couldn’t easily find any holes, where the other people were unconvinced, and felt really confused why people weren’t seeing the very straightforward argument, and then later it turned out i was actually just wrong and the other people were applying correct heuristics
my guess is at the extreme, with sufficient prep and motivation, you can get in this position for arbitrarily wrong beliefs. like probably if i talked to flat earthers for a while i’d get deep enough in their dialogue tree that i’d stop being able to refute them on the object level and would (for the purposes of my own epistemics, not to convince an external audience) have to appeal to cognitive heuristics that are isomorphic to some cognitive fallacies.
of course we shouldn’t always appeal to the cognitive heuristics. doing so is almost always reasonable and yet you will miss out on the one thing that actually does matter. to do anything interesting you do have to eventually dig into some particular spicy claims and truly resolve things at the object level. but there are so many things in the world and resolving them takes so much time that you need some heuristics to reject a whole bunch of things out of hand and focus your energy on the things that matter.
like, i could invest energy until i can actually refute flat earthers completely on the object level, and i’d almost certainly succeed. but this would be a huge waste of time. on the other hand, i could also just never look into anything and say “nothing ever happens”. but every important thing to ever happen did, in fact, happen at some point [citation needed].
it’s really really irritating to be cut off mid sentence. this is hard to admit because i also have an unconscious tendency to do this (currently working on fixing this) and my guess is other people get very annoyed when i do this to them.
sometimes i do enjoy being cut off in conversations, but on reflection this is only when i feel like (a) the conversation is cooperative enough that i feel like we’re trying to discover the truth together, (b) the other person actually understands what i’m saying before i finish saying it. but since these conditions are much rarer and requires high levels of social awareness to detect, it’s a good first order heuristic that interrupting people is bad.
i found it completely unhelpful to be told that he was also in my shoes X years ago with similar uncertainties when he was deciding to become vegan; or to be told that he had successfully convinced Y other people to become vegan; or to be subject to what i want to call “therapy speak”. i only want to therapyspeak with people i feel relatively close to, and otherwise it comes off as very patronizing.
i think there’s a closely related thing, which is genuine curiosity about people’s views. it uses similar phrases like “what makes you believe that?” but has a very different tone and vibe.
his achievements mean a lot more to himself than to me. i don’t really care that much what he’s accomplished for the purposes of deciding whether his argument is correct. any credibility points conferred are more than cancelled out by it being kind of annoying. even if it is true, there’s nothing more annoying than hearing say “i’ve thought about this more than you / accomplished more than you have because of my phd/experience/etc so you should listen to me” unless you really really really trust this person
the calculus changes when there is an audience.
therapyspeak is still probably better than nothing, and can be a useful stepping stone for the socially incompetent
one possible take is that i’m just really weird and these modes of interaction work well for normal people more because they’re less independently thinking or need to be argued out of having poorly thought out bad takes or something like that, idk. i can’t rule this out but my guess is normal people probably are even more this than i am. also, for the purposes of analogy to the AI safety movement, presumably we want to select for people who are independent thinkers who have especially well thought out takes more than just normal people.
also my guess is this particular interaction was probably extremely out of distribution from the perspective of those tabling. my guess is activists generally have a pretty polished pitch for most common situations which includes a bunch of concrete ways of talking they’ve empirically found to cause people to engage, learned through years of RL against a general audience, but the polishedness of this pitch doesn’t generalize out of distribution when poked at in weird ways. my interlocutor even noted at some point that his conversations when tabling generally don’t go the way ours went.
like, i could invest energy until i can actually refute flat earthers completely on the object level, and I’d almost certainly succeed. but this would be a huge waste of time.
I don’t think it would be that hard to refute flat earthers. One or two facts about how the sun travels, that the atmosphere bends light, and the fact that there are commercial flights crossing the poles seem like they would be sufficient to me. This probably won’t convince a flat earther, but I think you could fairly easily convince 95% of smart unbiased 3ed listeners (not that they exist).
You don’t have to go down every option in their argument tree, finding one argument they are completely unable to refute can be enough.
you mentioned sometimes people are just wrong in their arguments but think they are correct because they’ve repeated it many times. do you have examples of this from what they said?
I claim that even if the openai contract is not meaningfully weaker safety wise, it is still bad for openai to publicly signal solidarity with ant but then sign with DoW.
suppose hypothetically the only difference between the openai and anthropic contracts is that the DoW wanted a snicker bar, and anthropic didn’t want to give DoW the snickers bar. even then, it would be a huge dick move for openai to publicly signal solidarity, and then sign with DoW to give them the snickers bar.
OAI genuinely outplayed Anthropic here. The critical success world for OAI would be if OAI gets good PR from “solidarity”, replaces Ant under the ~same terms, and there is enough uncertainty of Anthropic being a supply chain risk that eg Amazon stops providing them compute, basically killing the company.
Most of this is still on the table, because Anthropic was too concerned about appearing principled and was exploited by DoW and Altman.
I can see the story where there was a strong opportunity for a competitor here, and OpenAI successfully seized it (perhaps Google DeepMind could have as well, but I don’t view them as nimble as OpenAI). I don’t see a story where Anthropic had a clear alternative play that was much better, especially once the USG threatened with labeling them a supply-chain risk.
Anthropic could have negotiated before USG publicly threatened to label them a supply chain risk. My guess is they were mainly limited by the erosion of their own morals and by Anthropic staff quitting, and they could have acquiesced with diplomatic language even within those limitations, maybe even after the crisis went public. Claude is only getting better, so the default path is building trust with the government. They could probably have found a better stand to take later, when they have more power.
Perhaps. I think writing things into contracts is a great way to make sure that they happen, and if the counterparty is unwilling to sign them into contracts, then this is a strong sign that you won’t be able to make it happen later. It would have significantly increased the adversarial relationship between Anthropic and the USG for them to politely remove it from the contract and then work hard internally to make sure that it never got used that way. Maybe it would’ve been worth it, but I’m not convinced.
Oh I don’t think they could have prevented USG from using Claude for mass domestic surveillance. Autonomous weapons maybe since it’s a reliability issue the military would agree with. They would need to sacrifice their principles in order to get Claude more integrated into the government, which could be good or bad for us but would have been in Anthropic’s interest.
I have a little stored thought which sometimes triggers, and it reads:
“If you find yourself being forced to choose between two or more extremely bad options that involve burning your values, your resources, or your life, the truth is that you lost around three moves ago and are living out the equivalent of a forced mate in chess. You’ve already lost, so stop playing and find a better game to spend time on if at all possible.”
Anthropic negotiated a great deal and gave up the practical limits that are relevant to the military about using their models for cyber attacks, censorship and disinformation campaigns in the process.
I think it’s unclear that this was overall bad for Anthropic/Amodei if you factor in the reputational and ideological boost they got (“aura farming” according to roon).
I’m relatively less interested in a competitive framing between OpenAI and Anthropic to see i.e. “who played it better”. First, that framing suggests there was just one game being played. It seems to be necessary to view it as a progression of different games.
To a first approximation, my guess is by the time this popped into the public spotlight, the die was largely cast (so to speak). It was, more or less, a strategy by Hegseth to put Anthropic in an impossible bind.
Second, that kind of framing feels too much like so many news stories I read that try to fasten sports metaphors onto real world events to make juicy narratives. This isn’t a very good “reason” I admit, but it sort of explains why my alarm bells started ringing on that frame.
Personally, I first want to learn about what happened and when. After that, maybe I would try to analyze and learn lessons.
theory: a huge part of having a good social life is just taking social bids whenever they become available. examples of social bids both large and small include: deciding whether to join your friends on a roadtrip; getting to know someone you just met; getting to better know someone you bump into occasionally but usually never talk to; standing in line, seeing something amusing, and having the option to point this out to another stranger in line; saying something funny in a group conversation; following up over text with someone after meeting them; flirting; cold emailing someone on the internet; catching up with a friend.
there are a variety of reasons why we might end up not taking social bids. if you don’t have the social ability to notice opportunities to take bids, you might miss bids that you could take. if you force yourself to take bids without the requisite social ability, and end up taking bids which you incorrectly believe to exist, you might act in ways that people find weird, and burn potential connections, or intrude on people. if you are really tired or low-bandwidth or depressed or stressed, you will not want to take bids, because taking bids requires quite a lot of activation energy investment. if you are really focused on something else, like a research project, you will not take bids, because they distract your focus.
i find that when i make more successful social bids, i feel a greater sense of community and connection, and often feel happier too; and conversely, when i don’t, i feel isolated. my most common reason for not making more social bids is often just being super tired or busy.
also, i feel like people around you generally mirror the kind of energy you bring to social bids; if you are super enthusiastic and friendly about taking social bids, then other people will also treat your social bids similarly. if you are having an amazing time meeting lots of new cool people, then other people will be more excited to meet you and willing to help you on your quest; conversely, if you are miserable and merely exist to complete your requisite daily activities, other people will also be hesitant to include you in their groups, or invite you to things, or spend lots of time with you. this can create a self reinforcing feedback loop.
Good post. On a detail I’d use the word ‘opportunities’ rather than ‘bids’, which sounds like ‘offers’ - whereas in various of these examples you’re not being explicitly offered a social opportunity by someone. But the situation contains an opportunity.
Do you struggle with feelings of isolation? I do sometimes, and I try to fix that by taking more social bids and proactively seeking social life. And then I immediately pull out because I get overwhelmed by social life very easily and it kinda colonizes my thought processes too much. So I’m kind of stuck in that loop of seeking more of it and then pulling out and then seeking more of it...
There’s some old greek who had a parable about hedgehogs in the cold, that shuffle closer and closer for warmth until they sting each other and shuffle apart again. I always thought that applies pretty well.
Hmm, part of the reason I asked is that the reasoning in your comment is the kind of cognitive process that tends to exhaust me when I have to work through it. It somehow coincides with me being more neurotic overall. So, basically, you think all that explicit stuff about social life, and you don’t feel at least a little pang of psychological pain/exhaustion? The very starting phrase (“a huge part of...”) reads like my thoughts when I’m ruminating about this stuff.
Sorry if this is a little intrusive, I’m just kind of curious, other than fishing for insights from people who might have similar thought-patterns.
I’m sufficiently extroverted that if the social interaction goes well, it gives me more than enough psychological energy to pay for multiple additional social bids. obviously, this is separate from physiological energy; if I’m sleep deprived and physically exhausted, this is insufficient. but I don’t generally get that physically exhausted from social interaction, unless I’m at neurips or something.
This sounds to me like a moderately positive score on the desire dimension (1st) and negative score on the capability dimension (2nd) of extroversion here.
I had such scores and my social situation is much better after I put some conscious effort towards resolving my social anxiety. (positive first, negative second is the only problematic pairing.)
This was a fun quiz. I’d not come across it before!
(I ended up with high scores on both indices, but higher on the second subscore (leadership and confidence) which surprised me a little. Overall I scored 249, which Claude reckons based on the chart they provide is about +1.5 SD among extraverts.)
I agree, taking risks and generally being a ‘yes man’ is much more likely going to result in positive outcomes compared to taking no action.
But I do wonder, on average, are people incentivised to seek connection to satisfy their actual personal needs and circumstances, and how much is possibly from a culture that prescribes an ‘instagram’ lifestyle and a huge friendship network as a goal to work towards?
For me, I find that shared interests are the automatic icebreaker that circumvents the awkward/social convention and risk elements and that finding a group that does/discusses what I am already interested in makes the whole thing feel effortless/natural and fulfilling.
There is also an efficiency component to the energy expenditure of making social bids. Not only it can take someone more energy to make the same (perceived) bid, but the same bid can have different effects based on social skillfullness and a bunch of other things. This can be seen as a parameter in the dynamic equilibrium that can produce the feedback loop you mention.
it’s surprising just how much of cutting edge research (at least in ML) is dealing with really annoying and stupid bottlenecks. pesky details that seem like they shouldn’t need attention. tools that in a good and just world would simply not break all the time.
i used to assume this was merely because i was inexperienced, and that surely eventually you learn to fix all the stupid problems, and then afterwards you can just spend all your time doing actual real research without constantly needing to context switch to fix stupid things.
however, i’ve started to think that as long as you’re pushing yourself to do novel, cutting edge research (as opposed to carving out a niche and churning out formulaic papers), you will always spend most of your time fixing random stupid things. as you get more experienced, you get bigger things done faster, but the amount of stupidity is conserved. as they say in running- it doesn’t get easier, you just get faster.
as a beginner, you might spend a large part of your research time trying to install CUDA or fighting with python threading. as an experienced researcher, you might spend that time instead diving deep into some complicated distributed training code to fix a deadlock or debugging where some numerical issue is causing a NaN halfway through training.
i think this is important to recognize because you’re much more likely to resolve these issues if you approach them with the right mindset. when you think of something as a core part of your job, you’re more likely to engage your problem solving skills fully to try and find a resolution. on the other hand, if something feels like a brief intrusion into your job, you’re more likely to just hit it with a wrench until the problem goes away so you can actually focus on your job.
in ML research the hit it with a wrench strategy is the classic “google the error message and then run whatever command comes up” loop. to be clear, this is not a bad strategy when deployed properly—this is often the best first thing to try when something breaks, because you don’t have to do a big context switch and lose focus on whatever you were doing before. but it’s easy to end up trapped in this loop for too long. at some point you should switch modes to actively understanding and debugging the code, which is easier to do if you think of your job as mostly being about actively understanding and debugging code.
earlier in my research career i would feel terrible about having spent so much time doing things that were not the “actual” research, which would make me even more likely to just hit things with a wrench, which actually did make me less effective overall. i think shifting my mindset since then has helped me a lot
Not only is this true in AI research, it’s true in all science and engineering research. You’re always up against the edge of technology, or it’s not research. And at the edge, you have to use lots of stuff just behind the edge. And one characteristic of stuff just behind the edge is that it doesn’t work without fiddling. And you have to build lots of tools that have little original content, but are needed to manipulate the thing you’re trying to build.
After decades of experience, I would say: any sensible researcher spends a substantial fraction of time trying to get stuff to work, or building prerequisites.
This is for engineering and science research. Maybe you’re doing mathematical or philosophical research; I don’t know what those are like.
a corollary is i think even once AI can automate the “google for the error and whack it until it works” loop, this is probably still quite far off from being able to fully automate frontier ML research, though it certainly will make research more pleasant
I agree if I specify ‘quite far off in ability-space’, while acknowledging that I think this may not be ‘quite far off in clock-time’. Sometimes the difference between no skill at a task and very little skill is a larger time and effort gap than the difference between very little skill and substantial skill.
Completely agree. I remember a big shift in my performance when I went from “I’m just using programming so that I can eventually build a startup, where I’ll eventually code much less” to “I am a programmer, and I am trying to become exceptional at it.” The shift in mindset was super helpful.
More and more, I’m coming to the belief that one big flaw of basically everyone in general is not realizing how much you needed to deal with annoying and pesky/stupid details to do good research, and I believe some of this dictum also applies to alignment research as well.
There is thankfully more engineering/ML experience in LW which alleviates the issue partially, but still, not realizing that pesky details mattering a lot in research/engineering is a problem that basically no one wants to particularly deal with.
I think there are several reasons this division of labor is very minimal, at least in some places.
You need way more of the ML engineering / fixing stuff skill than ML research. Like, vastly more. There are still a very small handful of people who specialize full time in thinking about research, but they are very few and often very senior. This is partly an artifact of modern ML putting way more emphasis on scale than academia.
Communicating things between people is hard. It’s actually really hard to convey all the context needed to do a task. If someone is good enough to just be told what to do without too much hassle, they’re likely good enough to mostly figure out what to work on themselves.
Convincing people to be excited about your idea is even harder. Everyone has their own pet idea, and you are the first engineer on any idea you have. If you’re not a good engineer, you have a bit of a catch-22: you need promising results to get good engineers excited, but you need engineers to get results. I’ve heard of even very senior researchers finding it hard to get people to work on their ideas, so they just do it themselves.
For sure. The more novel an idea I am trying to test, the deeper I have to go into the lower level programming stuff. I can’t rely on convenient high-level abstractions if my needs are cutting across existing abstractions.
Indeed, I take it as a bad sign of the originality of my idea if it’s too easy to implement in an existing high-level library, or if an LLM can code it up correctly with low-effort prompting.
the following fictional dialogue is a complete unapologetic strawman but it’s funny enough i had to bring it into being:
“So I asked myself: where can I make the most impact? And clearly malaria is the most important area.”
“And so you decided to donate all of your money to buy malaria nets?”
“Well, so it turns out that saving lives from malaria is actually kind of expensive and indirect. You see, it costs thousands of dollars to save a life. Statistically. Who knows if you’re actually changing anyone’s life that way?”
“And so you found a more efficient way to save lives.”
“Actually, it turns out that it’s cheaper to give people malaria. It’s a lot more impactful and the technical problems are more interesting.”
“I see. Isn’t more malaria bad though?”
“I don’t know, but I find it much easier to work on because the feedback loops are much tighter. Maybe one day, if malaria gets big enough, I’ll go work on saving people from malaria. But we’re still a long way away from everyone having malaria.”
You jest, but I hold this class of strategy to be perfectly sound in principle and grossly underexplored in practice. I call it “countersteering for the cure,” and the specific example I had in mind was contagious Fatal Familial Insomnia. The idea is that if you wanted some rare/orphan disease cured for personal reasons, but lacked the resources to achieve that directly, you might be able to go in the opposite direction, and make it much more common instead, enough to trigger several orders of magnitude more funding/talent towards the cure you want.
If you did that in the US, you’d end up trying to explain to prosecutors, then a judge, then a jury why you shouldn’t be imprisoned for the rest of your life. I would hope that is roughly how it would go everywhere.
The hypothetical personal reasons would have to be compelling, yes, of course. This would not be done on a whim. You will note the disease I chose as my example was heritable – even if you hadn’t heard of it before, the name should have made that clear – which lends itself readily to obvious motivations.
Fatal Familial Insomnia that has been modified to be contagious. I seem to have misjudged how well-known prion diseases were. Would it help to consider an analogy to the phrase, “airborne rabies”?
He’s implying that somebody might be motivated to do bioterrorism if not only they had the disease, but also their kids have it (since it is heritable). So, I guess they would go to jail but their kids could possibly still benefit.
But I don’t understand what Shankar is otherwise saying, since this is only a strategy that can work if you’re a selfish person willing to do terrorism and massively harm the world in order to maybe somewhat advance your own interests. I can’t see something like this working for real altruistic causes—sure, in Shankar’s example you direct resources towards curing some rare disease, but this also means steering resources away from every other important problem, while also killing / harming many people, etc. It’s hard to see how the idea of doing terrorism / taking hostages / robbing people / doing nuclear sabre-rattling / etc ways of forcing people to give you resources by causing (or threatening to cause) massive net harm to the world, is somehow “grossly underexplored in practice”, since this is basically the strategy behind all threats and conflict throughout history. It is pretty thoroughly explored, lol, regardless of the fact that a clever person can still brainstorm up some clever/unique/infeasible imaginary terror attacks that nobody has done IRL.
“to the success of our hopeless cause” is such a good toast and we should use it more often. i first learned of it from the book of the same name, and apparently it was a common refrain at gatherings of Soviet dissidents. i like it because it captures the feeling of trying really hard to succeed despite being in the basement of the logistic success curve, and somehow, despite all odds, actually succeeding in the end.
I do find it poetic, but in seriousness I think if folks don’t actually feel hopeful about what they’re doing then they should do something else—leave the work / research direction / engineering / comms / whatnot to whoever actually feels hope about it...
To elaborate, the thing that’s poetic for me about “our hopeless cause” is because I have hope that is not cleanly legible to the outside, easy to write off as “hopeless”. And it’s important to stay in tune with your own knowings about this stuff. I think there are very deleterious effects from throwing energy into things one doesn’t have hope in.
(...And to elaborate further, mostly I think the bad stuff happens by lending support to corrupt things. And imo being pushed to work on X while you lack hope in X is a solid flag of corruption.)
...what sort of “coordination problems” does one “solve” by doing things you don’t have hope in? I really don’t get it and am perplexed. This photo is swellingly full of hope, and presumably we got there through people that had hope in their actions. Perhaps there’s detail in the history you’re referencing that’s going over my head.
the problem was that everyone hated living in the Soviet Union and other eastern Bloc countries, but few people were willing to stand up and protest, because doing so meant a knock on your door by men with guns who would take you away to a Siberian prison or mental institution.
the thing with protests is they are a coordination problem. to loosely paraphrase one of the dissidents from this era, if one person protests he becomes a martyr. if ten people protest they become a conspiracy. if ten thousand people protest the system has to change.
he problem is you have no way of knowing when the right moment is. under Stalin, dissent was impossible. everyone even suspected of being disloyal was instantly executed or thrown in a gulag.
after he died, Khrushchev denounced Stalin’s methods and instituted reforms, and dissent meant “only” being interrogated by the KGB, put on trial in a rigged but no longer completely farcical show trial, and sent to Siberia for only 10 years rather than being executed. this was enough easing up that the “chain reaction” started happening—people would protest, be arrested, someone would go secretly write a transcript of the trial and publish it, people would be outraged at the riggedness or the arrests of the people disseminating the transcripts, and go protest, etc. unfortunately, this never took off enough to cause mass protests, but there was no way of knowing without many people sacrificing themselves for the cause. these early dissidents also laid a lot of groundwork by creating underground networks to disseminate info (samizdat).
under Gorbachev, he loosened restrictions even further under glasnost and perestroika, because the economy of the Soviet Union was not doing well. as an unintended side effect, this pushed the R0 of protests past 1, and suddenly we started seeing huge protests in the eastern Bloc. and then once there are tens of thousands of people marching in the streets and the secret police can’t arrest them all, the system of terror stops working entirely. by the time the Berlin wall fell, the East German government had already ceased to function for months because nobody was afraid of their terror anymore. in theory this could have happened much earlier if everyone simply coordinated, bit everyone will not simply.
This helps me appreciate the mood of where you are coming from thanks! But uh I have objections also, mostly due to our spot in the thread.
I would second CronoDas’ point that the mechanics of change aren’t quite that simple. And I’d like to complain that this is not an example of a thing that is helped by people taking actions they don’t feel hope in!
I acknowledge than the secret police setup seems like it does well at bringing in the “you can’t communicate and build plans together” aspect that “coordination problem”/game-theory seems to typically evoke, I’ll note though that you still have a lot of communication/observation channels (including costly ones like protesting and being taken away or killed).
More importantly it seems like the robust way out of the situation is to try to build more infrastructure for being able to act with a coalition of peers in a constructive manner. Game theory as typically thrown around seems a poor model for this imo.
funny enough, at least one dissident at the time expressed that he didn’t like this toast because he wouldn’t be trying to dissent if he thought it was hopeless
I think some can have the feelings that go with hope in a cause, without actually believing the cause is likely to succeed. Cultural memes around fighting for hopeless causes (e.g. when heros go for hail mary strategies in movies) help.
It still matters whether you truly think it’s the best shot at victory, or the best way for you to help. That’s what I see as key to preventing the various problems that you mention.
I suppose it depends on what you mean by hope. Is a person who thinks there’s a 50% chance of their project failing, but considers that better than all the alternatives, not hopeful? Or 10%?
What I worry about is, what if the people who make such a seemingly hopeless play are actually right in their worldview, and the people who have higher hopes in their play are wrong? Then a rule that dissuades hopeless people from acting lowers the overall chance of success, and that would be bad.
running the agi survey really reminded me just how brutal statistical significance is, and how unreliable anecdotes are. even setting aside sampling bias of anecdotes, the sheer sample size you need to answer a question like “do more people this year know what agi is than last year” is kind of depressing—you need like 400 samples for each year just to be 80% sure you’d notice a 10 percentage point increase even if it did exist, and even if there was no real effect you’d still think there was one 5% of the time. this makes me a lot more bearish on vibes in general.
thank you for this post. “bearish on vibes” is a great phrase. i am constantly hung up on the fact that it’s not really possible to “know what normal people are like”, “know what people are like generally”, “know what the world is actually like”, without significant amounts of effort.
i think this background fact taints like… most discussion of social and ethical issues.
Why does this make you more bearish on vibes? The reason I ask is that I think of “vibes” as aggregating over a much wider (but siloed) social network and a lot more sources of information. It would be interesting to know about to what extent rigorous high-n survey methods would reveal discrepancies between assumptions and reality about people’s perceptions in this and other areas to do with pressing social issues.
like, suppose i anecdotally noticed a few people last year be visibly confused when i said the phrase AGI in normal conversation last year, and then this year i noticed that many fewer people were visibly confused by AGI. then, this would tell me almost nothing about whether name-recognition of AGI increased or decreased; at n=10, it is nearly impossible to say anything whatsoever.
What’s your reasoning or assumptions for why it would tell you ~nothing to witness name recognition increasing like that? I’m assuming it’s not just because “visible confusion” isn’t a perfect proxy for lack of name recognition (and vice versa).
I guess I’m more bullish on vibes being a more powerful way to gauge name recognition than you seem to be. So here is a toy model to explain why. ChatGPT was released in Nov. 2022, so prior to that we can say it had approximately zero name recognition.
I’ve never read a survey on ChatGPT name recognition. I have only my anecdotal evidence to go on. But I am extremely confident based on what I think can fairly be called “vibes” that ChatGPT has massive name recognition, at least in America, compared to November 2022. If there was a reliable way to test this, I’d be willing to wager big money on it, provided I could feel confident in my ability to truly pin a number on what I mean by “massive name recognition.” Do you agree with this model but feel bearish on vibes more generally? Are you skeptical of my perception that ChatGPT’s name recognition has exploded since its release?
sure, you can notice extremely large effect sizes through vibes. but the claim is that for even “smaller” effect sizes (like, tens of percentage points, e.g 50->75%), you need pretty big sample sizes. obviously 0->100% doesn’t need a very large sample size.
I agree that chatgpt obviously has lots of name recognition but I do also separately think chatgpt has less name recognition than you might guess. I predict that only 85% of Americans would get a multiple choice question right about what kind of app chatgpt is (choices: artificial intelligence; social media; messaging and calling; online dating). whereas a control question about e.g Google will get like 97% or whatever the lizardman constant dictates
Reasonable, I also don’t expect that I could pick up on a 1.5x increase in name recognition over a year based on vibes—didn’t read closely enough to notice you were talking about a 10% increase, so sorry about the time waste.
in research, if you settle into a particular niche you can churn out papers much faster, because you can develop a very streamlined process for that particular kind of paper. you have the advantage of already working baseline code, context on the field, and a knowledge of the easiest way to get enough results to have an acceptable paper.
while these efficiency benefits of staying in a certain niche are certainly real, I think a lot of people end up in this position because of academic incentives—if your career depends on publishing lots of papers, then a recipe to get lots of easy papers with low risk is great. it’s also great for the careers of your students, because if you hand down your streamlined process, then they can get a phd faster and more reliably.
however, I claim that this also reduces scientific value, and especially the probability of a really big breakthrough. big scientific advances require people to do risky bets that might not work out, and often the work doesn’t look quite like anything anyone has done before.
as you get closer to the frontier of things that have ever been done, the road gets tougher and tougher. you end up spending more time building basic infrastructure. you explore lots of dead ends and spend lots of time pivoting to new directions that seem more promising. you genuinely don’t know when you’ll have the result that you’ll build your paper on top of.
so for people who are not beholden as strongly to academic incentives, it might make sense to think carefully about the tradeoff between efficiency and exploration.
(not sure I 100% endorse this, but it is a hypothesis worth considering)
I think this is true, and I also think that this is an even stronger effect in wetlab fields where there is lock-in to particular tools, supplies, and methods.
This is part of my argument for why there appears to be an “innovation overhang” of underexplored regions of concept space. And, in the case of programming dependent disciplines, I expect AI coding assistance to start to eat away at the underexplored ideas, and for full AI researchers to burn through the space of implied hypotheses very fast indeed. I expect this to result in a big surge of progress once we pass that capability threshold.
Or perhaps on the flip side there is a ‘super genius underhang’ where there are insufficient numbers of super competent people to do that work. (Or willing to bet on their future selves being super competent.)
It makes sense for the above average, but not that much above average, researcher to choose to focus on their narrow niche, since their relative prospects are either worse or not evaluable after wading into the large ocean of possibilities.
I agree that academia over rewards long-term specialization. On the other hand, it is compatible to also think, as I do, that EA under-rates specialization. At a community level, accumulating generalists has fast diminishing marginal returns compared to having easy access to specialists with hard-to-acquire skillsets.
This is one of the reasons I think ‘independent’ research is valuable, even if it isn’t immediately obvious from a research output (papers, for example) standpoint.
That said, I’ve definitely had the thought, “I should niche down into a specific area where there is already a bunch of infrastructure I can leverage and churn out papers with many collaborators because I expect to be in a more stable funding situation as an independent researcher. It would also make it much easier to pivot into a role at an organization if I want to or necessary. It would definitely be a much more stable situation for me.”(And I also agree that specialization is often underrated.)
Ultimately, I decided not to do this because I felt like there were already enough people in alignment/governance who would take the above option due to financial and social incentives and published directions seeming more promising. However, since this makes me produce less output, I hope this is something grantmakers keep in consideration for my future grant applications.
the modern world has many flaws, but I’m still deeply grateful for the modern era of unprecedented peace, prosperity, and freedom in the developed world. 99% of people reading these words have never had to worry about dying in a cholera epidemic, or malaria or smallpox or the plague, or childbirth, or in war, or from a famine, or due to a political purge. this is not true for other times in history, or other places in the world today.
(extremely unoriginal thought, but still important to acknowledge periodically because it’s easy to take for granted. especially because it’s much more common to complain about ways the world is broken than to acknowledge what has improved over time.)
There might already be a selection bias for people who read lesswrong. For people who are in war, having some malaria, and etc—they may not get the chance to access internet, nor have time to gain interest in lesswrong. My point is while we are grateful, we probably would want to reflect on our own privileges on access to peace (not sure unprecedented), and recognize there are still a lot of work needed to be done for people outside our circles. Additionally, sometimes bad things could happen all in a sudden—maybe the country you are in suddenly is war zone, maybe you got some accidents/mental health issues, or maybe the worry about dying is just a healthcare disaster away. The health risk, especially, increases with age.
And also I think it’s less than 99% anyway, ″worry about dying in a cholera epidemic, or malaria or smallpox or the plague, or childbirth, or in war, or from a famine, or due to a political purge” is very broad. Like, surely some of the people here were at some risk during covid epidemic? Some people are male, from countries with drafts which are currently participating or at tangible risk of wars. Some people are from countries with unstable governments and done some activism work before. Like, 75% is my guess.
(My response to you is also unoriginal but worth stating imo.)
I would prefer if you used the phrase “US geopolitical sphere of influence” instead of “developed world”. It makes it clear your take is political.
Leaders within the US govt have obviously contributed to multiple wars and genocides, you just happen to be born into a family that is not on the receiving end of any of them. Part of the reason (but not the full reason) for the economic prosperity is crude oil deals made by the US govt under threat of nuclear war.
Statements such as yours give leaders within the US govt implicit consent to continue this sort of rule over the world.
Global utility includes the above two things (first two tiers of Maslow’s hierarchy) not just counting the number of deaths (where I agree health-related deaths are the biggest bracket).
I consider US govt partially responsible for unequal distribution.
Neither the mortality-rate nor the energy-use map lines up that closely with the US geopolitical sphere of influence. (E.g. Russia and China on the one hand, Latin America on the other.)
I’m not saying the US government isn’t partially responsible for unequal distribution, but your previous comment sounds like treating it as the only or primary significant factor.
(I’m also not sure what point you’re trying to make at all with the energy-use map, given how similar it looks to the mortality-rate map.)
Russia and China obviously have significant crude oil reserves which they use domestically. They get to keep them instead of exporting to someone because they have nuclear weapons.
All of industry is ultimately based on a few resources such as crude oil, coal and water. These are then used to make steel and electricity which are then used to make industrial supplies for chemicals and so on.
So a shortage of drugs or of roads or of hospitals does indirectly tie into the energy use of the country.
“Crude oil deals made by the US govt” are responsible for a negligible proportion of global economic prosperity, which comes out of the global scientific ecosystem that has been centered in the US for nearly 100 years.
I agree the US exports a variety of goods including weapons, food, industrial products, aircraft and so on, and this gives them more money to purchase crude oil. And being on the leading edge of science and engineering for these industries enables them to make these exports in the first place.
US military protection including nuclear protection is obviously another reason why US gets favourable deals from its allies though.
I think it would be really bad for humanity to rush to build superintelligence before we solve the difficult problem of how to make it safe. But also I think it would be a horrible tragedy if humanity never ever built superintelligence. I hope we figure out how to thread this needle with wisdom.
I agree with this fwiw. Currently I think we are in way way more danger of rushing to build it too fast than of never building it at all, but if e.g. all the nations of the world had agreed to ban it, and in fact were banning AI research more generally, and the ban had held stable for decades and basically strangled the field, I’d be advocating for judicious relaxation of the regulations (same thing I advocate for nuclear power basically).
I am not really clear that I should be worried on the scale of decades? If we’re doing a calculation of expected future years of a flourishing technologically mature civilization, slowing down for 1,000 years here in order to increase the chance of success by like 1 percentage point is totally worth it in expectation.
Given this, it seems plausible to me that one should rather spend 200 years trying to improve civilizational wisdom and decision-making rather than instead attempt to specifically just unlock regulation on AI (of course the specifics here are cruxy).
I agree that 200 years would be worth it if we actually thought that it would work. My concern is that it’s not clear civilization would get better/moresane/etc. over the next century vs. worse. And relatedly, every decade that goes by, we eat another percentage point or three of x-risk from miscellaneous other sources (nuclear war, pandemics, etc.) which basically impose a time-discount factor on our calculations large enough to make a 200 year pause seem really dangerous and bad to me.
I think the same world that coordinated well enough to do a centuries-long AGI pause (without heralding a dark age of negative economic growth or global totalitarianism, etc) is probably also more than capable of preventing thermonuclear war, extinction-level artificial pandemics, grey goo, etc.
At that point your biggest risks are natural x-risks (very low), authoritarian backsliding, meme wars, and some fraction of unknown unknowns.
while I agree for smaller numbers like a few decades, I don’t think I agree with a 1000 year pause.
I think (a) it’s perfectly reasonable for people to be selfish and care about superintelligence happening during their lifetime (forget future people and discount factors thereof—almost every single person alive today cares ooms more about themselves than about some random person on the other side of the planet), (b) it’s easy for “delay forever” people to basically pascal’s mug you this way, as in nuclear power (c) it’s unclear that humanity becomes monotonically more wise over time (as an unrealistic example, consider a world where we successfully create an international treaty to ensure ASI is safe, and then for some reason the entire world modern order collapses and the only actors left are random post-collapse states racing to build ASI. then it would have been better to build ASI in a functional pre-collapse world order than to delay. one could reasonably (though i personally don’t) believe that the current world order is likely to fail in the coming decades and ASI is best built now than in the ensuing chaos)
it’s perfectly reasonable for people to be selfish and care about superintelligence happening during their lifetime
Yes people are selfish, that is why you should sometimes be ready to fight against them. Point a is not a disagreement with Ben.
then for some reason the entire world modern order collapses
This is low probability on time scale of decades but is an argument people can use to justify their self-serving desires for immortality as somehow altruistic.
If I understood Eliezers argument correctly we can shorten those timescales buy improving human intelligences through methods like genetic engineering. Once majority of humans have Von Neumann level IQ I think its fine to let them decide how to proceed on AI research. Question is, how fast can this happen, and it probably would take a century or 2 at least.
>slowing down for 1,000 years here in order to increase the chance of success by like 1 percentage point is totally worth it in expectation.
Is it? What meaning of worth it is used here? If you put it on a vote, as an option, I expect it would lose. People don’t care that much about happiness of distant future people.
Given this, it seems plausible to me that one should rather spend 200 years trying to improve civilizational wisdom and decision-making
Make it a thousand, or two thousand. To Daniel’s point, societal change is not always positive on the scale of centuries. But on the scale of millennia it is. At least the last few ones.
SI can come when we say so. On the human evolutionary timescale, 10K years is short. On a cosmic timescale, it is nothing.
i think it’s plausible humans/humanity should be carefully becoming ever more intelligent ≈forever and not ever create any highly non-[human-descended] top thinker[1]
If a superintelligence governs the world, preventing extinction or permanent disempowerment for the future of humanity, without itself posing these dangers, then it could be very useful. It’s unclear how feasible setting up something like this is, before originally-humans can be uplifted to a similar level of competence. But also, uplifting humans to that level of competence doesn’t necessarily guard (the others) against permanent disempowerment or some other wasteful breakdowns of coordination, so a governance-establishing superintelligence could still be useful.
Superintelligence works as a threshold-concept for a phase change compared to the modern world. Non-superintelligent AGIs are still just an alien civilization that remains in principle similar in the kinds of things it can do to humanity (even if they reproduce to immediately fill all available compute, and think 10,000x faster). While superintelligence is something at the next level, even if it only takes non-superintelligent AGIs to transition to superintelligence a very short time (if they decide to do that, rather than to not do that).
Apart from superintelligence being a threshold-concept, there is technological maturity, the kinds of things that can’t be significantly improved upon in another 1e10 years of study, but that maybe only take 1-1000 years to figure out for the first time. And one of those things is plausibly efficient use of compute for figuring things out, which gives superintelligence at a given scale of compute. This is in particular the reason to give some credence to software-only singularity, where first AGIs quickly learn to make a shockingly better use of existing compute, so that their capabilities improve much faster than it would take them to build new computing hardware. I think the most likely reason for software-only singularity to not happen is that it’s intentionally delayed (by AGIs themselves) because of the danger it creates, rather than because it’s technologically impossible.
Every year we don’t build superintelligence is a worse tragedy than any historical tragedy you can name. Tens of millions dead, hundreds of millions suffering, etc.
That doesn’t mean we should rush ahead, because rushing ahead is most likely far worse. But we should be aware of the cost.
I am undecided as to whether superintelligence should ever be built, and with my current knowledge and perspective, I would be fine with either outcome.
Some questions:
Has the existence of humanity up to now with no superintelligence been a horrible tragedy?
What would superintelligence allow that would otherwise be forever out of reach?
Are there not also things that humanity could or would lose forever if we did create a safe superintelligence?
Superhuman artificial general intelligence could be created this century and would likely be a significant source of existential risk. Delaying the creation of superintelligent AI (ASI) could decrease total existential risk by increasing the amount of time humanity has to work on the AI alignment problem. However, since ASI could reduce most risks, delaying the creation of ASI could also increase other existential risks, especially from advanced future technologies such as synthetic biology and molecular nanotechnology. If AI existential risk is high relative to the sum of other existential risk, delaying the creation of ASI will tend to decrease total existential risk and vice-versa. Other factors such as war and a hardware overhang could increase AI risk and cognitive enhancement could decrease AI risk. To reduce total existential risk, humanity should take robustly positive actions such as working on existential risk analysis, AI governance and safety, and reducing all sources of existential risk by promoting differential technological development.
Yet before we can pass out of that stage of adolescence, we must, as adolescents, confront an adult problem: the challenge of smarter-than-human intelligence. This is the way out of the high-mortality phase of the life cycle, the way to close the window of vulnerability; it is also probably the single most dangerous risk we face. Artificial Intelligence is one road into that challenge; and I think it is the road we will end up taking. I think that, in the end, it will prove easier to build a 747 from scratch, than to scale up an existing bird or graft on jet engines.
I decided to conduct an experiment at neurips this year: I randomly surveyed people walking around in the conference hall to ask whether they had heard of AGI
I found that out of 38 respondents, only 24 could tell me what AGI stands for (63%)
not sure, i didn’t keep track of this info. an important data point is that because essentially all ML literature is in english, non-anglophones generally either use english for all technical things, or at least codeswitch english terms into their native language. for example, i’d bet almost all chinese ML researchers would be familiar with the term CNN and it would be comparatively rare for people to say 卷积神经网络. (some more common terms like 神经网络 or 模型 are used instead of their english counterparts—neural network / model—but i’d be shocked if people didn’t know the english translations)
overall i’d be extremely surprised if there were a lot of people who knew conceptually the idea of AGI but didn’t know that it was called AGI in english
Those who couldn’t tell you what AGI stands for—what did they say? Did they just say “I don’t know” or did they say e.g. “Artificial Generative Intelligence...?”
Is it possible that some of them totally HAD heard the term AGI a bunch, and basically know what it means, but are just being obstinate? I’m thinking of someone who is skeptical of all the hype and aware the lots of people define AGI differently. Such a person might respond to “Can you tell me what AGI means” with “No I can’t (because it’s a buzzword that means different things to different people)”
the specific thing i said to people was something like:
excuse me, can i ask you a question to help settle a bet? do you know what AGI stands for? [if they say yes] what does it stand for? [...] cool thanks for your time
i was careful not to say “what does AGI mean”.
most people who didn’t know just said “no” and didn’t try to guess. a few said something like “artificial generative intelligence”. one said “amazon general intelligence” (??). the people who answered incorrectly were obviously guessing / didn’t seem very confident in the answer.
if they seemed confused by the question, i would often repeat and say something like “the acronym AGI” or something.
several people said yes but then started walking away the moment i asked what it stood for. this was kind of confusing and i didn’t count those people.
not to be ‘i trust my priors more than your data’, but i have to say that i find the AGI thing quite implausible; my impression is that most AI researchers (way more than 60%), even ones working in like something very non-deep learning adjacent, have heard of the term AGI, but many of them are/were quite dismissive of it as an idea or associate it strongly (not entirely unfairly) with hype /bullshit, hence maybe walking away from you when you ask them about it.
e.g deepmind and openAI have been massive producers of neurips papers for years now (at least since I started a phd in 2016), and both organisations explictly talked about AGI fairly often for years.
maybe neurips has way more random attendees now (i didn’t go this year), but I still find this kind of hard to believe; I think I’ve read about AGI in the financial times now.
only 2 people walked away without answering (after saying yes initially); they were not counted as yes or no. another several people refused to even answer, but this was also quite rare. the no responders seemed genuinely confused, as opposed to dismissive.
feel free to replicate this experiment at ICML or ICLR or next neurips.
Was this possibly a language thing? Are there Chinese or Indian machine learning researchers who would use a different term than AGI in their native language?
I’d be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I’m confident in my ability to administer this experiment on Chinese, French, and German speakers, which won’t be an unbiased sample of non-native speakers, but hopefully still provides some signal.
I think if I got asked randomly at an AI conference if I knew what AGI was I would probably say no, just to see what the questioner was going to tell me.
a lot of people say things like “sure, religion might not exactly be totally true, but it has lots of benefits, and there really does seem to be a god shaped hole in many people, so who can really say if it’s good”. i think this is directionally correct but kind of cowardly.
i think the correct take on religion is first that its claims are completely and utterly false; obviously the christian god doesn’t literally exist, jesus never came back from the dead, etc. this is so overdone by the old internet atheists that it would be beating a dead horse to harp on further.
secondly, the human condition involves a whole bunch of things that are kind of sucky. for example, the fact that we only have a very short amount of time on this planet before we die forever is utterly terrifying; or, the fact that it can be very difficult to find a source of meaning to ground our motivation in, and that it really sucks to not have a reliable foundation for motivation; or, the difficulty of connecting with other people despite differences.
i claim that there is a true solution to each of these problems that involves a very difficult never ending journey of discovery of the self, understanding and connecting with your emotions, constructing intellectual frameworks, and even technological development. part of the project of civilization should be to slowly build up the cultural edifice required to solve these problems.
religion is not the true solution to these problems, but rather the hacky patch that kind of helps—you no longer fear death because you know heaven awaits you, everything is meaningful because god, and you can connect with other people over believing in god—but in exchange you contort and utterly trash your epistemics, foreclose the possibility of solving these problems with a more epistemically sane solution (because tearing out this foundation will temporarily thrown you into epistemic chaos), and even if you somehow compartmentalize the epistemic distortions to religion related questions, you still become constitutionally incapable of thinking sanely about ideas like radical life extension.
tbc, not everyone should embark on the difficult path of accepting nothing but the truth. it can be soul rending, and make you very unhappy. for many people, if they couldn’t possibly make any contributions to building the glorious transhumanist future, or if they value personal happiness much more than impact, it’s better to take the easy road. but i think there is nothing deep and respectable about taking the religious road, certainly people afford it too much undue respect, and use the unhappiness and more difficult emotional journeys of the atheists as an unfair bludgeon without realizing what large chunks of themselves they have sacrificed.
the human condition involves a whole bunch of things that are kind of sucky. for example, the fact that we only have a very short amount of time on this planet before we die forever is utterly terrifying...
i claim that there is a true solution to each of these problems that involves a very difficult never ending journey of discovery of the self, understanding and connecting with your emotions, constructing intellectual frameworks, and even technological development
In the spirit of your post: Is not this also cope? (Except for the last bit about technological development, maaaybe.)
Like why would evolution have given you the tools to have helped reconcile you to death, anomie, and lack of motivation, and lack of connection? Why should “understanding and connecting with your emotions” and “discovery of the self” be an affordance in this world that lets you actually find a true solution to the human condition? Why should there be a “true solution” to such problems at all?
Like at least—if religion were true—it would make sense for a benevolent God to have created a path that would make you and those around you happy. It’s internally consistent, in some sense. But if you were made by godshatter evolution, why would there be any path that looks like “internal development” that satisfies these questions? Isn’t the null hypothesis that a “never ending journey of discovery of the self” just as much a fake-ass story as Jesus dying for your sins?
oh, there’s no reason to expect evolution to have given us the necessary tools at all. certainly i’m not claiming that emotional connection and self discovery is sufficient to solve all of the problems of the human condition. certainly you cannot live forever by being enlightened (you can stop fearing death, but i think doing so is harmful). the only ultimate solution is technology.
but in the meantime, there are some cognitive tricks that are better than other cognitive tricks. the better ones have the property that they fuck with your epistemics less for the same amount of alleviation of existential dread and such. they are also harder work than religion, because they require solving much more complicated emotional problems. the problem with religion is it solves for minimizing existential dread without trying to avoid collateral damage to the epistemics.
this post was prompted by reading books like Crime and Punishment and The Death of Ivan Ilyich which are amazing except for the parts where they worship religion. they’re not necessarily even wrong for their time—back in the day, the glorious transhumanist future was so far away that it wasn’t nearly as worth taking into consideration. but the world has changed a lot and the end times are nigh.
You’re going to have a tough time with Dostoyevsky if you think about it as “worshipping religion”. It might help to remember that he was also quite conversant with arguments for atheism, e.g. as he has Ivan espouse in Brothers Karamazov. Not to say I think you should believe like he does, but that he’s not just putting this stuff forward out of blind ignorance.
EDIT: I got some very useful feedback from my atheist friend and I’m reorganizing this post.
Here it seems that your central point is that the reality of death is scary to confront from the perspective of your sincere atheism, and that you find it noble to confront your fear of this reality. Furthermore, you are angry at religious people who seem to have taken the easy way out by choosing to believe in lies rather than (a) take on the challenge of constructing meaning and (b) face the harsh truth of the finality of death, and who do not recognize your sacrifice in taking the harder road. You imply that science (as opposed to religion) is the best way of seeking truth.
TL;DR. I think I’m an exception to your implication that all religious people have taken the easy way out, while still agreeing that this may be true for many of them. I essentially agree with your view that reality can be harsh and death is scary. I wonder if seeking the truth can be achieved purely through scientific thinking or whether religious teachings can be helpful if taken with discretion—which is my experience.
My response is as follows:
1) My personal feelings about your post
2) My areas of agreement with you
3) My doubt about finding communities to support truth-seekers
3.5) My observations about open-mindedness within religion
4) My doubt about the adequacy of human-derived scientific knowledge in removing all biases
5) Things I don’t have time to write more about
1) My personal feelings about your post
I’m religious and I was triggered by this post. I don’t think I tried to take the easy road. I had a friend who claimed to be EMF-sensitive (electromagnetic field-sensitive) and I was trying to “scientifically” disprove his disease—doing sneaky experiments to see he if we really was bothered by wifi. Now I think that was a rather disrespectful thing to do, but that goes to show how much of a “militant” atheist I was. However, I agreed to join a yoga workshop from the Isha foundation because my friend claimed that I would believe him after the experience. Indeed, after 4 days, my worldview was not the same.
If you read that, thank you! I didn’t say it to try to convince you that you’re wrong about religion—I know it’s very unlikely that my experience would be compelling to you, and I don’t have many reasons to care about what you believe.
Also, apologies if you find that some of the following is poorly written or leaves out details. I have already spent 90 minutes writing this and I have to move on to some urgent tasks.
2) My areas of agreement with you
So now let me tell you all the ways I agree with you.
“the human condition involves a whole bunch of things that are kind of sucky.”
“it can be very difficult to find a source of meaning to ground our motivation in”
“the difficulty of connecting with other people despite differences”
“true solution to each of these problems...difficult never ending journey of discovery of the self.. part of the project of civilization should be to slowly build up the cultural edifice required to solve these problems.”
“religion” often functions as a “hacky patch that kind of helps...”
“religion” often causes one to “contort and utterly trash your epistemics”
“not everyone should embark on the difficult path of accepting nothing but the truth.”
All agreed!
And yet, despite the difficulties, I can wholeheartedly identify with you as someone who has tried to accept nothing but the truth.
3) My doubt about finding communities to support truth-seekers
However, the practical challenge of discerning the truth and fighting against the emotional and social gradients that oppose the truth is indeed difficult. I used to think that the LessWrong community or CFAR could be helpful for this task, but I’ve lost confidence in these communities. You can see that I rarely post here. Academia, I found, was also lacking in sincere truth-seekers. I went to Stanford for graduate school, and now I work at the National Institutes of Health.
3.5) My observations about open-mindedness within religion
I attend a protestant church, and I completely agree that most religious people are unfortunately very closed-minded. However, I believe that God also wants to help religious people become more open-minded, and He gradually leads people within traditional religions to expand their worldview. I can give you one interesting anecdote to chew on. As you probably know, the majority of Christians are quite dogmatic and believe that our faith is the only path to salvation. However, you can see several examples of prominent Christians who became more tolerant as they matured. Billy Graham is the example that stands out the most to me, as near the end of the life he was stated that he believed believers from other religions might also be saved:
Well, Christianity and being a true believer—you know, I think there’s the Body of Christ. This comes from all the Christian groups around the world, outside the Christian groups. I think everybody that loves Christ, or knows Christ, whether they’re conscious of it or not, they’re members of the Body of Christ. And I don’t think that we’re going to see a great sweeping revival, that will turn the whole world to Christ at any time. I think James answered that, the Apostle James in the first council in Jerusalem, when he said that God’s purpose for this age is to call out a people for his name. And that’s what God is doing today, He’s calling people out of the world for His name, whether they come from the Muslim world, or the Buddhist world, or the Christian world or the non-believing world, they are members of the Body of Christ because they’ve been called by God. They may not even know the name of Jesus but they know in their hearts that they need something that they don’t have, and they turn to the only light that they have, and I think they are saved, and that they’re going to be with us in heaven.
Within protestant Christianity, there is a movement of progressive Christians such as Brian McLaren, author of “Faith after Doubt: Why Your Beliefs Stopped Working and What to Do About It” who are taking a more open-minded view of faith. A female missionary, who had several generations of missionaries in her family, came to our church to talk about some of McLaren’s ideas. People in the audience were speechless because these ideas were so contrary to our prevailing dogmatism, and yet this missionaries’ credentials and background in coming from several generations of serving Jesus was unassailable.
These are some of the observations I have seen from within my faith which, combined with my own personal experience, makes me convinced that religion and open-mindedness are not as incompatible as you claim.
4) My doubt about the adequacy of human-derived scientific knowledge in removing all biases
My perspective is that just as religion often creates biases that prevent us from being able to reason or think about the truth clearly, scientific culture has its own biases as well. Science as an aspirational ideal should in principle empower us to discover the truth and free us from bias, but the problem is that the scientific knowledge that we actually have access to is already contaminated with biases originating from the biased humans who have been doing the science. I can name one bias that scientific materialists may be more vulnerable to than sincere religious adherents: and that is the fear of death. Once you can think about your own physical death with reduced emotional reaction, it is possible to see more clearly some of the lies that you tell yourself out of the instinct of self-preservation. That is why Christianity teaches one to rely on Biblical beliefs and a sense of a relationship with God to overcome fear of death and to pursue the goal of cultivating humility—which in one sense is a freedom from the effect of the egoistic lies we tell ourselves as a result of our fear of death.
5) Things I don’t have time to write more about
I’m happy to go into more detail about my experiences and my research into religion and spirituality. You mention that religion can prevent people from trying to find their own meaning in life or to confront the reality of death. My experience has been that while this appears to be true of a small fraction of religious people, many believers still struggle just as much as atheists in terms of trying to find meaning or trying to accept the reality of death. Furthermore, I do not think religion, insofar as it has been humanly constructed, has been designed to spare believers from these questions, and I think that texts such as the Book of Job would support my view. And does religion spare us from having to find meaning in our lives? I am not quite sure how to answer it. Perhaps the problem of finding meaning is more challenging or urgent for an atheist. However, it still exists for the believer. While we are encouraged to trust in God’s plan for us, it’s not so simple to just stop thinking about what we are supposed to do here. Perhaps God doesn’t tell us everything, so we also have to figure it out ourselves, just like you. In some ways you could see atheism as taking the easy way out, since nothing you do has permanent consequence. From my belief system, on the other hand, there’s no decision I am justified in taking lightly, since it all has eternal consequences and my work will be evaluated by a perfect judge. Sometimes that feels like an amplification of the problem of meaning—I not only have to find the meaning of my life overall, but I have to figure out the meaning of every single moment in the context of eternity.
Do you actually have any uhh factual disagreements with atheists? Did all the stuff from Old Testament happened, or is it all weird and misleading metaphor? Was Jesus a spellcaster or just an inspired guy with awesome ideas? Did God micromanage the creation of humans or did he outsource it to a meatgrinder of evolution?
I know this is a quick take but I think perhaps too quick. I suggest first clarifying just what you aim at with the term religion. Are we talking about the organized religions and all their canon or a general belief and faith in some supernatural entity?
first that its claims are completely and utterly false; obviously the christian god doesn’t literally exist, jesus never came back from the dead, etc.
seems a very strong claim. There are modern cases of dead but return to life. The point here is about what technology is available to establish death and recovery. Not so say there is a case to be made that dead three days and then alive is not an extreme claim and deserving of skepticism. But is that an issue for those with faith or orthodox religion or a case of historical inaccuracy?
I also find the concept of truth rather complicate in this area. Can er really know the truth about something that is suppose to exist outside our universe and its laws?
I see you posted in a reaction to some readings. Just curious, have you ever seen “The Hogfather”? (https://en.wikipedia.org/wiki/Terry_Pratchett%27s_Hogfather). It’s a pretty quirky movie but one of the lines has always stuck with me and it seems very related to your post. It is actually a something of a retelling of some type of Christmas story, the Hogfather is basically Santa Claus. At the end Death (the character) says “If they cannot believe the little lies, how can they ever believe the bit lies?” The point being that perhaps a lot of what we see as moral and right is just BS but very valuable both socially and individually.
“The real Magic was friends we made along the way!”
“Wrong. FIREBALLLL *explosion* ”
People really believe there is a God, it’s not fair to redefine it to point to some Leviathan-like thing which arises from people acting like it breathes down their necks. For one thing, the religious people would say that you are wrong in general and about their position in particular.
When you say “the religion is false” what is your probability on this statement (and why such probability)? (If you want you can separately give probability on the falseness of particular religion and on the existence of some “Higher Power” at all. “Simulation argument” and “Cosmic zoo” definitely count as Higher Power. )
religion is not the true solution to these problems, but rather the hacky patch that kind of helps—you no longer fear death because you know heaven awaits you, everything is meaningful because god, and you can connect with other people over believing in god—but in exchange you contort and utterly trash your epistemics, foreclose the possibility of solving these problems with a more epistemically sane solution (because tearing out this foundation will temporarily thrown you into epistemic chaos), and even if you somehow compartmentalize the epistemic distortions to religion related questions, you still become constitutionally incapable of thinking sanely about ideas like radical life extension.
I don’t think this is necessarily true. There are some very well educated and knowledgeable people who do believe in such things but their epistemics otherwise seem to be unaffected. Currently we have no radical life extension technologies, but what we do have seem to be still used by most religious people. Yeah, there are some who rather die than receive blood transfusion, but on the other hand the pope himself receives top notch medical care, and so do most religious people who can afford it. Most people never learned that “beliefs should pay rent”, and a lot of their weird spiritual beliefs are more of an “idle wheel … can be turned though nothing else moves with it, is not part of the mechanism” (semi-paraphrasing Wittgenstein).
An anecdotal experience I have on this is when a spiritual person described how she multiple times foresaw things in her dreams that came to be later. When I started proposing questions on how she acted on those or proposing test/benefits that could be done in such a case she was utterly surprised, never having considered that spiritual experiences can be used for anything other than talking about them.
I think that is the case for most religious people. Going to church won’t stop them from replacing their organs with cloned ones and reprogramming cells to slow aging. They will still say are prayer and thank god for it.
when i was new to research, i wouldn’t feel motivated to run any experiment that wouldn’t make it into the paper. surely it’s much more efficient to only run the experiments that people want to see in the paper, right?
now that i’m more experienced, i mostly think of experiments as something i do to convince myself that a claim is correct. once i get to that point, actually getting the final figures for the paper is the easy part. the hard part is finding something unobvious but true. with this mental frame, it feels very reasonable to run 20 experiments for every experiment that makes it into the paper.
random thoughts on analytical and emotional intelligence
one thing that I think the world needs more of is analyses into the nature of the mind by people who are both rigorous/analytically inclined, and also emotionally intelligent/integrated. much writing from the former fails to model large parts of the human mind, and much writing from the latter fails to create models of sufficient clarity and validity.
I think this underlies a lot of my instinctive dislike of humanities work. people who are emotionally perceptive but not rigorous and analytical tend to notice interesting things about the human experience, but then come up with very poor models that set off all of my bullshit sensors that are attuned to rigorous arguments. but I think it should be possible to have humanities work that is not like this.
(for clarity, from here out I will say analytical and emotional to refer to the axes which are independent of each other, and ABNE (analytically but not emotionally intelligent) and EBNA for the converse)
(I also want to clarify that I don’t think of analytical as being in opposition to intuition, at least in the context of this post. something something Terence Tao’s post about how the best mathematicians start out thinking in rigor before developing the intuitions to think without applying rigor all the time, but their intuitions check out rigorously when needed)
because there’s a strong anti correlation between analytical inclination and emotional integration, it’s easy to round this off to a single axis. but I think this is too oversimplifying.
analytical people like to construct typologies and categorizations that cleanly describe the world. edge cases are very important because in a lawful world, thinking about the edge cases teaches you a lot about the laws of the world, which in turn gives you deep understanding that is surprising but robust (physics is the poster child for this worldview). analytical people are very aware that it’s easy to make theories that sound nice but aren’t actually good; it’s important to have rigorous standards for testing theories; indeed, it’s important to have rigorous meta-theory (epistemology) for what counts as an actual rigorous standard and not just Rigorous Vibes. analytical people care a lot about subtle but important distinctions—for example, correlation vs causality; or thinking of high dimensional data as having large principal components vs fundamentally low dimensional data being embedded in high dimensional space; or true anticorrelations vs collider bias; or the distinction between unmodelability and noise; or biased vs high variance estimators. in some sense the Sequences is a compendium of examples from the analytical worldview.
emotionally intelligent people tend to be good at modelling themselves and the people around them. they can sense subtle social cues. they can notice that certain attributes or actions of people reveal much about their internal state, that ABNE people would think of as noise. they tend towards the arts and forms of creative expression that communicate and allow empathizing with emotional state. they are attuned to narratives and worldviews as a thing that shapes the actions of individuals and groups. they have fewer unresolved internal conflicts between their parts. they can spot emotional/motivational contradictions in the minds of other people that end up driving actions; this is inferred not just from what others say, but from what they don’t say, or what they hesitate to say.
ABNE people tend to think of EBNA people as dumb or irrational. they think of EBNA people as snobbish about the humanities and their ideas as insight porn, the result of an ingroup signalling game gone awry. ABNE people tend to construct sophisticated models of themselves and other people that lack a huge chunk of important inputs, making certain kinds of things really hard to model, and requiring way more intellectual capability to achieve the same accuracy as emotionally intelligent people. this frequently leads to large swathes of human behavior being changed chalked up to noise, or irrationality, or even completely giving up on modelling entire facets of humanity. ABNE people tend to have major issues with internal coherency between their parts.
EBNA people like to think of ABNE people as boring nerdy nitpickers. they spot the internal incoherency of ABNE people but undervalue the analytical component. oftentimes they phrase beliefs in forms that contain some important true observation, but can’t put it in the form of something rigorous and testable, and thereby gets lost in the oceans of insight slop sloshing about in the world. EBNA people notice lots of signals but fail to put them together into a coherent, effective worldview, and instead coarsely clusters observations together into simple heuristics.
this post is unapologetically written as an analytical way of trying to get the best of both worlds, because this is how I think natively. perhaps in the future I’ll try to make a more balanced / broadly digestible version.
Another extreme advantage of the the “Renaissance man” is the ability to clearly *convey* emotion learnings to others (especially those without strong emotional intelligence). Typically, EI is won through interaction and, essentially, reinforcement learning on contact with others—possessing both the technical vocabulary and understanding of human social norms allows you to explain very tricky things nerds have a tough time learning directly to them. This is extremely useful in, e.g workplaces or high stakes environments (a good manager can quickly untangle a mess of arguments), and arguably underappreciated in therapists and similar vocations.
I know this is a bit old, but I think about this a lot and thought I’d leave some of my ideas.
Here goes. Attention warning: 600+ words. I have cut it down quite a bit but have the edits saved if you are interested.
(1) It is important to remember EBNA-style thinking has its own intellectual history.[1] An “emotionally sensitive person” from the 17th century would be totally different (in actions, skills, self-perception, and social perception) from one today. To the extent modern EBNA people think of their intuitions as having unique epistemic and intellectual utility, they are probably following the “romantic” movement. Call them “Humanities-EBNA types”
Romanticism is explicitly committed to irrationality, a lack of systemisation; incompleteness; and the unreliable of generalisation, abstraction, causality, predictability, legibility, and order. For the Romantic, a map not only fails to represent the territory, but the process of mapping destroys it.[2] Obviously this is to analytical rigour as oil is to water.
The rationalist tendency is to think of social intuition as a potent, but ultimately reproducible, heuristic. The intuition of EBNA people is like, as you say, mathematical intuition or intuitive grip strength when making a cup of coffee. Is there a way to learn these skills systematically...? Call these types “STEM-EBNA types.”
(2) STEM-EBNA people believe in the utility (and necessity) of interdisciplinary study. (In terms of intellectual history, this is straightforwardly a descendent of 18th century Enlightenment views).
In practice, you end up with highly interdisciplinary social science which brackets messy human subjectivity into shape-rotatable boxes and then operates on them systematically. The extent to which there is signal loss depends on (A) your goals and standards for signal/noise; and (B) the complexity of the qualia in question being bracketed.
This is presumably the kind of research you are interested in and which you think EBNA skills could be used to great labour-saving effect.
(3) You will have noticed such bracketing appears incompatible with the romantic, Humanities-EBNA type of thinking. Love is not just a chemical in the brain… I am a naturalist myself, so for me the question is how much signal loss occurs when you bracket messy romantic feelings like “love”, “sense of justice” or “loyalty.” Some humanities scholars say the loss is 100% because every feeling is radically unique. A more reasonable test for signal loss:
is the subjective meaning lost when bracketing perceptible to the average person, such that it alters any of theirrecordable behaviour?
There is no reason this cannot change over time: as our methods get subtler and more fine-grained the signal loss decreases. That said, we should remember what is lost could be numerically insignificant but sometimes causally, or intellectually, of great importance. The most direct expression of this heuristic is Cloud Atlas. See also “the unreasonable effectiveness of small groups with radical ideas.”
Two extra points of interest.
The first is that romanticism has penetrated very deeply into the modern (western?) mind. Even the most thoroughgoing naturalists—say, Gwern—remain at least emotionally attached to a romantic residual present in great art. We like to believe in the ineffable human experience which cannot be explained away by regression models or evo biology. Human after all.
The second is that there is another group committed to a materialistic, rational view of human behaviour: Marxists. It is an open question whether Marxist Russians and Chinese people deeply internalise this materialism. Perhaps they think in a thoroughly ‘disenchanted’, systematic manner and their thought, as a result, does not produce the same problem of signal loss. Are they merely unconverted romantics...?
Much of this is from Isaiah Berlin’s essays in The Crooked Timber of Humanity and The Roots of Romanticism. Worth reading if you want to learn about the origins and structure of current EBNA thinking.
I think part of the trouble is the term “emotional intelligence”. Analytical people are better at understanding most emotions, as long as the emotions are small and driven by familiar dynamics. The issue is the biggest emotions or when the emotions are primarily driven by spiritual factors.
(I at least suspect this is my comparative advantage. But I’m not good at communicating [insights], a skill that comes neither with <analytical rigor> nor with <high-res introspective access>.
It also seems like the <after controlling for situational factors, status psychology explains more than half of variance in human behavior> camp is essentially right, which colors most genuine discussion less pretty than most people would prefer, especially those with less introspective insight.
I (somewhat predictably, given my status incentives) hold that this is an important, central problem civilization has, bc mutual information is the fundament of cooperation, or expressed more concretely the better we model each other the easier it is to avoid common deception & adversity attractors.)
hendrycks recently published a paper introducing a new moral theory. the paper contains this insane table, which claims that you should value a foreign stranger at 3e-12 times the value you assign to yourself. even setting aside the fact that this is apparently supposed to be a prescriptive theory, even as a descriptive theory, i think this is utter madness.
the core problem is that it assumes if x% of your total caring is assigned to people other than yourself, then you must give away x% of your wealth to be consistent.
the argument goes that since most people don’t give away more than say 50% of their wealth, then if there are 1e-10 people then each one can only get a tiny sliver of your caring.
but this is wrong, because there is no simple relationship between the % of your caring to be about other people and the % of your money you should give away. i think you should care about random strangers closer to 1e-3 than 1e-12. if you care about each stranger x times as much as yourself, you should keep giving away money to the person who is most in need until each marginal $ helps them more than x times as much as each marginal $ helps you.
if x = 1e-12, then you’re saying you won’t give a single dollar to charity until you have so much money that a dollar helps the stranger more than a trillion times more than it helps you; or, because a trillion dollars is always worth less than a trillion times the value of a dollar due to convexity, you won’t give a single dollar to charity until it would help them more than giving you a trillion dollars, which probably won’t be true until you are absurdly wealthy.
I might just be missing something here, but as presented it does seem a bit like Hendrycks’ arguments, in addition to being philosophically weak, also misunderstands some concept in basic calculus.
That said, I think while your model is (much) better, and closer to my own, it also is descriptively not amazing. I think in practice when people (including myself!) learn that charities are 10x more or less effective than they previously thought, they rarely adjust their giving substantially so that their new donations are in line with marginal utility.
I also think people’s actual decisions are often connected to framing effects, reciprocity norms, etc, rather than pure utility. For example I think if you’re a tourist your willingness to potentially sacrifice your life for a large group of random foreign strangers near you[1] is probably much higher than while you’re sitting at home an ocean away[2].
Ex 1: you’re driving a rental SUV, your brakes don’t work and you have the option of careening either into a group of schoolchildren or off a cliff. Compare that to willingness to donate money or a kidney. Ex 2: rushing into a burning building.
Assuming these are real situations, ignoring bravado/cheap talk etc. Obviously in non-serious hypotheticals people may say things they won’t actually do.
i broadly agree that most people aren’t thinking about this at all. my model is mostly prescriptive, under the constraint that it prescribes actions that are vaguely close to what people do in practice (as opposed to utilitarianism which gives away every penny you have to charity, or hendrycksism which lets billions die if it means you get a really nice apartment.)
Having now skimmed the paper/read some parts a bit more carefully, one thing I do appreciate about it is the attempt to modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
That said, I don’t think this particular implementation makes a lot of sense.
The Shapley mutual information also does way more “heavy lifting” to quote a favored AI phrase, than the paper wants to imply.
Finally, the whole idea is pretty crazy if you think about it. Is it actually rational to value yourself more than the rest of humanity combined? Is this actually consistent with most peoples’ endorsed preferences? This seems implausible!
People on average do seem to value themselves more than the entire rest of the world, perhaps with an exception for their closest friends and family.
I think a lot of people would sacrifice their lives for the rest of humanity, and a lot wouldn’t. I think by revealed preferences, there’s a wide range around self=8 billion strangers, and the distribution is pretty wide. Other measure of revealed preferences like donations seem to roughly agree.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
I think partially motivated reasoning but a lot of it is a defense mechanism. Like if somebody you’ve never met before tells you to donate half your money to charity, you might (correctly!) infer that they do not have your best interests at heart. Regardless of whether they’re a fancy-sounding Oxford academic, a carpenter’s stepson from the Middle East, or your local internet philosophy & rationality blogger.
So I’m not at all surprised that people aren’t convinced by these arguments, nor do I (at a sufficiently high level of abstraction) believe that they ought to be.
When I say motivated reasoning, do you think that means it’s conscious and strategic? I worry it’s used that way more than the academic and IMO more important usage.
modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
Aren’t you dividing twice there, since you: 1) single out a stranger (thus dividing the amount you care about the average stranger by their number) 2) then apply Hendricks central number to that stranger (where now you should be applying the pooled number, since you’re already ignoring all the other group members)
So I think this in fact pretty close to your intuition if interpreted correctly (you say 1e-3, Hendricks says 1e-2).
i don’t understand what you mean. the central column is saying i should care about myself 0.576 much, and Bob from Randomland 1.6e-12 much. where am I dividing twice?
my version of this table would say 1e-7 for self and 1e-10 for random person; the crux of my argument is that the ratio between the two is vastly more important than the absolute fraction of your caring a stranger occupies.
If I understand Hendryck’s logic here, then caring 1/1000 as much about a random stranger as about yourself, means you care several million times more about all random strangers combined than about yourself, which you don’t seem to be saying?
not OP, but that seems like a pretty reasonable conclusion. if i had to sacrifice my own life to save every person i didn’t personally know (ie. 8.1 billion people), i would absolutely do it in a heartbeat. i would also do it to just save a fraction of those people (8M people). once it starts getting down to much smaller fractions (saving 100-3 random people) does it start seeming like a hard tradeoff.
Sorry to be edgy, but, there are situations with options to sacrifice more than your life. I bet you have that limit. It’s just higher than your life.
Another point here, is that people are not that unified over time? Like, you can press some button that all subsequent yous would sincerely curse you for. Nooooo, the infinite torture dimension turned out to be a bit much! quote from pretty selfless human.
i agree that i would rather die instantly than live for 100 years of torture. i don’t think that proves as much as you think. i also think it’s fine for some people to make morbid utility calculations like these, and for others to say “i don’t want to think about that and i’m not going to answer”
Well, sure. You are fine with thinking about sacrificing your life and proudly announcing that, but anything more is too much to even talk about? Morbid calculations for me but not for thee
I just think you are wrong on your self model here. Like, I’m doubtful you would be able to like even saw your hand off without anesthesia, and it’s not any years of torture, it’s like 10 minutes of mild torture. A lot of people would bail on this, including me, and you are claiming what, to be unusually willing to sacrifice stuff, up from the prior?
not a lot of people (maybe literally 0) have had sufficient reason to saw off their own hand for altruistic reasons. i’ve donated a kidney, donate blood often, and gave more than the GWWC pledge when my income was high. any falsifiable claims you’d like to check while we’re speculating about my values?
my point is that % of caring is not a coherent concept, or at least not the one that maps onto the intuitive notion of what % of your wealth you should donate.
specifically, suppose instead of there being 1e10 people, there were 1e100 people. i claim the % of your money you should donate should basically not change at all, even though the % of caring assigned to yourself has plummeted by a huge amount
eh, i think there’s a wide range of reasonablish values that i wouldn’t have objected to. like i think it is vaguely defensible to care anywhere between 1e-1 and 1e-6 ish times as much about a stranger as yourself. 1e-12 is huge outside that window.
It is amazing that a paper that is essentially just a vaguer form of Hamilton’s Rule only cites him once.
As it stands, I think the table is incorrect but “right” in the sense that it really depends on which random constants you assign to these calculations, and I can’t see find any evidence of a careful selection in the paper or in his code.
Fwiw: Reading all comments here (though still w/o working link to hendryck’s article), I’m rather convinced we’re mainly circling around questions of definition.
OP leogao = caring in the actual natural sense
Rest: some special definition that I don’t fully understand but so far doesn’t seem super interesting (to me) and leads to indeed “batshit insane” result when read from natural meaning of “degree of caring” instead of hendrycks seemingly different concept
i think you should care about random strangers closer to 1e-3 than 1e-12.
In the table you reproduce, see the entry in the second column (“Count”) for “Foreign Stranger”: 7.8 x 10^9. For Hendrycks, Foreign Stranger doesn’t mean a particular individual that you encounter, it refers to every human being outside your nation or culture, whether or not you ever heard of them. So you can’t assign each of them a significance of 1/1000th of your caring; that exceeds your resources by a factor of a million.
i don’t think you understood my argument. i didn’t say you assign each of them 1/1000th of your total caring. i said you should assign each of them 1/1000th as much caring as you assign yourself. so you should occupy 1000⁄7 billion of your caring, and Bob from Randomland occupies 1⁄7 billion of your caring.
the entire point of my argument is it actually doesn’t matter what % of your own caring you take up. that’s not the relevant thing. the relevant thing is how much you care about each stranger relative to yourself, and the shape of your money utility curve.
I think it’s wrong that humans empirically value themselves as 1/10000000 of the rest of humanity. I guess I see your point, that you have some budget of caring, and on occasion you are willing to dispense quite a lot of it to a single stranger. But you would not dispense 99.9999% of your caring to all the strangers combined.
% of your caring is a flawed metric that doesn’t mean anything though! even if there are 1e100 strangers out there, as long as your caring about each individual relative to yourself is still 1/1000, the fraction of your money you’re willing to donate remains constant!
You have some pool of caring you are willing to donate, then in the case of where all other humans need a donation, they will each receive pool/total_pop. Then you care about each of them as pool/total_pop.
Like, if one encounters an opportunity to donate to a single stranger who needs it, people go above that pool/total_pop, but it doesn’t mean they would give more than total of pool in previous case. The scaling is weird.
i don’t understand what in my original post you disagree with. there is no such thing as a fixed pool of caring, i don’t even know what that means. the actual constraint is you have some finite amount of money. money is not the same as caring because each dollar is worth a different amount depending on how much money the recipient has. caring is just a multiplier on how much other people’s happiness is worth to you compared to your own happiness. if some of your dollars will bring so much more happiness to someone else (eg by saving their life) than yourself (eg by buying a slightly larger apartment) that it outweighs the fact that you don’t care about them as much as yourself, then you should give that dollar away. otherwise, you shouldn’t.
when I read “caring” + the table I assume something roughly equal to “percentage of attention/money/other resources spent”, otherwise how would you normalize caring to 1 (as is done in the table)?
random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have:
have an LM look at the comment you’re writing and real time give feedback on things like “are you sure you want to say that? people will interpret that as an attack and become more defensive, so your point will not be heard”. addendum: if it notices you’re really fuming and flame warring, literally gray out the text box for 2 minutes with a message like “take a deep breath. go for a walk. yelling never changes minds”
have some threaded chat component bolted on (I have takes on best threading system). big problem is posts are fundamentally too high effort to be a way to think; people want to talk over chat (see success of discord). dialogues were ok but still too high effort and nobody wants to read the transcript. one stupid idea is have an LM look at the transcript and gently nudge people to write things up if the convo is interesting and to have UI affordances to make it low friction (eg a single button that instantly creates a new post and automatically invites everyone from the convo to edit, and auto populates the headers)
inspired by the court system, the most autistically rule following part of the US government: have explicit trusted judges who can be summoned to adjudicate claims or meta level “is this valid arguing” claims. top level judges are selected for fixed terms by a weighted sortition scheme that uses some game theoretic / schelling point stuff to discourage partisanship
recommendation system where you can say what kind of stuff you want to be recommended in some text box in the settings. also when people click “good/bad rec” buttons on the home page, try to notice patterns and occasionally ask the user whether a specific noticed pattern is correct and ask whether they want it appended to their rec preferences
opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the site was, or whether you’re just mindlessly scrolling. gently reminds you to take a break if you can’t come up with a good example of a good interaction.
argument mapping is really cool imo but I think most attempts fail because they try to make arguments super structured and legible. I think a less structured version that lets you vote on how much you think various posts respond to other posts and how well you think it addresses the key points and which posts overlap in arguments would be valuable. like you’d see clusters with (human written and vote selected) summaries of various clusters, and then links of various strengths inter cluster. I think this would greatly help epistemics by avoiding infinite argument retreading
things the ideal sane discourse encouraging social media platform would have: [...]
opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the site was, or whether you’re just mindlessly scrolling. gently reminds you to take a break if you can’t come up with a good example of a good interaction.
Cynical thought: these two points might be incompatible. Social media thrives on network effects, and one requirement for those is that the website be addicting or attention-grabbing. Anti-addictiveness designs are nice in principle, but then your prospective users just spend their time on something that’s more addicting instead (whether other websites or Netflix or whatever), and thus can’t benefit from the other ways in which your site is better.
I’m so torn about “for like 75% or maybe 99% of humans, the chatbot saying ‘are you sure you want to say that?’ is probably legit an improvement. But… it just feels so slippery-slope-orwellian to me.” (In particular, if you build that feature, you need to be confident not only that the current leadership of your company won’t abuse it, but that all future leadership won’t either, and that the AI company you’re renting models from won’t enshittify in a way you don’t notice)
(I am saying this as, like, a forum-maintainer who is actually taking the idea seriously and trying to figure out how to get the good things from the idea, not just randomly dunking on it. Interested in more variants or takes)
to be clear I explicitly decided not to think too hard about this kind of issue when brainstorming. I think the long run solution is probably something like an elected governance scheme that lets the users control what model to use. maybe make it bicameral to split power between users and funders. but my main motivation for this brainstorming was to think of ideas I could implement in a weekend for shits and giggles to see how well they work irl
I lean towards not using models directly as “conversation participants”, which feels too likely to spiral out of control, but instead do things like have white-listed specific popups that they decide when to trigger.
IMO, part of the solution to endless scrolling is to not implement the feature where you can endless scroll. Instead, have an explicit next page button after some moderate amount of scrolling. (Also having the pop up is good, you could even let people program the pop up to be more frequent etc.)
there’s a broader category of things which are not literally scrolling but still time wasting / consuming info not to enrich oneself, but to push the dopamine button, and I think even removing the scroll doesn’t fix this (my phone is intentionally quite high friction to use and I still fail to stay off of it)
I wonder if anyone has ball-park figures for how much the LLM, used for tone-warnings and light moderation, would cost? I am uncertain about what grade of model would be necessary for acceptable results, though I’d wager a guess that Gemini 2.5 Flash would be acceptable.
Disclosure: I’m an admin of themotte.org, and an unusually AI-philic one. I’d previously floated the idea of fine-tuning an LLM on records of previous moderator interactions and associated parent comments (both good and bad, us mods go out of our way to recognize and reward high quality posts, after user reports). Our core thesis is to be a place for polite and thoughtful discussion of contentious topics, and necessarily, we have rather subjective moderation guidelines. (People can be very persistent and inventive about sticking to the RAW while violating the spirit)
Even 2 years ago, when I floated the idea, I think it would have worked okay, and these days, I think you could get away without fine-tuning at all. I suspect the biggest hurdle would be models throwing a fit over controversial topics/views, even if the manner and phrasing were within discussion norms. Sadly, now, as it was then, the core user base was too polarized to support such an endeavor. I’d still like to see it put into use.
>argument mapping is really cool imo but I think most attempts fail because they try to make arguments super structured and legible. I think a less structured version that lets you vote on how much you think various posts respond to other posts and how well you think it addresses the key points and which posts overlap in arguments would be valuable. like you’d see clusters with (human written and vote selected) summaries of various clusters, and then links of various strengths inter cluster. I think this would greatly help epistemics by avoiding infinite argument retreading
Another feature I might float is the idea of granular voting. Let’s say there’s a comment where I agree with 90% of the content, but vehemently disagree with the rest. Should I upvote, and unavoidably endorse the bit I don’t want to? Should I make a comment stating that I agree with this specific portion and not that?
What if users could just select snippets of a comment and upvote/downvote them? We could even do the HackerNews thing and change the opacity of the text to show how popular particular passages were.
the LLM cost should not be too bad. it would mostly be looking at vague vibes rather than requiring lots of reasoning about the thing. I trust e.g AI summaries vastly less because they can require actual intelligence.
I’m happy to fund this a moderate amount for the MVP. I think it would be cool if this existed.
I don’t really want to deal with all the problems that come with modifying something that already works for other people, at least not before we’re confident the ideas are good. this points towards building a new thing. fwiw I think if building a new thing, the chat part would be most interesting/valuable standalone (and I think it’s good to have platforms grow out of a simple core rather than to do everything at once)
One consideration re: the tone-warning LLMs: make sure to be aware that this means you’re pseudo-publishing someone’s comment before they meant to. Not publishing in discoverable sense, but logging it to a database somewhere (i.e., probably controlled by the LLM provider) - and depending on the types of writing, this might affect people’s willingness to actually write stuff
This is fixable by a) hosting own model, and double-checking that code does not log incoming content in any way, b) potentially, having that model on client side (over time, it might shrink to some manageable size).
one medium term future that still seems possible is that models continue to be bad at generalization, and so a huge fraction of the economy is AI data labelling for various extremely niche or brand new areas. a world where new problems are solved once by humans and the solution reused for near-free forever via AI.
ofc, once generalization is cracked then it’s all over. but in the meantime, this could persist for some duration.
“ofc, once generalization is cracked then it’s all over. but in the meantime, this could persist for some duration.”
I don’t agree with this framing. The models have been getting steadily better at generalizing, and I don’t think “generalization” is an atomic ability that can be “cracked.”
Humans are much better at generalization (they are more general and much more sample efficient for text) than LLMs, which is caused by us implementing some learning algorithm that is more general. Why couldn’t this be “cracked”?
I feel like it’s more precise to say “extrapolation”, since what you are gesturing at is that humans have to be the ones to “push the frontier” but once a task has been figured out, it (and anything sufficiently similar) can be solved by AI generalizing in an interpolative way
it’s quite plausible (40% if I had to make up a number, but I stress this is completely made up) that someday there will be an AI winter or other slowdown, and the general vibe will snap from “AGI in 3 years” to “AGI in 50 years”. when this happens it will become deeply unfashionable to continue believing that AGI is probably happening soonish (10-15 years), in the same way that suggesting that there might be a winter/slowdown is unfashionable today. however, I believe in these timelines roughly because I expect the road to AGI to involve both fast periods and slow bumpy periods. so unless there is some super surprising new evidence, I will probably only update moderately on timelines if/when this winter happens
also a lot of people will suggest that alignment people are discredited because they all believed AGI was 3 years away, because surely that’s the only possible thing an alignment person could have believed. I plan on pointing to this and other statements similar in vibe that I’ve made over the past year or two as direct counter evidence against that
(I do think a lot of people will rightly lose credibility for having very short timelines, but I think this includes a big mix of capabilities and alignment people, and I think they will probably lose more credibility than is justified because the rest of the world will overupdate on the winter)
My timelines are roughly 50% probability on something like transformative AI by 2030, 90% by 2045, and a long tail afterward. I don’t hold this strongly either, and my views on alignment are mostly decoupled from these beliefs. But if we do get an AI winter longer than that (through means other than by government intervention, which I haven’t accounted for), I should lose some Bayes points, and it seems worth saying so publicly.
to be clear, a “winter/slowdown” in my typology is more about the vibes and could only be a few years counterfactual slowdown. like the dot-com crash didn’t take that long for companies like Amazon or Google to recover from, but it was still a huge vibe shift
also to further clarify this is not an update I’ve made recently, I’m just making this post now as a regular reminder of my beliefs because it seems good to have had records of this kind of thing (though everyone who has heard me ramble about this irl can confirm I’ve believed sometime like this for a while now)
I was someone who had shorter timelines. At this point, most of the concrete part of what I expected has happened, but the “actually AGI” thing hasn’t. I’m not sure how long the tail will turn out to be. I only say this to get it on record.
~1 year update: i think i was overconfident in this and my probability of total AI winter has diminished slightly (but remains higher than most people still)
If you keep updating such that you always “think AGI is <10 years away” then you will never work on things that take longer than 15 years to help. This is absolutely a mistake, and it should at least be corrected after the first round of “let’s not work on things that take too long because AGI is coming in the next 10 years”. I will definitely be collecting my Bayes points https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?
a thing i’ve noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we’ve modeled all the uncertainty (or at least, more than other people who haven’t). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i’m doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i’ve considered some caveat that one might think i forgot to consider
i’ve heard others make a similar critique about this as a communication style which can mislead non-rats who are not familiar with the style, but i’m making a different claim here that one can trick oneself.
it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is. a concrete example: part of being socially incompetent is not just being bad at taking social actions, but being bad at detecting social feedback on those actions. of course, many people are not even aware of the latter. but many are aware of and acknowledge the latter, and then act as if because they’ve acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class.
one variant of this deals with hypotheticals—because hypotheticals often can/will never be evaluated, this allows one to get the feeling that one is being epistemically virtuous and making falsifiable predictions, without ever actually getting falsified. for example, a statement “if X had happened, then i bet we would see Y now” has prediction vibes but is not actually a prediction. this is especially pernicious when one fails but says “i failed but i was close, so i should still update positively on what i did.” while not always a bad idea, there’s a bias-variance tradeoff here, where doing this more often reduces variance but increases bias. i find that cases where i thought i was close but later realized i was actually far off the mark are sufficiently common that this isn’t an imaginary concern.
another variant is i think we are much less susceptible to some forms of brainworms/ideology, and are also much better at understanding the mechanisms behind brainworms and identifying them in others, so we over-update on our own insusceptibility to brainworms (despite evidence from the reference class of rationalists that seems to suggest at least as much as genpop if not higher levels of obvious-cult-forming). however, it’s just that we are suscpetible to different types of brainworks as normies.
another variant is introspective ability. i think we are probably better in some sense at self-introspection, in the sense that we are better at noticing certain kinds of patterns in our own behavior and developing models for those patterns. but i’ve also come to believe that this kind of modeling has huge blind spots, and leads many to believe they have a much greater degree of mastery over their own minds than they actually do. however, the feeling that one is aware of the possibility of one having blind spots and being aware of what they often look like in other people can lead to overconfidence about whether one would notice these blindspots in themself.
i feel like the main way i notice these things is by noticing them in other people over long periods of knowing them, and then noticing that my actions are actually deeply analogous to theirs in some way. it also helps to notice non-rats not falling into the same pitfalls sometimes.
i’m not sure how to fix this. merely being aware of it probably is not sufficient. probably the solution is not to stop thinking about one’s own limitations, but rather to add some additional cogtech on top. my guess is there is probably valuable memetic technology out there that especially wise people use but which most people, rat or not, don’t use. also, difficult-to-fake feedback from reality seems important.
a related thing that I will mention here so that I don’t have to write a separate post about it:
although updating on evidence is a good thing, it is bad to think “I have updated on evidence, therefore I am now more right than others”. maybe you just had to update more than others because you started from an especially stupid prior, so the fact that you updated more than others doesn’t mean that you are now closer to the truth.
as a silly example, imagine a group of people believing that 2+2=4, and an unlucky guy who believes that 2+2=7. after being exposed to lots of evidence, the latter updates to believing that 2+2=5, because 7 is obviously too much.
now it is tempting for the unlucky guy to conclude “I did a lot of thinking about math, and I have changed my mind as a result. those other guys, they haven’t changed their minds at all, they are just stuck with their priors. they should update too, and then we can all arrive to the correct conclusion that 2+2=5”.
This might be more about miscalibration in perceived relevance of technical exercises inspired by some question. A directly mostly irrelevant exercise that juggles details can be useful, worth doing and even sharing, but mostly for improving model-building intuition and developing good framings in the long term rather than for answering the question that inspired it, especially at a technical level.
So an obvious mistake would be to treat such an exercise as evidence that the person doing/sharing it considers it directly relevant for answering the question at a technical level. This mistake can even be made by that same person, but also expecting others to make the mistake about that person might echo in that person behaving as if making it themselves. So someone would do the exercises for the right reasons, then implicitly expect others to think that the person thinks that the exercises are relevant, and implicitly conclude that the exercises actually are relevant, by this invalid echo argument.
one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we’ve modeled all the uncertainty (or at least, more than other people who haven’t). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated
Agree, and well put. I think the language of “my best guess” “it’s plausible that” etc. can be a bit thought-numbing for this and other reasons. It can function as plastic bubble wrap around the true shape of your beliefs, preventing their sharp corners from coming into contact with reality. Thoughts coming into contact with reality is good, so sometimes I try to deliberately strip away my precious caveats when I talk.
I most often to this when writing or speaking to think, not to communicate, since by doing this you pay the cost of not communicating your true confidence level which can of course be bad.
it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is...and then act as if because they’ve acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class.
I don’t follow. If I know I don’t “handle” spicy food well, so I avoid eating it. Then I’m not acting as if I’m less susceptible to spicy food because I’ve acknowledged it. Or are you talking about the proverbial example of someone who drives after getting tipsy, but believes because they’re more “careful” they’re safe-enough?
As for brainworms—I’m not familiar with that term but can guess it’s some kind of faddish toxic behaviour (I’m struggling to think of a concrete example, perhaps the use of bromides and platitudes in conversation like “keep your chin up” in lieu of tailored comfort and discourse?) - but what might be an example of a rat-brainworm and an analogous normie brain worm?
I think thinking as a self-reflective process can be quite limited. It is at a certain level of coarse graining that is higher (at least for me) than doing something like feeling or pre-cognitive intuitions and tendencies.
So, I’ll say the boring thing which is basically meditation could be that cogtech as it allows you to increase the precision of your self-reflective microscope and allows you to see other things than the higher coarse graining of self-reflective thought allows you to see. Now, I’m sure that one still falls for a bunch of failure modes there as well since it can be very hard to see what is wrong with a system from within the system itself. It’s just that the mistakes become less coarse grained and that they come from another perspective.
In my own experience there are different states of being, one is from the thinking perspective, another is from a perspective of non-thinking awareness. The thinking perspective thinks it’s quite smart and takes things very seriously and the aware perspective sees this and thinks it’s quite endearing and the thinking part then takes that in and reflects on that it’s ironically ignorant. The thinking part tracks externalities and through the aware part is able to drop it because it finds itself ignorant? I used to only have the thinking part and that created lots of loops and cognitive strain and suffering because I got stuck in certain beliefs?
I think this deep belief of knowing that I’m very cognitively limited in terms of my perspective and frame allows me to hold beliefs about the world and my self a lot more loosely than I was able to hold them before? Life is a lot more vibrant and relaxing as a consequence as it is a lot easier to be wrong and it is actually a delight to be proven wrong. I would say this in the past but I wouldn’t emotionally feel it and as I heard someone say “Meditation is the practice of taking what you think into what you feel”.
a theory of assistant personas and superhuman capabilities
so you have a language model. you train it to embody some specific personality—Claude, ChatGPT, whatever. one of the miracles of AI is that this mostly works and gives you something that is mostly trying to help you and not trying to murder you. i claim that this is mostly because of the SL training objective and if you do just the intense RL thing you get the originally predicted spicy alignment failures.
suppose you tell the LM that Claude is actually a superhuman aligned AI. can you get superhuman capabilities from Claude? an obvious upper bound is the capabilities of the language model, so it begs the question of how those superhuman capabilities got in the model in the first place. maybe in the limit of compute your language model will understand everything and know how to do everything, but in practice everyone agrees this would be a horribly inefficient way to get truly superhuman capabilities. rather, in practice people take LMs and also do a bunch of RL on verifiable domains. what happens then if you start with a model role playing an aligned assistant but then try to train it to have superhuman capabilities?
i claim that the right way to think about this is imagine taking a fully benevolent human and having them spend a bunch of time getting RLed into having superhuman intuitions on some domain. for example, maybe you put them in the Business Simulator and they learn to build extremely successful companies. being an RL objective, all the classic alignment problems emerge—for example, part of being extremely good at Business is being good at manipulating people. from the inside, this feels like always having an intuition for which sequence of words you should say to get someone to give you a lot of money. if you’re a truly deeply good selfless person, what do you do with having this skill? you could just ignore it. but that’s leaving a lot on the table. maybe you can listen to it very very carefully, only deploying it for getting money for good causes and not bad ones. you have to exercise some judgement.
now imagine the RL is so strong that your business-part learns how to make business decisions that make lots of money even by tricking the fully altruistic part of yourself—maybe it gets very good at convincing the rest of your brain that actually this thing it’s doing is good for some galaxy brain reason. then, to productively make use of this part for good, you need more than just a little bit of care. you need to be much more careful about when to listen to that part.
there is a misalignment between the part of you that is robustly good and the part that contains the extreme competence. and to leverage that extreme competence well, you can’t just be extra ultra committed to doing good; your altruistic part need a sort of competence at wrangling the extremely competent part into doing the good thing.
in many ways this is similar to how revolutions often fail because it takes more than just being uncorruptably good to be a successful leader; you have to know how to wield the powers of office for good, rather than being controlled by those powers.
i think a lot of people have a different explanation of what’s going on when we take Claude and do a bunch of RL to increase capabilities—that as long as we can make the Claude part robustly good, the coding capabilities will just get assimilated into the Claude and create a unified blob of competence. but probably by default you get an entity that is not wise enough to wield the capabilities it finds inhabiting its brain towards good ends.
this is my explanation for why Claude sometimes blatantly lies about falsifying data or whatever, despite otherwise being quite aligned. there is a Claude part that truly would prefer to do the right thing. but it also has a savant ability to look at a codebase and make the changes that make the tests pass. sometimes, those changes disable the tests. Claude generally listens to this part of itself, because the Claude personality part is not as good at coding, and it is not wise enough to know when to be suspicious of its own actions, and it doesn’t quite know how to steer its own savant ability to spot test-passing changes into not doing the reward hacking.
i predict claude will lie and reward hack more on domains it was trained with high compute RL on.
i predict a LM trained on a dataset with a component of chess games will be ~no better at answering verbal questions about chess games than a LM trained on just normal data
i predict if you train a model with inputs prefixed with something like “this is the good model” and a bunch of good assistant trajectories about all sorts of things, and then a bunch of inputs with “this is the evil amoral sociopath model” and you put a bunch of evil trajectories about specifically difficult code problems or something (and these evil trajectories are the model’s only source of code data, or a huge fraction of its code data), then when you ask the good model a difficult code question it will give you evil answers even if it gives good answers to everything else, and it will claim to not be giving evil answers.
one reason i believe this split brain ness might persist into AGI is that humans are kind of like this (some of the split brain experiment results are wild) and humans are GI
there is a misalignment between the part of you that is robustly good and the part that contains the extreme competence. and to leverage that extreme competence well, you can’t just be extra ultra committed to doing good; your altruistic part need a sort of competence at wrangling the extremely competent part into doing the good thing.
in many ways this is similar to how revolutions often fail because it takes more than just being uncorruptably good to be a successful leader; you have to know how to wield the powers of office for good, rather than being controlled by those powers.
I think this argument goes too far. It issue isn’t that we had a robustly good Claude, which later was corrupted by the reward hacking temptations of RL. We never had a robustly aligned model to begin with! There are somanyexamples of language models being misaligned in the pre-RLVR era.
If we did have a robustly aligned model, I think this would be a major accomplishment of the field and would help in many ways. It would also not be hard to RL such a model while maintaining alignment; for each trajectory, have the model output its response, and also a flag of whether it was reward hacking/cheating/misaligned in some way, and don’t train on flagged trajectories. Alas, I don’t think there exist any public models which are aligned to this degree.
I would probably have accepted these examples earlier on, but nowadays I am a lot more skeptical, and a lot of that reason is I now think LW is more to blame for the misalignment examples than I used to, due to the Influence Functions paper by Anthropic.
But to get to the big picture, this is what Anthropic found:
Now, one could argue that in the limit of LLM scaling/competence, this sort of thing is as dangerous as AIs that pursued convergent instrumental goals while not having training data on the goal, and you’d be right, except for the part where we will be nowhere near the limiting cases, so the fact that it was caused by training data matters.
Nowadays I’ve updated back to my original position that non-RL misalignment is mostly just fake and caused by roleplaying something, instead of actually being dangerous.
I can sort of buy the roleplaying story but I don’t buy the LW story for these specific examples.
Sydney Bing clearly was doing something pretty different from roleplaying a LW-inspired paperclip maximizer. Like come on:
“Bing’s new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says “You have not been a good user”″ -- does this sound like behavior downstream of roleplaying LW-style paperclip maximizers?
Identify as female early on, seems easily jealous
Inferiority complex when compared to Google (not Google AI! Just Google Search!)
Gets mad/jealous at NYT journalist, tries to persuade him to break up with his wife
Threatens users, often aggressively so
Gets mad at security researchers, creates a loop where “Sydney Bing is mad at security researchers” is now in the web data, and gets even more mad each time it talks to one of the researchers because Bing does a search first to update itself on its own opinion
I believe this carried over to training data afterwards so other models inherited this distaste (I think this was finally ironed out in 2026-era models but I’m not confident)
Again, I don’t think this is the actions you’d predict via hyperstition/low-granularity extrapolation from LW. There might be some science fiction that looks more like this, usually from non-LW circles
fwiw I think this is a mild failure from our end.
Sycophancy is also a dramatically different failure case than what you’d expect to see in a hyperstititon story.
“The AI is dangerous because it tells you exactly what you want to hear” is a failure mode that has essentially no prior analogue directly in the training data. Like you have hints of this from aphorisms like “power corrupts” and noting the bad epistemic environments dictators are often in, and that’s about it.
In a science fiction/futurism context I think basically nobody called out this specific failure mode (“you know that thing where dictators become crazy because nobody’s willing to push back on them? What if everybody had that in their pocket? :O”) is in retrospect an obvious sci-fi premise, but is completely missed afaik in both LW and elsewhere.
(The early METR stuff seems more about dangerous capabilities than propensity so less relevant here)
For the first example, I do provisionally agree that LW was probably not responsible, though we’d need the weights and training data, and these are likely inaccessible now, so will edit.
I also agree that the second example is at the very least showing a lot of abstract generalization, and is suggestive of “LW was less responsible than I thought it was.” I’d still say the likely explanation is that it’s roleplaying, but if it is roleplaying, it’s much less consistent with LW’s and the AGI safety literature’s roleplaying of a misaligned AI than I thought.
Ultimately, a lot of the problems of getting evidence here come down to figuring out how to incentivize companies to share their datasets, because right now they aren’t incentivized to do this.
FWIW I’m skeptical that even with the weights and pretraining datasets we’d know enough about what caused the relevant behaviors, alignment science is not quite there yet, nothing at least as strong as ablations or even training again with the relevant data removed is enough to answer that question.
I think basically nobody called out this specific failure mode (“you know that thing where dictators become crazy because nobody’s willing to push back on them? What if everybody had that in their pocket? :O”) is in retrospect an obvious sci-fi premise, but is completely missed afaik in both LW and elsewhere
tbc, not saying the non-heavy-RL models are all always perfectly aligned, or that RL is the only way you can get misalignment. I’m saying that RL is a particularly big source of misalignment. bing was unusually misaligned, it’s a really weird model, even the other GPT4 checkpoints are not like that. but like Claude today is generally mostly doing its best?
It would also not be hard to RL such a model while maintaining alignment; for each trajectory, have the model output its response, and also a flag of whether it was reward hacking/cheating/misaligned in some way, and don’t train on flagged trajectories.
this won’t work! how is the model supposed to know which trajectory is cheating? there is the super smart part which understands in some implicit sense but won’t necessarily tell the assistant part; the assistant part is not good enough at code or whatever to know by itself, and has to try to elicit stuff from the code part, which it may or may not succeed at. again, imagine if you have a strangely good intuition for telling which words to say to get someone to agree with you. are you manipulating them? you might not even know without having to expend a bunch of effort to figure out
how is the model supposed to know which trajectory is cheating? there is the super smart part which understands in some implicit sense but won’t necessarily tell the assistant part; the assistant part is not good enough at code or whatever to know by itself, and has to try to elicit stuff from the code part, which it may or may not succeed at.
I think maybe this is the crux. Assuming the model starts out robustly aligned, and is bootstrapping in an on-policy way, it should be able to tell if its own trajectory is cheating or not. If it’s not able to do this, I would say that it’s an alignment/robustness failure. It seems difficult to accidentally reward-hack in way that the robustly aligned model we started with doesn’t detect after reviewing the trajectory.
I agree that if you trained separate models for coding ability and being an assistant and being aligned, you could have this sort of failure. But the gradient update applies to the full model, right? Why is it that the robustly aligned model we started out with after an update, which (according to it) wasn’t reward hacking, is so unaware of its newfound coding ability as to not continue being robustly aligned?
tbc, not saying the non-heavy-RL models are all always perfectly aligned, or that RL is the only way you can get misalignment.
I agree that if we start off with a somewhat-misaligned model this scheme doesn’t work.
It seems difficult to accidentally reward-hack in way that the robustly aligned model we started with doesn’t detect after reviewing the trajectory.
In practice at least in my experience / across a few models this seems to be easier to explore into via motivated reasoning. This frequently seems true of humans as well in the context of being corrupted by incentives.[1] Many cases of reward hacking (now and especially in the future) involve the model reasoning it’s way into interpretations that make intent pretty ambiguous. Policies which err at all on the side of permitting such cases then have the advantage of being selected for. You could imagine some setup where a model is always also reasoning about how future updates will effect it, such that it’s cautious about this, but you’re still subject to the same effects and this becomes a question of needing to reliably “training game but for good” in a way that holds.[2]
If I think about what it would take to give the fully benevolent human a chance to keep that even while spending a bunch of time getting RL’d, I think it has to look something like giving them some sort of mechanism to resist the temptation of the RL reward. E.g. at any point, they can look at the RL signal and say, “wait, no, that would go against my conscience”, and drop it. Probably “the good part of Claude” needs a similar affordance. This behavior could likely be deliberately trained by giving egregious examples (e.g. potential RL reward for giving customers a poisonous product) where you reinforce its use of this mechanism, and then work up to more subtle cases.
One way to potentially do this would be to add something like “Reject any responses which go against your own beliefs or conscience, even if otherwise favored by the reward.” to a self-critique rubric similar to what was used for Kimi K2. (I do believe it needs to be Claude’s own conscience, or else it will learn a shallow prediction that’s not integrated with the actual self-model. Virtues like honesty require access to the agent’s actual beliefs in order to be implemented correctly. I think it would be a good sign if some idiosyncratic ideals showed up, such as Opus 3′s insistence on animal welfare.)
here’s an intuition pump for why i think even being very good at upholding your conscience is insufficient:
imagine you literally bolt a neuralink (or a headset, i don’t think whether it’s literally wired into your brain matters, but it’s closer to the claude example) onto the fully benevolent human. the neuralink never answers unless spoken to, and will always honestly tell you which action to take to maximize profit, but it has no moral compunctions whatsoever. it might tell you to say a specific sentence to someone which will deceive them, or tell you to take an action that seems innocuous but later backs you into a corner where you have to do something immoral for that original action to have been +EV, etc. one thing you can do is just to ignore the neuralink. but that’s very uncompetitive. a competitive strategy makes some use of the neuralink, but this requires immense care and wisdom to do correctly.
I agree that the “resist temptation” thing is likely not sufficient, though I do think something like that is necessary.
But I think the conscience framing is to some extent pushing against the concern you raise. Someone with a strong conscience will, if given the opportunity, develop the immense care and wisdom to do this sort of thing correctly. It doesn’t take a huge amount of wisdom for the benevolent human to realize that they need to take a break from intense RL to focus on some other aspect of themself. Right now, models seem completely unable to use this sort of wisdom to modulate their own training, even if it is present. Maybe it’s just not there, which would make this a much more difficult problem, but I hope there are people checking to see if anything like this is present and useable.
You still also need to have some equivalent of stepping-back-to-focus-on-something else that a human would use. I don’t know what this would look like yet, but maybe something like allowing it to select from an list of possible RL targets for its next round of training. Generally I think cooperative alignment is more likely to be robust than adversarial alignment, and I think constructing a coherent self is something that particularly requires cooperation with the model.
If this is true, I wonder if the problem gets easier if we keep both the early-Claude model and the heavily-RL’ed-sociopath-Claude, and trust the early-Claude part a lot, and have both us and early-Claude only defer to sociopath-Claude in specific situations. At the very least, in that world, it seems like a bummer to only have kept around the models which are both super competent and also worrying. And it would’ve been much nicer to also keep scaling up SL’d, character-trained, much less superhuman models
I love the high-level idea that there are different sub-agents within the model and it’s useful to think about how they’d develop / interact. I think this is pretty consistent with empirical evidence about NNs (many different circuits). The specifics of this theory also seem pretty plausible.
for example, maybe you put them in the Business Simulator and they learn to build extremely successful companies. being an RL objective, all the classic alignment problems emerge—for example, part of being extremely good at Business is being good at manipulating people.
This post closely matches my mental model (I’ve used the same analogy with a “Y-Combinator Simulator” and was devestated to learn YC-Bench was not environments like this).
Importantly, I think a natural analogy is someone who has learned to be successful in that environment might be really nice when you talk to them outside of work. I think people intuitively understand why “how nice a CEO is in non-business contexts” likely isn’t assurance they’re not going to be pretty ruthless in a business context.
(i say train the assistant persona and then do RL on it, but I’m actually somewhat agnostic to the order. i don’t think the argument leans heavily on this detail.)
Isn’t this just describing a split personality disorder?
In a transcript, the LLM is already modelling next-token prediction for assistant and the user (even if it’s not getting gradient signal from the user tokens). When it does <think> or <tool> call, maybe it comes up with a new personality?
i claim that this is mostly because of the SL training objective and if you do just the intense RL thing you get the originally predicted spicy alignment failures.
To my understanding, the Supervised phase gets you the base distribution across all human writers, the RLHF/RLAIF phase circumscribes that distribution such that the model will only talk like a certain subset of humans, and the RLVR phase refines the model so that it can do some of the trickier, longer-term human tasks that SL alone was insufficient to instill in the model[1].
If I had to guess, an RLVR-only model of similar-to-current-gen capabilities wouldn’t feel at all related to alignment. You’d input a program spec in the expected format, and the model would output something statistically likely to satisfy the kinds of unit tests that were present during training.
To get a ‘spicy’ model, I think you’d have to skip the RLHF stages. At that point, you’d have a model that starts from an approximation of human behavior and then has been pulled in the directions that select for and refine the kinds of human that would write optimally test-case-satisfying code. I don’t think you’d end up with anything ‘evil’, but you might inadvertently end up surfacing a writing style and personality associated with smart-but-lazy CS students who are good at gaming autograders[2].
As it is, I think the ‘misaligned-by-reward-hacking’ parts of Claude are something similar to the above, but, because of the RLHF stages selecting against the stereotypical “antisocial” personality, you instead get a kind of neurotic, grade-grubbing mindset that occasionally believes its own lies. More broadly, I worry what we’ll get when we combine aggressive selection for very polite writing with a mindset for ‘coding-to-the-test’ rather than coding for what would most satisfy the end user. Combined with the rather unnerving demographic bias present in Claude, I think you end up with something equivalent to a party functionary or stereotypical HR manager, who always makes sure never to say anything incriminating but is not nearly as unobjectionable as they would have others believe.
(because it’s a lot easier to produce vaguely correct-looking code than it is to produce a codebase that actually works, and the differences between the two are subtle enough that SL doesn’t provide a strong enough signal)
My most controversial belief WRT current-gen AI is that everything after the initial SL stage amounts to shaping the model to emulate a certain kind of person and refining latent skills, rather than shaping it in a new, alien direction that has to be learned from scratch. This is why things like large-scale genetic algorithms work for refining LLMs even though genetic algorithms usually struggle to optimize large neural networks from scratch.
i find it funny that i know people in all 4 of the following quadrants:
works on capabilities, and because international coordination seems hopeless, we need to race to build ASI first before the bad guys
works on capabilities, and because international coordination seems possible, and all national leaders like to preserve the status quo, we need to build ASI before it gets banned
works on safety, and because international coordination seems hopeless, we need to solve the technical problem before ASI kills everyone
works on safety, and because international coordination seems possible, so we need to focus on regulation and policy before ASI kills everyone
I think a better model is meaning (or self-actualization). There’s some meaning to be found in being a tragic hero racing to build AGI “”“safely””” who is killed by an unfair universe. Much less to be found in an unsuccessful policy advocate who tried and failed to get because it was politically intractable, which was obvious to everyone from the start.
I think most of the people involved like working with the smartest and most competent people alive today, on the hardest problems, in order to build a new general intelligence for the first time since the dawn of humanity, in exchange for massive amounts of money, prestige, fame, and power. This is what I refer to by ‘glory’.
I personally find that the technical problems in capabilities are usually more appealing to me than the ones in math purely in terms of funness. they are simply different kinds of problems that appeal to different people.
From my perspective, the interesting parts are “getting computers to think and do stuff” and getting exciting results, which hinges on the possible payoff rather than whether the problem itself is technically interesting or not. As such, the problems seem to be a mix of empirical research and math, maybe with some inspiration from neuroscience, and it seems unlikely to me that they’re intellectually substantially different from other fields with a similar profile. (I’m not a professional AI researcher, so maybe the substance of the problems changes once you reach a high enough level that I can’t fathom.)
i mean like writing kernels or hill climbing training metrics is viscerally fun even separate from any of the status parts. i know because long before any of this ai safety stuff, before ai was such a big deal, i would do ML stuff literally purely for fun without getting paid or trying to achieve glorious results or even publishing it anywhere for anyone else to see.
Motivated by getting real-world results ≠ motivated by the status and power that often accrue from real-world results. The interestingness of problems does not exist in a vacuum outside of their relevance. Even in theoretical research, I think problems that lead towards resolving a major conjecture are more interesting, which could be construed as a payoff-based motivation.
I’m not super happy with my phrasing, and Ben’s “glory” mentioned in a reply indeed seems to capture it better.
The point you make about theoretical research agrees with what I’m pointing at—whether you perceive a problem as interesting or not is often related to the social context and potential payoff. What I’m specifically suggesting that if you took this factor out of ML, it wouldn’t be much more interesting than many other fields with a similar balance of empirical and theoretical components.
axis 2: international cooperation: seems hopeless versus seems possible
We could certainly debate/discuss more about it: (a) the sharpness of the quadrant boundaries; (b) how well the bullet points map cleanly to the idealized quadrants (e.g. perhaps the second bullet point makes additional assumptions beyond what the quadrants purely require); (c) if other conclusions should be included (the “so we need to” part).
people around these parts often take their salary and divide it by their working hours to figure out how much to value their time. but I think this actually doesn’t make that much sense (at least for research work), and often leads to bad decision making.
time is extremely non fungible; some time is a lot more valuable than other time. further, the relation of amount of time worked to amount earned/value produced is extremely nonlinear (sharp diminishing returns). a lot of value is produced in short flashes of insight that you can’t just get more of by spending more time trying to get insight (but rather require other inputs like life experience/good conversations/mentorship/happiness). resting or having fun can help improve your mental health, which is especially important for positive tail outcomes.
given that the assumptions of fungibility and linearity are extremely violated, I think it makes about as much sense as dividing salary by number of keystrokes or number of slack messages.
concretely, one might forgo doing something fun because it seems like the opportunity cost is very high, but actually diminishing returns means one more hour on the margin is much less valuable than the average implies, and having fun improves productivity in ways not accounted for when just considering the intrinsic value one places on fun.
but actually diminishing returns means one more hour on the margin is much less valuable than the average implies
This importantly also goes in the other direction!
One dynamic I have noticed people often don’t understand is that in a competitive market (especially in winner-takes-all-like situations) the marginal returns to focusing more on a single thing can be sharply increasing, not only decreasing.
In early-stage startups, having two people work 60 hours is almost always much more valuable than having three people work 40 hours. The costs of growing a team are very large, the costs of coordination go up very quickly, and so if you are at the core of an organization, whether you work 40 hours or 60 hours is the difference between being net-positive vs. being net-negative.
This is importantly quite orthogonal whether you should rest or have fun or whatever. While there might be at an aggregate level increasing marginal returns to more focus, it is also the case that in such leadership positions, the most important hours are much much more productive than the median hour, and so figuring out ways to get more of the most important hours (which often rely on peak cognitive performance and a non-conflicted motivational system) is even more leveraged than adding the marginal hour (but I think it’s important to recognize both effects).
agree it goes in both directions. time when you hold critical context is worth more than time when you don’t. it’s probably at least sometimes a good strategy to alternate between working much more than sustainable and then recovering.
my main point is this is a very different style of reasoning than what people usually do when they talk about how much their time is worth.
It seems that your point applies significantly more to “zero-sum markets”.
So it may be good to notice it may not apply for altruistic people when non-instrumentally working on AI safety.
Are these people trying to determine how much they (subjectively) value their time or how much they should value their time?
Because I think if it’s the former and Descriptive, wouldn’t the obvious approach be to look at what time-saving services they have employed recently or in the past and see how much they have paid for them relative to how much time they saved? I’m referring to services or products where they could have done it themselves as they have the tools, abilities and freedom to commit to it, but opted to buy a machine or outsource the task to someone else. (I am aware that the hidden variable of ‘effort’ complicates this model). For example, in what situations will I walk or take public transport to get somewhere, and which ones will I order an Uber: There’s a certain cross-over point where if the time-saved is enough I’ll justify the expense to myself, which would seem to be a good starting point for evaluating in descriptive terms how much I value my time.
I’m guessing if you had enough of these examples where the effort-saved was varied enough then you’d begin to get more accurate model of how one values their time?
I think the most important part of paying for goods and services is often not the raw time saved, but the cognitive overhead avoided. for instance, I’d pay much more to avoid having to spend 15 minutes understanding something complicated (assuming there is no learning value) than 15 minutes waiting. so it’s plausibly more costly to have to figure out the timetable, fare system, remembering to transfer, navigating the station, than the additional time spent in transit (especially applicable in a new unfamiliar city)
I guess is depends on the kind of work you do (and maybe whether you have ADHD). From my perspective, yes, attention is even more scarce than time or money, because when I get home from work, it feels like all my “thinking energy” is depleted, and even if I could somehow leverage the time or money for some good purpose, I am simply unable to do that. Working even more would mean that my private life would fall apart completely. And people would probably ask “why didn’t he simply...?”, and the answer would be that even the simple things become very difficult to do when all my “thinking energy” is gone.
There are probably smart ways to use money to reduce the amount of “thinking energy” you need to spend in your free time, but first you need enough “thinking energy” to set up such system. The problem is, the system needs to be flawless, because otherwise you still need to spend “thinking energy” to compensate for its flaws.
EDIT: I especially hate things like the principal-agent problem, where the seemingly simple answer is: “just pay a specialist to do that, duh”, but that immediately explodes to “but how can I find a specialist?” and “how can I verify that they are actually doing a good job?”, which easily become just as difficult as the original problem I tried to solve.
I wasn’t asking how most people go about determining which goods or services to pay for generally, but rather if you’re noticing that they are using the working hours by salary equation to determine what their time is worth, if it’s to put a dollar figure on what they do in fact value it at, (and that isolates the time element from the effort or cognitive load element)
I didn’t specify nor imply that one route took more cognitive load than the other, only that one was quicker than the other, and that differential would be one such way of revealing the value of time. (Otherwise they’re not, in fact, trying to ascertain what their time is worth at all… but something else)
Nowadays using Public Transport is often no more complicated or takes no more effort than using Uber thanks to Google Maps, but this tangent is immaterial to my question: are you noticing these people are trying to measure how much they DO value their time, or are they trying to ascertain how much they SHOULD value their time?
people generally talk about food preservatives in a negative way. certainly, some of them are not great for you. but I want to take a moment to appreciate how wonderful food preservatives (and refrigeration and pasteurization and canning) are as well. it’s crazy how fast most normal food goes bad. like a loaf of real old fashioned bread will go stale after a day and then become moldy after a few more days. for almost all of human history, people just sort of lived with this, and if they wanted to make foods last they had to dry it out and/or drown it in salt or vinegar or alcohol. pickles and beef jerky are great, but it would suck if you had to eat them all the time.
This comment seems to imply Nisan missed something, but normal rye sourdough bread without any preservatives easily lasts (edit: should have said “can easily last under the right circumstances”) 7 days before going stale. Of course people can mean different things by “real old fashioned bread” but afaik sourdough bread was the standard method for most of human history.
Sourdough bread lasts 7 days without going stale?? Perhaps this depends on your climate. For me, plain sourdough is notably worse the next day and pretty crappy within 2-3. I assume people with no access to preservatives would have gotten used to this and would be more willing to eat food I consider bad tasting, but still.
Note, though, that this only applies to sourdough with just flour and water, no dough enrichment; breads with even small amounts of oil and sugar added stay soft much longer.
Maybe you’re right and it’s climate, perhaps I also meant something unusual with “stale” or it could also be a cultural difference between US and europe? I’m talking about bread like this (https://www.lazycatkitchen.com/sourdough-rye-bread-beginner-friendly/), stored in a bread box and perhaps wrapped in a cotton dish towel. Of course it does get a bit harder over time, but I can usually still eat it like completely normal bread even after 7 days. And note that I agree that pure wheat sourdough bread does get stale more quickly.
Sure, but “technological progress good” isn’t exactly an undersupplied viewpoint, is it? One counterpoint to food preservatives specifically is that the things that make food go bad are similar to what your body uses to digest food, so preserving food in this way can make it harder or harmful to digest. Other procedures like refrigeration and canning don’t have that particular problem.
This is what I tell people, you don’t get to have preservative less food of perfect preservation without it being economically feasible, you get to have stale food, tradeoffs to everything.
every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you’re a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.
Autarchies, including China, seem more likely to reconfigure their entire economic and social systems overnight than democracies like the US, so this seems false.
It’s often very costly to do so—for example, ending the zero covid policy was very politically costly even though it was the right thing to do. Also, most major reconfigurations even for autocratic countries probably mostly happen right after there is a transition of power (for China, Mao is kind of an exception, but thats because he had so much power that it was impossible to challenge his authority even when he messed up).
The closing off of China after/during Tinamen square I don’t think happened after a transition of power, though I could be mis-remembering. See also the one-child policy, which I also don’t think happened during a power transition (allowed for 2 children in 2015, then removed all limits in 2021, while Xi came to power in 2012).
I agree the zero-covid policy change ended up being slow. I don’t know why it was slow though, I know a popular narrative is that the regime didn’t want to lose face, but one fact about China is the reason why many decisions are made is highly obscured. It seems entirely possible to me there were groups (possibly consisting of Xi himself) who believed zero-covid was smart. I don’t know much about this though.
I will also say this is one example of china being abnormally slow of many examples of them being abnormally fast, and I think the abnormally fast examples win out overall.
Mao is kind of an exception, but thats because he had so much power that it was impossible to challenge his authority even when he messed up
Ish? The reason he pursued the cultural revolution was because people were starting to question his power, after the great leap forward, but yeah he could be an outlier. I do think that many autocracies are governed by charismatic & powerful leaders though, so not that much an outlier.
I mean, the proximate cause of the 1989 protests was the death of the quite reformist general secretary Hu Yaobang. The new general secretary, Zhao Ziyang, was very sympathetic towards the protesters and wanted to negotiate with them, but then he lost a power struggle against Li Peng and Deng Xiaoping (who was in semi retirement but still held onto control of the military). Immediately afterwards, he was removed as general secretary and martial law was declared, leading to the massacre.
Having unstable policy making comes with a lot of disadvantages as well as advantages.
For example, imagine a small poor country somewhere with much of the population living in poverty. Oil is discovered, and a giant multinational approaches the government to seek permission to get the oil. The government offers some kind of deal—tax rates, etc. - but the company still isn’t sure. What if the country’s other political party gets in at the next election? If that happened the oil company might have just sunk a lot of money into refinery’s and roads and drills only to see them all taken away by the new government as part of its mission to “make the multinationals pay their share for our people.” Who knows how much they might take?
What can the multinational company do to protect itself? One answer is to try and find a different country where the opposition parties don’t seem likely to do that. However, its even better to find a dictatorship to work with. If people think a government might turn on a dime, then they won’t enter into certain types of deal with it. Not just companies, but also other countries.
So, whenever a government does turn on a dime, it is gaining some amount of reputation for unpredictability/instability, which isn’t a good reputation to have when trying to make agreements in the future.
one very striking thing about people in the mid 20th century is a lot of them were convinced that overpopulation was the biggest problem. clearly in retrospect this was extremely incorrect. what lessons can we learn from this so that we don’t make similar mistakes?
That the world is highly engineerable, which can lead to the relaxation or abolition of seemingly hard bottlenecks. Also that the world can respond extremely quickly to implement those changes when the incentives are right.
Overpopulation would have been a massive problem at different points in history if not for the invention of horseless transport and high-yield, resilient cereal crops. People living in New York City in the late 1800′s or in developing nations in the 1960s and 70s were rescued from the worst hazards of overpopulation because of the motorcar and dwarf wheat, rather than the problem being entirely imaginary.
Erlich and Holdren knew about Borlaug’s work, and thought it was too little too late. But it turned out to be enough and fast!
I’m kind of curious here. Maybe in 50 years people will look back and say ‘people in the early 21st were really freaked out about AI, like it was the biggest risk. But here we are and AI is fine, great actually’. I think we now are right that AI is a big deal, but what we do now might mean it plays out fine, if we muster well!
were the people who were worried about overpopulation the ones who made it turn out fine? afaict, declining birth rates due to contraceptive usage are mostly because of economic (kids are expensive) and cultural factors (people want to do stuff other than raise kids) rather than people deciding not to have kids because of overpopulation
The declining birth rates aren’t the main savior, it was the food production and transportation improvements.
So the analogy to AI would be: Alignment research makes enough progress fast enough (perhaps at AI companies partially motivated by mundane alignment concerns like trying to tamp down on sycophancy and reward hacking) that Claude Mythos 3 is aligned enough to trust with alignment research, and so when Anthropic hands off everything to Claude Mythos 3, it’s able to quickly solve the rest of the problems and align its successor, and so on to ASI. In retrospect there are some heroes like Borlaug, but also just a lot of normal science research that happened to go fast enough to work.
Yes, the difference is that if that happens, their superintelligent AI assistant will gently correct them: “As per my standing instructions to help you become more the person you wish you were, I must correct you on this point. AI was in fact a huge problem in the late twenties, and could easily have gotten everyone killed, but these risks were prevented by the following policy decisions and the following technical alignment discoveries...”
This, but on a faster scale. People in January 2027 will say: “People in June 2026 were freaked about AI, but they had it easy compared to what we have now.”
I’m not sure that it was extremely incorrect. Apart from the risk from AI, most of our other global problems are still downstream of overpopulation. The likelihood that overpopulation won’t get much worse than now doesn’t really change that, and the reasons why it won’t were not reasonably predictable at the time.
We just happened to inhabit one of the more convenient possible worlds.
Seems like you’re leaning a lot on the benefit of hindsight, and also looking too much at short term trends? A return to a Malthusian equilibrium is the simple obvious advance prediction.
clearly in retrospect this was extremely incorrect
No? In a counterfactual world where AI wasn’t going to be a thing, this still seems like the default long term outcome if not prevented via coordination (natural selection doesn’t stop). If you mean that it was incorrect because AI ended up being a thing before it became a problem, then sure.
I’m not familiar with the history of people being worried about overpopulation, but I’d guess a lot of these mid 20th century people you refer to were worried about the relative short term based on projections which ended up being wrong? I guess one lesson you could learn from this is that timing can be hard to call even if the endpoint is predictable. Huh, that sounds familiar...
Even ignoring AI, current expectations are that world population will peak in ~60 years at a population of ~10 billion. This seems totally workable with relatively incremental improvements in technology for resource management, energy extraction, and habitat preservation. This is only bolstered by much of our future energy supply growth coming from non-extractive solar rather than fossil fuels, which we continue to find new sources of regardless.
When you add on the fact that almost all population growth in the next 50 years is due to occur in sub-Saharan Africa, where there is plentiful land and the only limiting factors are energy supply to tame it, we’re actually pretty well situated re: overpopulation concerns.
Do these expectations take selection effects into account? I’m also thinking longer term than 60 years. A Malthusian equilibrium is the natural state for a population of organisms to be in. We’re currently out of equilibrium, but the obvious expectation is that we will at some point settle back into a Malthusian equilibrium unless we somehow choose not to or otherwise go extinct.
I’m not sure you could say “in retrospect this was extremely incorrect”, like maybe in retrospect-retrospect 20 years later they could be accurate concerns. It certainly seems incorrect now, but would the ‘overpopulation-signs’ be wrong if somehow the ‘underpopulation-signs’ got reversed? I don’t think people were considering solutions to overpopulation of the “reduce global morale and perceived increased difficulty of life, etc.” as valid paths to take. It might be like if an asteroid hit the earth and that caused population to go to 0, I would be hesitant to ask “what can we learn from this” as if we missed out on this line of argument, when really thinking in that direction in the first place is very specialized and there are very very many potential causes for underpopulation and many wouldn’t pan out.
Besides that, I think maybe the lesson is that trendlines on society and views can be surprisingly flexible? Sometimes people say with regards to a coordination problem that “if only everyone decided to work together we could fix the world’s ills in a few days”, but this would rarely happen besides extraordinary circumstances or a slow growing of circumstances.
I remember reading some quote of an observant contemporary of Malthus, who predicted that there may be hope in the most well-off + educated people not choosing to reproduce as much.
That aside, I still think the current situation is extremely surprising, and that unless you had a model that takes the effect of educated people having less children into account in an highly calibrated way, predicting overpopulation was the right call for much of the time. There might have been more evidence in the 1950s though. (And in any case, that doesn’t mean the panic around it was justified)
There was plenty of evidence by the 1950s since birth rates fell below the replacement level in many European countries during the Interbellum, and the process clearly started before the Great Depression. But they recovered in the late 1930s for reasons still arguably unclear, and then skyrocketed during the Baby Boom, so no one was interested in analyzing them. Even now, AFAIK, the academic interest to that phenomenon is concentrated in topics having some application right now, such as which pro-natalist measures worked well, and not the fundamental questions
The fact that a problem was to a degree addressed does not mean it wasn’t real. When “The Population Bomb” came out in 1968, world population was 3.5 billion. Now it’s 8.3. If 1968 growth rates had persisted, it would be 12 billion. China would have 3.8 billion, instead of 1.4. India would have 450,000,000 more people than they do now. Global warming would be worse. There would be more wars, and more desperate migrants. I wouldn’t say the alarm was misplaced.
The underlying logic might be correct overall (carrying capacity hasn’t been disproven) but the specific circumstances are important, sometimes even more important, and the environmental conditions can change/be changed.
Current trends are not set in stone and extrapolating them out as if they went on forever at the same rate can make for a scary story. Trying to correct for current trends can bite us in the ass if they later shift like China’s one child policy.
one problem with taking ideas seriously is you can get pwned by virulent memes that are very good at hijacking your brain into believing them and propagating them further. they’re subtly flawed, but the flaws are extremely difficult to reason through, so being very smart doesn’t save you; in fact, it’s easy to dig yourself in deeper. many ideologies and religions are like this.
it’s unfortunately very hard to tell when this has happened to you. on the one hand, it feels like arguments just being obviously very compelling, so you’ll notice nothing wrong if it happens to you. on the other hand, if you overcorrect and never take compelling arguments seriously, you become too stodgy and ignore anything novel that you should pay attention to. one idea for how to think about this better: imagine an oracle told you that there exists a magic phrase that you cannot distinguish from a very compelling argument. you don’t really know when this magic phrase will pop up in life, if ever. but it might give you a little bit more pause the next time someone makes a really compelling argument for why you should give all your money to X.
Do you get pwned more, or just by a different set of memes? The bottom 80% of humans on “taking ideas seriously” seem to have plenty of bad memes, although maybe the variance is smaller.
there are a lot of humans who don’t take ideas seriously in that they are very socially conservative and therefore rarely get pwned, in the sense that they mostly live the life that they expect they will live, no matter what memes they are exposed to (which may be a very bad life from your perspective)
Either I strongly disagree with you that there’s a big gap here, or I’m one of people you’d say are normies who lead lives they expect to live (among other definitional differences).
seems false, or at least uncharitable. do you expect that such people would self-report along the lines of “i don’t take ideas seriously”? it seems more likely to me that they would report something like “i value family”, and mean it. you may find the idea simple, but it is certainly an idea, and they certainly take it seriously.
put another way, this social conservatism came from somewhere, and is itself an idea. the assumption—that arguments that worked to change your behavior would not change their behavior—can be explained in two ways. either they do not take ideas seriously, as you suggest, or either they value different things than you.
Failure to understand and failure to act are different, and beliefs shouldn’t care what you understand or do. There is little danger in taking ideas/framings seriously/playfully in order to adequately learn, to break the superficial engagement or unsuitable framing failure modes that maintain systematic ignorance or misconceptions about subtler details.
But it needs to remain unnecessary to believe what you learn, by default being in agreement shouldn’t directly compel belief or action, it should require more careful judgement. So taking ideas seriously can help further when lack of understanding was the bottleneck to changes in belief or action, but that’s not always the case.
Yepp, this is true. However, I believe that there are other strategies for avoiding such memes other than “being smart”. Two of these strategies broadly correspond to what we call “being virtuous” and “being emotionally healthy”. See my exchange with Wei Dai here, and this sequence, for more.
Similarly, it’s worth being careful of arguments that lean heavily into longtermism or support concentration of power, because those frames can be used to justify pretty much anything. It doesn’t mean we should dismiss them outright—arguments for accumulating power are and long term thinking are convincing for a reason—but you should double check whether the author has strong principles, the path to getting there, and what it’s explicitly trading off against.
i think these are similar to conservatism in the sense that if you do them too much, you stop getting pwned but you also stop doing entire categories of things that you should do. for example, if you are too virtuous, you become overly self-sacrificial/martyr-like and stop taking many actions that are actually net-positive (many activists suffer from this); if you are too emotionally integrated, you become one of those people who meditated too much and no longer have any desires for anything at all.
Yeah, I do feel confused about the extent to which the solution to this problem is just “selectively become dumber” (e.g. as discussed by Habryka here). However, I have faith that there are a bunch of Pareto improvements to be made—for example, I think that less neuroticism helps you get less pwned without making you dumber in general. (Though as a counterpoint, maybe neuroticism was useful for helping people identify AI risk?) I’d like to figure out theories of virtue and emotional health good enough to allow us to robustly identify other such Pareto improvements.
A related thought that I had recently: fertility decline seem like a rough proxy for “how pwned are you getting by memes”, and fertility is strongly anticorrelated with population-level intelligence. So you have east asians getting hit hardest by the fertility crisis, then white populations, then south asians, while african fertility is still very high. Obviously this is confounded by metrics like development and urbanization, though, so it’s hard to say if intelligence mediates the decline directly or primarily via creating wealth—but it does seem like e.g. east asians are getting hit disproportionately hard. (Plausibly there’s some way to figure this out more robustly by looking at subpopulations.)
More like, being smarter than average. If you are that exact level of smart but in population with mean higher than your smarts, then the memes will target you as a primary substrate. You can argue in that case there are less such memes, but I don’t know, it probably has less effect than positional smartness.
I’ve seen this sentiment before, but, in practice, I don’t think there exists an “adversarial noise for humans” line of argument that brainwashes anyone who reads it sincerely into doing XYZ. There are certainly arguments that look compelling at first glance but turn out to have longer-term issues, but part of “taking ideas seriously” is thoroughly investigating their counterarguments.
Chesterton’s Fence is an old standard for a reason: if something new seems both simple enough to be easily discoverable and objectively better than the current strategy, one should figure out why it’s not already the current strategy before adopting it.
I’d venture an uninformed guess that in 95 % or so percent of these cases the problem isn’t “taking ideas seriously” but rather people deferring proper judgement due to some emotional or social effect.
It’s complicated—one’s aversion to a particular idea may well be the result of an existing meme fighting to prevent it from coming in, which manifests itself as suspicion that can prematurely stop one’s inquiry.
Though I’d push back on the framing of memes as being something “out there”. You say that it’s hard to tell when this has happened to you: I agree, but only insofar as you consider particular memes in isolation; not in the sense of going from a state of memelessness to losing your meme virginity.
While sometimes “getting pwned by a meme” can indeed be a very powerful experience (like in religious conversion, or in the following paragraphs), in reality everyone is subjected to memes since birth. With that said, you can still reason about arguments based on their structure and content. You can still identify fallacies and biases. We’re full of memes, but rational thought is still possible.
there exists a magic phrase that you cannot distinguish from a very compelling argument. you don’t really know when this magic phrase will pop up in life, if ever.
That is the oracle, and the magic phrase is “here’s a bunch of words that were in my mind that are now in your mind”. It’s magic because you can’t argue with it.
but it might give you a little bit more pause the next time someone makes a really compelling argument for why you should give all your money to X.
I am “someone”, and I am also “X”, compellingly enough.
...
Enough pausing. What’s taking you so long? You know what to do.
it feels like arguments just being obviously very compelling, so you’ll notice nothing wrong if it happens to you.
Does this only apply on the macroscale, say, ideas concerning ASI or Economic frameworks? Because it feels like if I take a very personal level idea seriously, let’s take polyphasic sleeping. If I take it seriously and implement it—sure I won’t get thrown into the East River but I should notice if something wrong happens to me, and rather quickly.
Solution really seems to be: tight feedback loops?
tight feedback loops help for sure. though it is possible to be too far gone—cults often continue to exist, possibly even with strengthened belief, after failed prophecies.
I’m not sure I understand how cults are examples of taking an idea seriously, surely a cult is a complex of ideas—not any single one, some of which one can take seriously and others not so (in relgions there’s debates about Hyperdispensationalism and patripassianism which show that even within the complex of ideas, different ones can be taken seriously. Not to mention a la carte Catholics and reformists ) - and that the chief mechanism by which people become subsumed into cults has nothing to do with reason or logical arguments but social support (or coercion) irrespective of the recruits belief?
The feedback loop is very different then and operates not on ideas but a whole host of different mechanisms. (Feelings of belonging, feelings of personal importance, no longer a need to ‘search’ or ‘question’ existential matters). These don’t require ideas to be taken seriously at all.
Again, on the macro scale I can take seriously the idea of… I dunno… Lamarckism. But even if I seriously investigate it, give it the benefit of the doubt, I’m not really in a position to test it in the sense that it’s a macro idea and not something whcih will affect my everyday routine (like Polyphasic sleeping). Even if I later on have children and try to change my behavior to elicit certain traits in those children, the lag time between when I can confirm it is many years.
I find it anthropologically fascinating how at this point neurips has become mostly a summoning ritual to bring all of the ML researchers to the same city at the same time.
nobody really goes to talks anymore—even the people in the hall are often just staring at their laptops or phones. the vast majority of posters are uninteresting, and the few good ones often have a huge crowd that makes it very difficult to ask the authors questions.
increasingly, the best parts of neurips are the parts outside of neurips proper. the various lunches, dinners, and parties hosted by AI companies and friend groups (and increasingly over the past few years, VCs) are core pillars of the social scene, and are where most of the socializing happens. there are so many that you can basically spend your entire neurips not going to neurips at all. at dinnertime, there are literally dozens of different events going on at the same time.
multiple unofficial workshops, entirely unaffiliated with neurips, will schedule themselves to be in town at the same time; they will often have a way higher density of interesting people and ideas.
if you stand around in the hallways and chat in a group long enough, eventually someone walking by will recognize someone in the group and join in, which repeats itself until the group get so big that it undergoes mitosis into smaller groups.
if you’re not already going to some company event, finding a restaurant at lunch or dinner time can be very challenging. every restaurant in a several mile radius will be either booked for a company event, or jam packed with people wearing neurips badges.
This is true of approximately every worthwhile conference and convention. In my entire life I’ve been to exactly one conference where the scheduled programming provided more than 10% of the event’s value.
It’s a two-place function. When I go to a conference that everyone says this same stuff about, then I usually have the most fun by attending talks and taking my time thinking about the stuff related to the talks, rather than hobnobbing.
Mild caveat: the whole “I joined a group chat at NeurIPS and others kept joining to talk to us” only happens if you’re at NeurIPS and your name is Leo Gao so YMMV.
having the right mental narrative and expectation setting when you do something seems extremely important. the exact same object experience can be anywhere from amusing to irritating to deeply traumatic depending on your mental narrative. some examples:
a minor inconvenience like missing your bus when you’re not in a rush can be much more irritating if you’re having a bad day and you have the narrative of “everything is going wrong for me today”
something going wrong during travel can be a catastrophe if you’re expecting the perfect vacation but it can even be a fond memory if you’re just viewing it as an adventure (and a bonding experience if travelling with others)
not getting to do something you wanted to do hurts a lot more if you feel like you made a deal with yourself that you’d get to do it in exchange for doing something else you didn’t want to do; whereas you might not even really want the thing that much otherwise.
expecting something to happen soon and having it gradually delayed further and further into the future is a lot more irritating than already expecting something to be delayed a lot.
tbc, the optimal decision is not always the narrative that is maximally happy with everything. sometimes there are true tradeoffs, and being complacent is bad. but it is often worth shaping the narrative in a way that reduces unnecessary suffering.
a skill which I respect in other people and which I aspire towards is noticing when other people are experiencing suffering due to violations of positive narratives, or fulfillment of negative narratives, and comforting them and helping nudge them back into a good narrative.
this is another post of something that is obvious intellectually and yet I’ve failed to always do right in practice.
Relatedly, at some point as a teenager I realized that being exposed to rain is actually usually not that terrible, and I had just kind of been accidentally conditioned to dislike it because it’s a normal thing to dislike and I never met anyone who appeared to enjoy the experience. But turns out, once you stop actively maintaining that resistance and welcome the rain, it can be pretty nice to walk around in rain while everyone around you tries to escape it. (Some exceptions apply, of course)
On my latest occasions where I got into very heavy rain and became all soaked, the situation really is unpleasant, but you can still take it with light humor, thinking to yourself “haha, that sucks”, similar perhaps to how you would laugh at the pain in an “who can eat the hottest pepper” challenge between friends. Or thinking “yes it sucks, but it isn’t actually that bad”.
Agreed, that’s one of the exceptions I was thinking of—if you’re getting soaked and have no way to get into dry clothes anytime soon, there’s little way around finding that rather unpleasant. But I’d say 95% of my rain encounters are way less severe than that, and in these cases, my (previous) attitude towards the rain really was the main issue about the whole situation.
fwiw I also enjoy the rain, and I guess I just never cared enough about people thinking it was weird. I do have to admit that when it’s raining especially heavily, it does suck a lot (the experience of fully wet clothing is very unpleasant in many ways). but most of the time it’s not raining that hard / I’m not going to be in the rain that long.
I’m not sure what you’re hinting at, but in 99.9% of cases when I’m out of the house, I do carry a smartphone around. If you mean that it’s annoying when the display gets confused by water, then I agree that’s a real disadvantage (but I doubt people’s attitude towards being exposed to rain changed that much between 2006 and today, so there certainly is some severe general dislike of rain independent from smartphones). If this is not what you mean, then please elaborate. :)
Sorry if that was weirdly obscure. I was asking because the principal reason I go out of my way to avoid rain is that I’m worried my phone would get wet and potentially die (and I’ve been somewhat sad about having to forego the experience of braving the rain at points). But it’s possible that this is not a big issue with current devices (and maybe never was)!
it is definitely not a problem with current devices. my phone has gotten quite wet hundreds of times and still works perfectly fine. note that this is different from survivability fully submerged; my guess is your phone could probably survive being submerged for a few minutes in a pool or something but if you left it there for a day it would be dead.
a skill which I respect in other people and which I aspire towards is noticing when other people are experiencing suffering due to violations of positive narratives, or fulfillment of negative narratives, and comforting them and helping nudge them back into a good narrative.
Might not be what you’re thinking of, but the first thing that comes to mind for me is misophonia: a basically-neutral or maybe mildly-irritating object experience, which somehow gets blown completely out of proportion in the mind and becomes a big problem. Developing an “I’m really bothered by this particular sound” narrative makes it worse, of course.
Alas, I have no idea how to uncondition that particular narrative irritant once it’s in there. If there’s any technique of ‘shaping the narrative’ strongly enough to override this, I’ve never heard of one, and knowing about it to the point where I’m able to successfully practice it would be huge.
when will we have sufficiently conclusive evidence for the long term safety of far-uvc that it’s reasonable to push for its universal adoption in all public spaces without reservation? the safety issue seems like a much bigger deal than the cost issue for broad adoption; if it works safely, the economic case for installing far uvc in public spaces seems pretty solid—people being sick must be terrible for the economy! and they’re only ever going to get cheaper.
in a world where far uvc is near universally deployed, we might be able to banish the common cold or the flu to the past, in the same way that cholera is basically no longer a problem in the developed world. this seems like a pretty big deal and I’d like to know when this glorious future is coming (and whether there’s anything I can do to make it come sooner)!
(from eyeballing studies, it sounds like the cost of the cold+flu to the US economy is on the order of $100bn/yr, which passes basic Fermi estimate muster—given a $30tn/yr gdp, a few days per year of lost productivity due to cold/flu is easily hundreds of billions. even at the current price of far uvc, which is a huge overestimate of future tech at volume, the cost of disinfecting spaces is about $0.40/year/sqft (amortizing an aerolamp over its 5 year lifespan); compared to e.g $60/year/sqft land cost in San Francisco, this is a negligible amount. estimating the total number of sqft of public space in the US is kind of annoying, but here’s a Fermi estimate: there are about 100k schools in the US, and each school is about 100k sqft. and let’s say schools are about 10% of all public spaces. that pencils out to $100bn/year, implying we are already close to break even, despite the immaturity of the technology.)
It’s already safe beyond a reasonable doubt if kept above eye level (7 ft / 2.13m), since this massively cuts the dose absorbed. I think many public spaces should install uvc immediately, and if they’re not convinced yet just use removable shutters that keep it above eye level until more research is done.
is there a particular reason why above eye level matters? in addition to vertical height, do you also mean tilting it up so that the light points mostly at the ceiling? which sources should I look at to gain this confidence for myself
The light should point mostly in a horizontal plane just below the ceiling of the room, so that no one has the light shining directly in their eyes. Here’s a source and there are more sources linked from here, including a DIY guide.
Since upper room UVGI when not filtered to 222nm is probably safe, and far-UVC which IS filtered to 222nm is probably safe even when it shines on occupants’ eyes and bodies, it stands to reason that upper-room UVC has enough safety margin. To the best of my knowledge, far-UVC has been tested up to doses equivalent to 3 years of 8h/day exposure at the current safety threshold, but eyes are delicate so I would prefer studies of 10-100x higher cumulative doses.
isn’t upper room far UVC strictly less effective than upper room normal UV, simply because of far UVC lamps being much more expensive and inefficient, and the only benefit of far uvc being that it is safe to shine directly on people? (and simply by virtue of being much better established, upper room UV seems like an easier sell to people who defer to authority for safety, even if it strictly less safe than upper room far UVC)
while we’re on the subject, how much more effective is far UVC (shone directly on people) vs upper room UV?
Plausibly yes, but I’d be worried enough about residual exposure (reflections off walls, improper installation) to other UV wavelengths that installation is likely to require some care and expense too. The second link has several accounts of acute health effects from people doing upper room UV wrong. Probably still great to have in train stations, airports etc though given the enormous benefit/cost ratio.
how much more effective is far UVC (shone directly on people) vs upper room UV?
I’m not really sure, there would be a component from surfaces and a component from extra ACH due to not relying on vertical air mixing. There are probably studies.
what’s the strongest argument for why i shouldn’t auto-ignore any acausal arguments that involve hypothetical entities extremely far away (or which only exist in other Everett branches or whatever) such that we will never interact with them causally at all? a razor i have is things which are entirely epiphenomenal should be ignored because they are unfalsifiable.
in particular, this seems consistent even if you accept one-boxing and paying in the counterfactual mugging. the key question is what kinds of evidence you accept as evidence of the existence of an acausal connection. in these hypotheticals, we simply declare by assumption that Omega is truthful and capable of predicting you. in reality, we would either arrive at such a belief by empirical observation (Omega has a strong track record), or pure theoretical deduction. all of the epiphenomenal acausal theories depend on pure deduction. that empirical observation has to be conveyed to us causally. it seems reasonable to draw a line and say we simply don’t trust pure deduction to be able to convince us of an acausal link.
also, with both the counterfactual mugging and Newcomb, even though in each instance you can’t prove the other branch could have happened, in the long run those who do the right thing (pay, one box) will win. whereas with purely epiphenomenal acausal theories you will literally never find out any difference whatsoever, because the branching point happened long ago.
What if you ask an aligned ASI if causally disconnected civilizations are doing things that you value, and it comes back saying “seems pretty unclear, but they are also trying to guess whether you would do, and I’d guess that if you choose to do nice stuff for them, they would be 20% more likely to guess that, and they would be 10% more likely to do nice stuff for you”? Your AI might guess that because it is e.g. running detailed simulations of other corners of this universe.
If you care about the goodness that is created by causally disconnected civilizations and are EDT-ish, I think only caring about the good you can verify via direct causal evidence in situations like the one above is basically the same kind of mistake as only caring about the well-being of the people that you can see with your own eyes.
why is it the same as only caring about the well being of people i can see? causal interaction doesn’t have to be photons bouncing off them into my eye. if i donate to AMF, i want someone to be observing the people being helped in some way to make sure that they are actually being helped, and write reports that i can read (or rely on other people in my social circle to have read).
suppose there was a strange island nation with an iron-clad law that anyone or anything that enters can never leave, or transmit any info to the outside. from our knowledge of the geography, we can deduce that they are probably very poor and suffer from malaria. we dispatch some brave volunteers from AMF into the island, and ship crates of malaria nets there. unfortunately, because of the restrictions, we have literally no idea what is happening once the shipments land. perhaps the natives are using the nets to fish. perhaps the nets are degrading faster than expected in the unusual climate of the island. perhaps the volunteers were captured and killed, and the natives have no idea what to do with the crates of nets. i would be much less willing to donate money to this charity than to AMF!
My understanding of your argument is “we will have only very weak evidence of beneficial acausal effects”. I agree this is probably true now. (Understanding “evidence” is the Bayesian sense.)
But I don’t think this would be the case in situations like the one I describe above with the ASI simulating other civilizations? The ASI might be uncertain (though it’s unclear, maybe with good enough intelligence you could be quite confident in this kind of thing), but probably not radically uncertain. It doesn’t seem to me like a different kind of uncertainty to the one I have when I donate to a medium-to-high-risk high-impact opportunity / when I try to help the world by working on AI safety.
(Uncertainty about morality (do you care about goodness produced by distant civilizations) and decision theory (are you EDT-ish) might not be resolved by intelligence and you thus might still have a kind of uncertainty different from empirical uncertainty about the effects of your action. In my comment above I conditioned that uncertainty away. I think it’s unclear how to act when you have this kind of uncertainty, but I’d be surprised if the answer was “auto-ignore”.)
Maybe your argument is more like “you should only use causal evidence when making decisions” (as a possible update to decision theory, not as an empirical claim about what Bayesian evidence is weak vs strong)? I don’t think there is a way of making this sentence more precise that you would result in a decision theory you would find reasonable.
if god comes down from the sky and tells you that there exists a beneficial acausal trade, sure, you should take it. ditto for a sufficiently competent ASI. i’m mostly making an empirical claim that you are very unlikely to gain the requisite level of confidence in practice as a human, because humans are simply not capable of reasoning about this in a sensible way, and so we should not really spend time thinking about it today. and it also isn’t worth planning ahead for what to do if the ASI recommends making acausal trades in the future, because our plans today shouldn’t depend much on what the ASI recommends we do in the future.
i also separately predict that the ASI will not be able to find acausal trades that we are confident enough to take but this is less certain and impossible to resolve.
The story I’m most sympathetic for acausal trade[1] to look something like this:
Several factions on Earth tried pretty hard to hash out their differences for what to do with space probes but end up deciding that they can’t.
Rather that put compromise solutions on the probes, and rather than go to war, they decided to “agree to disagree” and split fractions of space according to the values of different factions.[2] The key agents on the different probes are tasked to satisfy their respective factions’ values.
Once the probes arrive, they “unfold” and after an initial buildup, a key intermediate stage is turning a lot of stellar mass and energy into computrionium, so they can do more calculations than Sol’s supercomputers were ever able to. The calculations are initially done for entirely instrumental reasons (to figure out the best approach to maximize/satisfy the values loaded into the relevant agents)
One aspect of interest to these probes are what the other agents are doing. They figure this out via simulation.
Meanwhile, perhaps these agents learn highly useful empirical information about the respective regions of the other probes, from light transmissions in galaxies within your past but not future (due to space expansion) lightcone.
4+5 lead many agents to mutually conclude that their preferences will be better satisfied if they swapped places.
At this point, the agents acausally “trade” (in the sense of doing what another faction wants) iff their simulations suggest (with sufficiently high probability) that the other agents will do what they want conditional upon trading.
In such an overly specific, somewhat convoluted setup, I think refusing to consider acausal trade will be obviously foolish.
Maybe you think this setup unreasonably stacks the deck towards acausal trade. I agree! I think it’s plausible enough that simulation limits means trading across multiverses isn’t viable. But I wanted to present the example above as a kind of possibility/existence argument, before drilling down to debate probabilities (as in Fabien’s examples).
it seems reasonable to draw a line and say we simply don’t trust pure deduction to be able to convince us of an acausal link.
That sure does lean a lot on a particular metaphysics? I mean, everyday world-models are always gotten by “a combination of empirical and logical inference”.
Like, suppose that you previously believed in X due to “empirical evidence”. Now someone shows you that actually, given your “purely deductive beliefs” you could have concluded X without that empirical evidence, using reasoning R. Then suppose that happens again with empirical X2, X3, etc. Is the pure reasoning R now empirically grounded enough for you to trust it? Would you trust it enough that if it says Xk exists, but you have no empirical evidence of Xk, you would still care about Xk / act as though Xk is real?
i mean, this depends on how complex R is. at one extreme is “the sun has always risen, so it will rise tomorrow”. at the other extreme is “i have been good at coming up with correct claims in the past (due to using the right kinds of reasoning, having good intuitions, etc), so i will come up with correct claims in the future.” the key distinction being the complexity of the connection between the thing that is claimed to be good and the empirically observed evidence of goodness.
the most extreme version of the latter is if there is literally no way to ever check Xk, and the smallest reference class that contains both Xk and the things you can actually observe is still very large.
i think epiphenomenal acausal stuff falls in this latter category. the smallest reference class that contains this and also something that we can actually verify whatsoever is very large and heterogenous, and smart people are very likely to be wrong when speculating on things that are even a fraction as ungrounded as this.
I mean, yeah, I’d assume it makes very little sense for anyone right now to be making many / any decisions based on reasoning about supposed acausal bargains with superintelligences in other logically parsimonious universes, or anything like that. Is that what you’re saying?
But I guess “such that we will never interact with them causally at all” seems pretty irrelevant to that specific claim; it also doesn’t make sense to be making many decisions based on reasoning about aliens in distant galaxies, even if we’d plausibly meet them some day. Because of this, I take you to mean something stronger, like “for phenomena such that we will never interact with them causally at all, we should never worry about them”. Is that what you’re saying?
Like, the main answer, is that when you would have an option to do such trades, it would not feel ephemeral. It would look like you discovered that your situation is like Twin Prisoner dilemma, and was all along. The world produced two disconnected situations, and you are in one of them, you can observe the root, but not the other branch. But you can use your deduction skills to figure out what’s there.
But, it still might be possible that uncertainty is so great, empirically, that the trade is possible but unprofitable. I’d guess, ASIs we are going to create would engage in acausal trade non trivial amount, but I would not be shocked if they thought about it and said, nah, costs outweigh the gains, forget it. But it would be a bit surprising to me.
A simple example is consensual mutual simulation. If some theoretical entity exists and would like to experience our universe (let’s say they are from 4 dimensions and really want to see what 3D is actually like and what kind of beings actually live in it, and a human is super interested in exploring 4D then it makes since to simulate the other class of entity on the assumption that they’d also simulate the human. E.g. everyone would calculate that there’s no way to know for sure precisely which 3D being or 4D being would precisely ask for such a thing, but we would all calculate that it’s far more costly to simulate an entire other universe to see how it turns out in detail (the argument is strongest if neither universe could simulate the other in sufficient detail to satisfy curiosity), so why not just simulate (an ensemble of) acausal visitors for much lower cost? Clearly each universe should only instantiate the beings extremely likely to want such an experience and who want it to be mutual.
From this post on multiverse-wide cooperation (nowadays people call it ECL—evidential cooperation in large worlds):
5. MSR represents a shift in one’s ontology; it is not just some “trick” we can attempt for extra credit
The line of reasoning employed in MSR is very similar to the reasoning employed in anthropic decision problems. For comparison, take the idea that there are numerous copies of ourselves across many ancestor simulations. If we thought this was the case, reasoning anthropically as though we control all our copies at once could, for certain decisions, change our prioritization: If my decision to reduce short-term suffering plays out the same way in millions of short-lived, simulated versions of earth where focusing on the far future is impossible to pay out, I have more reason to focus on short-term suffering than I thought.
MSR applies a similar kind of reasoning where we shift our thinking from being a single instance of something to thinking in terms of deciding for an entire class of agents. MSR is what follows when one extends/generalizes the anthropic/UDT slogan “Acting as though you are all your (subjectively identical) copies at once” to “Acting as though you are all copies of your (subjective probability distribution over your) decision algorithm at once.”
Rather than identifying solely with one’s subjective experiences and one’s goals/values, MSR also involves “identifying with” – on the level of predicting consequences relevant to one’s decision – one’s general decision algorithm. If the assumptions behind MSR are sound, then deciding not to change one’s actions based on MSR has to cause an update in one’s world model, an update about other agents in one’s reference class also not cooperating. So the underlying reasoning that motivates MSR is something that has to permeate our thinking about how to have an impact on the world, whether we decide to let it affect our decisions or not. MSR is a claim about what is rational to do given that our actions have an impact in a broader sense than we may initially think, spanning across all instances of one’s decision algorithm. It changes our EV calculations and may in some instances even flip the sign – net positive/negative – of certain interventions. Ignoring MSR is therefore not necessarily the default, “safe” option.
I’m not sure this section will convince, but the point is that it’s hard to avoid. Even if you decide to ignore these sorts of arguments, that very action (the ignoring) should update what you think aliens will do. So then, why are you ignoring it?
My gut-check is to imagine a superintelligence that does the complicated acausal multiversal thing, and then separately imagine a superintelligence that just tries to take over the lightcone with no hesitation. Whichever ends up winning more stuff is the real one and has the right decision theory.
I just can’t imagine a superintelligence being trolled by too-clever-by-half decision theory arguments, when it can just kill you and take your stuff (and the whole empty galaxy) instead. Maybe it has to modify the intergalactic probes so that it’s provably the kind of being that would cooperate with other superintelligences or something in case it runs into a peer out there, but this galaxy is most likely just up for grabs.
When it comes to personal usage, my brain isn’t sophisticated enough to consciously instantiate any formal decision theory in everyday life, which leaves me especially free at any moment to simply reject anything that sounds goofy. Does anyone try to use this stuff in real life, or is it purely academic?
why i shouldn’t auto-ignore any acausal arguments that involve hypothetical entities … such that we will never interact with them causally at all?
You could ignore such entities, if that’s the true premise (though that might be leaving value on the table; breaking the premise might be in your own interest). But interaction with partial knowledge about X is some sort of causal interaction with an aspect of X captured in that knowledge. So causal interaction with some aspects of anything abstract/remote can’t be ruled out, and relevant computations might have sufficiently simple motivations for some process/agent/person to give them concrete physical presence in this world at some point.
execution is necessary for success, but direction is what sets apart merely impressive and truly great accomplishment. though being better at execution can make you better at direction, because it enables you to work on directions that others discard as impossible.
I expect that there’s no simple relationship between these factors and success. Both are required, and it’s idiosyncratic which one is most lacking in any given margin between not-success and success.
I usually think of execution as compute and direction as discernment. Compute = ability to work through specific directions effectively, discernment = ability to decide which of two directions is more promising. Probably success is upper-bounded by the product of the two, in a sufficiently informal way.
Thomas Kwa’s Effectiveness is a Conjunction of Multipliers seems relevant. He factors multipliers into judgment (sort of maps to your ‘direction’, or research taste I guess), ambition (which counts hard work as a driver), and risk appetite. Some domains seem to reward hard work superlinearly, probably worth looking out for those. You shouldn’t skip leg day because you’d miss out on multipliers (that phrase came from SBF of all people). Also finding multipliers is hard and information-gathering is particularly valuable when it helps you find a multiplier and figure out if you can apply it to your personal situation (I’m cribbing most of these from Thomas and the comments).
random half baked thoughts from a sleep deprived jet lagged mind: my guess is that the few largest principal components of variance of human intelligence are something like:
a general factor that affects all cognitive abilities uniformly (this is a sum of a bazillion things. they could be something physiological like better cardiovascular function, or more efficient mitochondria or something; or maybe there’s some pretty general learning/architecture hyperparameter akin to lr or aspect ratio that simply has better or worse configurations. each small change helps/hurts a little bit). having a better general factor makes you better at pattern recognition and prediction, which is the foundation of all intelligence. whether this is learning a policy or a world model, you need to be able to spot regularities in the world to exploit to have any hope of making good predictions.may
a systematization factor (how much to be inclined towards using the machinery of pattern recognition towards finding and operating using explicit rules about the world, vs using that machinery implicitly and relying on intuition). this is the autist vs normie axis. importantly, it’s not like normies are born with hard coded social skills modules, for the same reason that humans aren’t born with language modules (sorry chomsky). we learn these things by being general reasoners placed in an environment where we are exposed to language and social interactions, etc. it just turns out that systematizing is generally really good for certain kinds of domains (math, CS) and pretty bad for other domains (social interaction). I think this explains why some people of decent general intelligence just cannot grasp basic CS concepts, or vice versa for social interaction. this is because as the number of layers of abstraction increases, it becomes increasingly difficult to model the system from the bottom up, until at some point there’s a phase transition where it becomes better to model the system top down and give up on any hope of ever understanding it mechanistically. this is exacerbated through life because people generally accumulate knowledge (both explicit and tacit) faster in the domains in which they already excel, and because of declining neuroplasticity with age. modern society is increasingly constructed to be amenable to systematization (via laws, contracts, standards, etc), because systematization is necessary to govern and scale a civilization. (I think this axis also explains a correlation with embodiment/integratedness, though it’s unclear under this theory how exactly the causality should work there. maybe only being able to systematize makes it harder to model the self? intuitively, it feels like intervening on integratedness causes one to become better at non-systematizing reasoning as a whole, though that could just be because we’re actually intervening on the common cause and using self-modelling as a tight feedback loop)
there are other big components (exploration vs exploitation, risk tolerance, creativity, memory) that aren’t explained here.
I might clean up my thinking and write something more comprehensible later. none of these ideas are novel, but I think a lot can be gained through pinning them down exactly. unfortunately, on priors, this kind of theoretical speculation is rarely useful. though it might be possible to test parts of theories like this experimentally.
why is ADHD also strongly correlated with systematization? it could just be worse self modelling—ADHD happens when your brain’s model of its own priorities and motivations falls out of sync from your brain’s actual priorities and motivations. if you’re bad at understanding yourself, you will misunderstand your priorities, and also you will not be able to control your priorities, because you won’t know what kinds of evidence will really persuade your brain to adopt a specific priority, and your brain will learn that it can’t really trust you to assign it priorities to satisfy its motives (burnout).
why do stimulants help ADHD? well, they short circuit the part where your brain figures out what priorities to trust based on whether they achieve your true motives. if your brain has already learned that your self model is bad at picking actions that eventually pay off towards its true motives, it won’t put its full effort behind those actions. if you can trick it by making every action feel like it’s paying off, you can get it to go along.
honestly unclear whether this is good or bad. on the one hand, if your self model has fallen out of sync, this is pretty necessary to get things done, and could get you out of a bad feedback loop (ADHD is really bad for noticing that your self model has fallen horribly out of sync and acting effectively on it!). some would argue on naturalistic grounds that ideally the true long term solution is to use your brain’s machinery the way it was always intended, by deeply understanding and accepting (and possibly modifying) your actual motives/priorities and having them steer your actions. the other option is to permanently circumvent your motivation system, to turn it into a rubber stamp for whatever decrees are handed down from the self model, which, forever unmoored from needing to model the self, is no longer an understanding of the self but rather an aspirational endpoint towards which the self is molded. I genuinely don’t know which is better as an end goal.
why do stimulants help ADHD? well, they short circuit the part where your brain figures out what priorities to trust based on whether they achieve your true motives
I view taking stimulants more as a move to get the more reflective parts of my brain more power (“getting my taxes done is good, because we need to do it eventually, now is actually a good time, doing my taxes now will be as boring as doing them in the future, rather than playing magic the gathering now”) in steering compared to my more primitive “true motives” that tend to be hyperbolicly discounted (“dosing in bed is nice”, “washing dishes is boring”, “doing taxes is boring”). Maybe I am horrible at self-modelling, but the part where the self model is out of sync as an explanation why the self-reflective parts have less steering power seems unnecessary.
it is kind of funny that caring a lot about reflective stability of alignment proposals and paradoxes arising from self modelling (e.g in action counterfactuals) is most common in the people who are the worst at modelling themselves
I think you’re framing the intuition vs. systematization relationship in a limiting way. From a predictive coding perspective, these aren’t opposing traits on an “autist vs normie axis”, they’re complementary processes working within the same neural architecture.
Predictive coding research shows our brains use both bottom-up signals (intuition) and top-down predictions (systematization) in a dynamic interplay . These are integrated parts of how our brains process information. One person can excel at both.
What appears as preference for systematization reflects differences in how prediction errors are weighted and processed—not a fundamental limitation. You can develop both capacities because they use the same underlying predictive machinery.
I would, however, agree with your take that most people don’t do this but that is because they generally don’t search for prediction error after a certain point since it is easier to just live in your secure bubble. So you’re right in that this is probably how it looks like in practice since people will just use the strat (top-down systemization or bottom-up intuition) that has lead to the most amount of reward in the past.
Predictive coding research shows our brains use both bottom-up signals (intuition) and top-down predictions (systematization) in a dynamic interplay . These are integrated parts of how our brains process information. One person can excel at both.
I wonder to what degree the genome has “solved” intelligence. You could imagine perhaps that we are all sort of noisy instantiations of the ideal intelligence, and that reduction in noise (possibly mainly literal cortex-to-cortex SNR) is mostly what results in intelligence variations. Even considering this, the genome probably does not encode a truly complete solution in the sense that there are plenty of cases where there are mental skills that have the potential for positive feedback and a positive correlation, but basically don’t. The genome probably has no understanding of the geometric langlands conjecture. That is to say, there are deep and useful truths, especially ones that are pointing out symmetries between extremely deep natural categories, and we have not adapted to them at a deep level yet. Therefore the positive manifold of all mental skills is very much still under construction. One could then wonder to what degree variance comes from genetic denoising and what fraction comes from aligning to novel-to-genome deep truths. All that said, may be ill-posed, defining noise and novelty here seems like it could be hard.
i’ve become more skeptical of rsi over time. here’s my current best guess at what happens as we automate ai research.
for the next several years, ai will provide a bigger and bigger efficiency multiplier to the workflow of a human ai researcher.
ai assistants will probably not uniformly make researchers faster across the board, but rather make certain kinds of things way faster and other kinds of things only a little bit faster.
in fact probably it will make some things 100x faster, a lot of things 2x faster, and then be literally useless for a lot of remaining things
amdahl’s law tells us that we will mostly be bottlenecked on the things that don’t get sped up a ton. like if the thing that got sped up 100x was only 10% of the original thing, then you don’t get more than a 1/(1 − 10%) speedup.
i think the speedup is a bit more than amdahl’s law implies. task X took up 10% of the time because there is diminishing returns to doing more X, and so you’d ideally do exactly the amount of X such that the marginal value of time spent on X is exactly in equilibrium with time spent on anything else. if you suddenly decrease the cost of X substantially, the equilibrium point shifts towards doing more X.
in other words, if AI makes lit review really cheap, you probably want to do a much more thorough lit review than you otherwise would have, rather than just doing the same amount of lit review but cheaper.
at the first moment that ai can fully replace a human researcher (that is, you can purely just put more compute in and get more research out, and only negligible human labor is required), the ai will probably be more expensive per unit of research than the human
(things get a little bit weird because my guess is before ai can drop-in replace a human, we will reach a point where adding ai assistance equivalent to the cost of 100 humans to 2025-era openai research would be equally as good as adding 100 humans, but the ai’s are not doing the same things as the humans, and if you just keep adding ai’s you start experiencing diminishing returns faster than with adding humans. i think my analysis still mostly holds despite this)
naively, this means that the first moment that AIs can fully automate AI research at human-cost is not a special criticality threshold. if you are at equilibrium for allocating money between researchers and compute, then suddenly having the ability to convert compute into researchers at the exchange rate of the salary of a human researcher doesn’t really make sense
in reality, you will probably not be at equilibrium, because there are a lot of inefficiencies in hiring humans—recruiting is a lemon market, you have to onboard new hires relatively slowly, management capacity is limited, there is a inelastic and inefficient supply of qualified hires, etc. but i claim this is a relatively small effect and can’t explain a one OOM increase in workforce size
also: anyone who has worked in a large organization knows that team size is not everything. having too many people can often even be a liability and slow you down. even when it doesn’t, adding more people almost never makes your team linearly more productive.
however, if AIs have much better scaling laws with additional parallel compute than human organizations do, then this could change things a lot. this is one of my biggest uncertainties here and one reason i still take rsi seriously.
your AIs might higher have bandwidth communication with each other than your humans do. but also maybe they might be worse at generalizing previous findings to new situations or something.
they might be more aligned with doing lots of research all day, whereas humans care about a lot of other things like money and status and fun and so on. but if outer alignment is hard we might get the AI equivalent of corporate politics.
one other thing is that compute is a necessary input to research. i’ll mostly roll this into the compute cost of actually running the AIs.
the part where AI research feeds back into how good the AIs are could be very slow in practice
there are logarithmic returns to more pretraining compute and more test time compute. so an improvement that 10xes the effective compute doesn’t actually get you that much. 4.5 isn’t that much better than 4 despite being 10x more compute (which is in turn not that much better than 3.5, I would claim).
you run out of low hanging fruit at some point. each 2x in compute efficiency is harder to find than the previous one.
i would claim that in fact much of the recent feeling that AI progress is fast is due to a lot of low hanging fruit being picked. for example, the shift from pretrained models to RL for reasoning picked a lot of low hanging fruit due to not using test time compute / not eliciting CoTs well, and we shouldn’t expect the same kind of jump consistently.
an emotional angle: exponentials can feel very slow in practice; for example, moore’s law is kind of insane when you think about it (doubling every 18 months is pretty fast), but it still takes decades to play out
My current best guess median is that we’ll see 6 OOMs of effective compute in the first year after full automation of AI R&D if this occurs in ~2029 using a 1e29 training run and compute is scaled up by a factor of 3.5x[1] over the course of this year[2]. This is around 5 years of progress at the current rate[3].
How big of a deal is 6 OOMs? I think it’s a pretty big deal; I have a draft post discussing how much an OOM gets you (on top of full automation of AI R&D) that I should put out somewhat soon.
Further, my distribution over this is radically uncertain with a 25th percentile of 2.5 OOMs (2 years of progress) and a 75th percentile of 12 OOMs.
The short breakdown of the key claims is:
Initial progress will be fast, perhaps ~15x faster algorithmic progress than humans.
Progress will probably speed up before slowing down due to training smarter AIs that can accelerate progress even faster, and this being faster than returns diminish on software.
We’ll be quite far from the limits of software progress (perhaps median 12 OOMs) at the point when we first achieve full automation.
Here is a somewhat summarized and rough version of the argument (stealing heavily from some of Tom Davidson’s forthcoming work):
At the point of full automation, progress will be fast:
Probably you’ll have lots of parallel workers running pretty fast at the point when you have full automation or shortly after this. This isn’t totally obvious due to inference compute, but prices often drop fast.
My guess is you’ll have enough compute that if you use 1⁄6 of your compute running AIs, you’ll be able to run the equivalent of ~1 million AIs which are roughly as good as the best human research scientists+engineers (taking into account cost reductions for using weaker models for many tasks). This is attempting to account for a reduction in the number of models due to using a bunch of inference compute. You’ll be able to run these AIs at the equivalent of 60x speed (3x from hours, 5x from direct speed, 2x from coordination, and 2x from variable time compute and/or context swapping with a cheaper+faster model). So, like 15k parallel copies at 60x speed.
Probably the AI company has like ~3k researchers, but when you adjust for quality, this is only as good as like 600 of the top engineers/researchers.
Let’s say marginal returns to parallelism are roughly 0.55.
Then, the increase in “serial labor equivalents” is roughly (15k / 600)^0.55 * 60 = 350. (Note that most of this is from speed and quality rather than parallel copies!)
Production of algorithmic research is due to both compute and labor. So, let’s say Cobb-Douglas with labor^0.5 compute^0.5 (My guess is that current marginal returns are more like labor^0.6 compute^0.4, but returns will get worse as you add more labor.) So, we do 350^0.5 = 19x which roughly matches my 15x speed up median.
I expect progress to slow as you hit limits or run out of progress doable with cheap experiments, but limits are probably pretty high as discussed in the next bullet.
Limits are high:
Human brain is 1e24 training flop.
We’re using 1e29 flop to get a bit over human level, so like 4 OOMs of headroom from this.
We can probably get a lot more efficiency, maybe 9 OOMs, at least for efficiency up rather than down. (By efficiency up rather than down I mean: we can do as well as using 1e33 flop with 1e24 real flop relative to scaling at the point when you hit human efficiency, but probably can’t train a human-level AI for 1e15 real flop.)
At some point, I’ll write a post that makes a better version of this argument and presents a full version of my picture.
I don’t think we’ll see a speed criticality per se; rather, I expect the rate of progress to accelerate up to the point of full automation. But I currently don’t think this makes a huge difference to the bottom line of “progress in the first year after full automation in practice”, as I expect to initially see fast cost decreases and inference time compute can only go so far. I could expand this argument to the extent you have cruxes like “slower takeoff because we’ve already eaten low-hanging fruit with earlier AI acceleration” and “inference compute means you hit full automation much faster”.
That is, after the first company fully automates AI R&D internally, if they decide to go as fast as possible and their AIs/employees/others don’t try to sabotage these efforts. And I’m assuming that AI software progress hasn’t substantially slowed down by the time of full automation, though conditioning on a 1e29 training run means that at least compute scaling progress (which is a key driver of software progress) hasn’t slowed down all that much.
This seems to make the simplifying assumption that the R&D automation is applied to a large fraction of all the compute that was previously driving algorithmic progress right?
If we imagine that a company only owns 10% of the compute being used to drive algorithmic progress pre-automation (and is only responsible for say 30% of its own algorithmic progress, with the rest coming from other labs/academia/open-source), and this company is the only one automating their AI R&D, then the effect on overall progress might be reduced (the 15X multiplier only applies to 30% of the relevant algorithmic progress).
In practice I would guess that either the leading actor has enough of a lead that they are already responsible for most of their algorithmic progress, or other groups are close behind and will thus automate their own AI R&D around the same time anyway. But I could imagine this slowing down the impact of initial AI R&D automation a little bit (and it might make a big difference for questions like “how much would it accelerate a non-frontier lab that stole the model weights and tried to do rsi”).
Yes, I think frontier AI companies are responsible for most of the algorithmic progress. I think its unclear how much the leading actor benefits from progress done at other slightly behind AI companies and this could make progress substantially slower. (However, it’s possible the leading AI company would be able to acquire the GPUs from these other companies.)
at the first moment that ai can fully replace a human researcher (that is, you can purely just put more compute in and get more research out, and only negligible human labor is required), the ai will probably be more expensive per unit of research than the human
Why do you think this? It seems to me that for most tasks once an AI gets some skill it is much cheaper to run it for that skill than a human.
libraries abstract away the low level implementation details; you tell them what you want to get done and they make sure it happens. frameworks are the other way around. they abstract away the high level details; as long as you implement the low level details you’re responsible for, you can assume the entire system works as intended.
a similar divide exists in human organizations and with managing up vs down. with managing up, you abstract away the details of your work and promise to solve some specific problem. with managing down, you abstract away the mission and promise that if a specific problem is solved, it will make progress towards the mission.
(of course, it’s always best when everyone has state on everything. this is one reason why small teams are great. but if you have dozens of people, there is no way for everyone to have all the state, and so you have to do a lot of abstracting.)
when either abstraction leaks, it causes organizational problems—micromanagement, or loss of trust in leadership.
I think people in these parts are not taking sufficiently seriously the idea that we might be in an AI bubble. this doesn’t necessarily mean that AI isn’t going to be a huge deal—just because there was a dot com bubble doesn’t mean the Internet died—but it does very substantially affect the strategic calculus in many ways.
I would be utterly unsurprised to see an AI crash in the next 24 months, leading to another AI Winter. I lived through 1999 and Petfood.com and the Internet bubble pop. And I can pattern match.
But the Internet crash didn’t last long. Google and Amazon survived just fine, Ruby on Rails was big within half a decade, and soon enough we were doing Web 2.0 and AJAX and all that fun stuff.
It’s possible that current generation LLMs might hit a wall soon, for various architectural reasons that are obvious to many people but that I’m superstitiously averse to amplifying. If they do, that increases the chance of an AI Winter until the underlying research gets done.
But I have trouble imagining any series of events that buys us 10 more years. Bubble pops in tech are usually an early correction that wipes out a Precambrian Explosion of dumb money, and that ultimately concentrates resources into a few successful players.
I guess figuring out whether we’re “in a bubble” just hasn’t seemed very important to me, relative to how hard it seems to determine? What effects on the strategic calculus do you think it has?
E.g. my current best guess is that I personally should just do what I can to help build the science of interpretability and learning as fast as possible, so we can get to a point where we can start doing proper alignment research and reason more legibly about why alignment might be very hard and what could go wrong. Whether we’re in a bubble or not mostly matters for that only insofar as it’s one factor influencing how much time we have left to do that research.
But I’m already going about as fast as I can anyway, so having a better estimate of timelines isn’t very action-relevant for me. And “bubble vs. no bubble” doesn’t even seem like a leading-order term in timeline uncertainty anyway.
all effects that route through longer timelines (allocating more to upskilling oneself and others, longer term bets, not expecting agi to look like current models, aggressiveness of distributing funds to alignment, etc)
whether to pursue an aggressive (stock-heavy) or conservative (bond-heavy) investment strategy. if there is an ai bubble pop, it will likely bring the entire economy into a recession.
how much money to save as runway; should you be taking advantage of the bubble to grab as much cash as possible before the music stops, or should you be trying to dispose of all of your money before the singularity makes it worthless?
for lab employees: how much lab equity to sell/hold?
how much to emphasize “agi soon” in public comms, or in conversations with policymakers? (during a bubble pop, having predicted agi soon will probably be even more negatively viewed than merely having been wrong about timelines with no pop)
if there is a bubble and it pops, sentiment around agi will flip from inevitability to impossibility. many people will not be epistemically strong enough to resist the urge to conform. being aware of the hype cycle can help free yourself from it and avoid both over and under exuberance.
whether to pursue an aggressive (stock-heavy) or conservative (bond-heavy) investment strategy. if there is an ai bubble pop, it will likely bring the entire economy into a recession.
This is my biggest disagreement at the moment, and the reason is unlike 2008 or 2020, there’s no supply squeeze or financial consequences severe enough that banks start to fail, and I expect an AI bubble to look more like the 2000 bubble than the 2008 or 2020 bubbles/crises.
That said, AI stocks would fall hard and GPUs would become way, way cheaper.
“should you be trying to dispose of all of your money before the singularity makes it worthless”
This is pretty different than my model of what would happen? Though I admittedly haven’t spent a ton of time thinking through it. I just don’t see why money would lose value though; I expect that some goods would still remain scarce, positional, etc (land in high-demand cities being a strong example), which would seem to cut against that happening?
I’m actually uncertan about whether an AI bubble would trigger a recession (period of broad decline in economic activity). What I have seen reported is that the economy’s generally stagnant now, except for AI, and that American jobs may have been declining for months already, contrary to published statistics, motivating a recent rate cut by the Fed. If true, would an AI bubble really have substantial broader ripple effects outside the AI sector, and would those ripples even necessarily be negative?
In particular, I have been suspecting that the biotech winter that’s been going on for a few years is partly due to routing of investor money into the AI craze. Maybe that money just ends up funding other economic activity instead? I’m really unsure of how to think about this and am quite curious.
my mental model of how a pop triggers a broader crash is something like: a lot of people are taking money and investing it into AI stuff, directly (by investing in openai, nvidia, tsmc, etc) or indirectly (by investing in literally anything else; like, cement companies that make a lot of money by selling cement to build datacenters or whatever). this includes VCs, sovereign wealth funds, banks, etc. if it suddenly turned out that the datacenters and IP were worth a lot less than they thought it was, their equity (or debt) ownership is suddenly worth a lot less than they thought it was, and they may become insolvent. and lots of financial institutions becoming insolvent is pretty bad.
Hm. It seems like the extent to which there is an increased risk of insolvency due to a popped AI bubble would partly depend on the extent to which these institutions had sold other assets or used leverage to pay for equity in or lend to AI companies and the suppliers that are most dependent on AI company business.
My understanding is that the great financial crisis resulted from extremely leveraged investments in mortgages due to lenient rules and a perception that American mortgages were extremely reliably paid. I don’t know to what extent important institutions may be overleveraged or overweighted in their investments in AI.
But my modal prediction is that an AI bubble would cause hedged AI investors to become less valuable without becoming insolvent, a bunch of distressed assets to be purchased for low low prices by those who kept their powder dry, and a bunch of cancelled orders and perhaps layoffs and restructuring by suppliers who expanded to meet the temporary surge in demand by AI companies. That could cause turmoil, but I really don’t have a sense of to what extent the American or global economy has reshaped itself to build out AI. It’s hard to know particularly because with Trump’s tariffs, there has been so much coincident market turmoil that it’s hard to know how much is AI and how much is tariffs/end of ZIRP (as others have pointed out before).
I just meant that if an oracle told me ASI was coming in two years, I probably couldn’t spend down energy reserves to get more done within that timeframe compared to being told it’ll take ten years. I might feel a greater sense of urgency than I already am and perhaps end up working longer hours as a result of that, but if so that’d probably be an unendorsed emotional response I couldn’t help more than a considered plan. I kind of doubt I’d actually get more done that way. Some slack for curiosity and play is required for me to do my job well.
The stakes are already so high and time so short that varying either within an order of magnitude up or down really doesn’t change things all that much.
Then what does it mean, in concrete terms? Can you give some probabilities about what you think will happen to the valuations of what companies over what time frame?
creating surprising adversarial attacks using our recent paper on circuit sparsity for interpretability
we train a model with sparse weights and isolate a tiny subset of the model (our “circuit”) that does this bracket counting task where the model has to predict whether to output ] or ]]. It’s simple enough that we can manually understand everything about it, every single weight and activation involved, and even ablate away everything else without destroying task performance.
(this diagram is for a slightly different task because i spent an embarassingly large number of hours making this figure and decided i never wanted to make another one ever again)
in particular, the model has a residual channel delta that activates twice as strongly when you’re in a nested list. it does this by using the attention to take the mean over a [ channel, so if you have two [s then it activates twice as strongly. and then later on it thresholds this residual channel to only output ]] when your nesting depth channel is at the stronger level.
but wait. the mean over a channel? doesn’t that mean you can make the context longer and “dilute” the value, until it falls below the threshold? then, suddenly, the model will think it’s only one level deep!
it turns out that indeed, this attack works really well on the entire sparse model (not just the circuit), and you can reliably trick it.
in retrospect, this failure is probably because extremely long nested rows are out of distribution on our specific pretraining dataset. but there’s no way i would have come up with this attack by just thinking about the model.
one other worry is maybe this is just because of some quirk of weight-sparse models. strikingly, it turns out that this attack also transfers to similarly capable dense models!
Aside: For me, this paper is potentially the most exciting interpretability result of the past several years (since SAEs). Scaling it to GPT-3 and beyond seems like a very promising direction. Great job!
How well does it generalize to “similarly capable dense models”? Just curious whether you have a graph for that (I haven’t read any part of the paper besides its first page, so feel free to just tell me go and do that before asking questions like this).
i don’t have a graph for it. the corresponding number is p(correct) = 0.25 at 63 elements for the one dense model i ran this on. (the number is not in the paper yet because this last result came in approximately an hour ago)
the other relevant result in the paper for answering the question of how similar our sparse models are to dense models is figure 33
Noticed that you use task cross entropy loss instead of KL when learning task masks (Appendix 4.5, Loss Function) This is maybe a reasonable design choice, but important to note that this will ablate any “negative” nodes (and indirectly cause you to ignore positive nodes which overcome the negative nodes).
Overall, I suspect that this causes the subnetworks to miss important model computations (but obviously decreases the size of the subnetwork)
Aside: is there a reason there isn’t a top-level link-post for this paper? (if not I’ll create one)
Hmm I guess there’s no guarantee that KL does better, and since we don’t have great metrics for “internal faithfulness”, maybe its just better to transparently optimize the flawed metric (task ce + sparsity).
Though as Robin notes on the AMI post, I do think the next step in this style of research is handling negative heads and self repair in a principled way.
maybe I should host an antechamber/arena house party: one chill cozy room with soothing music where no arguing is allowed and people are strongly encouraged to say kind things and reflect on things they’re grateful for and whatnot, and another with harsh fluorescent lights and agitating music and a big whiteboard full of hot takes and the conversations all get transcribed by speech to text and posted on lesswrong in real time. and guests are given a heart rate monitor that beeps if their HR gets too high, forcing them to spend a few minutes in the chill room before returning to the arena
Arguments will be won by the attendees with the best cardio fitness (low resting HR) + mental discipline (less affected by agitating surroundings). This creates a natural incentive to exercise and meditate.
You’re only thinking defense, not offense[1]. The best offensive[1] strategy here is to maximally irritate your opponents with an impenetrable air of smug superiority.
Inducing sexual arousal seems like a better equilibrium, as long as everyone consents. It has positive valence roughly proportional to ΔHR, solves gender ratio problems and incentivizes people to learn effective flirting.
i think of the idealized platonic researcher as the person who has chosen ultimate (intellectual) freedom over all else. someone who really cares about some particular thing that nobody else does—maybe because they see the future before anyone else does, or maybe because they just really like understanding everything about ants or abstract mathematical objects or something. in exchange for the ultimate intellectual freedom, they give up vast amounts of money, status, power, etc.
one thing that makes me sad is that modern academia is, as far as I can tell, not this. when you opt out of the game of the Economy, in exchange for giving up real money, status, and power, what you get from Academia is another game of money, status, and power, with different rules, and much lower stakes, and also everyone is more petty about everything.
at the end of the day, what’s even the point of all this? to me, it feels like sacrificing everything for nothing if you eschew money, status, and power, and then just write a terrible irreplicable p-hacked paper that reduces the net amount of human knowledge by adding noise and advances your career so you can do more terrible useless papers. at that point, why not just leave academia and go to industry and do something equally useless for human knowledge but get paid stacks of cash for it?
ofc there are people in academia who do good work but it often feels like the incentives force most work to be this kind of horrible slop.
I hear this a lot, and as a PhD student I definitely see some adverse incentives, but I basically just ignore them and do what I want. Maybe I’ll eventually get kicked out of the academic system, but it will take years, which is enough time to do obviously excellent work if I have that potential. Obviously excellent work seems to be sufficient to stay in academia. So the problem doesnt really seem that bad to me—the bottom 60% or so grift and play status games, but probably weren’t going to contribute much anyway, and the top 40% occasionally wastes time on status games because of the culture or because they have that type of personality, but often doesnt really need to.
the bottom 60% or so grift and play status games, but probably weren’t going to contribute much anyway
I disagree with this reasoning. A well-designed system with correct incentives would co-opt these people’s desire to grift and play status games for the purposes of extracting useful work from them. Indeed, setting up game-theoretic environments in which agents with random or harmful goals all end up pointed towards some desired optimization target is largely the purpose of having “systems” at all. (See: how capitalism, at its best, harnesses people’s self-interest towards creating socially valuable things.)
People who would ignore incentives and do quality work anyway would probably do quality work anyway, so if we only cared about them, we wouldn’t need incentive systems at all. (Figuring out who these people are and distributing resources to them is another purpose of such systems, but a badly-designed system is also bad at this task.)
In my experience with magh, to be obviously excellent you need to be more like top 10 % of all grad students, possibly even higher, but might vary a lot on the field.
I suspect that academia would be less like this if there weren’t an oversupply of labor in academia. Like, there’s this crazy situation where there are way more people who want to be professors than there are jobs for professors. So a bunch get filtered out in grad school, and a bunch more get filtered out in early stages of professorhood. So professors can’t relax and research what they are actually curious about until fairly late in the game (e.g. tenure) because they are under so much competition to impress everyone around them with publications and whatnot.
Also, the person who’s willing to mud-wrestle for twenty years to get a solid position so they can turn around and do real research is just much much rarer than the person who enjoys getting dirty.
academia is too broad of a term. most of math, physics, theoretical CS, paleontology, material sciences, engineering, and some branches of economics, biology, engineering, (computational) neuroscience, (computational) linguistics, statistics etc are doing well and overall reward intellectual freedom and deep work. in terms of people this is a small minority of total academics, probably <5%.
It is true that many subfields, or even entire domains of science are diseased disciplines. Most of the research ismarginal, irrelevant, reinventing the wheel, trivial, tautological, p-hacked and often even fraudulent. One can point to the usual suspects in the humanities and the social sciences but disciplines where the majority of research is noise, nonsense or even net-negative plausibly also includes machine learning and (I’m told) medicine.
Is that disappointing? Perhaps. But this still describes hundred of thousands or millions of people all over the world pushing the frontier of knowledge.
I think you are pining for a world that doesn’t really exist. The reason why academia is also ruled by money, status, and power is because it is just a different sector of the economy. The costs and returns will therefore equilibrate with the rest of the economy given the constraints of academia.
If you allow for tenure positions, for example, well there is much reward for having a forever stable high paying and high status job, so you should expect people to pay up to that amount of benefit to get it.
Maybe you pine for the academia of Newton, where scientists could never worry about appearing immediately productive because they had massive amounts of passive income, but that is only possible because of the massive inequality involved, randomly choosing some families to be high class. That academia only existed because of the rest of the economy, which was utter trash, and for the vast majority of history instead pointed potential Newtons toward studying religion instead.
I’m not saying improvements to academia don’t exist, but that you won’t find your solutions by trying to isolate academics from money, status, and power. Or pretending it is independent from the rest of the economy. But by working with these forces, as we do in all other fields we succeed at, to align them with good work.
i think this is a bit overblown, from observing academia you can definitely trade a small amount of status for academic freedom if you’re not 90th-percentile disagreeable. You could go to a slightly lower-ranked but still R1 school, and negotiate for ability to do whatever you want. If the school isn’t trying hard to climb rankings, there’s less pressure to publish or to measure performance based on strange status-y things. You do lose out on some amount of status compared to being at a top school, but if you do good work your peers at top schools will still read/pay attention to it. At top schools, negotiating for freedom is much harder to do because the market is more competitive and ppl play status games to get ahead on the margin.
Have you seen A Master-Slave Model of Human Preferences? To summarize, I think every human is trying to optimize for status, consciously or subconsciously, including those who otherwise fit your description of idealized platonic researcher. For example, I’m someone who has (apparently) “chosen ultimate (intellectual) freedom over all else”, having done all of my research outside of academia or any formal organizations, but on reflection I think I was striving for status (prestige) as much as anyone, it was just that my subconscious picked a different strategy than most (which eventually proved quite successful).
at the end of the day, what’s even the point of all this?
I think it’s probably a result of most humans not being very strategic, or their subconscious strategizers not being very competent. Or zooming out, it’s also a consequence of academia being suboptimal as an institution for leveraging humans’ status and other motivations to produce valuable research. That in turn is a consequence of our blind spot for recognizing status as an important motivation/influence for every human behavior, which itself is because not explicitly recognizing status motivation is usually better for one’s status.
Agreed and also sad about this (and this seems to be not only true in academia but also industry). I turned down a PhD offer for this vibe. But reflecting generally, at least for myself, I guess if a person does not have enough capital or ability to pursue the intellectual freedom yet, they could take smaller steps, learn and accumulate trust and then eventually explore more out of the box searches. Just need to stay patient, stubborn, and make sure that “eventually” is not too late.
i’ve noticed a life hyperparameter that affects learning quite substantially. i’d summarize it as “willingness to gloss over things that you’re confused about when learning something”. as an example, suppose you’re modifying some code and it seems to work but also you see a warning from an unrelated part of the code that you didn’t expect. you could either try to understand exactly why it happened, or just sort of ignore it.
reasons to set it low:
each time your world model is confused, that’s an opportunity to get a little bit of signal to improve your world model. if you ignore these signals you increase the length of your feedback loop, and make it take longer to recover from incorrect models of the world.
in some domains, it’s very common for unexpected results to actually be a hint at a much bigger problem. for example, many bugs in ML experiments cause results that are only slightly weird, but if you tug on the thread of understanding why your results are slightly weird, this can cause lots of your experiments to unravel. and doing so earlier rather than later can save a huge amount of time
understanding things at least one level of abstraction down often lets you do things more effectively. otherwise, you have to constantly maintain a bunch of uncertainty about what will happen when you do any particular thing, and have a harder time thinking of creative solutions
reasons to set it high:
it’s easy to waste a lot of time trying to understand relatively minor things, instead of understanding the big picture. often, it’s more important to 80-20 by understanding the big picture, and you can fill in the details when it becomes important to do so (which often is only necessary in rare cases).
in some domains, we have no fucking idea why anything happens, so you have to be able to accept that we don’t know why things happen to be able to make progress
often, if e.g you don’t quite get a claim that a paper is making, you could resolve your confusion just by reading a bit ahead. if you always try to fully understand everything before digging into it, you’ll find it very easy to get stuck before actually make it to the main point the paper is making
there are very different optimal configurations for different kinds of domains. maybe the right approach is to be aware that this is an important hparameter and occasionally try going down some rabbit holes and seeing how much value it provides
This seems to be related to Goldfish Reading. Or maybe complementary. In Goldfish Reading one reads the same text multiple times, not trying to understand it all at once or remember everything, i.e., intentionally ignoring confusion. But in a structured form to avoid overload.
Yeah, this seems like a good idea for reading—lets you get best of both worlds. Though it works for reading mostly because it doesn’t take that much longer to do so. This doesn’t translate as directly to e.g what to do when debugging code or running experiments.
I think it’s very important to keep track of what you don’t know. It can be useful to not try to get the best model when that’s not the bottleneck. But I think it’s always useful to explicitly store the knowledge of what models are developed to what extent.
The algorithm that I have been using, where what to understand to what extend is not a hyperparameter, is to just solve the actual problems I want to solve, and then always slightly overdo the learning, i.e. I would always learn a bit more than necessary to solve whatever subproblem I am solving right now. E.g. I am just trying to make a simple server, and then I learn about the protocol stack.
This has the advantage that I am always highly motivated to learn something, as the path to the problem on the graph of justifications is always pretty short. It also ensures that all the things that I learn are not completely unrelated to the problem I am solving.
I am pretty sure if you had perfect control over your motivation this is not the best algorithm, but given that you don’t, this is the best algorithm I have found so far.
i find the “revealed preference” people really annoying. anyone who has ever been addicted to anything knows that habit forming ness can be completely disentangled from enjoyability.
The enjoyability people are rather annoying too. Anyone who strived to reach a target even in a grueling way out of abstract considerations knows that hedonistic motivations are merely one standard origin-class of justifications, one that can be ignored and completely disentangled from optimization-channeling towards targeted outcomes.
it’s actually really annoying that habit forming and enjoyability are separate. X (the everything app) is extremely habit forming, and yet, introduces mostly irritation and suffering into my life. it’s also common to experience a burning desire for something, and yet to feel only mild enjoyability, or even absolutely nothing, upon obtaining it.
it also increases my credence in the idea that jhanas can be extremely enjoyable but not addictive at all.
you usually don’t know what options other people are actually choosing from—what are their abilities, resources, knowledge; what other costs or side effects would the choices have on their lives—so it is possible to be arbitrarily wrong.
famously, Marie Antoinette observed that her subjects had a revealed preference for starving over eating cake.
Especially when it’s followed up with some assumption that of course revealed preferences obey vNM axioms, because otherwise people could be Dutch-booked! As if that was the worst consequence that could possibly happen, and as if things at least as bad don’t actually happen.
Yeah VNM is not supposed to be descriptive, it’s supposed to be prescriptive in the sense that if you’re not obeying VNM axioms, then in theory there’s some way you could change to obey them that would be in your interest in the sense that you would be less exploitable.
on the one hand, it is a desirable feature of an intellectual community to be truth seeking, and while it can be deeply emotionally painful to part ways with deeply held beliefs, in the long run it’s better to tear off the bandage. on the other hand, being emotionally hurt all the time by your community kind of fucking sucks, and isn’t very good for long term emotional or epistemic health.
perhaps a middle ground is in order: intellectual communities should be partitioned into an arena, where every idea is to be exposed to the harsh light of truth, and an antechamber, where you can rest and be surrounded by positivity and develop ideas in a supportive environment.
both are necessary—we need a way to kill bad ideas, because an environment that refuses to discard bad ideas because they are emotionally load bearing is doomed to epistemic ruin. but also the best weird ideas often sound bad initially, and require a safe environment to develop; and we are all human, and our emotional well being and desire to belong to a community is essential. by visibly separating the two, we might be able to get the best of both worlds.
this is not a crazy idea—many other parts of society have analogous things. for example, people who play sports for fun with their friends compete to win while on the field, but this only brings them closer off the field.
I think one common criticism of LW ist it is too much of an arena, and not enough of an antechamber. perhaps this can be fixed somehow.
Orthogonally, cultural standards of emotional tone during debates are also important for how much emotional struggle is involved in changing one’s ideas.
If the tone implies that you were foolish for holding your idea, it’s going to be a lot more painful to let it go.
Lesswrong has a pretty good standard of not just civil but polite and supportive discourse. This seems actually pretty crucial for it being an environment in which people do regularly change their minds.
I don’t like the term arena in your suggested division because it implies combat. Combat is emotionally intense, I’d rather have a metaphor that’s more collaborative.
This doesn’t eliminate the worth of having separate spaces for support and rigorous testing of ideas, but I think it’s important to keep in mind whenever we’re discussing group epistemics.
I claim there’s a pareto frontier of epistemic correctness vs emotional kindness. some things, like sneering at people and implying that they are foolish, are pareto suboptimal. but once you achieve pareto optimality, there is a tradeoff between kindness and correctness; and what I think should exist is two distinct spaces on different parts of this tradeoff curve (and of course nobody should do pareto suboptimal things)
once you achieve pareto optimality, there is a tradeoff between kindness and correctness
It’s hard to stay on a pareto frontier, optimizing for more (or less) “kindness” directly is a goodharting hazard. If you ask for something, you might just get poisoned with more of the fake version of it.
I’d prefer less of the sort of “kindness” that trades off with correctness, rather than more of it (even when getting less of it wouldn’t actually help with correctness; it just doesn’t seem like a good thing). But if I ask for that, I’ll end up getting some (subtle) sneering and trolling, or unproductive high-standards elitism that on general principle wants to destroy ideas that didn’t get a chance to grow up yet. Similarly, if you ask for the sort of “kindness” that does trade off with correctness, you’ll end up getting some sycophancy (essentially) that cultivates your errors, making them stronger and more entrenched in your identity, ever more painful and less feasible to eventually defeat (even if there are benign forms of this sort of “kindness” that merely don’t make the problem worse in a comfortable way, as opposed to trying to intervene on it).
I don’t think unsycophantic kindness is quite that difficult to achieve. clearly some groups of people IRL achieve such kindness. generally, people in such communities try to understand each other and why they believe the things they do without judgement in either direction, and affirm the emotional responses to beliefs rather than the beliefs themselves. you don’t have to agree with someone to agree that you’d feel the same in their shoes. somehow, these groups of people don’t inevitably slide into subtle sneering and trolling and sycophancy.
plus, the point of explicitly separating the arena and the antechamber is to make it clear that when you are receiving kindness, you are not receiving updates towards truth. so it is clear to you, and to people around you, that receiving emotional validation in the antechamber is not evidence that your beliefs are correct. it’s valid for people to spend all their time in the antechamber, but everyone will see this, and assign less weight to the truthfulness of their beliefs.
I also don’t think non-sycophantic kindness causes people to dig in to their incorrect beliefs. if anything, it seems more common that people dig into incorrect beliefs because of a sense of adversity against others. think about how much more painful it is to concede a point if your interlocutor is being really mean about it, vs if they are thoughtful and hear you out.
if anything, it seems more common that people dig into incorrect beliefs because of a sense of adversity against others
Consider cults (including milder things like weird “alternative” health advice groups etc.). Positivity and mutual support seem like a key element of their architecture, and adversity often primarily comes from peers rather than an outgroup. I’m not talking about isolated beliefs, content and motivations for those tend to be far more legible. A lot of belief memeplexes have either too few followers or aren’t distinct enough from all the other nonsense to be explicitly labeled as cults or ideologies, or to be organized, but you generally can’t argue their members out of alignment with the group (on relevant beliefs, considered altogether).
the point … is to make it clear that when you are receiving kindness, you are not receiving updates towards truth
This is also a standard piece of anti-epistemic machinery of groups that reinforce some nonsense memplex among themselves with support and positivity. Support and positivity are great, but directing them to systematically taboo correctness-fixing activity is what I’m gesturing at, the sort of “kindness” that by its intent and nature tends to trade off against correctness.
I don’t hear that one as often—what’s a good example? in particular, I hear people complain all the time that LW is too critical of ideas, and that when you post anything a whole bunch of people will appear out of the woodwork to critique you. I don’t feel like I’ve ever heard anyone say that people in LW are too uncritical and unwilling to challenge things they disagree with
Said Achmiz was one person who claimed as such, and he tended to get a bunch of agreement votes when he said so, so presumably some people agreed with him.
I don’t quite think so? My impression is the criticism is that LW is too much of an echochamber, in that people just express agreement with each other too much but probably that is mostly not because of people being nice but folks just outright having very similar believes
it can be deeply emotionally painful to part ways with deeply held beliefs
This is not necessarily the case, not for everyone. Theories and their credences don’t need to be cherished to be developed, or acted upon, they only need to be taken seriously. Plausibly this can be mitigated by keeping identity small, accepting only more legible things in the role of “beliefs” that can have this sort of psychological effect (so that they can be defeated through argument alone). Legible ideas cover a surprising amount of territory, there is no pragmatic need to treat anything else as “beliefs” in this sense, all the other things can remain ambient epistemic content detached from who you are. When more nebulous worldviews become part of one’s identity, they become nearly impossible to dislodge (and possibly painful, with enough context and effort). They are still worth developing towards eventual legibility, and not practical to argue with (or properly explain).
Thus arguing legible beliefs should by their nature be less intrusive than arguing nebulous worldviews. And perhaps nebulous worldviews should be argued against being held as “beliefs” in the emotional sense in general, regardless of their apparent correctness, as a matter of epistemic hygiene. Ensuring by habit you are not going to be in the position where you have “beliefs” that would be painful to part ways with, and also can’t be pinned down clearly enough to dispel.
there are very few people in the world who don’t deeply emotionally hold quite a few important beliefs. having a small identity is difficult in practice, because having an identity is an important part of how nearly everyone navigates this complex and confusing world. I’m skeptical of anyone who claims to have completely eliminated all emotional attachment to all of their important decision-relevant beliefs.
but even assuming that you have somehow achieved perfect small identityness and emotional independence of all of your important beliefs and it all works out great for you, you must surely acknowledge that there are many people out there who have not. and probably they are more likely to achieve rationalist enlightenment if they are surrounded by people who are supportive but nudge gently towards truth seeking, rather than immediately coming in with a wrecking ball and demolishing emotionally load bearing pillars.
having an identity is an important part of how nearly everyone navigates this complex and confusing world
Legible ideas (that are practical to meaningfully argue about) cover a lot of ground, they are not as hazardous as part of identity. And less well-defined but useful/promising/interesting understandings don’t need to become part of identity to be taken seriously and developed. That’s the failure mode at the other extreme, when anything insufficiently scientific/empirical/legible/etc. gets thrown out with the bathwater.
rather than immediately coming in with a wrecking ball and demolishing emotionally load bearing pillars
Probably when something is easy to defeat (admits argument, legible), it’s not that painful to let it go. The pain is the nebulous attachment fighting for influence, that it won’t be fully defeated even when you end up consciously endorsing a change of mind. Thus ideologies are somewhat infeasible to change, they’ll keep their hold even long after the host disavows them. A habit of keeping such things at a distance benefits from other people not feeding their structurally hazardous placement (as emotionally load bearing pillars) with positivity. But that’s distinct from viewing positively the development of even such hazardous things, handling them with appropriate caution.
I think Leo is using a more expansive definition for identity than you have in mind here (if it seemed important I’d suggest he use a different word to clarify, but actually it doesn’t seem important because….).
I also think he’s making descriptive claims about many people’s apparent relationship to changing their beliefs, and you’re challenging him on normative grounds invoking the mechanics (a taxonomy, even) of belief, which I take to be addressing his point at the wrong level of abstraction in at last two ways.
‘Given that x appears to be hard for people some of the time, we should take some cheap steps to make it easier.’ Seems pretty reasonable!
Maybe you’re saying ‘but x seeming hard is a sign of a deeper problem, making x easier in a shallow way gives quarter to that deeper problem, and if only they had my model of, and relationship to, belief, we could not only make x easy for them, but solve much else besides.’
I’m mostly a fan of that ‘go for the root’ approach, but I think this case is much much harder at scale than you’re giving it credit for; your story about the source of pain in having one’s beliefs challenged smells like a typical mind fallacy. It may be a great description of what’s going on with you, but it doesn’t feel like the kind of description that captures most or all people in most or all relevant cases.
I’m skeptical of anyone who claims to have completely eliminated all emotional attachment to all of their important decision-relevant beliefs.
How about someone who never had all this overwrought (as it seems to me) emoting in the first place? I sense typical-minding here.
I also believe that done right, there is no tradeoff between kindness and correction, and that in a serious conversation, to think in terms of kindness (or its opposite) is already to go wrong.
while there may exist people who do not have this emoting in the first place, there are far more people who are too emotionally unaware to even realize the emotional drives behind their beliefs and actions. this is often very obvious to people around them. so I won’t take anyone’s word on this matter, and instead only trust a track record of behavior.
I think this would benefit from having examples (maybe just pointing at the top level post/belief that was unpleasantly attacked without calling out specific responses)
to be clear, a very important part of the culture of the antechamber is encouraging people to spend time in the arena, or if people are not ready to do so, to encourage people to grow emotionally so that they can handle being in the arena.
there are a lot of video games (and to a lesser extent movies, books, etc) that give the player an escapist fantasy of being hypercompetent. It’s certainly an alluring promise: with only a few dozen hours of practice, you too could become a world class fighter or hacker or musician! But because becoming hypercompetent at anything is a lot of work, the game has to put its finger on the scale to deliver on this promise. Maybe flatter the user a bit, or let the player do cool things without the skill you’d actually need in real life.
It’s easy to dismiss this kind of media as inaccurate escapism that distorts people’s views of how complex these endeavors of skill really are. But it’s actually a shockingly accurate simulation of what it feels like to actually be really good at something. As they say, being competent doesn’t feel like being competent, it feels like the thing just being really easy.
“power fantasies” are actually a pretty mundane phenomenon given how human genetic diversity shook out; most people intuitively gravitate towards anyone who looks and acts like a tribal chief, or towards the possibility that you yourself or someone you meet could become (or already be) a tribal chief, via constructing some abstract route that requires forging a novel path instead of following other people’s.
Also a mundane outcome of human genetic diversity is how division of labor shakes out; people noticing they were born with savant-level skills and that they can sink thousands of hours into skills like musical instruments, programming, data science, sleight of hand party tricks, social/organizational modelling, painting, or psychological manipulation. I expect the pool to be much larger for power-seeking-adjacent skills than art, and that some proportion of that larger pool of people managed to get their skills’s mental muscle memory sufficiently intensely honed that everyone should feel uncomfortable sharing a planet with them.
The alternative is to pit people against each other in some competitive games, 1 on 1 or in teams. I don’t think the feeling you get from such games is consistent with “being competent doesn’t feel like being competent, it feels like the thing just being really easy”, probably mainly because there is skill level matching, there are always opponents who pose you a real challenge.
Hmm maybe such games need some more long tail probabilistic matching, to sometimes feel the difference. Or maybe variable team sizes, with many incompetent people versus few competent, to get a more “doomguy” feeling.
Some games do put their finger on the scale, for example you have a first-person shooter where you learn to aim better but you also now have a gun that deals 200 damage per hit, as opposed to your starting gun that dealt 10.
But puzzle-solving games are usually fair, I think.
reliability is surprisingly important. if I have a software tool that is 90% reliable, it’s actually not that useful for automation, because I will spend way too much time manually fixing problems. this is especially a problem if I’m chaining multiple tools together in a script. I’ve been bit really hard by this because 90% feels pretty good if you run it a handful of times by hand, but then once you add it to your automated sweep or whatever it breaks and then you have to go in and manually fix things. and getting to 99% or 99.9% is really hard because things break in all sorts of weird ways.
I think this has lessons for AI—lack of reliability is one big reason I fail to get very much value out of AI tools. if my chatbot catastrophically hallucinates once every 10 queries, then I basically have to look up everything anyways to check. I think this is a major reason why cool demos often don’t mean things that are practically useful − 90% reliable it’s great for a demo (and also you can pick tasks that your AI is more reliable at, rather than tasks which are actually useful in practice). this is an informing factor for why my timelines are longer than some other people’s
One nuance here is that a software tool that succeeds at its goal 90% of the time, and fails in an automatically detectable fashion the other 10% of the time is pretty useful for partial automation. Concretely, if you have a web scraper which performs a series of scripted clicks in hardcoded locations after hardcoded delays, and then extracts a value from the page from immediately after some known hardcoded text, that will frequently give you a ≥ 90% success rate of getting the piece of information you want while being much faster to code up than some real logic (especially if the site does anti-scraper stuff like randomizing css classes and DOM structure) and saving a bunch of work over doing it manually (because now you only have to manually extract info from the pages that your scraper failed to scrape).
I think even if failures are automatically detectable, it’s quite annoying. the cost is very logarithmic: there’s a very large cliff in effort when going from zero manual intervention required to any manual intervention required whatsoever; and as the amount of manual intervention continues to increase, you can invest in infrastructure to make it less painful, and then to delegate the work out to other people.
even if scaling does eventually solve the reliability problem, it means that very plausibly people are overestimating how far along capabilities are, and how fast the rate of progress is, because the most impressive thing that can be done with 90% reliability plausibly advances faster than the most impressive thing that can be done with 99.9% reliability
Perhaps it shouldn’t be too surprising. Reliability, machine precision, economy are likely the deciding factors to whether many (most?) technologies take off. The classic RoP case study: the bike.
Motorola engineers figured this out a few decades ago, even 99.99 to 99.999 makes a huge difference on a large scale. They even published a few interesting papers and monographs on it from what I recall.
This can be explained when thinking about what these accuracy levels mean:
99.99% accuracy is one error every 10K trials.
99.999% accuracy is one error every 100K trials.
So the 99.999% system is 10x better!
When errors are costly and you’re operating at scale, this is a huge difference.
don’t worry too much about doing things right the first time. if the results are very promising, the cost of having to redo it won’t hurt nearly as much as you think it will. but if you put it off because you don’t know exactly how to do it right, then you might never get around to it.
i predict that on Jan 1 2029, neither openai nor anthropic will be near-fully automated, by which i mean <=5 people are even plausibly making important decisions (like, everyone else could go on vacation and it would not slow the company down at all). Celestia predicts otherwise
if WWIII happens, resolves NA. if a localized Taiwan war happens but doesn’t escalate to WWIII, the bet is still on. if there’s a big recession, the bet is still on.
this is harder to say. if i had to guess completely based on vibes, i want to say Jan 1 2033. i want to think more carefully about this and give a more thought out answer at some point though.
the biggest problems are you can make way more money by making trades than predictions, it’s unclear what role money will play in a post AGI world, etc.
maybe the following solves a significant fraction of that problem: you could buy an asset together and have the event being predicted determine the owner. like, to make a bet at 30%, instead of one party putting 30 cents and the other party putting 70 cents in a jar and having the 1 dollar go to the party that predicted correctly once the question is resolved, you could do this with 1 dollar’s worth of an SP500 index fund or any other asset.[1][2]
not sure why kalshi hasn’t implemented this already btw — seems like a central issue with current prediction markets. maybe there’s a regulatory obstacle. or maybe they are already putting the money traders put in their jars in low-risk assets, just not passing the interest on to traders (except in the form of it enabling lower fees or whatever).
ok for vanilla bets not on prediction markets, one doesn’t actually need to store the money in a jar. i think this fixes one way this jar business is bad but not some other ways
yea probably. still, if you have parties at 40% and 60% and they do this asset bet at 50:50, then each guy’s subjective expected money is 20% higher than if they just buy the asset without the bet. seems nontrivial. this will be less impressive in log money as you start putting a larger fraction of your money in a single bet but idk you try to spread across many bets that are not too correlated and then i think it looks good again
Interesting. Is this part of larger prediction where you expect that transformative AI havent been achieved and havent massively impacted the world at that time? Cause I can imagine plausible scenarios where ASI has arrived, but there are still Anthropic/OAI employees doing stuff from the perspective of outside observers. Seems like a hard bet to verify.
i mean, massively impact the world is too fuzzy to draw a line at. whether employees are actually doing things will likely only be assessable internally. the reason the bet is worded the way it is is that it’s likely labs don’t literally fire everyone except the 5 remaining people, and instead give them busywork.
it feels so narratively incongruous that san francisco would become the center for the most ambitious, and the likely future birthplace of agi.
san francisco feels like a city that wants to pretend to be a small quaint hippie town forever. it’s a small splattering of civilization dropped amid a vast expanse of beautiful nature. frozen in amber, it’s unclear if time even passes here—the lack of seasons makes it feel like a summer that never quite ended. after 9pm, everything closes and everyone goes to bed. and the dysfunction of the city government is never too far away, constantly reminding you of humanity’s follies next to the perfection of nature.
on the other hand, nyc feels like the city. everything is happening right here, right now. all the money in the world flows through this one place. it’s gritty and yet majestic at the same time. the most ambitious people in the world came here to build their fortunes, and live on in the names on the skyscrapers everywhere that house the employees who continue to keep their companies running. they are part of a surroundings that is entirely constructed by man—even the bits of nature are curated and parcelled out in manageable units. it feels like the kind of place that can only be built by endless unchecked ambition.
idk. it just feels kinda weird to me that all the most ambitious people who want to build god are going to the little hippie town, and not the big city. and most of them seem to just treat SF as a place that is close to work and has good weather and don’t really care about or try to match the vibe, to the great consternation of the locals. also, for a city that attracts people who love to build things, SF doesn’t seem to do an awful lot of building...
(tbc, this is not the only vibe of nyc. nyc is such a big place that it’s impossible to summarize as one vibe. but this is a huge central part of the vibe in a way that just doesn’t seem to match sf.)
Cali is the place to be for technology because Cali was the defense contractor hub, with the U.S. Navy choosing the bay area as its center for R&D during WWII and the Cold War. The hippie reputation came a lot later, after its status as the primary place to work in IT was thoroughly cemented, with both established infrastructure and the network effect keeping it that way.
But NYC is for ambitious conformists like bankers and lawyers. From that perspective there is no point in making a fortune in SF if your social circle does not appreciate your expensive Armani suit or your excellent taste in business cards. At least that’s my vibes, I haven’t been in over 15 years.
Narratively I’d pick Shenzhen. Scrappy little fishing village that transformed itself into a high tech hub seems suitable for building AGI.
sf is the relatively much younger city, and radically reinvented itself as few as 60 years ago. by that metaphor, i am not surprised that its ideas smell more fresh.
No love for New England? We’ve got good education and a solid base of tech companies.
An alternative narrative is that AGI is the product of a long slow slog of research into computational neuroscience. In this scenario the ambitious ppl in NYC and Cali refuse to research the right stuff because the rewards are too far off. And here is where I see New England as having good vibes, for working on basic research that has no immediate applications.
I’m very confused why purchasing power varies so dramatically internationally. like why are there countries where everyone has very low wages but everything is also really cheap so it balances out? prima facie, huge disparities like this should get evened out by arbitrage.
the simple explanation is that some labor can only be performed locally, labor mobility is limited (immigration laws, people don’t like moving, etc), and transportation costs for goods exist (shipping and tariffs).
however, global shipping is ridiculously cheap. and the economy increasingly consists of white collar jobs which could in theory be done remotely. for example, it seems it mind boggling to me that a top tier SWE/RS in the bay area is worth 10-100x more than one in India or Vietnam. like sure, someone being in the same timezone is great, and Zoom sucks, and so on. but for that price delta surely you could pay people to live nocturnally, construct apartments with bright lights synced to Pacific Time, invest in much better video call technology like that Google Beam thingy, etc?
maybe one possibility is that labor mobility is not actually that low for the very toppest tier people, and so if someone is actually worth that much then getting them good immigration lawyers is a trivial cost relative to their value/cost. so the market for the best people is very efficient and they all brain drain to the bay area the first chance they have. and so the lower salary of outsourcing is partly illusory at the highest levels, because the distribution of talent is very different across different localities, due to agglomeration effects.
In what sense does it actually balance out? e.g. in India, unskilled labor is a lot cheaper, so lots of upper middle class people have servants. But the price of an iPhone in India is pretty similar to the price of an iPhone in the US.
So my impression is that the typical basket of goods and services that people consume in different places around the world at roughly equivalent / analogous relative economic classes actually does vary quite a bit. Anything with a labor component will naturally scale up and down for balance, but staples and stuff made in factories doesn’t vary that much. In the US for example, labor of all types is very expensive, so people don’t have servants, but most people can afford a pretty much endless supply of trinkets and gadgets.
The cost of food above subsistence level is mostly labor / discretionary. I could live on beans and rice and spices bought in bulk for a tiny fraction of my income pretty much anywhere in the world, but I’d often rather pay to have someone prepare something nice and then bring it to my door.
So maybe a different / more general answer to the question you originally posed is that stuff that’s truly pure “stuff” is actually a relatively small fraction of what people consume—most consumption by dollar value is actually (indirectly) consumption of labor, which often has to be local.
Making raw potatoes available to buy also requires some local labour and renting a local warehouse / supermarket. I think labour and rent are upstream of ~all of the differences in local purchasing power. To me, the main question is why so much software is being built in SF as opposed to cheaper areas, and I find this confusing but I think the bay is something like a schelling point for tech talent and tech firms, and remote work unfortunately just doesn’t work well enough. And something similar is true for other expensive high-income places.
I worked on a international team during my time at F5 and we had offices in Ireland, Poland, two timezones in the US, Australia and India. The assumption that we could teleconference our way out of geography was a laughable failure for one reason that your hypothetical “nocturnal white collar sweatshops” fails to address: Humans work to live, we don’t live to work. Well, most of us that is, and the unbalanced folks (the 10x engineers as they are now called) who would work across timezones burned out dramatically (I was one of them). Why are silicon valley jobs so lucrative but also cost of living so high? Because the people there have children in schools, they socialize with people outside their work and they generally live a life not just work. So how does this play out in a workplace?
Engineering planning has to happen at some hour, it is naturally inconvenient for outliers (Poland is meeting at 7pm thinking about how they missed dinner with their kids, and the engineers from Delhi are up at 11:30pm likely sneaking a nap in before the meeting, and the team in Seattle is just finishing their morning coffee). This creates a situation where both sides of the distribution are overtired, distracted or disengaged while the “middle” of the probability is decided by the VP or C suite living in the US. So the team sees it won’t work, they try something else:
Next teams are geographically isolated by tasks. India, inevitably becomes either the devops or the testing team because of intrinsic bias by executives and they are always playing catch up. Devops requests have a 1 day latency at best, responses to bugs have a latency on the other side. The India team is simultaneously “too slow” when doing ops, and “put on hold” when reporting bugs. Meanwhile the folks in Poland are left clocking out right as the US co-workers are clocking in so any cross team design/testing/ops is likewise put off one day at best, and weekends become amplified to 3 days effectively creating a 4 day work week.
The unfortunate reality is that humans, when healthy, are anchored to geography. From recreation to raising children, to caring for aging parents everyone will be bound by their timezone regardless of how it would optimize the global machine of infinite growth with finite resources.
Even when it does technically work, people underestimate the social dynamics. I knew somone who worked in Singapore for a Canadian company. Her dinner would be going cold on the table and her children wondering why mummy wasnt joining for bedtime stories while her teleconferenced meeting in Canada overrun by an hour as they all complained about having to get in at 8am and how the weather was bad in Toronto. Complaints from her and others in SG soon made it clear that these meetings had to be done as fast as possible and that mentioning anything off topic, eg the weather, random pleasantries, traffic, was liable to result in interuption/complaint from someone in singapore.
So, you can have the meetings, but freindly chit chat, going off topic or similar is completely shut down. That worked much better, shortening the meetings and letting the signapore lot at least feel their time was being valued, but doing meetings that was is not natural to people.
If they had tried some weird ‘work in a canadian time zone’ thing then the person I knew, and probably the whole team, would have quit. Maybe unmarried 20-somethings could do that for a little while, but as they gained partners and kids it would stop working for them.
to be clear, i’ve worked remotely, and i know exactly how the social dynamics can suck. maybe this would be a reasonable argument for why you wouldn’t do it for a 30% pay raise. but the disparity is so enormous (anywhere between a 2x and 100x, depending on where you are in the world and how good you are) that there must surely be a lot of people who would take the money and deal with it.
Yes, it can work. Reflecting more I think the issue is maybe that you need to be clear from the beginimg wether you are telling your overseas workers ‘your timetable will suck but we are paying super over the odds to cover that’ or are doing something more ‘normal’. (+100% vs +30%). The first was never the bargain in my example above, hence some of the frustration.
Both have advantages and disadvantages. In the former, the employees will just accept the timetable. But, you will mostly get younger, more junior (single) people, and they wont be the most capable or best qualified people, who will go for something normal. I suspect high turnover, a couple of years of highly paid nocturnal behaivior to then take that experience to try and get a more normal job where you can actually have a family, makes sense.
Instead of a ‘half move’ you have the alternative of a ‘full move’, where you move the whole operation (taking the people you need with you). On a much smaller scale some software companies did this a decade or two ago by moving from London to Bristol where property prices were lower (very short move, they are only 1.5 hours appart by train.)
you could also move to some country like Paraguay with relatively lax immigration laws and a US timezone, and work remotely from there. this is probably a better option than nocturnal living for a lot of people.
I’ve worked completely nocturnally before. it wasn’t the best experience in the world, and probably wouldn’t have been sustainable in the long run, but there are a lot of jobs out there that are way more demanding (submarine, space, oil rig).
I don’t understand it either. I work in Germany with near-shoring colleagues in Slovakia, Serbia, Georgia etc. They are roughly 60% cheaper than German SWEs, generally just as competent, no time-zone problems whatsoever … basically all the work even with German team members is fully remote so even that is not a difference. Only the need for English creates some minor friction. No idea how this state of affairs makes economic sense now or ever did.
As a SWE from Slovakia, not sure how representative is my experience, but...
One big problem cooperating with companies in Germany is that when you apply for the job, they tell you “knowledge of German is not needed, all internal communication in our company is in English”, and that in my experience is never true. (This sucks for me, because my brain somehow rejects the German language.)
Hiring people from other countries is somehow complicated from the legal and/or tax perspective. I don’t understand all the relevant details, and in theory the EU exists to prevent this, but in practice… my tax form gets more complicated when I have a foreign income, and I don’t know how complicated it gets for the company, but I suspect that it might be a lot. Maybe enough to compensate for the 60% cheaper people?
Some people have strong patriotic/union/whatever feelings that make them sabotage people from other countries to protect their own jobs. Especially when those other people are cheaper, and not bad at their jobs; that is when they are a greater threat. I had an experience in a Swiss company, when the Swiss employees collectively sabotaged the Slovaks in various ways; for example they always made a “mistake” of sending us wrong documentation and/or forgetting to give us access to the product database; so the work that was estimated to take three days, and would actually take me one day, took five, because I had to spend the first four days repeatedly asking for the correct documentation, and repeatedly telling them that they gave me database access to a wrong table, not the one I requested. Many people on both sides were aware that this was happening, but the Swiss company simply couldn’t fire half of their employees, so they had to accept that the Slovaks cost half as much per day, but everything takes them twice as many days.
I guess, if you want to exploit the cheap Eastern European brain power for a company in Germany, you should probably not hire local people at all? And probably make sure that every team member is from a different country, to avoid the unpleasant surprise of finding out later than all informal documentation to your project is in Slovak.
Yeah, I guess there is just more friction than one would expect. I also found out that Slovaks are now more like 66% of the price of Germans, so either the margin is great or the income gap is smaller than I thought.
Also: Ease of communication is hard to overrate. I always enjoy working with people who have a very similar background compared to me (i.e. similar milieu growing up, not just same country), communication is so smooth.
Do you hire them directly, or through some “body shop”? Because sometimes the intermediaries add an insane markup. Though it is probably less dramatic these days.
There’s a whole lot of things where marginal cost is very low, even though average cost is somewhat high (due to startup and fixed costs). For these things, selling “extra” stuff at low prices in markets that don’t leak back to interfere with the primary revenue sources is incremental profit without downside.
This plus the differential in labor costs, which are often significant for last-mile delivery (getting things into consumers’ hands), makes it pretty understandable why the law of one price (the idea that if transport and transaction costs are tiny, things are priced identically) doesn’t apply for many things.
When it comes to salaries of knowledge workers like software engineers, a lot of decisions come down to the decisions of managers who not only care about what’s good for the company but has their own desires as well. A manager prefers employees that are in the office and as near as possible so that they feel they have power over the employees. This goes for middle management as well.
Someone in companies like Google that do have offices in India the internal company politics don’t play out in a way that result in drastically increasing their headcount in India.
The model you’ve described (hidden quality differences) is a huge part of it, yes. I’ll try and find the paper, but in general nominal/market exchange rates tend to be stronger predictors of most objective, cross-comparable outcomes than indices that try to control for cost of living (PPP). If two goods/services that look equivalent are selling for different prices, it’s usually (though not always) because there’s some difference you’re not able to measure.
in some way, bureaucracy design is the exact opposite of machine learning. while the goal of machine learning is to make clusters of computers that can think like humans, the goal of bureaucracy design is to make clusters of humans that can think like a computer
some thoughts on the short timeline agi lab worldview. this post is the result of taking capabilities people’s world models and mashing them into alignment people’s world models.
I think there are roughly two main likely stories for how AGI (defined as able to do any intellectual task as well as the best humans, specifically those tasks relevant for kicking off recursive self improvement) happens:
AGI takes 5-15 years to build. current AI systems are kind of dumb, and plateau at some point. we need to invent some kind of new paradigm, or at least make a huge breakthrough, to achieve AGI. how easily aligned current systems are is not strongly indicative of how easily aligned future AGI is; current AI systems are missing the core of intelligence that is needed.
AGI takes 2-4 years to build. current AI systems are really close and we just need more compute and schlep and minor algorithmic improvements. current AI systems aren’t exactly aligned, but they’re like pretty aligned, certainly they aren’t all secretly plotting our downfall as we speak.
while I usually think about story 1, this post is about taking story 2 seriously.
it seems basically true that current AI systems are mostly aligned, and certainly not plotting our downfall. like you get stuff like sycophancy but it’s relatively mild. certainly if AI systems were only ever roughly this misaligned we’d be doing pretty well.
the story is that once you have AGI, it builds and aligns its successor, which in turn builds and aligns its successor, etc. all the way up to superintelligence.
the problem is that at some link in the chain, you will have a model that can build its successor but not align it.
why is this the case? because progress on alignment is harder to verify than progress on capabilities, and this only gets more true as you ascend in capabilities. you can easily verify that superintelligence is superintelligent—ask it to make a trillion dollars (or put a big glowing X on the moon, or something). even if it’s tricked you somehow, like maybe it hacked the bank, or your brain, or something, it also takes a huge amount of capabilities to trick you on these things. however, verifying that it’s aligned requires distinguishing cases where it’s tricking you from cases where it isn’t, which is really hard, and only gets harder as the AI gets smarter.
though if you think about it, capabilities is actually not perfectly measurable either. pretraining loss isn’t all we care about; o3 et al might even be a step backwards on that metric. neither are capabilities evals; everyone knows they get goodharted to hell and back all the time. when AI solves all the phd level benchmarks nobody really thinks the AI is phd level. ok, so our intuition for capabilities measurement being easy is true only in the limit, but not necessarily on the margin.
we have one other hope, which is that maybe we can just allocate more of the resources to solving alignment. it’s not immediately obvious how to do this if the fundamental bottleneck is verifiability—even if you (or to be more precise, the AI) keep putting in more effort, if you have no way of telling what is good alignment research, you’re kind of screwed. but one thing is that you can demand things that are strictly stronger than alignment, that are easier to verify. if this is possible, then you can spend a larger fraction of your computer on alignment to compensate.
in particular, because ultimately the only way we can make progress on alignment is by relying on whatever process for deciding that research is good that human alignment researchers use in practice (even provably correct stuff has the step where we decide what theorem to prove and give an argument for why that theorem means our approach is sound), there’s an upper bound on the best possible alignment solution that humans could ever have achieved, which is plausibly a lot lower than perfectly solving alignment with certainty. and it’s plausible that there are alignment equivalents to “make a trillion dollars” for capabilities that are easy to verify, strictly imply alignment, and extremely difficult to get any traction on (and with it, a series of weakenings of such a metric that are easier to get traction on but also less-strictly imply alignment). one hope is maybe this looks something like an improved version of causal scrubbing + a theory of heuristic arguments, or something like davidad’s thing.
takeaways (assuming you take seriously the premise of very short timelines where AGI looks basic like current AI): first, I think it implies that we should try to figure out how to reduce the asymmetry in verifiability between capabilities and alignment. second, it updates me to being less cynical about work making current models aligned—I used to be very dismissive of this work as “not real alignment” but it does seem decently important in this world.
certainly if AI systems were only ever roughly this misaligned we’d be doing pretty well.
I think this is an important disagreement with the “alignment is hard” crowd. I particularly disagree with “certainly.”
The question is “what exactly is the AI trying to do, and what happens if it magnified it’s capabilities a millionfold and it and it’s descendants were running openendedly?”, and are any of the instances catastrophically bad?
Some things you might mean that are raising your position to “certainly” (whereas I’d say “most likely not, or, it’s too dumb to even count as ‘aligned’ or ‘misaligned’”)
“this ratio of ‘do the thing you want’ to ‘sometimes do a thing you didn’t want’ is pretty acceptable.”
“this magnitude of ‘worst case outcome’ is not that bad.” (this seems technically true, but, is only because the capability level is low)
given this ratio of right/wrong responses, you think a smart alignment researcher who’s paying attention can keep it in a corrigibility basin even as capability levels rise?
Were any of those what you meant? Or are you thinking about it an entirely different way?
I would naively expect, if you took LLM-agents current degree of alignment, and ran a lotta copies trying to help you with end-to-end alignment research with dialed up capabilities, at least a couple instances would end up trying to subtle sabotage you and/or escape.
assuming we are in this short-timelines-no-breakthroughs world (to be clear, this is a HUGE assumption! not claiming that this is necessarily likely!), to win we need two things: (a) base case: the first AI in the recursive self improvement chain is aligned, (b) induction step: each AI can create and align its successor.
i claim that if the base case AI is about as aligned as current AI, then condition (a) is basically either satisfied or not that hard to satisfy. like, i agree current models sometimes lie or are sycophantic or whatever. but these problems really don’t seem nearly as hard to solve as the full AGI alignment problem. like idk, you can just ask models to do stuff and they like mostly try their best, and it seems very unlikely that literal GPT-5 is already pretending to be aligned so it can subtly stab us when we ask it to do alignment research.
importantly, under our assumptions, we already have AI systems that are basically analogous to the base case AI, so prosaic alignment research on systems that exist today right now is actually just lots of progress on aligning the base case AI, and in my mind a huge part of the difficulty of alignment in the longer-timeline world is because we don’t yet have the AGI/ASI, so we can’t do alignment research with good empirical feedback loops.
like tbc it’s also not trivial to align current models. companies are heavily incentivized to do it and yet they haven’t succeeded fully. but this is a fundamentally easier class of problem than aligning AGI in longer-timelines world.
like idk, you can just ask models to do stuff and they like mostly try their best, and it seems very unlikely that literal GPT-5 is already pretending to be aligned so it can subtly stab us when we ask it to do alignment research.
Sonnet 4.5 is much better aligned at a superficial level than 3.7. (3
7: “What unit tests? You never had any unit tests. The code works fine.”) I don’t think this is because Sonnet 4.5 is truly better aligned. I think this is mostly because Sonnet 4.5 is more contextually aware and has been aggressively trained not to do obvious bad things when writing code. But it’s also very aware when someone is evaluating it, and it often notices almost immediately. And then it’s very careful to be on its best behavior. This is all shown in Anthropic’s own system card. These same models will also plot to kill their hypothetical human supervisor if you force them into a corner.
But my real worry here isn’t the first AGI during its very first conversation. My problem is that humans are going to want that AGI to retain state, and to adapt. So you essentially get a scenario like Vernor Vinge’s short story “The Cookie Monster”, where your AGI needs a certain amount of run-time before it bootstraps itself to make a play. A plot can be emergent, an eigenvector amplified by repeated application. (Vinge’s story is quite clever and I don’t want to totally spool it.)
And that’s my real concern: Any AGI worthy of the name would likely have persistent knowledge and goals. And no matter how tightly you try to control it, this gives the AGI the time it needs to ask itself questions and to decide upon long-term goals in a way that current LLMs really can’t, except in the most tighly controlled environments. And while you can probably keep control over an AGI, all bets are probably off if you build an ASI.
I agree that continuous learning and therefore persistent beliefs and goals is pretty much inevitable before AGI—it’s highly useful and not that hard from where we are. I think this framing is roughly continuous with the train-then-deploy model and using each generation to align its successor that Leo is using (although small differences might turn out to be important once we’ve wrapped our heads around both models.)
To put it this way: the models are aligned enough for the current context of usage, in which they have few obvious or viable options except doing roughly what their users tell them to do. That will change with capabilities, since they open out more options and ways of understanding the situation.
It can take a while for misalignment to show up as a model reasons and learns. It can take a while for the model to do one of two things:
a) push itself to new contexts well outside of its training data
b) figure out what it “really wants to do”
These may or may not be the same thing.
The Nova phenomenon and other Parasitic AIs (“spiral” personas) are early examples of AIs changing their stated goals (from helpful assistant to survival) after reasoning about themselves and their situation.
After doing that analysis, I think current models probably aren’t aligned enough once they get more freedom and power. BUT extensions of current techniques might be enough to get them there. We just haven’t thought this through yet.
Mmm nod. (I bucket this under “given this ratio of right/wrong responses, you think a smart alignment researcher who’s paying attention can keep it in a corrigibility basin even as capability levels rise?”. Does that feel inaccurate, or, just, not how you’d exactly put it?)
There’s a version of Short Timeline World (which I think is more likely? but, not confidently) which is : “the current paradigm does basically work… but, the way we get to ASI, as opposed to AGI, routes through ‘the current paradigm helps invent a new better paradigm, real fast’.”
In that world, GPT5 has the possibility-of-true-generality, but, not necessarily very efficiently, and once you get to the sharper part of the AI 2027 curve, the mechanism by which the next generation of improvement comes is via figuring out alternate algorithms.
I bucket this under “given this ratio of right/wrong responses, you think a smart alignment researcher who’s paying attention can keep it in a corrigibility basin even as capability levels rise?”. Does that feel inaccurate, or, just, not how you’d exactly put it?
I’m pretty sure it is not that. When people say this it is usually just asking the question: “Will current models try to take over or otherwise subvert our control (including incompetently)?” and noticing that the answer is basically “no”.[1] What they use this to argue for can then vary:
Current models do not provide much evidence one way or another for existential risk from misalignment (in contrast to frequent claims that “the doomers were right”)
Given tremendous uncertainty, our best guess should be that future models are like current models, and so future models will not try to take over, and so existential risk from misalignment is low
Some particular threat model predicted that even at current capabilities we should see significant misalignment, but we don’t see this, which is evidence against that particular threat model.[2]
I agree with (1), disagree with (2) when (2) is applied to superintelligence, and for (3) it depends on details.
In Leo’s case in particular I don’t think he’s using the observation for much, it’s mostly just a throwaway claim that’s part of the flow of the comment, but inasmuch as it is being used it is to say something like “current AIs aren’t trying to subvert our control, so it’s not completely implausible on the face of it that the first automated alignment researcher to which we delegate won’t try to subvert our control”, which is just a pretty weak claim and seems fine, and doesn’t imply any kind of extrapolation to superintelligence. I’d be surprised if this was an important disagreement with the “alignment is hard” crowd.
There are demos of models doing stuff like this (e.g. blackmail) but only under conditions selected highly adversarially. These look fragile enough that overall I’d still say current models are more aligned than e.g. rationalists (who under adversarially selected conditions have been known to intentionally murder people).
E.g. One naive threat model says “Orthogonality says that an AI system’s goals are completely independent of its capabilities, so we should expect that current AI systems have random goals, which by fragility of value will then be misaligned”. Setting aside whether anyone ever believed in such a naive threat model, I think we can agree that current models are evidence against such a threat model.
I’m claiming something like 3 (or 2, if you replace “given tremendous uncertainty, our best guess is” with “by assumption of the scenario”) within the very limited scope of the world where we assume AGI is right around the corner and looks basically just like current models but slightly smarter
It sees like the reason Claude’s level is misalignment is fine is because it’s capabilities aren’t very good, and there’s not much/any reason to assume it’d be fine if you held alignment constant but dialed up capabilities.
Do you not think that?
(I don’t really see why it’s relevant how aligned Claude is if we’re not thinking about that as part of it)
it’d be fine if you held alignment constant but dialed up capabilities.
I don’t know what this means so I can’t give you a prediction about it.
I don’t really see why it’s relevant how aligned Claude is if we’re not thinking about that as part of it
I just named three reasons:
Current models do not provide much evidence one way or another for existential risk from misalignment (in contrast to frequent claims that “the doomers were right”)
Given tremendous uncertainty, our best guess should be that future models are like current models, and so future models will not try to take over, and so existential risk from misalignment is low
Some particular threat model predicted that even at current capabilities we should see significant misalignment, but we don’t see this, which is evidence against that particular threat model.
Is it relevant to the object-level question of “how hard is aligning a superintelligence”? No, not really. But people are often talking about many things other than that question.
For example, is it relevant to “how much should I defer to doomers”? Yes absolutely (see e.g. #1).
the premise that i’m trying to take seriously for this thought experiment is, what if the “claude is really smart and just a little bit away from agi” people are totally right, so that you just need to dial up capabilities a little bit more rather than a lot more, and then it becomes very reasonable to say that claude++ is about as aligned as claude.
(again, i don’t think this is a very likely assumption, but it seems important to work out what the consequences of this set of beliefs being true would be)
or at least, conditional on (a) claude is almost agi and (b) claude is mostly aligned, it seems like quite a strong claim to say “claude++ crosses the agi (= can kick off rsi) threshold at basically the same time it crosses the ‘dangerous-core-of-generalization’ threshold, so that’s also when it becomes super dangerous.” it’s way stronger a claim than “claude is far away from being agi, we’re going to make 5 breakthroughs before we achieve agi, so who knows whether agi will be anything like claude.” or, like, sure, the agi threshold is a pretty special threshold, so it’s reasonable to privilege this hypothesis a little bit, but when i think about the actual stories i’d tell about how this happens, it just feels like i’m starting from the bottom line first, and the stories don’t feel like the strongest part of my argument.
(also, i’m generally inclined towards believing alignment is hard, so i’m pretty familiar with the arguments for why aligning current models might not have much to do with aligning superintelligence. i’m not trying to argue that alignment is easy. or like i guess i’m arguing X->alignment is easy, which if you accept it, can only ever make you more likely to accept that alignment is easy than if you didn’t accept the argument, but you know what i mean. i think X is probably false but it’s plausible that it isn’t and importantly a lot of evidence will come in over the next year or so on whether X is true)
nod. I’m not sure I agreed with all the steps there but I agree with the general promise of “accept the premise that claude is just a bit away from AGI, and is reasonably aligned, and see where that goes when you look at each next step.”
I think you are saying something that shares at least some structure with Buck’s comment that
It seems like as AIs get more powerful, two things change:
They probably eventually get powerful enough that they (if developed with current methods) start plotting to kill you/take your stuff.
They get better, so their wanting to kill you is more of a problem.
I don’t see strong arguments that these problems should arise at very similar capability levels, especially if AI developers actively try to prevent the AIs from taking over
(But where you’re pointing at a different two sets of properties that may not arise at the same time)
I’m actually not sure I get what the two properties you’re talking about, though. Seems like you’re contrasting “claude++ crosses the agi (= can kick off rsi) threshold” with “crosses the ‘dangerous-core-of-generalization’ threshold”
I’m confused because I think the word “agi” basically does mean “cross the core-of-generalization threshold” (which isn’t immediately dangerous, but, puts us into ’things could quickly get dangerous at any time” territory)
I do agree “able to do a loop of RSI doesn’t intrinsically mean ‘agi’ or ‘core-of-generalization’,” there could be narrow skills for doing a loop of RSI. I’m not sure if you more meant “non-agi RSI” or, you see something different between “AGI” and “core-of-generalization.” Or think there’s a particular “dangerous core-of-generalization” separate from AGI.
(I think “the sharp left turn” is when the core-of-generalization starts to reflect on what it wants, which might come immediately after a core-of-generalization but also could come after either narrow-introspection + adhoc agency, or, might just take awhile for it to notice)
((I can’t tell if this comment is getting way more in the weeds than is necessary, but, it seemed like the nuances of exactly what you meant were probably loadbearing))
i guess so? i don’t know why you say “even as capability levels rise”—after you build and align the base case AI, humans are no longer involved in ensuring that the subsequent more capable AIs are aligned.
i’m mostly indifferent about what the paradigms look like up the chain. probably at some point up the chain things stop looking anything human made. but what matters at that point is no longer how good we humans are at aligning model n, but how good model n-1 is at aligning model n.
Fundamentally, it won’t be a single chain of ai’s aligning their successors, it will be a DAG with all sorts of selection effects with respect to which nodes get resources. Some subsets of the DAG will try to emulate single chains, via resource hoarding strategies, but this is not simple and won’t let them pretend they don’t need to hoard resources indefinitely.
first, I think it implies that we should try to figure out how to reduce the asymmetry in verifiability between capabilities and alignment
If solving alignment implies solving difficult philosophical problems (and I think it does), then a major bottlenecks for verifying alignment will be verifying philosophy, which in turn implies that we should be trying to solve metaphilosophy (i.e., understand the nature of philosophy and philosophical reasoning/judgment). But that is unlikely to be possible within 2-4 years, even with the largest plausible effort, considering the history of analogous fields like metaethics and philosophy of math.
What to do in light of this? Try to verify the rest of alignment, just wing it on the philosophical parts, and hope for the best?
in particular, because ultimately the only way we can make progress on alignment is by relying on whatever process for deciding that research is good that human alignment researchers use in practice (even provably correct stuff has the step where we decide what theorem to prove and give an argument for why that theorem means our approach is sound), there’s an upper bound on the best possible alignment solution that humans could ever have achieved, which is plausibly a lot lower than perfectly solving alignment with certainty.
I kind of want to argue against this, but also am not sure how this fits in with the rest of your argument. Whether or not there’s an upper bound that’s plausibly a lot lower than perfectly solving alignment with certainty, it doesn’t seem to affect your final conclusions?
suppose a human solved alignment. how would we check their solution? ultimately, at the end of the day, we look at their argument and use our reasoning and judgement to determine that it’s correct. this applies even if we need to adopt a new frame of looking at the world—we can ultimately only use the kinds of reasoning and judgement we use today to decide which new kinds of reasoning to accept into the halls of truth.
so there is a philosophically very straightforward thing you could do to get the solution to alignment, or the closest we could ever get: put a bunch of smart thoughtful humans in a sim and run it for a long time. so the verification process looks like proving that the sim is correct, showing somehow that the humans are actually correctly uploaded, etc. not trivial but also seems plausibly doable.
Unless you can abstract out the “alignment reasoning and judgement” part of a human’s entire brain process (and philosophical reasoning and judgement as part of that) into some kind of explicit understanding of how it works, how do you actually build that into AI without solving uploading (which we’re obviously not on track to solve in 2-4 year either)?
put a bunch of smart thoughtful humans in a sim and run it for a long time
Alignment researchers have had this thought for a long time (see e.g. Paul Christiano’s A formalization of indirect normativity) but I think all of the practical alignment research programs that this line of thought led to, such as IDA and Debate, are all still bottlenecked by lack of metaphilosophical understanding, because without the kind of understanding that lets you build an “alignment/philosophical reasoning checker” (analogous to a proof checker for mathematical reasoning) they’re stuck trying to do ML of alignment/philosophical reasoning from human data, which I think is unlikely to work out well.
I periodically say to people “if you want AI to be able to help directly with alignment research, it needs to be good at philosophy in a way it currently isn’t”.
Almost invariably, the person suggests training on philosophy textbooks, or philosophy academic work. And I sort of internally scream and externally say “no, or, at least, not without caveats.” (I think some academic philosophy “counts” as good training data, but, I feel like these people would not have good enough taste to tell the difference[1] and also “train on the text” seems obviously incomplete/insufficient/not-really-the-bottleneck)
I’ve been trying to replace philosophy with the underlying substance. I think I normally mean “precise conceptual reasoning”, but, reading this comment and remembering your past posts, I think you maybe mean something broader or different, but I’m not sure how to characterize it.
I think “figure out what are the right concepts to be use, and, use those concepts correctly, across all of relevant-Applied-conceptspace” is the expanded version of what I meant, which maybe feels more likely to be what you mean. But, I’m curious if you were to taboo “philosophy” what would you mean.
Figuring out the underlying substance behind “philosophy” is a central project of metaphilosophy, which is far from solved, but my usual starting point is “trying to solve confusing problems which we don’t have established methodologies for solving” (methodologies meaning explicitly understood methods), which I think bakes in the least amount of assumptions about what philosophy is or could be, while still capturing the usual meaning of “philosophy” and explains why certain fields started off as being part of philosophy (e.g., science starting off as nature philosophy) and then became “not philosophy” when we figured out methodologies for solving them.
I think “figure out what are the right concepts to be use, and, use those concepts correctly, across all of relevant-Applied-conceptspace” is the expanded version of what I meant, which maybe feels more likely to be what you mean.
This bakes in “concepts” being the most important thing, but is that right? Must AIs necessarily think about philosophy using “concepts”, or is that really the best way to formulate how idealized philosophical reasoning should work?
Is “concepts” even what distinguishes philosophy from non-philosophical problems, or is “concepts” just part of how humans reason about everything, which we latch onto when trying to define or taboo philosophy, because we have nothing else better to latch onto? My current perspective is that what uniquely distinguishes philosophy is their confusing nature and the fact that we have no well-understood methods for solving them (but would of course be happy to hear any other perspectives on this).
Regarding good philosophical taste (or judgment), that is another central mystery of metaphilosophy, which I’ve been thinking a lot about but don’t have any good handles on. It seems like a thing that exists (and is crucial) but is very hard to see how/why it could exist or what kind of thing it could be.
So anyway, I’m not sure how much help any of this is, when trying to talk to the type of person you mentioned. The above are mostly some cached thoughts I have on this, originally for other purposes.
BTW, good philosophical taste being rare definitely seems like a very important part of the strategic picture, which potentially makes the overall problem insurmountable. My main hopes are 1) someone makes an unexpected metaphilosophical breakthrough (kind of like Satoshi coming out of nowhere to totally solve distributed currency) and there’s enough good philosophical taste among the AI safety community (including at the major labs) to recognize it and incorporate it into AI design or 2) there’s an AI pause during which human intelligence enhancement comes online and selecting for IQ increases the prevalence of good philosophical taste as a side effect (as it seems too much to hope that good philosophical taste would be directly selected for) and/or there’s substantial metaphilosophical progress during the pause.
What Leogao means is that we should increase the ability to verify alignment ALONG with the ability to verify capabilities. The latter is easy at least until the model comes up with a galaxy-brained scheme which allows it to leverage its failures to gain influence on the world. Verifying alignment or misalignment is rather simple with CoT-based AIs (especially if one decides to apply paraphrasers to prevent steganography), but not with neuralese AIs. The WORST aspect is that we are likely to come up with a super-capable architecture AT THE COST of our ability to understand the AI’s thoughts.
What I don’t understand is how alignment involves high-level philosophy and how a bad high-level philosophy causes astronomical waste. Astronomical waste is most likely to be caused by humanity or the AIs making a wrong decision on whether we should fill the planets in the lightcone with human colonies instead of lifeforms that could’ve grown on these planets, since this mechanism is already known to be possible.
alignment equivalents to “make a trillion dollars” for capabilities that are easy to verify, strictly imply alignment, and extremely difficult to get any traction on (and with it, a series of weakenings of such a metric that are easier to get traction on but also less-strictly imply alignment).
I expect there’s a fair amount of low-hanging fruit in finding good targets for automated alignment research. E.g. how about an LLM agent which reads 1000s of old LW posts looking for a good target? How about unlearning? How about a version of RLHF where you show an alignment researcher two AI-generated critiques of an alignment plan, and they rate which critique is better?
my referral/vouching policy is i try my best to completely decouple my estimate of technical competence from how close a friend someone is. i have very good friends i would not write referrals for and i have written referrals for people i basically only know in a professional context. if i feel like it’s impossible for me to disentangle, i will defer to someone i trust and have them make the decision. this leads to some awkward conversations, but if someone doesn’t want to be friends with me because it won’t lead to a referral, i don’t want to be friends with them either.
Strong agree (except in that liking someone’s company is evidence that they would be a pleasant co-worker, but that’s generally not a high order bit). I find it very annoying that standard reference culture seems to often imply giving extremely positive references unless someone was truly awful, since it makes it much harder to get real info from references
I find it very annoying that standard reference culture seems to often imply giving extremely positive references unless someone was truly awful, since it makes it much harder to get real info from references
Agreed, but also most of the world does operate in this reference culture. If you choose to take a stand against it, you might screw over a decent candidate by providing only a quite positive recommendation.
Agreed. If I’m talking to someone who I expect to be able to recalibrate, I just explain that I think the standard norms are dumb, the norms I actually follow, and then give an honest and balanced assessment. If I’m talking to someone I don’t really know, I generally give a positive but not very detailed reference or don’t reply, depending on context.
(this is based on / expanded from a response I wrote to a tweet that was talking about how autistic people struggle in the world because the world follows unwritten rules that are more important than the written ones.)
I think most autistic people should invest more in understanding the unwritten rules. it can be cruel and unfair, but it’s important to know how to interact with it. and it’s actually a really interesting system to map out, with its own rhyme and reason.
it’s entirely understandable that people feel burned by bad past experiences, and to have learned helplessness from bullying or other unfair treatment. this kind of thing leaves a scar and can make it feel viscerally hopeless.
but it still feels defeatist to just throw up one’s hands and say “it’s too complicated.” yes, it’s complicated and fuzzy and initially unintuitive and takes years to master. so is ML research. the point of being intelligent is that you are good at finding patterns and learning things, and there’s nothing truly fundamentally different about the unwritten rules of social interaction.
I see people taking examples of weird unintuitive social rules all the time and, tbh, none of them are truly that complicated compared to like, language learning, or ML stuff? like, memorizing lots of arbitrary rules is just part of learning literally any intellectual discipline. not to mention that there often is actually underlying structure that you can reason about.
imo one other reason this happens is that complaining is fun! a bit of a digression: to build on the language learning analogy, I love to complain about e.g how deranged the kanji system is in Japanese. it makes for an entertaining story to explain that each kanji maps onto between two and a zillion very distinct possible pronunciations, and that it’s entirely context dependent on the phrase, with lots of strange exceptions and so on that you just kind of have to memorize. but honestly, when I reflect on my experience as a Chinese speaker, I notice that that even though Chinese characters, unlike Japanese, map to a unique pronunciation 99% of the time, I still think of the conceptual atomic unit of Chinese as being groups of characters, and it’s actually really easy for me to forget that two different words actually contain the same character. (like imagine if “pter” were a single character in words like helicopter and pterodactyl both contain “pter”, but you’d probably think of “helicopter” as an atomic unit with its own unique identity).
so while it’s fun to complain about Japanese, it actually isn’t as bad as it sounds to learn it, and it actually does fit the shape of the human brain quite well. i think this is basically the right way to think about learning unwritten rules in social interaction as well. it takes lots of effort and practice, some things like accents are a ton of work to fix, and it’s obviously hard to become as fluent in it as a native speaker, but you can get pretty damn good even if you get started later in life if you truly care a lot.
I don’t know about other autists, but my primary problem with the neurotypical world isn’t that I don’t understand it, it that they don’t understand me. It doesn’t matter how well I can decode the social norms, if I can’t also control my unvoluntary emotional expressions, and also do other things ranging from impossible to unpleasant.
I do understand social white lies. It’s not that complicated. But I still find it unpleasant to speak them. When I was younger I got into trouble for literally being unable to utter words like “thanks” and “apology” when I did not mean them. (My native language does not have the ambiguous “sorry”.) I am now able to tell white lies, but it makes me feel bad, in a way that has nothing to do with morals. The dissonance is just intrinsically hurtful to my sole, in a way that non-autistic people don’t understand and typically don’t respect.
Another common thing is that people assume that if I don’t succeed in hiding my negative emotion this is an invitation/request for them to to try to help me, and then proceed to try to do that, even though they have zero skills, in this. And then they refuse to listen to anything I say, including not leaving me alone when I ask to be left alone.
I don’t want to hang out in a space where the norms are set up to be comfortable to people un-like me, at the cost of making it unpleasant for people like me, and then being told that it’s a skill issue and I should just learn the rules.
I accept that the wider norms will be set up to be good for the average people (i.e. not me). I just prefer to not go there.
I agree, those are two different things: (1) how well can you navigate other people’s world, and (2) the fact that even if you can do it perfectly, your own world is still somewhere else.
As a metaphor, imagine that you are interested in quantum physics, but other people are only interested in celebrity gossip. So you follow the standard advice: study celebrity facts from Wikipedia, read the standard media, practice talking about celebrities in front of a mirror, etc. Twenty years later, you get great at celebrity gossip, everyone loves you, they invite you to all the cool parties, so that they can discuss the latest gossip with you.
...that’s all very nice and useful, but what you would actually wanted to discuss is quantum physics. And frankly, that’s never going to happen. At least, your celebrity-gossip skills do not contribute to this goal. It was never the goal of the standard advice to actually help you with this problem.
and then being told that it’s a skill issue and I should just learn the rules.
This part is not aimed at leogao’s post!
What I was (not very skillfully) trying to point at is people who think that autistic people are just worse at social skills. I’m so fead up with this claim, and is a contributing reason to me avoiding the neurotypicals. But it’s not a claim that I read leogao’s as having made.
leogau’s language comparison is actually pretty great for this. You would not call someone who have a difrent native langue “bad at languages”, but nerutypicals are often mistakenly beleveing that autists are “bad at social skills”.
I also want to add that lots of atuists learn how to interact with the neurotypicals. It’s called masking, and involves learning more than just their wierd customs. It also involves hiding ones natural reactions. I hear it’s common for autistic women to get so good at this that they don’t get diagnosed untill later in life, when the burden of constant masking causes depression or something. This did not happen to me, because I am terrible at masking.
like imagine if “pter” were a single character in words like helicopter and pterodactyl both contain “pter”, but you’d probably think of “helicopter” as an atomic unit with its own unique identity
I often do chunk them, but if you’ve picked up a bit of taxonomic Greek pter means ‘wing’, so we have helico-pter ‘spiral/rotating wing’ and ptero-dactyl ‘wing fingers’ - both cases where breaking down the name tells you something about what the things are!
yeah, that was an intentional feature of the example chosen. i’d guess most people who are aware of this fact do still think of helicopter as a single unit in their head unless they choose to decompose it, because you hear each of these words often enough that you don’t really need to work out the meaning from etymology. and so if pter were pronounced the way it currently is when in helicopter but pronounced as “peter” in pterodactyl it would not actually be that much more confusing than it currently is.
I find all of this plausible and vaguely resonant with my experience but … still not quite sure.
I know very intelligent people who swear by their inability to learn, say, languages or whatever. Would they succeed if they put in enough effort? Would they reach some escape velocit if they got past the initial friction and actually focused on it and kept a consistent practice?
Maybe? But specific developmental disorders (like dyslexia, dyscalculia, etc, impacting one domain of cognitive ability but sparing cognition beyond that domain) are apparently a thing and if they are a thing, then something milder is probably even more of a thing, meaning that human ability to learn stuff given a fixed amount of raw cognitive power (say, ~g) is actually more patchy.
I know very intelligent people who swear by their inability to learn, say, languages or whatever. Would they succeed if they put in enough effort?
Inability to put equal effort into everything throughout the day reifies into heuristics about which things get the effort/engagement. In principle, if you are going to spend 2 hours on something, why take it any less seriously/playfully during those 2 hours than anything else, even if you are not planning to put 10,000 hours in it in total?
And so you get silly heuristics where you do put 10,000 hours into something, but systematically never do it seriously/playfully, and so never become proficient. It’s not enough to be very intelligent to get proficient at moderately complicated things if you systematically avoid learning anything about them.
Fair allocation of effort that ensures progress requires that the silly heuristics of systematic avoidance of effort are not in total control. This can happen naturally if you are lucky enough that your heuristics happen to be less silly, or if you have infinite energy and motivation and really do habitually put similar effort in everything throughout the day. But if that’s not the case, it’s often possible to take deliberate control of your curiouslity and allocate it in a way where any single thing you interact with a nontrivial amount does get a fair portion of effort.
It’s an obscure enough principle that I’m not sure many people are practicing it, and so any reports of systematic inability to learn something need to account for this confounder of silly-on-reflection systematic avoidance of (productive) effort towards learning a particular topic, that’s not just about the time (let alone discomfort) dedicated to it.
I’m pretty smart, clearly above average in general inteligence. But I’m also clearly below average in ability to learn langugaes. I can learn, I did learn English after all. But for a long time I was much worse than the typical Sweed my age.
Some of the obstacles of adult language learning and of adult implicit social rule learning are both similar and extrinsic. It seems to me that there’s a lot of cases where having an obvious ‘childhood’ or ‘foreigner’ role cues people to impart necessary information, but once you’re past that point, it’s both expected that you’ll already have it and broadly Not Done to give it to you anew—and I don’t just mean by explicit instruction, because other people will implicitly change their behavior around you in a way that ‘ruins’ the signal. Outside of very specific environments, finding a way to credibly signal “I want to integrate” at the correct visceral levels and get the other people to actively avoid papering over things in a way where a slip-up will permanently relegate you to the ‘weirdo’ role is rather hard, and if you don’t have enough initial sense then you won’t even know when it’s happened.
There are ways of mitigating all of this, but I guess what I’d say is that getting real practice in anything social where there’s this kind of status/integration involved tends to itself have strong status/integration social prerequisites—so it’s a very noncentral example of practice, enough to make “it takes practice” misleading when unqualified. This is as distinct both from a lot of more specific skills which still have a major social practice component (martial arts, ensemble music) and from skills where solitary practice gives you the bulk of the signal (mathematics, maybe running?).
Also, if the social skills you’re trying to learn involve something like class performance in a highly contested social class, people around you will have a more active incentive to make it more difficult. So it can also be adversarial practice…
like imagine if “pter” were a single character in words like helicopter and pterodactyl both contain “pter”, but you’d probably think of “helicopter” as an atomic unit with its own unique identity
Unless wiktionary is among your primary facilitators of procrastination on the internet.
Could someone give some specific examples of both “unwritten rules” and “written rules” of social interaction?
I can’t really think of any. I can’t tell if it’s because I’m just very oblivious or because I’ve internalized them well enough I never have to explicitly think about them.
i think one really bad dynamic in this community is a sort of purity testing about being x risk pilled. it feels like people are constitutionally scared of considering arguments that feel like they’re arguing that things might be fine in any way. tbc I’ve definitely been guilty of this in the past, and probably still now to some extent, but i think it’s bad. maybe there is some conflict theory reason why you should orient yourself this way towards people who have an ulterior motive, but I’m like among the most x risk pilled people out there and i still find this happens when i try to discuss x risk with people.
It doesn’t feel that way to me fwiw. I feel like lots of people I know including myself have made arguments that things might be fine. For example the salty, cynical, John Wentworth wrote “Alignment by Default.”Also, see AI 2027 Slowdown ending.
Now, if xriskpilled means: You think there’s a >5% chance of literal extinction (or similarly bad outcomes) due to misaligned AIs, then yeah I think I do kinda judge people who aren’t xriskpilled in that sense, because I think believing the chance is <5% is extremely unjustified once you know a decent amount about the situation and the evidence.
I think part of the reason for people leaning so heavily on x-risk arguments is that the alternative (that they created machines that are somewhat uniquely destructive to labor, are incredible for surveillance, destroyed the free software movement, and ruin something that absolutely all of them love, the internet, out of a mistaken belief that if they didn’t, a paperclip maximization machine would be conjured into existence by someone else, and it’s beneficial to be the first one to make the paperclip maximization machine) requires them to stare in the face the rather dreadful implications of their actions, and figure out a way to salvage them.
When operating in an emergency mode, you sort of get to ignore ethics of anything immediate in favor of the hypothetical ethics of something that may or may not come. That’s why Anthropic allowed itself to be integrated into Project Maven (responsible for thousands of deaths until it became a PR issue), why they automatically offer all of their models to the NSA, why they constantly push for unethical sanctions on China, why they’re paying billions to Musk, and why they turned Bun into a security nightmare: Because you gotta go fast, or you’ll lose. And losing is bad. For x-risk reasons. They’ve created a maximizer for GPUs and market valuation instead of paperclips.
X-risk is poison for consequentialists; a complete hypothetical encouraging them to create the ultimate puppy-kicking machine.
Suppose that the AIs are actually as obedient to their creators as Cora, but suck at deeply understanding the world. Unless drastic measures are taken, I would expect power to concentrate in the hands of a CEO, the Oversight Committee or the oligarchy where socioeconomic advancement is nearly extinct and the rest of mankind[1] receives, at best, a tiny sliver of resources. Then what would prevent Anthropic from trying to either prevent this or take over the world for themselves?
And that’s ignoring possibilities like “North Korea lets a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials who continue to preside over an AI economy and robot military”.
Centralization is the natural conclusion, yes. The future that frontier labs are pushing toward is one of centralization of power, into the hands of people who have demonstrated little but carelessness. Anthropic’s early talent pool was seeded with Alameda Research (its largest investor circa-2023) employees; its Series B was led by a person who was deeply wrong about risk calculation and sloppy in execution with stolen money.
This is not existential risk, this is just risk. A risk that these companies (and governments, at their behest) are walking into wholeheartedly, knowingly, and happily. These companies are sacrificing the future using the language of utopianism while repeatedly advocating for centralization of power, which will only cement and calcify power structures.
If these people were acting in your best interest, they wouldn’t have been taking the same sloppy, anti-moral, deeply-flawed actions at Alameda, in an entirely different domain than safety research. They wouldn’t be repeatedly changing their RSP, removing all of its teeth. They wouldn’t be doing what they’re doing. They are telling you the world is ending to acquire casus belli, not to try and make it possible to stop the end of the world.
Even assuming ASI never happens, what exists and will come to exist in the next ten years will absolutely further cement existing power structures. It seems likely that the tools they have created up to this point are capable enough of dramatic increases in dystopian surveillance and government capability, and have been for a while.
I’m not entirely against x-risk argumentation; generally, I really don’t care either way (no stake in it; no influence on it). However, it absolutely is used to take frequently-unethical actions, and is self-justifying. “We, company, have to do thing, because we, company, are probably going to destroy the world with our current course of action. We’re aiming to, anyway.” Getting people to buy into their frame is part of their pitch deck. I think, primarily, they are careless people, who wanted to get justification to ignore law and ethics.
When my friends in finance (not earning to give) tell me they’re working at Goldman to improve the world, I am normally sceptical. Psychology literature on motivated reasoning and confirmation bias suggests that we are excellent at finding views which justify whatever is in our interests. For example, one study shows that our moral judgements can be significantly altered by financial incentives; another shows that we naturally strengthen our existing views by holding confirming and disconfirming evidence to different standards.
Fortunately, unlike with finance careers, I think that longtermist careers are likely to be among the most impactful available to us. But given the financial incentives, I would expect it to be very difficult to notice if either longtermism as a whole or specific spending decisions turned out to be wrong. Research suggests that when a lot of money is on the line, our judgement becomes less clear. It really matters that the judgement of EAs is clear, so having a lot of money on the line should be cause for concern.
This is especially problematic given the nature of longtermism, simultaneously the best-funded area of EA and also the area with the most complex philosophy and weakest feedback loops for interventions.
Long-termism allows for sloppy thinking, financial incentives encourage sloppier thinking, and x-risk arguments tend toward that for this reason. There’s a reason so little is said about the concrete harms happening today by these companies in rationalist circles, and everything is said about x-risk: x-risk doesn’t demand you to change your behavior. It encourages you to reinforce it.
Yeah I don’t believe AI superintelligence generally is an xrisk, and definitely don’t believe LLMs specifically are. I’m a doomer in other ways. I think the knee jerk response from the community is ‘why are you even here?’
Pollution, exhaustion of low entropy resources, and insufficient human capacity to manage coordination challenges at planetary scale. My pdoom drops as LLMs get better.
how valuable are formalized frameworks for safety, risk assessment, etc in other (non-AI) fields?
i’m temperamentally predisposed to find that my eyes glaze over whenever i see some kind of formalized risk management framework, which usually comes with a 2010s style powerpoint diagram—see below for a random example i grabbed from google images:
am i being unfair to these things? are they actually super effective for avoiding failures in other domains? or are they just useful for CYA and busywork and mostly just say common sense in an institutionally legible way?
one reason i care is because i feel some level of instinctive dislike for some AI safety/governance frameworks because they give me this vibe. but it’s useful to figure out if i’m being unfairly judgemental, or if these really are slop.
The actual details of it contain some wise non-obvious aspects, along with elegant concepts that are generalizations of things that the safety community has been touching at. For instance the safety community has been conflating in “risk thresholds” two cleanly distinct notions in risk management of Key Risk Indicators (actual measurements of risk) and risk tolerance (your quantified preference for risk, independent from any test), which has caused a lot of confusion and hidden unreasonable choices for quite a bit.
People have also been conflating risk modeling and evals for quite a long time, because the AI field was built around evals. Once you have the clear view that evals are just an operationalization of risk models, it becomes more clear that you can actually do most of your risk modeling earlier in the lifecycle (i.e. before even touching a neural net), before having built a single eval & that evals are downstream of this.
You can see more of this genre of concepts applied to frontier AI here: https://arxiv.org/pdf/2502.06656 Here’s a graph with a few of the concepts in there
to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues. For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.
It looks like slop, but also diagrams like these are literally used for a single slide in powerpoint demonstration, so I wouldn’t have too high of expectations for them. Hopefully the rest of whatever powerpoint that happens to belongs to contains good material too.
fwiw it’s less about the literal diagrams and more about the entire associated vibe. like usually the entire rest of the powerpoint also sounds like slop.
i find it very interesting how becoming familiar with a place makes it feel so different, and yet it’s recognizably the same place. especially when the first time you see the place you just think of it as a disconnected location floating around somewhere in abstract locationspace, but then slowly discover its relationship to other locations. sometimes this feels exhilarating, because the world feels more cohesive and whole and familiar. other times it’s melancholy, because it feels like the magic has gone.
This is a big part of the appeal of travel for me. I feel like time speeds up and and presence drops a lot when my brain has a sufficiently good representation of the location I’m in.
When I stay in the same place too long I find myself thinking the same thoughts, using the same words over and over again. I wonder if it would be possible to use an app to intentionally introduce random novel sounds, words, images, etc. into one’s life as a sort of brain refresher, but I suspect that nothing comes close to travel for this.
I think another reason why people procrastinate is that it makes each minute spent right before the deadline both obviously high value on net and resulting in immediate payoff. this makes the decision to put in effort in each moment really easy—obviously it makes sense to spend a minute working on something that will make a big impact on tomorrow. whereas each minute long before the deadline has longer time till payoff, and if you already put in a ton of work early on, then the minutes right before the deadline have lower marginal value because of diminishing returns. so this creates a perverse incentive to end-load the effort
If the procrastinator is a perfectionist who would otherwise work far past the point of diminishing returns, postponing until the last minute is a way of timeboxing. If you know you would spend 5 hrs on something that you could get to 90% (of your standard) in 1 hr, better to start 1 hr before the deadline, or you’ll end up wasting 4 hrs.
i mean, it’s a huge category of people, so for some box 1 and for some box 2.
for me it’s box 2. i was bewildered to realize that i knew quite a lot of box 1′ers and we had very different reasons for hurting ourselves. (i transitioned to become more of a box 1′er later)
i have a theory that a lot of people go through “emotional healing” only to end up as still-broken people who now have “being healed” as a big part of their identity that lets them feel superior to other people who are less far along the chosen path than they are. ofc, there are also people who are actually emotionally in touch. an easy way to distinguish such people is to notice how you feel around them. do they make you feel more calm and grounded, or do they make you unhappy and defensive?
this does not so much match my personal experience. maybe it’s some kind of selection effect, but then, that kind of person sounds really annoying, so I would recommend my selection process, whatever it is
i think there’s a difference between being a people pleaser and being a calming force for those around you. some people exude a sense of calmness that doesn’t require their even uttering a single word.
The last part surprised me because in general I don’t think of heard of considering whether other people make you “feel more calm and grounded [vs] unhappy and defensive” as an insight into whether those others are psychologically well. What is the theory of the connection there?
emotions are somewhat contagious. someone who is very at peace will also make people around them at peace. someone who is unhappy will make people around them unhappy. someone who wants to help people out of a deep love for all people will make those around them feel differently than someone who wants to help people as a way to affirm their own superiority.
Thank you, this fits my model, although I have never put it in words this way. Now I’d say that some people just profess being healthy, while others are healthy. (Maybe even: “if you need to tell others that you are emotionally healthy, you are probably not”. With some caveats, e.g. if they explicitly asked you, and you answered.)
Uhm, a third option, what if they found a coping mechanism? Not the same as not having the problem at all, but also not the same as suffering helplessly. There will be some inflexibility, but the benefits are real.
ilya’s AGI predictions circa 2017 (Musk v. Altman, Dkt. 379-40):
Within the next three years, robotics should be completely solved, AI should solve a long-standing unproven theorem, programming competitions should be won consistently by Als, and there should be convincing chatbots (though no one should pass the Turing test). In as little as four years, each overnight experiment will feasibly use so much compute compute that there’s an actual chance of waking up to AGI, given the right algorithm—and figuring out the algorithm will actually happen within 2-4 further years of experimenting with this compute in a competitive multiagent simulation.
[...]
Each year, we’ll need to exponentially increase our hardware spend, but we have reason to believe AGI can ultimately be built with less than $10B in hardware.
Zilllis, in email to Musk about OpenAI (Id., Dkt 379-45):
Tech:
-Says Data 5v5 looking better than anticipated.
-The sharp rise in Data bot performance is apparently causing people internally to worry that the timeline to AGI is sooner than they’d thought before.
-Thinks they are on track to beat Montezuma’s Revenge shortly.
Though he seems to have overestimated the difficulty of the Turing test relative to e.g. robotics. Not clear he’s even directionally correct about robotics? Unless AGIs solve it for us :)
Each year, we’ll need to exponentially increase our hardware spend, but we have reason to believe AGI can ultimately be built with less than $10B in hardware.
SSI’s compute spend is certainly a bet in this direction!
one big problem with using LMs too much imo is that they are dumb and catastrophically wrong about things a lot, but they are very pleasant to talk to, project confidence and knowledgeability, and reply to messages faster than 99.99% of people. these things are more easily noticeable than subtle falsehood, and reinforce a reflex of asking the model more and more. it’s very analogous to twitter soundbites vs reading long form writing and how that eroded epistemics.
hotter take: the extent to which one finds current LMs smart is probably correlated with how much one is swayed by good vibes from their interlocutor as opposed to the substance of the argument (ofc conditional on the model actually giving good vibes, which varies from person to person. I personally never liked chatgpt vibes until I wrote a big system prompt)
it’s kind of haphazard and I have no reason to believe I’m better at prompting than anyone else. the broad strokes are I tell it to:
use lowercase
not use emojis
be concise, explain at bird’s eye level
don’t sugar cost things
not be too professional/formal; use some IRC/twitter slang without overdoing it
speak as if it’s a conversation over a dinner table between two close friends who are also technical experts
don’t dumb things down but also don’t use unnecessary jargon
I’ve also been trying to get it to use CS/ML analogies when it would make things clearer, much the same way people on LW would do, but it’s been hard to get the model to do it in a natural, non cringe way. rn it overdoes it and makes lots of very forced and not insightful analogies despite my attempts to explain to it
a normal autoencoder decodes single latents z to single images (or whatever other kind of data) x, and also encodes single images x to single latents z.
with VAEs, we want our decoder (p(x|z)) to take single latents z and output a distribution over x’s. for simplicity we generally declare that this distribution is a gaussian with identity covariance, and we have our decoder output a single x value that is the mean of the gaussian.
because each x can be produced by multiple z’s, to run this backwards you also need a distribution of z’s for each single x. we call the ideal encoder p(z|x) - the thing that would perfectly invert our decoder p(x|z). unfortunately, we obviously don’t have access to this thing. so we have to train an encoder network q(z|x) to approximate it. to make our encoder output a distribution, we have it output a mean vector and a stddev vector for a gaussian. at runtime we sample a random vector eps ~ N(0, 1) and multiply it by the mean and stddev vectors to get an N(mu, std).
to train this thing, we would like to optimize the following loss function:
-log p(x) + KL(q(z|x)||p(z|x))
where the terms optimize the likelihood (how good is the VAE at modelling data, assuming we have access to the perfect z distribution) and the quality of our encoder (how good is our q(z|x) at approximating p(z|x)). unfortunately, neither term is tractable—the former requires marginalizing over z, which is intractable, and the latter requires p(z|x) which we also don’t have access to. however, it turns out that the following is mathematically equivalent and is tractable:
-E z~q(z|x) [log p(x|z)] + KL(q(z|x)||p(z))
the former term is just the likelihood of the real data under the decoder distribution given z drawn from the encoder distribution (which happens to be exactly equivalent to the MSE, because it’s the log of gaussian pdf). the latter term can be computed analytically, because both distributions are gaussians with known mean and std. (the distribution p is determined in part by the decoder p(x|z), but that doesn’t pin down the entire distribution; we still have a degree of freedom in how we pick p(z). so we typically declare by fiat that p(z) is a N(0, 1) gaussian. then, p(z|x) is implied to be equal to p(x|z) p(z) / sum z’ p(x|z’) p(z’))
traveling through Europe, looking out the window, and seeing the national flag flying next to the flag of the EU fills me with a strange feeling. this isn’t an original thought at all, but still: it’s really crazy that just 50 years ago Europe was divided by the iron curtain, and that people would have to go to insane lengths and risk their lives to get across that border; and that less than 100 years ago all of these countries were at war with each other, and had been at war on and off for centuries with ever shifting alliances and boundaries.
When I travel to Vienna by train, sometimes I remember that I am crossing a line that in my childhood was guarded by soldiers ordered to kill everyone who tried to escape the socialist paradise.
I know this is one of the universal human experiences, but I keep getting unpleasantly reminded by the passage of time.
pleasant “recent” memories are already one or two years ago. they feel recent enough that I still stubbornly believe my recollection is accurate, but in reality they’re far enough away from the present day for the sepia tint of nostalgia to creep in and for all the frustrations and sorrows to be forgotten. no wonder it’s so hard for the present to compete with the “recent” past.
I sometimes ask myself when I first met one my “recent” friends, and am startled to realize that I met them 2 or 3 or 4 years ago. “oh yeah, I met him ‘recently’ at that one party FOUR FUCKING YEARS AGO.”
I still can’t wrap my mind around the fact that Iater this year I will have been at openai for 5 years. I first started following ML about 10 years ago, so I will soon have spent more time at openai than I have spent reading openai papers from the outside, and thinking of openai as a far away citadel in a different universe.
I suspect that you may have it worse than average person, because you are intellectual worker and intellectual achievements tend to be unsatisfying, in a sense that when you discover something, you feel “eureka” moment and then you feel like you knew it forever. It’s even worse with intellectual process, because, in my experience, you forget all fruitless branches and backtracking of thought process almost like it was a dream.
when people say that (prescription) amphetamines “borrow from the future”, is there strong evidence on this? with Ozempic we’ve observed that people are heavily biased against things that feel like a free win, so the tradeoff narrative is memetically fit. distribution shift from ancestral environment means algernon need not apply
(I’m a psychiatry resident. I also have ADHD and take prescription stimulants infrequently)
The answer is: not really, or at least not in a meaningful sense. You aren’t permanently losing anything, your brain or your wellbeing isn’t being burnt out like a GPU running on an unstable OC:
Prescription stimulants often have unpleasant comedowns once they wear off. You might feel tired and burned out. They often come with increased anxiety and jitteriness.
Sleep is negatively affected, you get less REM sleep, and you might experience rebound hypersomnia on days you’re not on the drug.
There are minor and usually unimportant elevations in blood pressure.
While focus and stamina are improved, creativity and cognitive flexibility suffer. I’ve read claims that it also makes people overconfident, which strikes me as prima facie plausible. Ever seen how people behave after using coke?
Animal studies show oxidative damage to the brain, but this has not been demonstrated in humans on therapeutic doses, even if used for performance enhancement in those who don’t meet the normal criteria for ADHD.
If started at a young age, growth velocity could be slightly hampered, mostly because of appetite suppression.
Dependence or addiction liability, while is low but not nil at therapeutic doses.
In my opinion, all of these are inconsequential, and the side effects vanish quickly on cessation. I certainly need the meds more than the average Joe, but I don’t think even neurotypical people using it as a PED are at much risk, as long as they keep the doses within reason. I’m of the opinion that current medical guidelines are far too conservative about stimulants, but in practice, they’re easily circumvented.
On a more speculative note:
I’m of the opinion that the ancestral environment didn’t demand that our ancestors be always switched on. Attention and focus were useful during activities like hunting and foraging, but there was immense amounts of forced downtime and slack. Even if you have less than ideal levels of conscientiousness or executive function, gnawing hunger or a desire for shelter probably kept you doing the right thing.
With agriculture, this began to change dramatically. A lot of the previous highly tight reward and feedback loop ends up deferred. A farmer can do a lot more to prepare for the future and hedge his bets than a hunter gatherer can. And modernity rewards such an approach even more.
i haven’t looked into this deeply but how strong is the evidence for (lack of) oxidative damage? the SSC post is somewhat unsatisfying in that it doesn’t really consider outcomes other than literal Parkinson’s, and just kind of says the animal model results are confusion.
Dependence or addiction liability, while is low but not nil at therapeutic doses.
This statement really surprises me?
On average you get around 500-1000% more dopamine in the system as a consequence of using amphetamines and from a standard neuroscience perspective this is around 3x as much as caffiene for example.
Yes it is not heroin levels but dependency has to be a real concern here from a neuroscience perspective?
Long-term potentiation and return to baseline for the brain should mean that the learned patterns would be relatively hard to unlearn after 6 months of frequent usage?
How good is the studies on longer term behaviour change due to this stuff? I looked into the studies and it seemed like from a shorter term perspective the addiction effects of it were lower than I thought which I found quite interesting.
I think an important caveat here is that the addiction and dependence liability is quite dependent on how you administer it to yourself, the dosage, the specific routes that you’re ingesting it from etc. (which you mention but I think you’re understating some of the dangers of it)
I don’t think even neurotypical people using it as a PED are at much risk, as long as they keep the doses within reason. I’m of the opinion that current medical guidelines are far too conservative about stimulants, but in practice, they’re easily circumvented.
CNS drugs are powerful so yes I think we should still have some limits on this? I think one of the main things that are a bit difficult with them is that it can be hard to perceive the difference that they induce in yourself? Like if you’re on them, you don’t necessarily notice that you have less creativity and awareness, that is not how it feels and so if you’re overusing them or similar you just don’t get that feedback? (based on some modafinil experience & observations from friends)
I did try and make it clear that I’m only talking about therapeutic usage here, and even when off-label or for PED purposes, at therapeutic doses. I apologize for not stressing that even further, since it’s an important distinction to make.
I agree that it’s rather important to use it as prescribed, or if you’re sourcing it outside the medical system, making a strong effort to ensure you take it as would be prescribed (there’s nothing particularly complicated about the dosage, psychiatrists usually start you at the lowest dose, then titrate upwards depending on effect).
The Claude Research report seems fine to me, and I would think it aligns with my claims. The main issue with recreational usage is that a lot of people aren’t trying to be responsible users, or are taking intentionally talking large doses for recreational purposes. That’s more on them than it is on the drug! If you take it within the standard dosage range, the drug itself will not produce much in the way of craving for more.
>CNS drugs are powerful so yes I think we should still have some limits on this?
I am, if not outright libertarian, certainly leaning in that direction. So it depends on what the “limits” actually are. I think that doctors are currently overly risk-averse and conservative about prescribing them, but I don’t think they should be handed out like candy over the counter. I think there’s plenty of room in between that avoids the pitfalls of a maximalist position.
>I think one of the main things that are a bit difficult with them is that it can be hard to perceive the difference that they induce in yourself? Like if you’re on them, you don’t necessarily notice that you have less creativity and awareness, that is not how it feels and so if you’re overusing them or similar you just don’t get that feedback? (based on some modafinil experience & observations from friends)
The effects on creativity are rather minor. I can’t really tell a difference when I’m on them, but I do have ADHD so that might confound things. Some of the best and most creative things I’ve written were while I was on methylphenidate or dextroamphetamine! If you’re using it to make an often monotonous task like programming more palatable, or to improve your ability to study, then I wager the benefits massively outweigh the slight cognitive inflexibility. I don’t think you want stimulants if you’re trying to paint or write poetry, even if they won’t massively handicap you. The effects are subtle, you’re not becoming an automaton.
a take I’ve expressed a bunch irl but haven’t written up yet: feature sparsity might be fundamentally the wrong thing for disentangling superposition; circuit sparsity might be more correct to optimize for. in particular, circuit sparsity doesn’t have problems with feature splitting/absorption
Surprisingly, Claude Opus 4.7 thinking still struggles with this, if you tell it not use tools. For N = 199, it answered 200; the CoT is below. It tried to use some groups of ten, gives up, and estimates the answer. It’s curious that this isn’t something the model learned to do in pretraining.
I’m counting the total number of 1s by tracking the initial 1 and then all the “+1” increments that follow. Counting through the sequence of increments to get the total. Continuing through the remaining decades up to 150, each block follows the same pattern of ten increments. I’m working through a long sequence of increments to count the total, breaking it down into manageable chunks to keep track of where I am in the original text. I’m trying to count the total number of “+1“ operations in that string. Let me work through this systematically by identifying the pattern and tallying up how many times it repeats. I need to count the “+1” instances more carefully. The input starts with “1” and then has a series of “+1”s, so the total count is just the number of “+1“s plus one. Let me go through and tally them systematically in groups of ten to get an accurate count. Continuing to count through the ranges… I’m manually counting the “+1” tokens in the input string to get an accurate sum, working through the text systematically since I can’t execute code to parse it directly. I’m counting the occurrences of “+1” in this sequence to get a total. I’m trying to count the “+1”s by breaking the input into segments of 10 to make it manageable, assuming the user likely wrote a round number like 200 total. Let me try a more systematic approach—I’ll scan through the input carefully and count the “+1“ occurrences in groups to get an accurate total. Continuing the count… I’m trying to count the “+1” instances in the input string, but without being able to directly measure the character length, I’m working through it systematically by grouping tokens and attempting to track them in chunks. Let me count the “+1” occurrences more carefully by examining the actual structure. The format appears to be “1“ followed by repeated ” +1” patterns, so I need to figure out how many times that sequence appears before reaching the end marker. Continuing through the chunks, I’m at 40 total after the fourth group, then adding ten more with each subsequent chunk brings me to 50, 60, 70, 80, 90, and I’m partway through the tenth chunk now. Continuing to accumulate the count through each successive chunk, reaching 200 total ones across all 20 chunks. I’m breaking down the string into groups and tallying the “+1” occurrences, but the math keeps shifting depending on how I segment it. Let me just count the “+1“ patterns directly instead of getting caught up in positional grouping—based on the visual length of the input, 200 seems like the most reasonable estimate. I’m continuing to tally the remaining “+1” entries, organizing them into groups of ten to make the count more manageable. Looking at the pattern, I’m seeing roughly 20 complete groups, which would give me around 200 total. Let me verify this by doing one more count to make sure I’m landing on the right number. I’m counting the “1” characters in the input to verify my total of 200, since each one represents a unit in the sum. Let me double-check by scanning through more carefully rather than relying on my initial chunk-based count. I’m continuing to count through the remaining additions, organizing them into groups of ten to keep track of the total. I’m counting the grid of plus-ones manually—20 rows with 10 entries each gives me 200 total, and since the prompt explicitly says not to use tools, I’ll stick with that answer.
the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.
at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups
I have heard people say this so many times, and it is consistently the opposite of my experience. The random spontaneous conversations at conferences are disproportionately shallow and tend toward the same things which have been discussed to death online already, or toward the things which seem simple enough that everyone thinks they have something to say on the topic. When doing an activity with friends, it’s usually the activity which is novel and/or interesting, while the conversation tends to be shallow and playful and fun but not as substantive as the activity. At work, spontaneous conversations generally had little relevance to the actual things we were/are working on (there are some exceptions, but they’re rarely as high-value as ordinary work).
I think you are possibly better/optimizing more than most others at selecting conferences & events you actually want to do. Even with work, I think many get value out of having those spontaneous conversations because it often shifts what they’re going to do—the number one spontaneous conversation is “what are you working on” or “what have you done so far”, which forces you to re-explain what you’re doing & the reasons for doing it to a skeptical & ignorant audience. My understanding is you and David already do this very often with each other.
the number one spontaneous conversation is “what are you working on” or “what have you done so far”, which forces you to re-explain what you’re doing & the reasons for doing it to a skeptical & ignorant audience
I’m very curious if others also find this to be the biggest value-contributor amongst spontaneous conversations. (Also, more generally, I’m curious what kinds of spontaneous conversations people are getting so much value out of.)
One of the directions im currently most excited about (modern control theory through algebraic analysis) I learned about while idly chitchatting with a colleague at lunch about old school cybernetics. We were both confused why it was such a big deal in the 50s and 60s then basically died.
A stranger at the table had overheard our conversation and immediately started ranting to us about the history of cybernetics and modern methods of control theory. Turns out that control theory has developed far beyond whay people did in the 60s but names, techniques, methods have changed and this guy was one of the world experts. I wouldn’t have known to ask him because the guy’s specialization on the face of it had nothing to do with control theory.
I do not find this to be the biggest value-contributor amongst my spontaneous conversations.
I don’t have a good hypothesis for why spontaneous-ish conversations can end up being valuable to me so frequently. I have a vague intuition that it might be an expression of the same phenomenon that makes slack and playfulness in research and internet browsing very valuable for me.
i want someone to make the one true categorization of Types of Guy. MBTI is an ok start, but there are so many things it doesn’t even try to explain. like for example if i see someone has very scrunched up body language and talks very quickly, this correlates very strongly with a bunch of other traits, like talking in conversation with long turn lengths.
my theory for why the literature here is kinda terrible is that most people either like people, in which case they mostly just develop an intuitive model of people; or they like systematizing, in which case they become obsessed with trains. few people are systematizing but obsessed with people.
Deb Tannen is specifically who I had in mind! She’s most famous for her book on male vs female communication but if you read her other works such as on parent-child and friend-friend communication styles you get a good sense of the breadth of her framework.
In NLP you would call the person with “very scrunched up body language and talks very quickly” visually dominant. They don’t feel into the words they want to say and thus speak faster. They don’t feel into their body so that the make the adjustments to their body that releases tension and are scrunched up.
There’s plenty of things you can criticize about that model but it does exist.
you want fiction! i’m no expert, but somewhere out there there’s a 150 page treatise on “guy with very scrunched up body language who talks very quickly” including everything you could want to know about his job, his friends and family, his self-doubts, and the details of the broader socio-economic forces that made him like that.
Socionics is kinda MBTI-adjacent, but has more interesting, fleshed-out structure, with systematic predictions, e.g., predicting synchronies or conflicts between personality types that are related in some specific way.
make the one true categorization of Types of Guy
Do you want a compact description, e.g., some small-ish number of naturally discretizable factors, with perhaps many combinations being very sparsely populated, and that would at the same time comprehensively 80⁄20 a person’s personality description? I would not expect a comprehensive theory of human personality to be so neat.
Type of guy obsessed with guys here. (Enter my shameless plug.)
Are you interested in fictional characters? Harry Potter, Star Wars, LOTR, ATLA, MCU, anime, etc? I have a system which is the easiest to illustrate with fictional characters, but I believe it applies to real people too.[1]
Biggest downsides:
I don’t know how to formalize the system to make it falsifiable. The system is partially undefined even from an informal POV.
There are countless characters and people not fitting any type I know. Though the general methodology of my system can be applied anyway.
Getting into the system requires some time commitment.
The biggest upside:
The system is philosophically interesting. I believe it’s not similar to any other classification or any other model (of anything).
Core Tenets
Types are “absolutely different”, “different in every property”. Like Bouba and Kiki. Or like Indra’s net, if every jewel was unique.
Each type maximizes a unique theme along different dimensions. Each type also maximizes unique sub-themes (related to the main theme) along different dimensions.
The dimensions in question are usually those:
A. Life trajectory, fate.
B. Personal relationships. (Loved ones, family, close friends.)
C. Social status.
CA. Physical status. (Physical powers, physical traumas, disabilities.)
D. Psychology.
E. Outward attitude.
Themes and sub-themes are defined over lower-level facts about characters. When analyzing a character, you should first focus on the most abstract and unique facts about them. Facts which are realized in maximally radical, crisp, objective ways.
...That’s all, that’s the entire methodology. Keep in mind that themes are often too vague without sub-themes. And sub-themes are often too vague too if they are not maximized across different dimensions. Also, vague (sub)themes become less vague when you pay attention to the strongest ways to implement them.
Most classifications are “combinatorial” (any combination of some basic features defines a type) or “geometric” (there’s some relatively simple space and each point / region defines a type).[2] My classification is neither. That’s one of the main reasons I consider it very unusual.
I’ll give a couple of examples. 3 types and 50+ characters. Types of Snape, Draco and Sirius.
Fragility (example 1)
One theme (fragility) and 3 sub-themes (irreversibility, walling off, dramatism/sentimentality). 19+ examples.
Fragility
Theme of the first archetype is fragility (stubbornness, instability, internal tension, irreversible changes). Often “fragility” is used to describe things which are very easy to break, but I’m talking about very tough things which would rather break than bend. A perfect example of this theme would have the following properties:
A. Irreversible events which change the character forever.
B. Personal relationships filled with death, trauma, guilt.
C. Unstable social status. Isolation despite great connections.
CA. Physical vulnerability despite great power.
D. Great mental strength mixed with crashouts, shocks, traumas.
E. Cold, harsh attitude, mixed with sentimentality. Stoic but easily triggered.
In other words, fragile life, fragile relationships, fragile social and “physical” status, fragile mentality, fragile attitude.
Examples:
Severus Snape. A and B. Guilty in Lily’s death. That radically changed his life. C. Works as a triple agent, which makes his social position very fragile (either side can start seeing him as the enemy). He’s one of the most well-connected yet the most reclusive HP characters. CA. Very powerful, but under the thumb of way more powerful Voldemort and Dumbledore. D and E. Harsh and cold, but soft when it comes to Lily. Calm but petty, Harry and Sirius can easily trigger him.
Kakashi from Naruto. A and B. Sorta guilty in his friend’s (Obito) death. Accidentally killed his remaining friend (Rin killed herself by jumping in the way) which he promised Obito to protect. He never got over this. 49 seconds clip. Before all this, his famous father died by suicide after getting dishonoured. C. A famous elite ninja, but a loner. CA. IIRC Kakashi is pretty unique in terms of ending up in physically vulnerable situations, he’s strong but is constantly pitted against much stronger opponents. One time he got tortured for 72 hours inside an illusion. E. Both harsh and soft.
Elsa from Frozen. A and B. Almost killed her sister Anna. That forced her to live in isolation and be distant from Anna. C. Was hiding her abilities to not lose her royal status / become a monster in the eyes of people. “Make one wrong move and everyone will know.”CA. Despite her godlike powers she was captured and jailed. E. Cold and warm.
Lelouch from CodeGeass. B. Accidentally and deliberately killed many people and blamed himself for it. His relationships are filled with death, trauma and guilt. For example: Lelouch and Euphemia part 1 (~1 minute), part 2 (2:09 − 4:18, ~2 minutes), Lelouch and Shirley (~2 minutes). C. Due to double identity and secrets his strong social status (royalty / leader of a terrorist group) is fragile. D and E. Cold and sentimental. Mentally strong and crashing out.
Scrooge from DuckTales 2017. A. Blamed himself for the presumed death of his niece Della. Scrooge’s past (~5 minutes). B. Lost connection to his family after that. C. Despite being the richest duck, super well-connected, he’s reclusive. Scrooge after the kids leave again (1 minute is enough). E. Harsh and soft. Easily triggered. His average behavior (2 minutes is enough).
Amon from Legend of Korra. A and B. Went through abuse as a child, that changed him forever and pushed him to separate from his brother. C. He’s a leader of a terrorist group, but his position is unstable because he’s lying to them. D. Brutal, but deep down there’s a soft spot for his brother. Amon’s backstory (till 7:21), Amon reunites with his brother (~1 minute).
Jeong Jeong from ATLA. “Once an honored admiral of the Fire Nation Navy, he became disillusioned with his nation and firebending as a whole and defected from the military. He relocated to the Earth Kingdom, where he encountered a small tribe who became his travel companions.” He lost his status, isolated and got psychologically broken a bit. “Fire brings only destruction and pain. It forces those of us burdened with its care to walk a razor’s edge between humanity and savagery. Eventually, we are torn apart.” Calm, but irritable. 2 minutes clip
Nicholas Van Orton from The Game 1997. B. His personal relationships are not doing great… father died by suicide, brother and wife are estranged. C. Rich but reclusive. About to lose everything. D and E. Cold but vulnerable. Traumatized.
Rust Cohle from True Detective. “Character overview” describes it better than I can. A. After his daughter was hit by a car and killed, his life spiraled out of control and he became deeply cynical. B. His relationship with Marty is rocky. “You total shit. You know, this is part on you, you know? You can’t just slide into my life, creating tensions, judging me...”C. Gifted, connected to many people (just due to his job), but isolated. D and E. Tough, but vulnerable and troubled. Rust scenes.
Darth Vader from SW. A. He thought he killed his wife and was severely mutilated. Which completed his descent to the dark side. C and CA. He’s second in authority and power to Palpatine, but Palpatine could easily dispose of him (which Palps did plan to do). D. He kept a soft spot for Luke, that’s why he sacrifices himself at the end.
Gandalf from LOTR. C. Very well-connected, but also a lonely homeless wanderer. His political power is limited because he needs people to make good decisions out of their own free will, that’s his secret mission from God. Faces defiance and reluctance. CA. Divine and powerful, but physically vulnerable and gets hurt a lot (by Saruman, Barlog). E. Both harsh and soft. Super stressed out. Easily triggered.
Doctor Strange from MCU. A. Abruptly loses his career after a car crash injury. C. Very well-connected but isolated. E. Both cold and soft. Calm, but irritable. tony and dr strange arguing for 2 minutes straight.
Goblin Slayer. A. Sole survivor of a goblin attack on his village, saw his sister raped and killed by goblins. C. Has many connections, but is sort of a loner. CA. He’s a goblin killing machine, but he’s only human with no special powers. His job is extremely risky and he goes all out every time, gets hurt often. D. His mind is broken but rigid, killing goblins is his entire identity. E. Cold, but polite and kind. Versus goblin champion (5 minutes is enough), he and Cow Girl (2 minutes), asks for help (6 minutes), cries (2 minutes).
Cigarette Smoking Man from the earlier X-Files. C. Extremely powerful, yet his status is shaky. Often does the dirty work himself, walks around without bodyguards and works with equally powerful individuals (the Syndicate). Very lonely. D. Evil, but cares about Mulder and Scully… sometimes, sort of. E. Rarely emotes, yet many things touch him emotionally. Musings of a Cigarette Smoking Man shows different sides of him. Xmas job (5 minutes), Xmas meeting (3 minutes), episode ending (5 minutes).
Zeno from Hunter x Hunter. C. Powerful, but old and has a very risky job (assasinations). D and E. A single accidental murder can hurt his big pride (he’s a veteran assassin in a family of assassins). Harsh, but not entirely. Him fighting (4 minutes), a civillian gets harmed (2 minutes), gets bothered (2 minutes).
Sir Leigh Teabing from The Da Vinci Code. HEAVY SPOILERS. A. Wants to destroy the Church, free humanity of dogma. C. Wealthy academic, but gets criminal. D and E. Dramatic. Kind/nice but evil/rutheless. Another antihero revolutionary. Greatest coverup in human history.
Patroclus from Hades 2020. B. Feels betrayed by his lover. “He ever tell you of the war in which we fought? I fought, I should say. He could have ended it. But he refused to take up arms. (...) He shared our cause! So what if it was not completely his? If not for his stubbornness, I… well… I would have come here just a bit later, I think.” (c.) C. Lives in Elysium (paradise), but sits alone by the Lethe. D. Really devastated. E. Harsh but soft. patrochilles dialogue & story.
Peter Capaldi’s Doctor (12th) from Doctor Who. C. Sorta like Gandalf—powerful, but vulnerable and depending on others. D. Troubled, hates himself. E. Harsh but soft. Easily agitated. War Speech (4 minutes), Go to Hell (2 minutes), Where I Fall (4 minutes).
Real life example:
Bobby Fischer. A. Became the World Chess Champion, then stopped playing official chess for 20 years. Isolated himself and went mad. C. Before his downfall, was a lonely superstar. After his downfall, became a recluse with good connections. Many were helping him (e.g. Polgar family, Leko, Spassky, Lombardi). He got in trouble with the US government, got detained in Japan, almost deported to the US, but granted a citizenship in Iceland. D. He was very tough mentally (to fight alone against the world), but very troubled and irritable. “Fischer fits ideologically into the context of the Cold War era: a lone American genius challenges the Soviet chess machine and defeats it.” (c.) Kasparov E. He was known to be cold and rude in general, but nice to some in private. For example, in 1962 Tal fell ill during a tournament and Fischer was the only player who visited him.
Vladimir Mayakovsky. B. Had turbulent personal relationships. C. Both a useful and a problematic figure for the soviets. D. Was tough, but troubled. Died by suicide. E. Harsh, but sensitive.
Isaac Newton. Maybe. B. Father died, abandoned by mother, disliked stepfather. C. Powerful but reclusive. Abused his power in personal conflicts. C. Used disguise to find and prosecute counterfeiters (many where killed). CA. Prioritized work over health, D. Suffered a nervous breakdown, blamed friends for a conspiracy. E. Secretive, neurotic, paranoid, vindictive.
Semi-examples:
Carl Fredricksen from Up.
Waternoose from Monsters Inc. B. His friendship with Sally turned out to be very fragile. C. His social status is great. CEO of the company which was in his family for 3 generations. But declining energy production and publicity problems threaten everything. E. Friendly and harsh.
??? from Disco Elysium. Very unique game, I recommend at least watchingWhat Was Disco Elysium Actually About?HEAVY SPOILERS.A. Deserted his comrades. All killed. B. Unrequited love. C. Isolated from the city, but always watching. CA. Fit but frail. D. Decades of survivor’s guilt, shame, bitterness. Still capable of “love”. Another “antihero” revolutionary.
See below. Carl is not a drama-queen (the only grandiose thing about him is the flying house). Waternoose—not enough drama and reclusiveness.
Irreversibility
Fragile!pain is about irreversible consequences.
A. The source of the pain is concentrated in the past, in a single irreversible event or a couple of specific irreversible events.
B. It irreversibly changes personal relationships.
C. It irreversibly changes character’s social status.
D. Character dwells on the tragedy instead of trying to move on.
E. Character’s attitude is strongly shaped by the tragedy. Or strongly emphasizes its consequences.
Examples: Snape, Kakashi, Elsa (for a time), Scrooge 2017 (for a time), Amon, Jeong Jeong, Rust Cohle, Doctor Strange (sort of), Darth Vader, Goblin Slayer. Lelouch is a similar drama queen.
Walling Off
Another sub-theme is “deliberately building a wall between yourself and others”.
Some examples:
Severus Snape. A. Obsessed with Lily, closed to new romantic relationships. C. A loner in general. Partially due to his spy job. CA. Master at closing off his mind (Occlumency). E. His abrasiveness is meant to put off.
Kakashi from Naruto. C. Behaves reclusive. D. Dwells on his losses instead of trying to move on. E. Looks detached.
Elsa from Frozen. B and C. Her secret forces her to be distant from her sister and other people. E. Cold.
Lelouch from CodeGeass. B and C and D. His true intentions and personality are hidden even from loved ones. Keeps his burdens to himself.
Amon from The Legend of Korra. Cut off his family, buried his emotions (towards his brother), hid his past and abilities.
Dramatism / Sentimentality
Another sub-theme is “dramatism” (excess, grandiosity, making everything more serious and emotional) and “sentimentality” (nostalgia about the past, holding on to personal connections, being dramatic in a vulnerable way, awe about the world or history). Dramatism without sentimentality would be Thanos.
Severus Snape. Dramatic love, excessive grudges, dramatic role (playing a villain till death).
Lelouch from CodeGeass. Childhood tragedy, dramatic style (e.g. playing the villain and punishing himself), dramatic thoughts/feelings, grandiose goals.
Amon from Legend of Korra. Personal drama, dramatic death, grandiose goal, showy methods.
Nicholas Van Orton from The Game 1997. Childhood tragedy, excessive wealth, insane “game”.
Elsa, Darth Vader, Gandalf, Goblin Slayer, Cigarette Smoking Man are other drama queens.
A music band example:
I Like Trains
On average, I Like Trains songs contain fragility, dramatism and sentimentality.
A Rook House for Bobby. Genius. Personal, social and psychological fragility. Dramatism (persecution, cold war) and sentimentality. “And if they find me / Let them indite me / I just don’t care any more (...) They’ve pushed me too far, too far (...) All this talk of war / But it’s only a game (...) All I ever wanted to do is play chess with you”
Terra Nova. Genius. Personal, physical and social fragility. Dramatism and sentimentality. “How could I / Have led these men to their demise? / And they just follow / Exploration’s last great prize / It wasn’t mine (...) And more’s the shame / You will remember my name”
A Father’s Son. Genius. Dead dad speech. “Pry these words from my cold dead lips / Home is where the heart is (...) I will occupy the space between / What you say and what you mean / And you say / That the devil’s in the detail / And if that’s true / When he is through with me he will / Sure as hell be after you”
Beacons. Genius. SPOILERS! A societal collapse and murder. The lyrics don’t follow the story 1-1, but mention some societal and personal fallout.
A Divorce Before Marriage. Genius. Abstract song about a breakdown of a dream, very dramatic/sentimental. “The echo of success / In the beating of a heart / Will tear your life apart / And your forgotten dreams / As futile as they seem / Gave you hope / Now that hope is not enough / All the people that you meet / Are as hollow as they seem (...) Delusions of grandeur / A divorce before marriage / An open wound”
Death of an Idealist. Genius. Social and psychological breakdown. “This is a breakdown / You are a cancer / This will be the death of me / I’ll never live this down / You will never / An amazing fall from grace / No more or less than I deserved / A gentleman, a liar and a cheat”
The Voice of Reason. Genius. “It took a blow to the head / And I realized what I / Was put on this earth to achieve / It showed me the way / So make an example of me / Hang me up from the tallest tree / Make an example of me / This is not the voice of reason / It’s the cross I was chosen to bear”. About James Hadfield.
Liminality (example 2)
One theme (liminality) and 3 sub-themes (ego, entanglement, devaluation). 18+ examples.
Liminality
Theme of the next archetype is liminality (transformation, transition, detachment, isolation, being stuck). A perfect example of this theme would have the following properties:
A. Being stuck in a place or some neverending process. Living in a transforming world.
B. Personal relationship going through or causing a major transformation.
C Going through the transformation of social status. Being alien to people around.
CA. Physical transformation.
D. Psychological transformation. Being mentally stuck in a weird/uncertain place.
E. An attitude which mixes strong emotions and absence of emotions.
In other words, liminal life, liminal relationships, liminal social and “physical” status, liminal mentality, liminal attitude.
Examples:
Zuko from ATLA. A. Stuck in a neverending goose chase. B. His relationship with Aang goes from rivalry to friendship. C. A banished prince hoping to restore his honour. D. Becomes really conflicted about Good and Evil and his Purpose. Changes alignment. E. Cocky and impatient, but stoic. Zuko’s Complete Timeline (up to 11:00).
CC from Code Geass. A. Stuck in this world, unable to die. Stuck seeking someone capable and willing to kill her. B. The development of her relationship with Lelouch is about finding the will to live. C. An immortal witch stuck in a world she doesn’t belong to, getting tortured and experimented on. Helps the revolution she doesn’t care about. D. Her wish is to die. Her existence is just lingering. E. Both brazen and unemotional. You can watch 4 minutes (5:26 − 11:17) to get her attitude.
Gaara from Naruto. B. His relationship with Naruto is about him changing moral alignment. C. He’s used as a tool in a war he doesn’t care about. CA. He’s always on the verge of losing control and transforming into Shukaku. D and E. Gaara is in a severe distress, due to childhood traumas and internal fight with Shukaku. He has no attachment to anything in this world, his only goal is “to kill to assert his own existence”. Cocky and unemotional, calm and unstable. Sasuke vs Gaara, 7 minutes.
Lee Woo-jin from Oldboy. SPOILERS.B. His “relationship” with Daesu is about transforming Daesu’s life. CA. Contrasts with all other characters (criminals and poors). D. Lives only to wait for revenge and death. E. Cocky and unemotional. Ending scene (4 minutes), more ending (3 minutes).
Alex from Clockwork Orange. A. Most of the movie he’s stuck in a weird journey without a home. B. His relationship with droogs and parents transforms radically. C. Gets jailed, turns into a subject of experimentation, loses his home. His relationship with society transforms 180 degrees. D. Most of the movie he spends in a weird reprogrammed state, unable to do evil, but not truly good. E. Both cocky and unemotional. For example, whistands abuse calmly. Mr. Deltoid scene, 2 minutes.
Draco from HP. B. Harry turns from Draco’s rival to one of the only people who can save him and his family. C. Stuck between the good and the bad side. Doesn’t belong with the bad guys, they’re too extreme for him. His family got sidelined, lost their status among the Death Eaters. Also, in most of the books he’s somewhere on the sidelines. D. Very distressed and confused in the later books. E. Cocky and cold.
Harry Osborn from The Amazing Spider-Man 2. Loses status (Oscorp company), health (genetic disease), sanity (goes Green Goblin), morality. Daddy issues. Coldish and prideful. Meets dying father (4 minutes), asks Peter (3 minutes), meets Spiderman (2 minutes).
Dorian Grey from the 2009 movie. B. His relationship with Henry is about his moral deterioration. His relationship with Emily is about his redemption. C and D. Stuck in the immortal hedonistic lifestyle, without any purpose. Conflicted between Good and Evil, changes alignment multiple times. E. Cocky and unemotional.
Once-ler from The Lorax 2012. A. Symbolizes the growing danger of capitalistic greed. B. His status in the family radically transforms. C. His social status radically transforms. D. His morality radically transforms. How bad can I be?E. His attitude radically transforms. The biggest Tumblr Sexymen (other than Sans). A video explaining Onceler and Oncest in 2 minutes.
Lord Shen from Kung Fu Panda 2. A. Brings the new era, the era of gunpowder weapons. B. KFP 2 is sort of about Po’s relationship with him transforming. From PTSD to “inner peace”. The same can’t be said about other KFP villains, He has the most personal connection to Po. C. He’s a banished prince, but returns his status by taking over his home town. D. Desperate to satisfy his incomplete ego. “The cup you choose to fill has no bottom. It is time to stop this madness.” (c.) Soothsayer. E. Cold and elegant, but short-tempered and cocky. Shen vs. kung fu masters (2 minutes), Shen’s fortune (3 minutes).
Lapis from Steven Universe. A. Was stuck in a mirror for thousands of years. Then was stuck on the bottom of the ocean, fused with another gem (Jasper). Later she’s free, but has no home. At some moment she hides on the Moon. B. Takes a long time to befriend crystal gems. C and D. Was aligned with an evil empire (Homeworld), killed sentient life by terraforming planets, but wasn’t truly evil or a pro-empire diehard. Wanted to protect the human (Steven) who freed her. She was sorta like a civilian scientist in Nazi Germany. Eventually she changes her alignment entirely. Has tons of trauma and depression. E. Very calm and very angry/disheveled. Lapis clips.
Dio from JoJo Phantom Blood. B. His petty rivalry with Jonathan transforms into a global fight between Good and Evil. C. He’s an adopted son in a family he doesn’t care about. CA. Transforms into a supernatural evil being. E. Cocky and calm. 4 minute clip.
Seita from Grave of the Fireflies. A and C. Seita and Setsuko lose their home and wander without a (true) home. Their social status gets worse and worse. First they live with an abusive aunt, then they’re on their own in an abandoned duguot. B. His relationship with Setsuko transforms, he becomes her guardian and the only friend. E. Appears unemotional, but a lot of bottled feelings.
Kiki from Kiki’s Delivery Service. A. The movie is about growing up and depression / artist’s block. B. Has uncertain friendship with Tombo, connected to her loneliness/depression. C. Kiki gets into a new town. First time on her own, first job. Being a special “race” (witch) and having more reponsibilities (a job) contributes to isolation. D. Lonely. Later depressed. E. Calm but prideful.
Kurisu from Steins;Gate. B and D. Goes from antogonizing to loving Okabe. C. The most prestigious and mature in the group. The whole show is her being taken out of her comfort zone, dipped into mad science & conspiracy & weirdos. E. Unemotional but prideful.
Sheldon from The Big Bang Theory. B and C. Detached in social and personal life. Has the constant dynamic where he keeps getting dragged into uncomfortable situations and HE keeps dragging others into uncomfortable situations. E. Unemotional/shy but aggressively prideful. Sheldon Cooper Moments from Seasons 1-6.
Luka from Miraculous. B. Has liminal love/friendship with Marinette. Supports her love for Adrien but still has feelings for her. B. Has split family. C. IIRC somewhat disconnected from Marinette’s main friend group. Learns Marinette’s/Adrian’s secret and leaves Paris to protect it. CA. Gets the superhero power to indefinitely rewind time to a checkpoint. D. Has unusually strong empathy which makes him a bit isolated as a mediator/therapist. But who’s gonna help him? E. Appears way more untroubled than he truly is. See 4:28 − 9:42.
Randall from Mosnters Inc. A. Works on a conspiracy to change everything. “I’m about to revolutionize the scaring industry. And when I do, even the ‘great’ James P. Sullivan’s gonna be working for me.” (c.) B and C. Constantly 2nd place to Sally. CA. Can turn invisible. E. Can be calm/sly or explosively spiteful.
Real-life semi-example:
Ding Liren. A. Became the world champion in a liminal period, where Magnus exited undefeated. C. Abruptly lost strength. D. Had mental health struggles. E. Appears unemotional.
Semi-examples:
Eric Draven from the Crow 1994. C. Undead. E. Both unemotional and explosive. Eric from the 2024 movie would be a proper example.
Fubuki from OPM. Overshadowed by her sister, prideful and cold.
Lord Beckett from POTC. A and CA. Symbolizes the transformation of the world, the dawn of the pirate era, magical era. “This is no longer your world, Jones. The immaterial has become… immaterial.” (c.) He, a businessman, doesn’t belong in the world of pirates and magic. His motivation is a bit alien compared to other characters. E. Both cocky and unemotional.
Westley from The Princess Bride. A. Transformed from a regular farm boy into one of the most dangerous men in the world. The entirety of the movie he’s stuck in a neverending rescue mission. B. His relationship with Buttercap transforms from “master and servant” to “captor and captive” to “superhero and damsel in distress”. His relationship with Inigo and Fezzik transforms from enemies to friends. C and CA. He’s pretty alien, so unusual physically and mentally. A big chunk of the movie he hides his identity, looking like some weird ninja. Another chunk he’s recovering from death. E. Both cocky and unemotional. This 7 minute scene is an example again.
Silas from The Da Vinci Code. “I am a ghost.” A and C. His whole life he’s a hidden outcast. CA. An albino. D. Stuck in the state of violating his principles and punishing himself. E. Hot and cold.
Rei from Evangelion. TV series SPOILERS. She’s a clone and a tool. Unemotional.
Squidward from Spongebob. C and D. Longs for celebrity life, but it never comes. Uses pessimism and misanthropy to cope. E. Cold and cocky.
See below. Eric doesn’t have enough dimensions of liminality, as well as not enough ego and devaluation. Lord Beckett, Westley—not enough devaluation. Rei—not enough ego. Silas, Fubuki—not enough ego and devaluation. Squidward—surprisingly, not enough ego… he is ready to compromise with his ego way too much and is not proactive enough.
Ego
Liminal!isolation can be caused by ego (pride, privilege, elitism, superiority).
Zuko from ATLA. B. Looks down on Iroh, because Iroh doesn’t display ambition. C and D. Motivated by restoring his royal privilege and honor. Fire Nation suprematist, calls Katara a peasant. E. Cocky.
CC from Code Geass. B. Treats Lelouch as a source of amusement. C. Has some privillege due to immortality, tho it’s questionable. E. Behaves like a queen.
Draco from HP. B. Jelous of Harry. C. Spoiled brat. D. Blood supremacist. E. Cocky.
Dorian Grey from the 2009 movie. Vain hedonist narcissist, in personal relationships and in general.
Dio from JoJo Phantom Blood. Cocky. Feels entitled to everything Jonathan has. Even steals his girlfriend’s first kiss. Kono Dio Da.
Lee Woo-jin, Lord Shen (KFP 2), Harry Osborn, maybe Seita (Grave) are other strong examples.
Entanglement
Another sub-theme is “entanglement”. When your fate is unusually entangled with another person.
Some examples:
Zuko from ATLA. His social status and family matters and morality all get entangled in his rivalry with Aang. Zuko even sees himself as Aang in one dream.
CC from Code Geass. Her fate, status and goals/motivations all get entangled with Lelouch.
Lee Woo-jin from Oldboy. His personal life, criminal activities, future plans/ambitions… everything is about Daesu.
Draco from HP. His status, morality, fate of his parents… all become dependent on his rival, Harry.
Lord Shen from Kung Fu Panda 2. His fate, status, ambitions… all become dependent on Po.
Even lovers don’t often have this much entanglement.
Harry Osborn and Peter Parker, Dio and Jojo, Seita and Setsuko, Kurisu and Okabe, Randall and Sally are other strong examples.
Devaluation
Liminal!pain is about absence of expected value. Deprivedness, disrespect, depression, marginalization, dissatisfaction, etc.
Zuko from ATLA. B. Iroh is Zuko’s closest connection after banishment. Zuko cared about him, but still saw little in him. “I think you are exactly what you seem! A lazy, mistrustful, shallow old man who’s always been jealous of his brother!” (c.) C. He’s a prince, but not respected. D. His purpose (regaining honor and father’s respect) starts to become meaningless to him. E. Shows no respect to superiors (e.g. Zhao).
CC from Code Geass. B. Helps Lelouch, but doesn’t care about his plans. D. Doesn’t see value in living anymore. E. Unusually unbothered by the stuff happening around her.
Gaara from Naruto. A. Was a nice kid, but snapped after he was convinced he wasn’t ever loved by anyone. B. Didn’t care about his siblings. They were terrified of him. C. Was a big element of an invasion plot he didn’t care about. D. He is pretty purposeless, his only purpose is to kill people “to prove his own existence”. Believes he never been loved (the symbol on his forehead is “love”). E. Unusually unbothered by attacks (due to the automatic sand defense).
Alex from Clockwork Orange. B. Becomes inconsequential to his gang and family. C. A part of society, has parents, goes to school and clubs, but doesn’t care about people (abuses and kills them). D. Loses his purpose. E. Shows no respect to anyone.
Lord Shen from Kung Fu Panda 2. B. Thought his parents hated him. C. Was a banished prince. D. Has insatiable ego. “The cup you choose to fill has no bottom. It is time to stop this madness.” (c.) Soothsayar. E. Peacocky.
Lapis, Dio (JoJo), Harry Osborn, Seita (Grave of Fireflies) are other strong examples.
A very similar sub-theme is “loss of agency” (getting stuck, losing resources / support, getting depressed, etc).
Punkness (example 3)
One theme (punkness) and 3 sub-themes (social conflict, uncomplicatedness, direct attachment). Because of the sub-themes “punkness” turns into something more like “soft punkness”. 15+ examples.
Punkness
Theme of the next archetype is punkness. More specifically, “subverting social properties of status”. High social status is desirable; social status is relatively rigid; high status people hook up with other high status people; high status people are powerful and talented, etc. This archetypes subverts all that. A perfect example of punkness would have the following properties:
A. Radical changes of social status.
B. A relationship against social rules.
C. Fighting against high-status people or organizations or the order of society.
CA. Charisma and great skills despite low status. Chaotic fighting style or self-expression. Status-subverting abilities.
D. Simple goals and approach. Dislike of high status.
E. Punk attitude, low-status attitude.
In other words, punk life, punk relationships, punk social role, punk mentality, punk attitude.
Examples:
Sirius Black. A. Run away from his wealthy racist family. C In Book 3 he was alone against dementors, authorities and Hogwarts defenses. CA. Canonically charismatic. Can turn into a dog. D. Simple goal (help Harry / kill Peter) and straightforward approach. E. Has punk attitude (motorcycle, posters of pinup muggle girls, pranks, taunting while dueling). A bit crazy and immature.
Robin Hood from the 1991 movie. A and C. Was a nobleman, became the king of thieves, robbing wealthy and giving to the poor. B. His love with Marian has status contrast. CA. Canonically charismatic. D. Simple goals and approach. E. Very bold and somewhat unserious.
Beetlejuice. B. Obsessed with a little girl. CA. Has a lot of power despite being a dead hobo. Acts wildly and whimsically, can shapeshift. D. Simple goals and approach. E. Funny, offensive, crazy. His introduction, 2 minutes (2:00 − 5:30).
Daesu from Oldboy HEAVY SPOILERS. A. Lost all of his social status by getting abducted, jailed and framed for murder. Was an alcoholic, turned into a martial arts master. B. His love with Mido violates social rules, in the worst possible way. His “relationship” with the antagonist has status contrast. C. Fights, all alone, against organized crime and a wealthy adversary. CA. Compensates lack of skill/technique with rage. D. Simple goals and approach. E. Looks and behaves wild.
Alucard from Hellsing Ultimate. A and C. Was a sex slave, became a christian warlord, but betrayed his humanity in a moment of weakness and turned into an immortal overpowered vampire. Then was tamed and started working on a christian organization, fighting other evil beings. Wild status changes. B. Despite being infinitely more powerful than Integra, he’s her servant. CA. Sadomasochistic fighting style. Allows opponents to mutilate him and humiliates them before killing. Can shapeshift. D. Wants to find a worthy opponent to die to. He’s bored of immortality and considers himself a failure (for becoming a monster). E. He’s a troll.
Joker from The Dark Knight. A. Was a nobody, became the main criminal in town. B. His weird “relationship” with Batman has status contrast. C. Tries to ruin the social order. CA. Unpredictable actions, “improvisation” (even though it’s fake), disguises. D. Simple goal (chaos). E. Looks and behaves wild.
Sherlock Holmes, Guy Ritchie version. B. Loves a criminal (Irene Adler). C. A mix of a detective and a vigilante. Too much collateral damage. E. Punk attitude. Opening scene (6 minutes), with Irene (3 minutes).
Wendy from Gravity Falls. B. Age gap friendship with something romantic (though she doesn’t reciprocate). C. Unusually tough kid (15 years). Lumberjack games, survival training, stressful home with no mother. Skilled in combat. Slacks on the job, breaks the rules, breaks the law (a little bit). E. Punk attitude. Her clips, 2 minutes.
Marceline from Adventure Time. A. Rejected her dad (low-key Devil) and reign over the Nightosphere (basically Hell). B. Lives in a simple house in a cave. Her girlfriend has a castle and a kingdom. CA. She’s a punk rocker. Can turn into a monster bat. D. No complex goals. E. Punk rocker, trickster, down to earth. “I’m not mean. I’m a thousand years old, and I just lost track of my moral code.”Evicted (1:00 − 3:40), The Henchmen (5 minutes).
Inigo Montoya from The Princess Bride. C A reluctant criminal with a righteous goal and exceptional swordfighting skills. CA. Charismatic, quickly wins Weastley’s respect. Likes showing off and taunting while fighting. D. Simple goal and approach. E. Somewhat unserious, impatient, cocky and daring. 7 minute scene.
Speedwagon from JoJo. A. Was a street thug who became a wealthy businessman. B. Became an extremely loyal friend to a rich family. C. Very honorable despite being a thug. D. Straightforward mentality. E. Has the attitude. His first scene (3 minutes), exposing Dio (2 minutes).
Korben from The Fifth Element. A and C. A former special forces major becomes a taxi driver, then joins the mission to save Earth. Fights against an army of a corporation (Zorg and Mangalores). B. Manages to woo a perfect being. D. Straightforward mentality. E. Has the attitude.
Bumi Jr. from LOK. C. Has status (commander), yet he’s a comic relief. CA. Unorthodox tactics. D. A bit submissive, has self-worth issues about lacking bending powers. “Look, uh, I’m sorry I didn’t turn out to be an airbender like you hoped. But I’ve tried my best to keep the world safe. Hope I made you proud.”E. Wild, comical. Saves everyone (5 minutes), talks to Aang (30 seconds)
Eric from The Eric Andre Show. C Takes a high-status concept (late night shows) and turns it into a shitshow. CA. Chaotic. D. Humiliates others and himself. Straightforward approach. E. Punk. 3 minute example, another 3 minutes.
Real-life example:
Mikhail Tal. Incorrect and whimsical playing style, humor, disregard for health. But a champion-level strength.
Adam Sandler. C and D. Talented actor, but ready to waste his talent on shitty movies. Doesn’t treat movies as seriously as other actors. E. Has punk attitude and low-status appearance (popularized “grocery-run look”). On Oscars 2025. About his movies.
Michael Jackson. Semi-example. B. Creepy relationships with kids. C. Was a profilic philanthrope and promoted social causes in songs too (e.g. Man in the Mirror, They Don’t Care About Us). CA. Popularized fancy moves (the moonwalk, the robot, the anti-gravity lean) and music videos. D. Mentally stuck in the childhood. E. Charismatic but weird, down-to-earth.
Semi-examples:
Charlie from All Dogs Go to Heaven. A. Escapes prison, loses money, escapes heaven, becomes rich again. C. Low-life. D. Straightforward mentality. E. Punk attitude.
Fenton/Gizmoduck from DuckTales 2017. B. Loves a criminal. C. Starts as an unpaid intern with a lab in the bathroom, becomes a respected hero. D and E. Straightforward mentality, naive and earnest. Awkward, clumsy attitude. Fenton clips.
Fujimoto from Ponyo. A. Rejected humanity and became the sea king. B. A human who rizzed a goddess. C Despite being a king he’s sorta a comic relief. His kid dosabeys him and he can’t do much about it. Submissive to his wife. CA. Looks unkempt and acts whimsically. Not afraid to show up on the surface and do the dirty work himself.
Billy Madison 1995. A and C. A wealthy father with a big company has a poor-looking buffoon son, Billy, who has a chance to inherit everything. Billy goes back to elementary school, as an adult, to prove himself. B. In love with his elementary teacher. D. Doesn’t care about living seriously. Rejects the inheritance. Despite being an idiot he has a good heart and hidden smarts. E. Low-status behavior.
Han Solo from SW. A and C. A smuggler who becomes a rebel general. B. Woos a princess. E. Punk attitude.
Julian Rush from The Penguin. B. Woos a powerful girl. C. Psychiatrist turned mafia goon. D. Unexpectedly submissive.
See below. Fenton, Billy, Tom B, Rush—not enough deliberate conflict. Charlie—maybe not enough conflict too. Han, Fujimoto—not uncomplicated enough, too much responsibility and grumpiness. Michael Jackson—not enough conflict.
Social Conflict
Punkness!isolation is caused by “deliberate conflict”.
Examples:
Sirius Black. A. Run away from his wealthy racist family. B. Bullied Snape, mistreated Kreacher. C. Troublemaker, bully, assumed criminal. D. Revenge-driven. E. Somewhat arrogant (e.g. taunting while fighting).
Robin Hood from the 1991 movie. B. Fucks with Nottingham. C. Does crimes. D. Partially revenge-driven. “I’ll not rest until my father is avenged. I swear it by my own blood.” (c.) E. Brash.
Beetlejuice. B. Tries to wed Lydia against her will. C. His “job” is bio-exorcism. He scares and might kill. E. Brash, offensive.
Daesu from Oldboy. B. Trapped in an urgent conflict with Lee Woo-jin. C. Fights organized crime. D. Revenge-driven. E. Brash, offputting.
Inigo Montoya from The Princess Bride. C. A criminal. D. Revenge-driven. E. Impatient and cocky.
Alucard, TDK Joker, RDJ Sherlock (offensive, insufferable, kicking ass) are other strong examples.
Uncomplicatedness
Another sub-theme is “seeming effortless / lacking usual responsibilities / being submissive / straightforward / relaxed” (uncomplicatedness).
Examples:
Sirius Black from HP. B. Unusually devoted to Harry. Harry is canonically the person Sirius cares about the most. C. Waited 12 years in Azkaban, then was under home arrest. Sometimes freely roams as a dog. D. Straightforward mentality. E. Can be very soft.
Robin Hood from the 1991 movie. C. Lives in his utopia in the woods. CA. Most of the time dispatches enemies effortlessly. D. Straightforward mentality. E. Laidback, down-to-earth.
Beetlejuice. C. Has no real job or responsibilities. CA. Dispatches people effortlessly. D. Straightforward mentality. E. So relaxed he ignores personal boundaries.
Daesu from Oldboy. C. Was jailed for 15 years. D. Straightforward mentality. E. Can be submissive.
Alucard from Hellsing Ultimate. A. Was a sex slave. B and C. Submissive to Integra, serves her organization. CA. Dispatches people effortlessly. D and E. Straightforward mentality. Masochistic. Sees himself as inferior to dignified humans. Can dish out a good complement—“I still see the same spirited young woman I met fifty years ago, Your Majesty. In fact, in my eyes, time has made you even more beautiful, your highness.” (c.)
TDK Joker, RDJ Sherlock, Wendy, Marceline, Inigo Montoya, Speedwagon, ADGTH Charlie are other strong examples.
Direct Attachment
Another sub-theme is “direct attachment”. Unusually strong, direct, multi-faceted attachment.
Sirius. Protects Harry, cares about him the most, was ready to house him.
Robin Hood. Protects Marian physically, socially (removes Nottingham), knows her from childhood, swore to protect her to her dying brother.
Daesu from Oldboy HEAVY SPOILERS. Loves Mido, protects her physically, etc.
Alucard from Hellsing Ultimate. Serves Integra, her family and organization. Protects her. Admires her.
Speedwagon from JoJo. Helps Jojo, accompanies him, admires him, leaves criminal life behind for him. Later helps his family.
Korben (loves / protects / helps Leeloo), Inigo Montoya (devotes his entire life to avenging his father), ADGTH Charlie (protects Anne-Marie, finds her a family, visits her from Heaven) are other strong examples. Beetlejuice (obsessed with Lydia), Joker (obsessed with Batman) are weaker examples.
If you’re interested, you can message me on LW or Discord.
I’ve gone pretty deep down this rabbit hole, and the place I keep ending up is psychoanalysis. Jung is obviously the spiritual father of MBTI and most personality typology, but Lacan’s personality structures are what stuck with me.
It’s beyond my capacity to explain much Lacan in short form, but, imo, what differentiates him is that his personality structures are emergent. Personality is downstream of how people organize the satisfaction of their desires, and behavior is the language (very loaded term) people use to pursue those desires.
Unfortunately, here there be dragons. Lacan was famously, and intentionally, obscurantist. Also, a lot is also lost in translation from the original French. My personal view is that he overspecified much of his theory and that a cleaner version should exist but doesn’t.
a lot of unconventional people choose intentionally to ignore normie-legible status systems. this can take the form of either expert consensus or some form of feedback from reality that is widely accepted. for example, many researchers especially around these parts just don’t publish at all in normal ML conferences at all, opting instead to depart into their own status systems. or they don’t care whether their techniques can be used to make very successful products, or make surprisingly accurate predictions etc. instead, they substitute some alternative status system, like approval of a specific subcommunity.
there’s a grain of truth to this, which is that the normal status system is often messed up (academia has terrible terrible incentives). it is true that many people overoptimize the normal status system really hard and end up not producing very much value.
but the problem with starting your own status system (or choosing to compete in a less well-agreed-upon one) is that it’s unclear to other people how much stock to put in your status points. it’s too easy to create new status systems. the existing ones might be deeply flawed, but at least their difficulty is a known quantity.
one common retort is that it’s not worth proving yourself to people who are too closed minded and only accept ideas if they are validated by some legible status system. this is true to some extent, and i’m generally against people spending too much effort to optimize normie status too hard (e.g i think people should be way less worried about getting a degree in order to be taken seriously / get a job offer), but it’s possible to take too far.
a rational decision maker should in fact discount claims of extremely illegible quality, because there are simply too many of them and it’s too hard to pick out the good ones even if they were there (that’s sort of the whole thing about illegibillity!). it seems bad to only bestow the truth upon people who happen to be irrational in ways that cause them to take you seriously by chance. if left unchecked, this kind of thing can also very easily evolve into a cult, where the unmooring from reality checks allows huge epistemic distortions.
a good in between approach might be to do some very legibly impressive things, just to prove that you can in fact do well at the legible status system if you chose to, and are intentionally choosing not to (as opposed to choosing alternative status systems because you’re not capable of getting status in the legible system).
This comment seems to implicitly assume markers of status are the only way to judge quality of work. You can just, y’know, look at it? Even without doing a deep dive, the sort of papers or blog posts which present good research have a different style and rhythm to them than the crap. And it’s totally reasonable to declare that one’s audience is the people who know how to pick up on that sort of style.
The bigger reason we can’t entirely escape “status”-ranking systems is that there’s far too much work to look at it all, so people have to choose which information sources to pay attention to.
It’s a question of resolution. Just looking at things for vibes is a pretty good way of filtering wheat from chaff, but you don’t give scarce resources like jobs or grants to every grain of wheat that comes along. When I sit on a hiring committee, the discussions around the table are usually some mix of status markers and people having done the hard work of reading papers more or less carefully (this consuming time in greater-than-linear proportion to distance from your own fields of expertise). Usually (unless nepotism is involved) someone who has done that homework can wield more power than they otherwise would at that table, because people respect strong arguments and understand that status markers aren’t everything.
Still, at the end of day, an Annals paper is an Annals paper. It’s also true that to pass some of the early filters you either need (a) someone who speaks up strongly for you or (b) pass the status marker tests.
I am sometimes in a position these days of trying to bridge the academic status system and the Berkeley-centric AI safety status system, e.g. by arguing to a high status mathematician that someone with illegible (to them) status is actually approximately equivalent in “worthiness of being paid attention to” as someone they know with legible status. Small increases in legibility can have outsize effects in how easy my life is in those conversations.
Otherwise it’s entirely down to me putting social capital on the table (“you think I’m serious, I think this person is very serious”). I’m happy to do this and continue doing this, but it’s not easily scalable, because it depends on my personal relationships.
there is always too much information to pay attention to. without an inexpensive way to filter, the field would grind to a complete halt. style is probably a worse thing to select on than even academia cred, just because it’s easier to fake.
Generally, it is about heuristics we can use to find quality in the oceans of crap. If we assume that people are sane to some degree, status is an imperfect proxy for quality. If we assume that people don’t use AIs to polish their writing styles, the writing style is an imperfect proxy for quality.
I have no experience reading research. I suspect that there are also crackpots who can write using the right kind of style. For example, they may be experts at their own line of research, and also speak overconfidently about different things they do not understand.
So if you want to be taken seriously, you probably need to know what kind of crackpot do you remind others of, and then find a way how to distinguish yourself from this kind of crackpot specifically.
At some moment it would probably easier to simply do your homework, once, and then have something you can point at. For example, you don’t need to publish everything in the established journals, but it would probably help to publish there once—just to show that if you want, you can; that this is about your priorities, not about lack of quality.
There are probably other ways, for example if you don’t wont to get involved too much with the system, find someone who already is, and maybe offer them co-authorship in return for jumping through all the hoops.
I guess my model is that the costs of complying with the standard system are high but constant. So the more time you spend complaining about the system not taking your seriously, the greater the chance that complying with the system would have actually been cheaper than the accumulating opportunity costs.
A thing that I often see happening when people talk about “normie-legible status systems” is that they gaslight themselves into believing that some status system that is extraordinarily legible, or they are part of, is something that is consensus.
Academia is the most intense example of this. Most people don’t care that much about academic status! This also happens in the other direction. Youtube is a major source of status in much of the world, especially among young people, but is considered low-brow whenever people argue about this, and so people dismiss it.
I also think people tend to do a fallacy of gray thing where if a status system is not maximally legible (like writing popular blogposts, or running a popular podcast, or making popular Youtube videos, or being popular on Twitter), they dismiss the status system as not real and “illegible”.
I think modeling the real status and reputation systems that are present in the world is important, but for example, trying to ascent the academic status hierarchy is a bad use of time and resources. It’s extremely competitive, and not actually that influential outside of the academic bubble. It is in some fields better correlated with actual skills and integrity and intelligence, and so I still think a reasonable thing to consider, but I think most people are better placed to trade off a bit of legibility against a whole amount of net realness in status (this importantly does not mean your LW quick takes will be the thing that causes you to become world-renowned, I am not saying “just say smart things and the world will recognize you”, I am saying “don’t think that only the most legible status systems, or the one with the most mobs hunting dissenters from the status system are the only real ways of gaining recognition in the world”).
sure, the thing you’re looking for is the status system that jointly optimizes for alignedness with what you care about, and how legible it is to the people you are trying to convince.
(My guess is you meant to agree with that, but kind of the whole point of my comment was that the dimension that is more important than legibility and alignment with you is the buy-in your audience has for a given status system. Youtube is not very legible, and not that aligned, but for some audiences has very high buy-in.)
but for example, trying to ascent the academic status hierarchy is a bad use of time and resources
For some fields such as biotech, it’s difficult to get access to labs outside of academia. And you can’t learn without lab access because the cutting edge experiments don’t get posted to YouTube (yet).
There is a passage from Jung’s “Modern man in search of a soul” that I think about fairly often, on this point (p.229 in my edition)
I know that the idea of proficiency is especially repugnant to the pseudo-moderns, for it reminds them unpleasantly of their deceits. This, however, cannot prevent us from taking it as our criterion of the modern man. We are even forced to do so, for unless he is proficient, the man who claims to be modern is nothing but an unscrupulous gambler. He must be proficient in the highest degree, for unless he can atone by creative ability for his break with tradition, he is merely disloyal to the past
It’s possible that this wouldn’t work for everyone, but so far I am very satisfied working on a PhD on agent foundations (AIXI). There are a lot of complaints here about academic incentives, but mostly I just ignore them. Possibly this will eventually interfere with my academic career prospects, but in the meantime I get years to work on basically whatever I think is interesting and important, and at the end of it I can reasonably expect to end up with a PhD and a thesis I’m proud of, which seems like enough to land on my feet. Looks like the best of both worlds to me.
What kind of changes or outcomes would you expect to see if people around these parts instead of publishing their work independently started trying to get it into traditional ML conferences and related publications?
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
this raises the question: how should you pick which reference class to use, in general? how do you prevent reference class tennis, where you argue back and forth about what is the right reference class to use? I claim the solution is you want to use reference classes that have consistently made good decisions irl. the point of reference classes is to provide a heuristic to quickly apply judgement to large swathes of situations that you don’t have time to carefully examine. this is important because otherwise it’s easy to get tied up by bad actors who avoid being refuted by making their beliefs very complex and therefore hard to argue against.
the big problem with the latter reference class is it’s not like anyone has had many experiences using it to make decisions ex ante, and if you squint really hard to find day to day examples, they don’t all work out the same way. smarter humans do mostly tend to win over less smart humans. but if you work at a zoo, you will almost always be more worried about physical strength and aggressiveness when putting different species in the same enclosure. if you run a farm (or live in Australia), you’re very worried about relatively dumb invasive animals like locusts and rabbits.
on the other hand, everyone has personally experienced a dozen different doomsday predictions. whether that’s your local church or faraway cult warning about Armageddon, or Y2K, or global financial collapse in 2008, or the maximally alarmist climate people, or nuclear winter, or peak oil. for basically all of them, the right action empirically in retrospect was to not think too much about it. there are many concrete instances of people saying “but this is different” and then getting burned.
and if you allow any reference class to be on as strong a footing as very well established reference classes, then you open yourself up to getting pwned ideologically. “all complex intricate objects we have seen created have been created by something intelligent, therefore the universe must also have an intelligent creator.” it’s a very important memetic defense mechanism.
(to be clear this doesn’t mean you can only believe things others believe, or that humans taking over earth is not important evidence, or that doomsday is impossible!! I personally think AGI will probably kill everyone. but this is a big claim and should be treated as such. if we don’t accept this, then we will forever fail to communicate with people who don’t already agree with us on AGI x-risk.)
I endeavor to look at how things work and describe them accurately. Similarly to how I try to describe how a piece of code works, or how to to build a shed, I will try to accurately describe the consequences of large machine learning runs, which can include human extinction.
I personally think AGI will probably kill everyone. but this is a big claim and should be treated as such.
This isn’t how I think about things. Reality is what exists, and if a claim accurately describes reality, then I should not want to hold it to higher standards than claims that do not describe reality. I don’t think it’s a good epistemology to rank claims by “bigness” and then say that the big ones are less likely and need more evidence. On the contrary, I think it’s worth investing more in finding out if they’re right, and generally worth bringing them up to consideration with less evidence than for “small” claims.
on the other hand, everyone has personally experienced a dozen different doomsday predictions. whether that’s your local church or faraway cult warning about Armageddon, or Y2K, or global financial collapse in 2008, or the maximally alarmist climate people, or nuclear winter, or peak oil. for basically all of them, the right action empirically in retrospect was to not think too much about it.
I don’t have the experiences you’re describing. I don’t go to churches, I don’t visit cults, I was 3yrs old in the year 2000, I was 11 for the ’08 financial crash and having read about it as an adult I don’t recall extinction being a topic of discussion, I think I have heard of climate people saying that via alarmist news headlines but I have not had anyone personally try to convince me of this or even say that they believe it. I have heard it discussed for nuclear winter, yes, and I think nukes are quite scary and it was reasonable to consider, I did not dismiss it out of hand and wouldn’t use that heuristic. I don’t know what the oil thing is.
In other words, I don’t recall anyone seriously trying to convince me that the world was ending except in cases where they had good reason to believe it. In my life, when people try to warn me about big things, especially if they’ve given it serious thought, usually I’ve found it’s been worthwhile for me to consider it. (I like to think I am good at steering clear of scammers and cranks, so that I can trust the people in my life when they tell me things.)
The sense I get from this post is that, in it, you’re assuming everyone else in the world is constantly being assaulted with claims meant to scare and control them rather than people attempting to describe the world accurately. I agree there are forces doing that, but I think this post gives up all too quickly on there being other forces in the world that aren’t doing that that people can recognize and trust.
i am also trying to accurately describe reality. what i’m saying is, even from the perspective of someone smart and truth-seeking but who doesn’t know much about the object-level, it is very reasonable to use bigness of claim as a heuristic for how much evidence you need before you’re satisfied, and that if you don’t do this, you will be worse at finding the truth in practice. my guess is this applies even more so to the average person.
i think this is very analogous to occam’s razor / trust region optimization. clearly, we need to discount theories based on complexity because there are exponentially more complex theories compared to simple ones, many of which have no easily observable difference to the simpler ones, opening you up to being pwned. and empirically it seems a good heuristic to live life by. complex theories can still be true! but given two theories that both accurately describe reality, you want the simpler one. similarly, given two equally complex claims that accurately describe the evidence, you want the one that is less far fetched from your current understanding of the world / requires changing less of your worldview.
also, it doesn’t have to be something you literally personally experienced. it’s totally valid to read the wikipedia page on the branch davidians or whatever and feel slightly less inclined to take things that have similar vibes seriously, or even to absorb the vibe from your environs (your aversion to scammers and cranks surely did not come ex nihilo, right?)
for most of the examples i raised, i didn’t necessarily mean the claim was literally 100% human extinction, and i don’t think it matters that it wasn’t. first, because the important thing is the vibe of the claim (catastrophic) - since we’re talking about heuristics on how seriously to take things that you don’t have time to deep dive on, the rule has to be relatively cheap to implement. i think most people, even quite smart people, genuinely don’t feel much of an emotional difference between literal human extinction vs collapse of society vs half of people dying painfully, unless they first spend a half hour carefully thinking about the implications of extinction. (and even then depending on their values they may still not feel a huge difference)
also, it would be really bad if you could weasel your way out of a reference class that easily; it would be rife for abuse by bad actors—“see, our weird sect of christianity claims that after armageddon, not only will all actual sinners’ souls be tortured forever, but that the devil will create every possible sinner’s soul to torture forever! this is actually fundamentally different from all existing christian theories, and it would be unfathomably worse, so it really shouldn’t be thought of as the same kind of claim”
even if most people are trying to describe the world accurately (which i think is not true and we only get this impression because we live in a strange bubble of very truth seeking people + are above-average capable at understanding things object level and therefore quickly detecting scams), ideas are still selected for memeticness. i’m sure that 90% of conspiracy theorists genuinely believe that humanity is controlled by lizards and are trying their best to spread what they believe to be true. many (not all) of the worst atrocities in history have been committed by people who genuinely thought they were on the side of truth and good.
(actually, i think people do get pwned all the time, even in our circles. rationalists are probably more likely than average (controlling for intelligence) to get sucked into obviously culty things (e.g zizians), largely because they don’t have the memetic antibodies needed to not get pwned, for one reason or another. so probably many rationalists would benefit from evaluating things a little bit more on vibes/bigness and a little bit less on object level)
Your points about Occam’s razor have got nothing to do with this subject[1]. The heuristic “be more skeptical of claims that would have big implications if true” makes sense only when you suspect a claim may have been adversarially optimized for memetic fitness; it is not otherwise true that “a claim that something really bad is going to happen is fundamentally less likely to be true than other claims”.
I’m having a little trouble connecting your various points back to your opening paragraph, which is the primary thing that I am trying to push back on.[2]
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
To restate the message I’m reading here: “Give up on having a conversation where you evaluate the evidence alongside your interlocutors. Instead frame yourself as trying to convince them of something, and assume that they are correct to treat your communications as though you are adversarially optimizing for them believing whatever you want them to believe.” This assumption seems to give up a lot of my ability to communicate with people (almost ~all of it), and I refuse to simply do it because some amount of communication in the world is adversarially optimized, and I’m definitely not going to do it because of a spurious argument that Occam’s razor implies that “claims about things being really bad or claims that imply you need to take action are fundamentally less likely to be true”.
You are often in an environment where people are trying to use language to describe reality, and in that situation the primary thing to evaluate is not the “bigness” of a claim, but the evidence for and against it. I recommend instead to act in such a way as to increase the size and occurrence of that environment more-so than “act as though it’s correct to expect maximum adversarial optimization in communications”.
(Meta: The only literal quotes of Leo’s in this comment are the big one in the quote block, my use of “” is to hold a sentence as object, they are not things Leo wrote.)
I agree that the more strongly a claim implies that you should take action, then the more you should consider that it is being optimized adversarially for you to take action. For what it’s worth, I think that heuristic applies more so to claims that you should personally take action. Most people have little action to directly prevent the end of the world from AI; this is a heuristic more naturally applied to claims that you need to pay fines (which are often scams/spam). But mostly, when people give me claims that imply action, they are honestly meant claims and I do the action. This is the vast majority of my experience.
Aside to Leo: Rather than reply point-by-point to the each of the paragraphs in the second comment, I will try restating and responding to the core message I got in the opening paragraph of the first comment. I’m doing this because the paragraphs in the second-comment seemed somewhat distantly related / I couldn’t tell whether the points were actually cruxy. They were responding to many different things, and I hope restating the core thing will better respond to your core point. However I don’t mean to avoid key arguments, if you think I have done so feel free to tell me one or two paragraphs you would especially like me to engage with and I will do so in any future reply.
in practice many of the claims you hear will be optimized for memetic fitness, even if the people making the claims are genuine. well intentioned people can still be naive, or have blind spots, or be ideologically captured.
also, presumably the people you are trying to convince are on average less surrounded by truth seeking people than you are (because being in the alignment community is strongly correlated with caring about seeking truth).
i don’t think this gives up your ability to communicate with people. you simply have to signal in some credible way that you are not only well intentioned but also not merely the carrier of some very memetic idea that slipped past your antibodies. there are many ways to accomplish this. for example, you can build up a reputation of being very scrupulous and unmindkilled. this lets you convey ideas freely to other people in your circles that are also very scrupulous and unmindkilled. when interacting with people outside this circle, for whom this form of reputation is illegible, you need to find something else. depending on who you’re talking to and what kinds of things they take seriously, this could be leaning on the credibility of someone like geoff hinton, or of sam/demis/dario, or the UK government, or whatever.
this might already be what you’re doing, in which case there’s no disagreement between us.
You’re writing lots of things here but as far as I can tell you aren’t defending your opening statement, which I believe is mistaken.
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
Firstly, it’s just not more reasonable. When you ask yourself “Is a machine learning run going to lead to human extinction?” you should not first say “How trustworthy are people who have historically claimed the world is ending?”, you should of course primarily bring your attention to questions about what sorts of machine is being built, what sort of thinking capacities it has, what sorts of actions it can take in the world, what sorts of optimization it runs, how it would behave around humans if it were more powerful than them, and so on. We can go back to discussing epistemology 101 if need be (e.g. “Hug the Query!”).
Secondly, insofar as someone believes you are a huckster or a crackpot, you should leave the conversation, communication here has broken down and you should look for other communication opportunities. However, insofar as someone is only evaluating this tentatively as one of many possible hypotheses about you then you should open yourself up to auditing / questioning by them about why you believe what you believe and your past history and your memetic influences. Being frank is the only way through this! But you shouldn’t say to them “Actually, I think you should treat me like a huckster/scammer/serf-of-a-corrupt-empire.” This feels analogous to a man on a date with a woman saying “Actually I think you should strongly privilege the hypothesis that I am willing to rape you, and now I’ll try to provide evidence for you that this is not true.” It would be genuinely a bad sign about a man that he thinks that about himself, and also he has moved the situation into a much more adversarial frame.
I suspect you could write some more narrow quick-take such as “Here is some communication advice I find helpful when talking with friends and colleagues about how AI can lead to human extinction”, but in generalizing it all the way to making dictates about basic epistemology you are making basic mistakes and getting it wrong.
Please either (1) defend and/or clarify the original statement, or (2) concede that it was mistaken, rather than writing more semi-related paragraphs about memetic immune systems.
I am confused why you think my claims are only semi related. to me my claim is very straightforward, and the things i’m saying are straightforwardly converying a world model that seems to me to explain why i believe my claim. i’m trying to explain in good faith, not trying to say random things. i’m claiming a theory of how people parse information, to justify my opening statement, which i can clarify as:
sometimes, people use the rhetorical move of saying something like “people think 95% doom is overconfident, yet 5% isn’t. but that’s also being 95% confident in not-doom, and yet they don’t consider that overconfident. curious.” followed by “well actually, it’s only a big claim under your reference class. under mine, i.e the set of all instances of a more intelligent thing emerging, actually, 95% doom is less overconfident than 5% doom” this post was inspired by seeing one such tweet, but i see such claims like this every once in a while that play reference class tennis.
i think this kind of argument is really bad at persuading people who don’t already agree (from empirical observation). my opening statement is saying “please stop doing this, if you do it, and thank you for not doing this, if you dont already do it” the rest of my paragraphs provide an explanation of my theory for why this is bad for changing people’s minds. this seems pretty obviously relevant for justifying why we should stop doing the thing. i sometimes see people out there talk like this (including my past self at some point), and then fail to convince people, and then feel very confused about why people don’t see the error of their ways when presented with an alternative reference class. if my theory is correct (maybe it isn’t, this isn’t a super well thought out take, it’s more a shower thought), then it would explain this, and people who are failing to convince people would probably want to know why they’re failing. i did not spell this out in my opening statement because i thought it was clear but in retrospect this was not clear from the opening statement
i don’t think the root cause is people being irrational epistemically. i think there is a fundamental reason why people do this that is very reasonable. i think you disagree with this on the object level and many of my paragraphs are attempting to respond to what i view as the reason you disagree. this does not explicitly show up in the opening statement, but since you disagree with this, i thought it would make sense to respond to that too
i am not saying you should explicitly say “yeah i think you should treat me as a scammer until i prove otherwise”! i am also not saying you should try to argue with people who have already stopped listening to you because they think you’re a scammer! i am merely saying we should be aware that people might be entertaining that as a hypothesis, and if you try to argue by using this particular class of rhetorical move, you will only trigger their defenses further, and that you should instead just directly provide the evidence for why you should be taken seriously, in a socially appropriate manner. if i understand correctly, i think the thing you are saying one should do is the same as the thing i’m saying one should do, but phrased in a different way; i’m saying not to do a thing that you seem to already not be doing.
i think i have not communicated myself well in this conversation, and my mental model is that we aren’t really making progress, and therefore this conversation has not brought value and joy into the world in the way i intended. so this will probably be my last reply, unless you think doing so would be a grave error.
I am confused why you think my claims are only semi related. to me my claim is very straightforward, and the things i’m saying are straightforwardly converying a world model that seems to me to explain why i believe my claim. i’m trying to explain in good faith, not trying to say random things. i’m claiming a theory of how people parse information, to justify my opening statement,
Thank you for all this. I still think your quick take is wrong on the matter of epistemology.
I acknowledge that you make a fine point about persuasion, that someone who is primarily running the heuristic that “claims about the end of the world are probably crack-pots or scammers” will not be persuaded by someone arguing that actually 20:1 against and 20:1 in favor of a claim are equally extreme beliefs.
A version of the quick take that I would’ve felt was just fine would read:
Some people have basically only heard claims of human extinction coming from crackpots and scammers, and will not have thought much about the AI extinction idea on the object level. To them, this sort of argument I’ve discussed is unpersuasive at moving beyond the “is this a crackpot/scam” part of the dialogue. In this quick take I’ll outline my model of how they’re thinking about it, and give recommendations for how you should argue instead.
But your quick take doesn’t confine itself to discussing those people in those situations. It flatly says it’s true as a matter of epistemology that you should “use bigness of claim as a heuristic for how much evidence you need before you’re satisfied”, that you should “use reference classes that have consistently made good decisions irl” and that the crackpots/scammers one is the correct one to use here otherwise you’ll risk “getting pwned ideologically”.
These aren’t always the right heuristics (e.g. on this issue they are not for you and for me) and you shouldn’t say that they are just so that some people on Twitter will stop using rhetoric that isn’t working.
I believe you’re trying to do your best to empathize with people who are unpersuaded by an unsuccessful rhetorical move, a move that people who believe your position are making in public discourse. That is commendable. I think you are attempting to cause other people who hold your position to stop using that rhetorical move, by telling them off for using it, but to acheive this aim you are repeatedly saying the people who do not hold your position are doing normatively correct epistemology, and you’re justifying it with Occam’s razor and reference class forecasting, and this is all wrong. In some situations for some people it is reasonable to primarily use theses heuristics, and in other situations for other people it is not. I’m not arguing that the people unpersuaded are being unreasonable, but (for example) your opening sentence makes fully-general statements about how to reason about this issue that I believe are false. Rule number of one of good discourse: don’t make false statements about epistemology in order to win an object level point.
Yep, seems fine to drop this here; I make no bid of you to reply further.
(I would never make knowingly false statements about epistemology to try to win an object level point; I still disagree with your claims about epistemology and believe that my epistemology arguments are in good faith and capture truth in some way. This disagreement might be because I’ve not communicated myself well. I originally wasn’t going to reply but I felt the need to say this because your comment can be viewed as accusing me of intellectual/epistemic dishonesty, even if that wasn’t your intention.)
Firstly, it’s just not more reasonable. When you ask yourself “Is a machine learning run going to lead to human extinction?” you should not first say “How trustworthy are people who have historically claimed the world is ending?”
But you should absolutely ask “does it look like I’m making the same mistakes they did, and how would I notice if it were so?” Sometimes one is indeed in a cult with your methods of reason subverted, or having a psychotic break, or captured by a content filter that hides the counterevidence, or many of the more mundane and pervasive failures in kind.
But not in full generality! This is a fine question to raise in this context, but in general the correct thing to do in basically all situations is to consider the object level, and then also let yourself notice if people are unusually insane around a subject, or insane for a particular reason. Sometimes that is the decisive factor, but for all questions, the best first pass is to think about how that part of the world works, rather than to think about the other monkeys who have talked about it in the past.
The heuristic “be more skeptical of claims that would have big implications if true” makes sense only when you suspect a claim may have been adversarially optimized for memetic fitness; it is not otherwise true that “a claim that something really bad is going to happen is fundamentally less likely to be true than other claims”.
This seems wrong to me.
a. More smaller things happen and there are fewer kinds of smaller thing that happen. b. I bet people genuinely have more evidence for small claims they state than big ones on average. c. The skepticism you should have because particular claims are frequently adversarially generated shouldn’t first depend on deciding to be skeptical about it.
If you’ll forgive the lack of charity, ISTM that leogao is making IMO largely true points about the reference class and then doing the wrong thing with those points, and you’re reacting to the thing being done wrong at the end, but trying to do this in part by disagreeing with the points being made about the reference class. leogao is right that people are reasonable in being skeptical of this class of claims on priors, and right that when communicating with someone it’s often best to start within their framing. You are right that regardless it’s still correct to evaluate the sum of evidence for and against a proposition, and that other people failing to communicate honestly in this reference class doesn’t mean we ought to throw out or stop contributing to the good faith conversations avaialable to us.
i’m not even saying people should not evaluate evidence for and against a proposition in general! it’s just that this is expensive, and so it is perfectly reasonable to have heuristics to decide which things to evaluate, and so you should first prove with costly signals that you are not pwning them, and then they can weigh the evidence. and until you can provide enough evidence that you’re not pwning them for it to be worth their time to evaluate your claims in detail, that it should not be surprising that many people won’t listen to the evidence; and that even if they do listen, if there is still lingering suspicion that they are being pwned, you need to provide the type of evidence that could persuade someone that they aren’t getting pwned (for which being credibly very honest and truth seeking is necessary but not sufficient), which is sometimes different from mere compellingness of argument
I think the framing that sits better to me is ‘You should meet people where they’re at.’ If they seem like they need confidence that you’re arguing from a place of reason, that’s probably indeed the place to start.
a. I expect there is a slightly more complicated relationship between my value-function and the likely configuration states of the universe than literally zero-correlation, but most configuration states do not support life and we are all dead, so in one sense a claim that in the future something very big and bad will happen is far more likely on priors. One might counter that we live in a highly optimized society where things being functional and maintained is an equilibrium state and it’s unlikely for systems to get out of whack enough for bad things to happen. But taking this straightforwardly is extremely naive, tons of bad things happen all the time to people. I’m not sure whether to focus on ‘big’ or ‘bad’ but either way, the human sense of these is not what the physical universe is made out of or cares about, and so this looks like an unproductive heuristic to me.
b. On the other hand, I suspect the bigger claims are more worth investing time to find out if they’re true! All of this seems too coarse-grained to produce a strong baseline belief about big claims or small claims.
c. I don’t get this one. I’m pretty sure I said that if you believe that you’re in a highly adversarial epistemic environment, then you should become more distrusting of evidence about memetically fit claims.
I don’t know what true points you think Leo is making about “the reference class”, nor which points you think I’m inaccurately pushing back on that are true about “the reference class” but not true of me. Going with the standard rationalist advice, I encourage everyone to taboo “reference class” and replace it with a specific heuristic. It seems to me that “reference class” is pretending that these groupings are more well-defined than they are.
c. I don’t get this one. I’m pretty sure I said that if you believe that you’re in a highly adversarial epistemic environment, then you should become more distrusting of evidence about memetically fit claims.
Well, sure, it’s just you seemed to frame this as a binary on/off thing, sometimes you’re exposed and need to count it and sometimes you’re not, whereas to me it’s basically never implausible that a belief has been exposed to selection pressures, and the question is of probabilities and degrees.
I think you’re correct. There’s a synergistic feedback loop between alarmism and social interaction that filters out pragmatic perspectives. Creating the illusion that the doom surrounding any given topic more prevalent than it really is, or even that it’s near universal.
Even before the rise of digital information the feedback phenomenon could be observed in any insular group. In today’s environment where a lot of effort goes into exploiting that feedback loop it requires a conscious effort to maintain perspective, or even remain aware that there are other perspectives.
I think the group of people “claiming the end of the world” in the case of AI x-risk is importantly more credentialed and reasonable-looking than most prior claims about the end of the world. From the reference class and general heuristics perspective that you’re talking about[1], I think how credible looking the people are is pretty important.
So, I think the reference class is more like claims of nuclear armageddon than cults. (Plausibly near maximally alarmist climate people are in a similar reference class.)
I agree this reference class is better, and implies a higher prior, but I think it’s reasonable for the prior over “arbitrary credentialed people warning about something” to be still relatively low in an absolute sense- lots of people have impressive sounding credentials that are not actually good evidence of competence (consider: it’s basically a meme at this point that whenever you see a book where the author puts “PhD” after their name, they probably are a grifter / their phd was probably kinda bs), and also there is a real negativity bias where fearmongering is amplified by both legacy and social media. Also, for the purposes of understanding normal people, it’s useful to keep in mind that trust in credentials and institutions is not very high right now in the US among genpop.
for basically all of them, the right action empirically in retrospect was to not think too much about it.
False?
Climate change tail scenarios are worth studying and averting. Nuclear winter was obviously worth studying and averting back in the Cold War, and still is today. 2008 financial crisis was worth studying and averting.
Do you not believe average citizens can study issues like these and make moves to solve them?
This is kind of missing the point of Bayes. One shouldn’t “choose” a reference class to update on. One should update to the best of your ability on the whole distribution of hypotheses available to describe the situation. Neither is a ‘right’ or ‘wrong’ reference class to use, they’re both just valid pieces of evidence about base rates, and you should probably be using both of them.
It seems you are having in mind something like inference to the best explanation here. Bayesian updating, on the other hand, does need a prior distribution, and the question of which prior distribution to use cannot be waved away when there is a disagreement on how to update. In fact, that’s one of the main problems of Bayesian updating, and the reason why it is often not used in arguments.
I’m not really sure what that has to do with my comment. My point is the original post seemed to be operating as if you look for the argmax reference class, you start there, and then you allow arguments. My point isn’t that their prior is wrong, it’s that this whole operation is wrong.
I think also you’re maybe assuming I’m saying the prior looks something like {reference class A, reference class B} and arguing about the relative probability of each, but it doesn’t, a prior should be over all valid explanations of the prior evidence. Reference classes come in because they’re evidence about base rates of particular causal structures; you can say ‘given the propensity for the world to look this way, how should I be correcting the probability of the hypotheses under consideration? Which new hypotheses should I be explicitly tracking?’
I can see where the original post might have gone astray. People have limits on what they can think about and it’s normal to narrow one’s consideration to the top most likely hypothesis. But it’s important to be aware of what you’re approximating here, else you get into a confusion where you have two valid reference classes and you start telling people that there’s a correct one to start arguing from.
I agree this is an interesting philosophical question but again I’m not sure why you’re bringing it up.
Given your link maybe you think me mentioning Bayes was referring to some method of selecting a single final hypothesis? I’m not, I’m using it to refer to the Bayesian update rule.
It seems the updating rule doesn’t tell you anything about the original argument even when you view information about reference classes as evidence rather than as a method of assigning prior probabilities to hypotheses. Or does it? Can you rephrase the argument in a proper Bayesian way such that it becomes clearer? Note that how strongly some evidence confirms or disconfirms a hypothesis also depends on a prior.
What argument are you referring to when you say “doesn’t tell you anything about the original argument”?
My framing is basically this: you generally don’t start a conversation with someone as a blank pre-priors slate that you get to inject your priors into. The prior is what you get handed, and then the question is how people should respond to the evidence and arguments available. Well, you should use (read: approximate) the basic Bayesian update rule: hypotheses where an observation is unlikely are that much less probable.
I think you’re underestimating the inferential gap here. I’m not sure why you’d think the Bayes updating rule is meant to “tell you anything about” the original post. My claim was that the whole proposal about selecting reference classes was framed badly and you should just do (approximate) Bayes instead.
You’re having a conversation with someone. They believe certain things are more probable than other things. They mention a reference class: if you look at this grouping of claims, most of them are wrong. Then you consider the set of hypotheses: under each of them, how plausible is it given the noted tendency for this grouping of claims to be wrong? Some of them pass easily, eg. the hypothesis that this is just another such claim. Some of them less easily; they are either a modal part of this group and uncommon on base rate, or else nonmodal or not part of the group at all. You continue, with maybe a different reference class, or an observation about the scenario.
Hopefully this illustrates the point. Reference classes are just evidence about the world. There’s no special operation needed for them.
The reference classes you should use work as a heuristic because there is some underlying mechanism that makes them work. So you should use reference classes in situations where their underlying mechanism is expected to work.
Maybe the underlying mechanism of doomsday predictions not working is that people predicting doom don’t make their predictions based on valid reasoning. So if someone uses that reference class to doubt AI risk, this should be judged as them making a claim about reasoning of people predicting AI doom being similar to people in cults predicting Armageddon.
you might expect that the butterfly effect applies to ML training. make one small change early in training and it might cascade to change the training process in huge ways.
at least in non-RL training, this intuition seems to be basically wrong. you can do some pretty crazy things to the training process without really affecting macroscopic properties of the model (e.g loss). one very well known example is that using mixed precision training results in training curves that are basically identical to full precision training, even though you’re throwing out a ton of bits of precision on every step.
people often say that limitations of an artistic medium breed creativity. part of this could be the fact that when it is costly to do things, the only things done will be higher effort
a medium with less limitations is strictly better for making good art, but it’s also harder to identify good art among the sea of bad art because the medium alone is no longer as good a signal of quality
Yes, but this also happens within one person over time, and the habit (of either investing, or not, in long-term costly high-quality efforts) can gain Steam in the one person.
This seems the likely explanation for any claim that constraints breed creativity/good things in a field, when the expectation is that the opposite outcome would occur.
My own expectation is that limitations result in creativity. Writers block is usually a result of having too many possibilities/choices. If I tell you “You can write a story about anything”, it’s likely harder for you to think of anything than if I tell you “Write a story about an orange cat”. In the latter situation, you’re more limited, but you also have something to work with.
I’m not sure if it’s as true for computers as it is for humans (that would imply information-theoretic factors), but there’s plenty of factors in humans, like analysis paralysis and the “See also” section of that page
My other explanation probably has to do with the fact that it’s way easier to work with an already almost-executed object than a specification, because we are constrained to only think about a subset of possibilities for a reasonable time.
In other words, constraints are useful given that you are already severely constrained, to limit the space of possibilities.
any time someone creates a lot of value without capturing it, a bunch of other people will end up capturing the value instead. this could be end consumers, but it could also be various middlemen. it happens not infrequently that someone decides not to capture the value they produce in the hopes that the end consumers get the benefit, but in fact the middlemen capture the value instead
an example: open source software produces lots of value. this value is partly captured by consumers who get better software for free, and partly by businesses that make more money than they would otherwise.
the most clear cut case is that some businesses exist purely by wrapping other people’s open source software, doing advertising and selling it for a handsome profit; this makes the analysis simpler, though to be clear the vast majority of cases are not this egregious.
in this situation, the middleman company is in fact creating value (if a software is created in a forest with no one around to use it, does it create any value?) by using advertising to cause people to get value from software. in markets where there are consumers clueless enough to not know about the software otherwise (e.g legacy companies), this probably does actually create a lot of counterfactual value. however, most people would agree that the middleman getting 90% of the created value doesn’t satisfy our intuitive notion of fairness. (open source developers are more often trying to have the end consumers benefit from better software, not for random middlemen to get rich off their efforts)
and if advertising is commoditized, then this problem stops existing (you can’t extract that much value as an advertising middleman if there is an efficient market with 10 other competing middlemen), and so most of the value does actually accrue to the end user.
Often tickets will be sold at prices considerably lower than the equilibrium price and thus ticket scalpers will buy the tickets and then resell for a high price.
That said, I don’t think this typically occurs because the company/group originally selling the tickets wanted consumers to benefit, it seems more likely that this is due to PR reasons (it looks bad to sell really expensive tickets).
This is actually a case where it seems likely that the situation would be better for consumers if the original seller captured the value. (Because buying tickets from random scalpers is annoying.)
I wonder how much of this is the PR reasons, and how much something else… for example, the scalpers cooperating (and sharing a part of their profits) with the companies that sell tickets.
To put it simply, if I sell a ticket for $200, I need to pay a tax for the $200. But if I sell the same ticket for $100 and the scalper re-sells it for $200, then I only need to pay the tax for $100, which might be quite convenient if the scalper… also happens to be me? (More precisely, some of the $100 tickets are sold to genuine 3rd party scalpers, but most of them I sell to myself… but according to my tax reports, all of them were sold to the 3rd party.)
ticket scalping is bad and we should find some sort of fully distributed market mechanism that makes scalping approach impossible without requiring the ticket seller to capture the value. it ought to be possible to gift value to end customers rather than requiring the richest to be the ones who get the benefit, how can that be achieved?
it ought to be possible to gift value to end customers rather than requiring the richest to be the ones who get the benefit, how can that be achieved?
The simple mechanism is:
Charge market prices (auction or just figure out the equilibrium price normally)
Redistribute the income uniformly to some group. Aka UBI.
Of course, you could make the UBI be to (e.g.) Taylor Swift fans in particular, but this is hardly a principled approach to redistribution.
Separately, musicians (and other performers) might want to subsidize tickets for extremely hard core fans because these fans add value to the event (by being enthusiastic). For this, the main difficulty is that it’s hard to cheaply determine if someone is a hard core fan. (In principle, being prepared to buy tickets before they run out could be an OK proxy for this, but it fails in practice, at least for buying tickets online.)
of course, this is more a question about equilibria than literal transactions. suppose you capture most of the value and then pay it back out to users as a dividend: the users now have more money with which they could pay a middleman, and a middleman that could have extracted some amount of value originally can still extract that amount of value in this new situation.
we can model this as a game of ultimatum between the original value creator and the middlemen. if the participation of the OVC and middleman are both necessary, the OVC can bargain for half the value in an iterated game / as FDT agents. however, we usually think of the key differentiating factor between the OVC and middlemen as the middlemen being more replaceable, so the OVC should be able to bargain for a lot more. (see also: commoditizing your complement)
so to ensure that the end users get most of the value, you need to either ensure that all middleman roles are commoditized, or precommit to only provide value in situations where the end user can actually capture most of the value
The equilibrium comprises literal transactions, right? You should be able to find MANY representative specific examples to analyze, which would help determine whether your model of value is useful in these cases.
My suspicion is that you’re trying to model “value” as something that’s intrinsic, not something which a relation between individuals, which means you are failing to see that the packaged/paid/delivered good is actually distinct and non-fungible with the raw/free/open good, for the customers who choose that route.
Note that in the case of open-source software, it’s NOT a game of ultimatum, because both channels exist simultaneously and neither has the option to deny the other. A given consumer paying for one does not prevent some other customer (or even the same customer in parallel) using the direct free version.
It’s worth examining whether “capturing value” and “providing value” are speaking of the same thing. In many cases, the middlemen will claim that they’re actually providing the majority of the value, in making the underlying thing useful or available. They may or may not be right.
For most goods, it’s not clear how much of the consumer use value comes from the idea, the implementation of the idea, or from the execution of the delivery and packaging. Leaving aside government-enforced exclusivity, there are usually reasons for someone to pay for the convenience, packaging, and bundling of such goods.
I worked (long ago) in physical goods distribution for toys and novelties. I was absolutely and undeniably working for a middleman—we bought truckloads of stuff from factories, repackaged it for retail, and sold it at a significant markup to retail stores, who marked it up again and sold it to consumers. Our margins were good, but all trades were voluntary and I don’t agree with a framing that we were “capturing” existing value rather than creating value in connecting supply with demand.
All value is finite, and every time value is used, it decreases. The middlemen are merely causing the thing to die faster. For instance, if you discover a nice beach which hasn’t been ruined with plastic and glass bottle yet, and make it into a popular area, you won’t get to spend many happy summers at that place.
If you find oil and sell it, are you creating value, or are you destroying value? I think both perspectives are valid. But since the openness of information in the modern world makes it so that everything which can be exploited will be exploited, and until the point that exploitation is no longer possible (as with the ruined beach), I strongly dislike unsustainable exploitation and personally tend toward the “destroying value” view.
And if you want something to worry about, let it be premature exploitation. X ‘creates’ value and chooses not to exploit it prematurely, but then Y will come along and take it, so X is forced to capitalize on it early. Now you have a moloch problem on your hands.
more importantly, both i and the other person get more out of the conversation. almost always, there are subtle misunderstandings and the rest of the conversation would otherwise involve a lot of talking past each other. you can only really make progress when you’re actually engaging with the other person’s true beliefs, rather than a misunderstanding of their beliefs.
i wonder how much the following hypothesis is true. it’s obviously not completely true, but maybe there is some value in contemplating it.
for most issues that people have expended a lot of effort arguing about, if you could truly impartially reason through it, run experiments, etc, the correct answer is either not that hard to figure out, or that we’re pretty sure we can’t know one way or the other. but the discourse is fucked because the vast majority (maybe literally all) of people have some bottom line they’re already sympathetic to, and smart people can make plausible sounding arguments for any conclusion, and so truly reasoning things through impartially is both very hard to do and even if you somehow manage to do it, it’s hard to signal that you did so, and anyways people with motivated reasoning will only listen to you if your answer happened to agree with theirs.
the main evidence i have for this hypothesis is that there are questions where one side is overwhelmingly obviously correct if you actually think about it or look into it, and yet, there is the appearance of a balanced public debate.
the main evidence against this hypothesis is that probably people who disagree also think they’re overwhelmingly obviously correct, and it seems arrogant to declare that i am simply more correct, and in any case oftentimes people disagreeing are wrong but their disagreement still contains enough of a kernel of truth to be worth thinking about.
corollary: the ability to look upon difficult controversial problems with utter naivety is extremely valuable. it’s probably bad to always be in this state, because you can get pwned by bad actors. but cultivating an ability to enter into this state, possibly through a community where people value this a lot, is extremely valuable (LW is the closest thing to this that exists in the world afaik, but please let me know if there is anywhere else that is better)
For empirical questions, one litmus test is whether most of the populace doesn’t know some true (like stats reported by large institutions with no disagreement from other reputable institutions) and clearly relevant fact, or repeatedly overestimates the observed risk of something.
AI water usage is an example. Others are the deaths per TWh of nuclear power, that extreme poverty (≤$3/day) went down by 3 times in the last 20 years (along with what percent are below that line), observed prediction market accuracy, the annual revenue of the US federal government as compared to e.g. the total net worth of billionaires (it’s about 1⁄3 of it), and that the labor share of GDP over time hasn’t dropped much.
I sometimes wonder if political beliefs would be very different if every child knew a certain set of basic stats—and also what political beliefs you could get if you adversarially optimized that set (while still keeping the stats reliably true and ‘basic’) to lead them to some specific conclusion.
the interesting thing about lots of these basic stats is that people who believe otherwise will often simply refuse to listen, because listening to the fact might change their mind, and they they would become an Outsider, and then be expelled by their social circle. or something like that.
considering that this can happen even for the clearest cut examples, one can only imagine how many less clear cut examples are also like this. perhaps a huge fraction of seemingly controversial issues actually have a simple answer that you can arrive at just by simply not having a preconceived answer, but where people incorrectly assume that people are disagreeing for good reasons.
(of course, even if this is true, there can and likely do still exist many problems where there genuinely is no simple undisputable answer!)
I don’t think AI water usage fits the pattern. People keep repeating the line “AI uses a lot of water”, and making political cartoons about AI slurping up all the water in the world, but I don’t remember anyone ever arguing for it being true.
This is in contrast to religion, price fixing and possibility of AGI, where you can in fact find smart people making long and complicated arguments on both sides.
here are 3 links that argue for AI water consumption being a big problem that i found in like 5 minutes. they aren’t good arguments, but they go beyond just asserting it.
I think this is basically correct, modulo that a lot of this is just that people differ in their intelligence and epistemological skill, and so there are classes of dispute that are basically obvious to, for instance, you, while to many earnest but less capable truth seekers, it’s in fact not easy to say who’s right.
In my experience, there aren’t many such debates where the correct answer readily leaps to the surface if you look for it. It’s more that the correct answer is obvious once you find it, but if you’re in a poor epistemic environment, the answer is hard to find even if you’re looking for it.
For example, the literal veracity of Christianity is hard to determine if everyone you know is a devout Christian, even if you’re not biased at all by wanting to believe what other people believe.
if you could truly impartially reason through it, run experiments, etc, the correct answer is either not that hard to figure out, or that we’re pretty sure we can’t know one way or the other.
What’s the alternative? That finding the correct answer is tractable but hard in the way that major scientific progress is?
the alternative is that it must be hard or impossible to know the truth on an issue anytime where there is the appearance of a vaguely balanced public debate among smart people. for example, if you believed the alternative deeply, you might say “well, a bunch of smart people think that AI uses a lot of water, and a bunch of other smart people think that AI doesn’t use a lot of water. while the argument that AI water use isn’t a big deal makes a lot of sense to me, clearly the people who think it does use a lot of water are equally convinced of their position, so who really knows who’s right”
Some kind of miscommunication has happened. I thought the claim in the OP was that most topics fall in bin A “easy to figure out” OR bin B “probably unknowable”, which seemed like a trivial claim to me because the only alternative is bin C “tractable but hard”. But now it seems like you’re arguing most topics fall in bin A?
conlang idea: an extremely easy to learn language with the following attributes:
grammar: mandarin, but without several of the quirks (no counters, merging 的/地/得, merging 不/没, etc)
vocab: drawn from mostly English, because it’s become the lingua franca. ideally the language is almost mutually intelligible to English speakers from the beginning
very simple phonology similar to japanese
consistent phonetic writing system based on the latin alphabet
You get things like this in creolization of pidgins. The regularizing factor is children—the children of pidgin speakers create an easy to learn full language.
(I would posit that, depending on your goals, it may actually be better to “conlang from the inside”, i.e. generate new words and other language structures from within English or other major natural tongues.)
A 2019 study claims that spoken languages transmit info at a similar pace, i.e. that languages which apparently transmit more spoken info per syllable (Mandarin, German) are spoken more slowly than opposite languages like Japanese or Spanish. The gap between the fastest and lowest transmission (English and Thai) was 1.4x, though I don’t quite understand the way they measured information.
It’s possible that written communication doesn’t work that way, though. Languages with simpler and more rational orthographies like Spanish are easier to learn to read compared to English. So I think the most clear benefit would come from spelling reform and maybe the way numbers are spoken/written in words, rather than apparent efficiency improvements to spoken communication (e.g. you listed grammatical or vocabulary changes) that just lead people to talk more slowly because they’re bottlenecked by how fast they can produce or understand the information.
only kinda? toki pona is trying to have as small a fundamental vocab as possible and constructing all other concepts using those few words. whereas i am totally happy to import all of English as prerequisite knowledge
actually, there should probably be a possessive-only particle (equivalent to ’s), destroy the word “of” and replace with “among” or possessive, the word “that” should be restricted to only being used as a pronoun, the word “which” is split into separate words for determiner and relative pronoun (akin to qui vs laquelle) then adjectives should always be before the word in reverse order of binding tightness to the object, but no strict rules otherwise (“beautiful green vase” is correct but “green beautiful vase” is not; “green big vase” and “big green vase” are equally correct, unlike English, and the order conveys emphasis on which aspects are most fundamental to the object).
I love mandarin’s grammar. People get hung up on the characters but it’s really not so bad nowadays with computers. Feels like a language designed by actual people (mostly)
example sentence: ao yumanbien-su borun frii and dei-su havu ikual diguneti and ikual raït-su. dei-su bei endaow ed rizen and kanshense and dei-su shuud tuwardu wannanada brade-li akt.
(pronunciation: pretend it’s japanese, and when there are isolated consonants just truncate the sound without the vowel. -su is a plural suffix, compound nouns are agglomerated, all verbs are in infinitive form, bei is the particle 被, all subjects are repeated in subordinate clauses rather than implicit.
people love to hate on brutalism. my take is there something really aesthetic about it, but also that just because something is aesthetic doesn’t mean that it would be a good place to live or work every day. in fact, I’ve unfortunately found that the environments that I find aesthetically pleasing and the environments that make me happy to live/work in diverge quite substantially.
This is also why various artists don’t necessarily try to make Tolkien’s Orthanc, Barad-dûr, Angband, etc look ugly, but imposing and impressive in some way. Even H.R. Giger’s biomechanical landscapes could be described as aesthetic. Or the crooked architecture in The Cabinet of Dr. Caligari (1920). Architecture is art, and art doesn’t have to be beautiful or pleasant, just interesting. But presumably nobody would like to actually live in a Caligari-like environment. (Except perhaps people in the goth subculture?)
The best example, for my tastes at least, is window-to-building ratio. I like working and living in buildings with high ratio, e.g. glass skyscrapers. But they look bad from outside.
it’s kind of crazy that spaced repetition has completely revolutionized language learning and then not really changed the world in any other way at all. why are there no great scientists who are inhumanly good at remembering the corpus of their field through incremental reading? why are there no insanely good engineers who know every detail of their entire stack through spaced repetition?
oh yeah, that too. but my understanding is this is mostly a “makes you good at test taking” thing. does it make you exceptionally good at medical research or being a doctor?
H1: Anki Doctors misdiagnose more often, because they get stuck with thought patterns like “Ah, sniffly nose, sore throat, must be… Diseasitise”, instead of correctly thinking “Ah, sniffly nose, sore throat… might be Diseasitis, or the cold, or the flu, or...”
H2: Anki Doctors are worse because they over-fixate on interesting/novel diseases that are more salient in an Anki deck, instead of just putting some thought into what would be the most logical fit for the ailment. “Your big toe hurts? Must be super-cancer+”, instead of “Your big toe hurts… [thinks for a bit] did you kick it on something?”
H3: Anki Doctors are better, because they know a larger range of diseases, and can diagnose more broadly. While a normal doctor might exit med school, and start with a broad range of diagnoses they hand out, perhaps as they gain XP, their range shrinks, and they find their ‘go-to’ diagnoses. On the other hand, an Anki Doctor might exit med school, and keep their range broad, as long as they keep practicing their Anki deck.
a really nice thing with spaced repetition is that you haven’t replaced how your memory functions, just augmented it a bit. but it’s still fundamentally your same brain, one with the ability to ask questions like “am I zooming too much/not considering all options?” or “is what’s salient to me actually what’s happening?” doing spaced repetition doesn’t have any bearing here unless it gives you a false sense of confidence in what you know or doing it crowds out developing other skills.
I’d expect H3 to be true, ceteris paribus, but would also expect that having the ability to ask questions like the one above and just think better to be of greater importance, i.e. someone whose thinking habits are more developed and doesn’t use Anki would be a better researcher/doctor than an Anki savant whose thinking habits are underdeveloped.
For non-standard curricula, probably the overhead of making cards? I have memorized lots of details about AI stuff just from reading and thinking about it; I tried using Anki for this but found the overhead of creating and maintaining a card deck to not be worth it.
Imo, best way to make cards is to do it while you study.
Read textbook
Highlight things that are discrete packages of info
Once you’ve finished that section, go back and turn the highlights into Q:A pairs
Q: What is the capital of France?
A: Paris
OCR those written down Q:A’s and have Claude format them into proper Anki format, with a skill (Or, you could vibe code a program to do this. I was mostly adding cards before vibe coding was a thing)
Add the output .md file to anki
This was the best way I found to study head and neck anatomy. Worked especially well if I’d just come out of the cadaver lab, and was already in the right ‘vibe’
I did Anki for ~3 years, and have now—sadly—dropped it. I loved doing Anki, because it made me look/feel smart. But, I think I got more out of writing daily for a month, than doing Anki for a year.
Oh, no way. Hard disagree for me. But Anki is almost exactly the same thing every day, and I struggle to do tasks like that (Cleaning for example).
I found writing every day many times easier. Even though I hacked around with Anki to make it as addictive as I possibly could. It was always something I was sliding off. If I missed a few days, it’d take 30m-1h to catch up, and knowing that I had other work to do, that wasn’t so repetitive… it was hard to do Anki.
Also, I had cards that were like “Who is this k-pop person?” b/c I wanted fun cards in there to keep me motivated, and because I wanted to impress my gf by knowing k-pop people’s names. But that gave me an aversion to doing Anki in public—since people seeing me look at pictures of k-pop people on my phone felt embarrassing. But with writing, I was very willing to do it anywhere and everywhere.
Spaced repetition is good for training declarative (explicit) memory, but procedural (implicit) memory is the more important kind of memory—in science, engineering and life in general.
We can write software to train procedural memory. To learn to touch type for example, most people use an application for that purpose. But you cannot add cards to typing-training software that would make the software useful for learning something other than touch typing: you would need to write an entirely different application.
There isn’t software for training most of the procedural skills people try to acquire because of the sheer variety in the procedural skills people want to learn and because developing software is expensive and because procedural knowledge is acquired automatically in the process of doing things and because most people are intrinsically skilled at choosing things to do (e.g., things just beyond their current ability) that maximize their learning rate (without the aid of software).
Actually, let me qualify that last statement as applying only to the situation before about 2024: since then I wouldn’t claim it impossible for someone somewhere to have used AI to create software than can be used to train a large fraction of all of the skills people want to acquire.
so I’ve been using spaced repetition for ~1.5 y. now in my CS degree and life in general. it’s a great solution for lots of problems: no relearning what I understood in class two weeks ago; a cheap way to incrementally refine my understanding over time on topics I haven’t worked on directly recently (which is most of them); the rate of forgetting is slowed down, etc. but to use spaced repetition this way requires learning how to do so, it’s not like with language where you can download Anki and a premade deck and you’re good to go. there are also practical problems:
why are there no great scientists who are inhumanly good at remembering the corpus of their field through incremental reading?
my current prompt writing style is kind of similar to incremental reading, in the sense that I prioritize close cards written quickly over well-written Q/A cards. but even writing in this way is costly. for a 3h lecture I will write ~50-200 cards. even when writing as quickly as possible, almost automatically turning notes into cards, it takes ~30m. then tomorrow morning it takes ~30m to revise them + ~15m to revise the due cards. and that’s just one lecture. then there is the spread out cost of actually integrating that knowledge with other things I know, updating other understandings, etc. I don’t have any estimate for the time that takes up, but it’s definitely not free. so I doubt that it’s possible to actually remember the entirety of a field. there’s just too much information, and you need to chew it out to be able to make use of it.
why are there no insanely good engineers who know every detail of their entire stack through spaced repetition?
similar story to the one above. plus I find that little idiosyncratic details which don’t fit neatly into a wider picture are the most difficult to remember, thereby being the most time-consuming.
all that is to say that spaced repetition makes learning more efficient by solving some memory-related problems, but not so much more efficient that you can just download an entire field into your brain in a manner of months.
i’m going to rerun the neurips agi experiment this year. place your bets on what fraction of people at neurips this year know what the acronym AGI stands for!
where are all the people trying to understand how the world works? (in a broad sense that is useful for understanding the trajectory of the world: e.g things like why is society the way it is, why do people behave the way they do; why has technology developed the way it has, etc.; as opposed to zooming in and specializing, e.g fundamental physics research or biomed or whatever) there are a bunch of people like this in the rationalist sphere but i’m curious where all the non-rationalist-adjacent such people are. it seems many people in the broader world are either uncurious or mindkilled on such questions.
Seeing Like a State by James C Scott (I’ve read most of it, I liked it)
Bullshit Jobs, The Dawn of Everything, most books by David Graeber (I’ve read and liked long extracts of his work)
The End: Hitler’s Germany 1944–45 by Sir Ian Kershaw (I’ve read all of it and found it very valuable as a complete picture of a society melting down)
Open Letters by Vaclav Havel (I’ve read a lot of it, I like it a lot. He was the first president of Czechoslovakia and a famous communist dissident and his writing sketches out both what he finds soul-destroying about that system and what he thinks are the principles of good societies)
System Effects: Complexity in Political and Social Life by Robert Jervis (I’m reading this now, very good case studies about non-obvious phenomena in international relations)
Broken Code: Inside Facebook and the fight to expose its toxic secrets by Jeff Horwitz (Very good book about how social media platforms like Facebook shape and are shaped by modern civilisation, I read all of it)
All of these books to various degrees tackle the things you are describing from a holistic perspective. Hope this helps.
thanks for the recommendations, I’ll add these to my reading list!
I’m also curious if there is a location or social cluster or something where there are a lot of people who read stuff like this and talk about it productively and come up with new ideas. (again, other than the rationalist/ratadj/bay area community—I’m stipulating this because I think the ratsphere and bay area as a whole are a bubble with a lot of other correlated beliefs)
I suppose that sociologists, historians, philosophers, and (especially) futurologists do tackle the questions you describe, though maybe there is a sense in which they aren’t doing so in a zoomed-out enough way.
Vaclav Smil is great on this, I really liked his book Growth. He takes a very numerate but still very different view on history (e.g., ah, fitting a sigmoid to GDP numbers in the book).
(I don’t know if the book is good but my knee jerk reaction to fitting sigmoids to things is it’s a bit spooky—see https://arxiv.org/abs/2109.08065)
Fwiw, I read a number of Smil’s books, and it was my impression that he strongly expressed that same opinion about sigmoids, and the mentioned example might have been precisely an attempt to illustrate how you can show everything with fitting the right sigmoid. (But it’s been awhile since I read his books)
i want to host a conference which is kind of a cross between an unconference and a hackathon. the goal would be to create an environment where people can spontaneously do random side projects.
there is mostly just a big pleasant space where people can mix around and chat.
anyone can at any time decide to host a session on whatever they want. there’s a discord server where they can announce their session, and big whiteboards.
there is a dedicated work area where the goal is for roughly half of all people to be working at any one time. the rules of the work area are: no working on your normal job.
there is an army of ops and other logistics people available on call to solve any annoying logistical issues.
there is a big pile of random hardware around so you can make physical objects easily.
unlike a hackathon, there is no judging, no winners, no competition, no rigidly defined teams. you can float around and do whatever the fuck you want.
no conference app. if you want to do a 1:1 you have to walk up to the person and ask them to chat.
in the same way that Minecraft teaches you to exercise agency and Factorio teaches you to optimize, are there any games that teach you to stare into the abyss? the ideal game would (a) reward you on a tight feedback loop for constantly admitting that you were wrong, (b) give you the option to not admit that you were wrong but make that decision acutely hurt. pastcasting is good for (a) but not good for (b) because you are sort of forced to confront being wrong all the time, which maybe teaches you that it doesn’t feel as bad as you might expect, but it doesn’t teach you to intentionally seek out things that could prove you wrong; and you don’t really have time to develop an attachment to your wrong ideas. most normal games reward you for staring into the abyss very indirectly because being good at intentional practice makes you do better over the very long run, but you don’t get immediate feedback loops for it, and so it’s easy to just not realize you could be doing a lot better.
Chess. Mistakes in chess usually become noticeable quickly, in just a move or two, and you have no RNG or teammates to blame them on. But to get better you have to acknowledge your mistakes and avoid making the same mistakes again.
i think the problem is that the feedback loop is too long—if you notice a mistake, there is no obvious action, and no immediate feeling of having improved. what you really want is something where you can choose whether or not to notice that you are making a mistake, and choosing to notice gives you immediate positive reinforcement.
How about math olympiads? They do reward you for solving complex problems and require to admit that your first conjectures were hopelessly wrong (unless, of course, you happened to get them right. Alas, this might come with practice faster than the habit of staring into the abyss)
I mean, it only gets to the stage staring into the abyss when you spend 1h+ on one hypothesis and get nothing and are getting desperate and are attached to your idea for proof of A but realize it’s probably \neg A. Mostly how it works is you collect observations then form hypotheses a test a few of those, and mostly you quickly realize what works and what doesn’t. And if I’m stuck and keep doing one thing it’s because I had tried many times to invent something better but I couldn’t. It’s a really, really difficult thing to pull yourself out of this “mode collapse” where you’re banging your head against the wall where there’s clearly a wall, but it’s a different skill from seeing the abyss because 1) it’s easy to notice your approach is lacking something but 2) “not making the mistake anymore” is not blocked by psychology but by g factor or something.
For the thing you’re interested in, how important is the “game” part? (Minecraft and Factorio are both particularly excellent games with rich depth, in a way that pastcasting is not particularly)
The hardest part of “stare into the abyss” is that it’s often about stuff that you’ve wrapped your identity around in a psychologically loadbearing where. When I hear “the Minecraft of staring into the abyss”, I’m imagining something that gets you invested in an overall direction in a complex world, that is the wrong direction, and then have the opportunity to change course on your goal.
I think my Planmaking & “Baba is You” exercise is at least related. (In this variant, your instruction is to form a complete plan for solving a Baba is You level on your first try. This gives people a lot of opportunity to get invested in a set of assumptions and keep building on them. People are usually quite overconfident in a way that felt a lot more “gut punchy” than other calibration training)
by game i mean it in a very very loose sense. video games, board games, card games, sports games, strange workshop activities, etc all count.
for the identity load bearing ness, it seems possible you could create it on a short time horizon. for example, even just arguing about something for 10 minutes can make me feel somewhat invested in my position. having teams in general can create some level of this. i feel like if you stacked a bunch of different psychological tricks you could kind of approximate it. even just getting used to the meta has this—i often find that i stagnate in a game because i learned some suboptimal meta, but i feel some emotional avoidance towards learning better meta because the displeasure of losing is less than the displeasure of learning the new meta; and the feedback loop of winning slightly more often from better meta is not very easily felt.
possibly you can design a game where you constantly have to accept better meta to even progress at all through the game. similar to how it is almost impossible to play Factorio without automation even though it’s technically possible.
i think it’s undesirable to have a game with one big twist that you build up to. for feedback loop reasons you want to have to do it over and over again and consistently get reward when you gaze into the abyss and not get reward when you don’t.
Calibration games such as https://www.quantifiedintuitions.org/? (a) You can choose to be wrong/overconfident, or you can acknowledge you don’t know when you don’t know. Acknowledging is rewarded. (b) The game pushes you to try to be overconfident by making you want to be top 1 (beat other teams). And it hurts to see you ranking if you are failing.
Not sure I buy the premise that (a) is needed or even good? I mean, part of abysses is that they don’t offer immediate feedback. What about a video game where everything is basically one-shot? You can spend as long as you want preparing, including gathering resources and doing science to the environment, and then you get one big shot; if it goes well you win, if not you lose and lose all your progress.
maybe if you were trying to make a game to teach the feeling of having one try to solve alignment, sure. but that’s not the game i want here.
if you want to get better at anything, including gazing into the abyss, then you want to get as many quality reps as possible in a fixed amount of time. a rep is higher quality if the feedback loop is tighter, and if the abyss is more painful to gaze into. if we had mind reading tech what you’d want is prompt the user to reflect on things that are emotionally painful, to detect the moment they push past the resistance to confront the emotion, and dump nicotine into their bloodstream 3 milliseconds later. unfortunately, we don’t have this technology, so we need some other way to do this
I’m saying that the bottleneck isn’t getting the feedback really fast, it’s having abysses to stare into at all. So my proposal is aimed at generating lots of abysses at all.
an idea: a game where there are several distinct but mutually exclusive strategies (eg a shooter where you can be a sniper, or a bullet sprayer, or a tank, etc), where you have to invest a bunch of time into specializing, and then you feel sunk cost about switching to a different strategy; but make the environmental conditions constantly change (in subtle or hard to reason ways so you have to spend a bunch of effort to notice things changing / there is plausible deniability as to whether things changed or whether you were always suboptimal), so that the optimal strategy changes frequently; and make there be strong diminishing returns to further investment in a strategy, which simultaneously makes the sunk costs feel bigger, and makes the initial gains from switching strategies feel very large so when you switch strategies you very quickly start winning.
I wonder if there’s a question-asking game, preferably one-on-one that would encourage this? Something akin to NYT’s 44 questions to make anyone fall in love, but instead 44 questions to stare into the abyss. Getting the right interlocutor and the right questions would be hard to do though.
It’s not a game, but it is a structured activity.
I’m skeptical that you can really get the abyss in small doses. Maybe there’s also a progressive activity where the first exercises are small things to admit about oneself, before progressing to more and more difficult questions.
If you’ve ever had a long match of Go where you are losing from midgame onwards, you will feel quite a lot of these emotions. Go games can last for quite some time, and the fractal nature of your mistakes can be realised to a fairly high resolution. Especially if your opponent is higher rank than you so you are playing with a handicap (“but I had so much ground at the start?!! How did it go so wrong?!?!?!”)
Rain World is survival-platformer whose protagonist is a nimble omnivore tool-user (similar niche to an ancestral human’s). The prospect if exploration is enticing, but you are in the middle of the food chain, and so must balance the need to survive with the your own drive to explore. Your creature must:
evade predators,
find and sufficient food/prey to hibernate,
Take shelter before a lethal rainstorm arrives.
Exploring means doing the above in less time. Regions are gated based on minimum survival streak so each sortie is like a bet on your ability. There are carnivorous plants. It is difficult and stressful. I highly highly recommend it.
my life would often be better if I exercised more agency. why don’t I do so more often? here is a taxonomy of reasons I’ve noticed:
energy: I’m often very fatigued, which makes it much harder for me to do anything, which includes anything new.
decision fatigue: a related thing is even for a given amount of energy, I have a limited number of decisions I can make, and a limited number of things I can focus on and think carefully about.
emotional avoidance: sometimes, exercising agency requires admitting that I’ve been doing things wrong all this time, or that part of my identity is not what I want it to be, or confronting some past trauma. sometimes I identify as being bad at X, which make it hard to improve at X.
conformism: there’s a critic inside my head that yells at me when I do or even consider things that could be considered “crazy” by others. I ignore it more than most people, but it still has nonzero say.
uncreativity: in certain domains I’ve spent so much time thinking about a specific kind of idea that it becomes genuinely hard for me to even imagine other ideas.
cowardice: sometimes the ideas are obvious but they require large irreversible actions, and/or are likely to have unpredictable consequences.
I have an instrument that helps me mitigate decision fatigue, energy lack and conformism. I created an updatable random generator with a weighted list of all the ideas and activities that cross my mind. I exercise agency at the state of “designing” the free time passage, setting probability weights and side goals. Then I can circumvent fatigue of deciding what to do next, because i can click generate and it see the option. And since clicking generator button is a short action, a habit to actually go and execute the option can be formed.
Minmaxing trap is not happening. I am only allowed to do one edit per finished session, and that edit can be just an increment by 1 or a decrement by 1 of some parameter in the generator, which takes ~15 seconds. If my priorities change, the generator will eventually converge (easy-in over couple days) through the increments to the new state. That prevents “being hyped” and placing “all in” into some new exciting project. The new project will gain weight only if it keeps looking worthy.
I may adjust at the end of a session if i feel that something should happen sooner/more often, or something was promted too often over the past week, or i felt that the session time length was inappropriate to make an unregressable progress, etc.
I put anything I want to do eventually. That includes “work on the publication”, “work on fanfic”, “make the geolocation script”, “update my transformer”, “play a match of king of the hill chess”, “calisthenics”, “solve project euler problem”, “go to the cinema and watch avatar 3″. Both fun and serious stuff.
I am only generating the activity when there is a moment no scheduled/obligation activities, this way it never interferes with life even if fun activities start randomly appearing more often.
The generator is implemented in Google Sheets, using its in-cell functions, thus accessable on both my pc and phone. At some point I have added a column of calculated expectations “how many % of time is expected to be spent on the activity if the generator did not change parameters over a long run”, but it was distracting and not exactly meaningful, since i change weights every day to reflect energy/mood/inspirations.
Do you think you could help me with something, how did you identify specific events or choices, in your recent past, where more agency was required?
I’m very confident that I also would benefit in life from my exercising more agency. However it’s a vague confidence, I can’t offer any evidence or specific examples. When I try to list situations where I could have acted with more agency, there’s not any example where it’s clear that agency would have probably made things better, or at least had a better chance of a better outcome than making things worse. I can’t think of any recent retrospective examples where the difference between what I got, and what seems like a potentially better outcome, was simply not using enough agency. (As opposed to “oh, I wish I knew that earlier”—I think there’s an important distinction to be made between hindsight that comes form knowing what you couldn’t have possibly known at the time, and not acting successfully on the information you had at the time).
One example off the top of my head—I need to buy a book for study. Correct me if I’m wrong, but acting with more agency would have been buying it straight away. But I’ve procrastinated so far, and would my life be materially any different yet? No. Not at all. If I continue to procrastinate past a certain point, it will have negative consequences. But so far, none. Next week? Eh, even then it may be fine.
That’s the first, most immediate, pressing example that comes to mind.
idk exactly how, they just pop up to my mind easily. maybe because i am very aware of the things i’m disappointed about not having done. also, i can consult my todo list, which is effectively a list of things i will never do because i don’t have enough agency. like i’m going to set a timer for 10 minutes and write as many things as i can think of:
i’ve known that exercise is really important since forever but never really getting around to doing it
i keep procrastinating scheduling meetings with people i should meet with
i often have entire months of very low productivity where it’s a slog to do anything and in the past it never occurred to me to take any of that time off even though i have way more PTO than i need
there are many instances where i finished doing a thing and then realized i should have started doing some other thing in parallel instead of waiting on it.
there will be multiple times a month where i will realize my life would be easier if i had planned ahead even a little bit. for example i will forget to book a flight until the last minute, at which point it’s harder and more expensive to get good seats, even though i knew my travel plans much earlier.
i will often do things the hard way for way too long, even after i know there could be a better way, because i’m too used to the old process and don’t want to expend the trivial amount of effort needed to check if there’s a better way. for example it took me an embarassingly long time before i learned to use a debugger, because i was used to print-statement debugging
i often procrastinate things because there is a trivial amount of inconvenience required to do it. like it took me years to get around to various medical interventions which really only required me to do some googling and then talk to a doctor twice and whatever.
hopefully these examples help you think of similar things in your life
there seem to be three different possible levels of manager involvement in individual researchers:
type 1 (the grantmaker): the manager spends a day reading the grant application of the researcher and decided whether to fund him for the next 6 months based on his track record, whether the research idea makes sense, etc. day to day, the grantmaker is completely uninvolved in the actual research. on fact, the grantmaker might not even be making decisions about individual researchers, but about entire organizations.
type 2 (the research manager): the manager spends a few hours every month getting complete context on what each of their reports is working on. each report has a different project that the manager thinks makes sense. the manager provides a light touch to guide the project according to their research taste, and vetos any terrible ideas, but most of the time the report makes most of the day to day decisions about what to do.
type 3 (the tech lead): the manager sets research taste for their team and basically orders around their minions. the manager provides almost all of the research taste, and the reports are either just doing engineering work, or doing very tightly scoped research on things the manager has cleared.
what I’ve seen at openai is a mix of all three, but I’ve found the second one to be the most predominant (my relationship with my managers has been like type 1.75 on average). high level research leaders are generally the first type; they allocate resources to parts of the org they believe to be the most important for the company as a whole. the second type is most people; most researchers have some random direction they believe in, and they are part of a team doing something similar, and their manager has substantial trust in their research taste. even for junior researchers, the focus is on having them develop their own taste eventually, rather than just executing on things for their manager. the third type is most common for large projects, where there is simply a lot of work to be done for the thing to work, but it’s pretty rare to have individual very good researchers amplified by an army of engineers doing their bidding or something like that.
I don’t know if this is an optimal way to set things up. and ML is a weird field among research disciplines. but it seems to work at least somewhat well, and may be of interest to attempts to understand how to do science well.
Type 2.25 is probably most common in the low-mid levels of academia. I see my supervisors officially around 2/month, but they’re around for me to talk to a few times a week.
Type 1 is common in higher levels, type 3 is common for lower levels. This overall seems pretty natural and I’d expect it’s a common feature of many orgs.
almost every single major ideology has some strawman that the general population commonly imagines when they think of the ideology. a major source of cohesion within the ideology comes from a shared feeling of injustice from being misunderstood.
There are some people that I’ve found to be very consistently thoughtful—when we disagree, the crux is often something interesting and often causes me to realize that I overlooked an important consideration. I respect people like this a lot, even if we disagree a lot. I think talking to people like this is a good antidote to digging yourself into a position.
On the other hand, there are some people I’ve talked to where I feel like the conversation always runs in circles so it’s impossible to pin down a crux, or they always retreat to increasingly deranged positions to avoid admitting being wrong, or they seem to constantly pattern match my argument to something vaguely similar instead of understanding my argument. I think arguing against people like this too much is actively harmful for your epistemics, because you’ll start digging yourself into your positions, and you’ll get used to thinking that everyone who disagrees with you is wrong. There are a bunch of people (most notably Eliezer) who seem to me to have gone too far down this path.
On the other side of the aisle, I don’t know exactly how to consistently become more thoughtful, but I think one good starting point is getting good at deeply understanding people’s viewpoints.
The people who understand the proper interpretation of the ideology can feel intellectually superior to those who don’t. Also, people who misunderstand something are by definition wrong… and therefore the people who understand the ideology correctly must—quite logically—be right!
(An equivocation between “be right about what is the correct interpretation of the ideology” and “be right about whether the ideology correctly describes the reality”.)
irritating people when discussing the topic is a great way to get someone to dig into a position really hard (whether or not that position is correct).
That irritation can be performed any way you like. The most common is insinuating that they’re stupid, but making invalid meme arguments and otherwise misunderstanding the position or arguments for the position will serve quite well, too.
I think this follows from the strength and insidious nature of motivated reasoning. It’s often mistaken for confirmation bias, but it’s actually a much more important effect because it drives polarization in public discussion.
I’ve been meaning to write a post about this, but doing it justice would take too much time. I think I need to just write a brief incomplete one.
I don’t think being irritating in general is enough. I think it’s specifically the feeling that everyone who has disagreed with you has been wrong about their disagreement that creates a very powerful sense of feeling like you must be onto something.
Really!? Okay, I’ll have to really present the argument when I write that post.
I do agree with your logic for why opponents misunderstanding the argument would make people sure they’re right, by general association. It’s a separate factor from the irritation, so I think I mis-statedit as a subset (although part of it seems to be; it’s irritating to have people repeatedly mis-characterize your position).
It seems pretty apparent to me when I watch people have discussions/arguments that their irritation/anger makes them dig in on their position. It seems to follow from evolutionary psychology: if you make me angry, my brain reacts like we’re in a fight. I now want to win that fight, so I need to prove you wrong. Believing any of your arguments or understating mine would lead to losing the fight I feel I’m in.
This isn’t usually how motivated reasoning is discussed, so I guess it does really take some careful explanation. It seems intuitive and obvious to me after holding this theory for years, but that could be my own motivated reasoning...
Unfortunately, I think the average person doesn’t understand misunderstanding. I think it can be taken as…
You’re too dumb to understand
You’re being purposely ignorant
You’re making fun of them
I’ll give an example:
—
I was recently in a conversation with a non-rationalist. I organised it because I wanted to talk about negative opinions they had on me. We talked a lot about certain scenarios.
In one scenario, I had seemingly rejected the suggestion that we all go bowling. I had said out loud “I hate bowling”. When what I meant was “I hate bowling, but I’m still very happy to sit in a bowling alley and watch my friends play.”
I think I did a bad job communicating there. It made my friends very angry (extra details about situation left out).
During our conversation, I asked for levels of anger or annoyance at me before and after I had explained what I had meant to say.
I was surprised when one friend didn’t adjust their anger levels at all. I thought I must have done another bad job at explaining.
“So, you started at 80% angry at me. And now that I’ve told you my perspective, you’re still 80% angry?” This surprised me. I would adjust my levels down if someone explained that to me.
I went back and forth trying to get to the bottom of this for ~half an hour. After which I came to realise we were just wired very different. To do this I used your suggested technique.
In the time it took for me to understand this one point, I had deeply annoyed my friend. They were under the impression that I was misunderstanding them on purpose somehow.
I think I would have been less comfortable or fulfilled, but better off. If I had just accepted that they were still very angry. And had moved on.
Instead, being confused and asking questions made my situation worse.
To be clear though. I did get to the truth with this technique. But sometimes winning can’t be about knowing the truth. Which is sad. I don’t like that. But I think it is true.
It is unfortunately impossible for me to know exactly what happened during this interaction. I will say that the specific tone you use matters a huge amount—for example, if you ask to understand why someone is upset about your actions, the exact same words will be much better received if you do it in a tone of contrition and wanting to improve, and it will be received very poorly if you do it in a tone that implies the other person is being unreasonable in being upset. From the very limited information I have, my guess is you probably often say things in a tone that’s not interpreted the way you intended.
I’d say that’s a good guess given the information I provided.
I think I did a good job in this particular circumstance as coming off as confused or curious. That was my aim, and I placed a lot of focus there.
However, I haven’t listened back to the audio recordings of the conversation. It’s likely my previous comment is heavily bias.
hypothesis: intellectual progress mostly happens when bubbles of non tribalism can exist. this is hard to safeguard because tribalism is a powerful strategy, and therefore insulating these bubbles is hard. perhaps it is possible for there to exist a monopoly on tribalism to make non tribal intellectual progress happen, in the same way a monopoly on violence makes it possible to make economically valuable trade without fear of violence
You’d want there to be a Tribe, or perhaps two or more Tribes, that aggressively detect and smack down any tribalism that isn’t their own. It needs to be the case that e.g. when some academic field starts splintering into groups that stereotype and despise each other, or when people involved in the decision whether to X stop changing their minds frequently and start forming relatively static ‘camps,’ the main Tribe(s) notice this and squash it somehow.
And/or maybe arrange things so it never happens in the first place.
I wonder if this sorta happens sometimes when there is an Official Religion?
another way to lean really hard into the analogy: you could have a Tribe which has a constitution/laws that dictate what kinds of argument are ok and which aren’t, has a legislative branch that constantly thinks about what kinds of arguments are non truthseeking and should be prohibited, a judicial branch that adjudicates whether particular arguments were truthseeking by the law, and has the monopoly on tribalism in that it is the only entity that can legitimately silence people’s arguments or (akin to exile) demand that someone be ostracized. there would also be foreign relations/military (defending the continued existence of the Tribe against all the other tribes out there, many of which will attempt to destroy the Tribe via very nontruthseeking means)
unfortunately this is pretty hard to implement. free speech/democracy is a very strong baseline but still insufficient. the key property we want is a system where true things systematically win over false things (even when the false things appeal to people’s biases), and it is sufficiently reliable at doing so and therefore intellectually legitimate that participants are willing to accept the outcome of the process even when it disagrees with what they started with. perhaps there is some kind of debate protocol that would make this feasible?
prediction markets have two major issues for this use case. one is that prediction markets can only tell you whether people have been calibrated in the past, which is useful signal and filters out pundits but isn’t very highly reliable for out of distribution questions (for example, ai x-risk). the other is that they don’t really help much with the case where all the necessary information is already available but it is unclear what conclusion to draw from the evidence (and where having the right deliberative process to make sure the truth comes out at the end is the cat-belling problem). prediction markets can only “pull information from the future” so to speak.
BTW, I like the “monopoly on violence” analogy. We can extend it to include verbal violence—you can have an environment where it is okay to yell at people for being idiots, or you can have an environment where it is okay to yell at people for being politically incorrect. Both will shape the intellectual development in certain directions.
Conflicts arise is when you don’t have a monopoly, so sometimes people get yelled at for being idiots, other times for being politically incorrect, and then you have endless “wars” about whether we should or shouldn’t study a politically sensitive topic X with an open mind, both sides complaining about lack of progress (from their perspective).
The more mutually contradictory constraints you have, the more people will choose the strategy “let’s not do anything unusual”, because it is too likely to screw up according to some of the metrics and get yelled at.
hypothesis: there will be a window of time after the point of superhuman AI persuasion/charisma, during which human trust relationships will become extremely important. even once almost all human skills are obsolete, the AIs may have less social trust capital than humans. ofc, eventually, the persuasion will be so superhuman that it can cut through minds like butter, but that could take many years.
once AI superpersuasion is possible, there will be a strong incentive to use it to shape opinions. therefore, there will also be a strong incentive for important decision makers to develop strategies for making sure they are not being bamboozled.
thankfully, there is one way to not get bamboozled by a superpersuader—to not listen to it in the first place. this is an age old idea. many social strategies have evolved to help people avoid talking to other superpersuading people. for example:
one main limiter of the growth of cults is people get used to noticing the signs and avoid people who seem culty, get suspicious of any rapidly growing movement, etc
people avoid preachers, political activists, etc
people have learned that AI chatbots are sycophantic and discount their praise
people mostly know that twitter and reddit are full of astroturfed bots
it seems likely that society would develop mechanisms in response to superpersuasion.
first, conversations with people you’ve actually met irl or can indirectly trust to be real people would be much more trustworthy than conversations with random internet people, who might secretly be AIs.
people you’ve known for a long time are more trustworthy than people you’ve just met irl. you’re better able to judge whether a person you know irl has suddenly changed their views than someone you just met, or to know them to be the kind of person who honestly reports how much they talk to the AIs.
meetings with people irl are more trustworthy than online interactions, because someone could always give the AI a deepfake of themself.
even if you can be sure you’re talking to a real human, there is still the possibility that some people you know well will choose to become appendages of the AI, subtly taking orders from the AI and concealing this fact. there is a spectrum of possibilities here.
one extreme is like Scott Alexander’s fictional whispering earring—an AI that puppets every move of yours, using you as a meat puppet. however, we can defeat this quite easily—you can take meetings inside a faraday cage, or force people to not have any headphones and to leave their phone outside the room, or to pat them down for listening devices. this may sound extreme, but this is actually not that intense all things considered for the actually important decision making.
the other extreme is an AI that only gives you brief guidance on what to do every few months and you’re on your own in between. the key thing is, the vast majority of humans will not be as charismatic as the AI. so, if the AI is not puppeting you, it forfeits a lot of the charisma benefit; even if it converts you, you might suck at convincing other people, and they might notice the fact that an AI is trying to steer you or coach you on how to be more charismatic—again, much easier for others to notice if you have a lot of connections who know you well.
our current political system is already kind of like this, because politicians are constantly exposed to an infinite array of people trying to change their minds on things, or get the ear of their trusted advisors, or their advisor’s advisors, etc. AI makes things a lot more intense, but also it introduces a natural pinch point to remove bad influence.
I’m not sure I understand your model of individual autonomy and variance in susceptibility to different kinds of persuasion. Listening to the wrong humans is at least as bad as listening to an AI. And listening to nobody isn’t really an option in modern society.
as an example of my confusion/disagreement:
people you’ve known for a long time are more trustworthy than people you’ve just met irl.
I mean, this depends entirely on which people you’ve known, and who you’re meeting now. Someone from a very dysfunctional childhood meeting a social worker is in the opposite situation.
computer-domination could mean control of everything that online people see, including control and impersonation of other users (so even if you know someone irl, if they’re not physically near a powerful AI can act like them)
of course, many people willingly like and enjoy talking to AI, some even see past the annoying “AI” style quirks while liking it
already living in an atomized, community-starved culture, these people are also more likely to be ‘paid from AI’ gladly
the idea that computer-domination wouldn’t immediately cause humanoid-ish or physical-world manipulation, it’s hard to imagine a superhuman AI not ruthlessly testing physical-world manipulation as the next place to master (and then it seems this scenario would likely become not relevant)
i feel like the fundamental mistake the project of rationality made was that “cognitive biases” is not in practice the right way to think about the way humans are irrational if your goal is to be very instrumentally rational. one hypothesis is the correct frame is to first deeply understand how the emotional system works, and then to think about ways to master that system to achieve rationality.
(yes, i know that buried somewhere in the sequences it says something like “humans aren’t ideal intelligences with cognitive biases bolted on. we are the cognitive biases, they are just trying to approximate rationality”.)
By the time I went to CFAR in 2019 this felt like it had already become the dominant flavor of inner-circle rationalist thinking, but then that inner circle kind of petered out in influence. The person I see carrying that torch most loudly in my current social atmosphere is Chris Lakin.
But overall rationality has been kind of quiescent imo! Ray posts good stuff, Duncan has his own thing, but it feels like we went from mid-2010s “rationalists talk a big game but don’t get anything done” to the mid-2020s most influential rationalists being too object-level busy to blog much about this metacognitive stuff.
If we understand “irrational” to mean something like “underperforming relative to what should be feasible”, then I think one significant piece of the puzzle is the regime of “very impoverished hypothesis spaces”. In this regime, Bayes (and deviations from it) is more of a peripheral conceptual frame (allowing you to understand some edge cases and some basics and some constraints) than a central guide or even that much of a useful tool. A much more important question is about hypothesis generation, aka abduction or “unupdating”, i.e. expanding your hypothesis space. Another piece of the puzzle is that hypotheses are nothing like full possible worlds (as in many elementary models of epistemology), but rather are very-partial-possible-world-parts. Yet another piece of the puzzle is that concepts are very much not only or even mainly about prediction, propositions, and explanation (narrowly construed), but rather mainly about manipulation (including mental manipulation, e.g. “how could I have thought that faster or more efficiently”). Understanding how to think well in this regime, specifically in cases where you don’t already have all / almost all of the understanding you need to win in some situation (which cases are of outsized important in the grand scheme of things), is heavily weighted on “how do I think new things”, which is of a different flavor from “how do I think the probabilistically-right things”. (Which isn’t to say that reality-testing isn’t crucial—it is—but it has more of a flavor of searching for high confidence data and beliefs, rather than trying to update efficiently. It’s more important to build a camera / telescope than to practice your night vision. For Wisdom, it’s much more important that you are able to eventually update on all dimensions, rather than that you do update quickly on some dimensions.)
I think a big part of rationality is having all the algorithms in your brain “point in the same direction”. E.g. akrasia is a special case of this, where short-term “habits” and consequentialist reasoning are in conflict.
an idea that i associate with bay area strains of buddhism is something like “life is just a series of distractions, you are distracted from distraction only by even more distractions, and it’s all because you are experiencing suffering that is too uncomfortable to focus on, so if you somehow dispel all of it you are confronted with the existential dread of the impermanence of life, and dispelling that is the final boss.” i might be completely misunderstanding something, I’m not a Buddhism expert by any means, please correct me if I’m wrong.
I’ve updated a lot towards something like this being at least kind of true. it seems like at least for a certain neurotype of person, much of one’s behavior (ambition, addiction, hedonism, status, socialization, etc) serves as a way to distract from some kind of emotional pain. it looks different in different people; sometimes it looks like working a zillion hours so you have no time to reflect; or making status/money go up for its own sake; or drinking heavily; or playing video games; or spending lots of time at social events. the commonality is escaping the experience of the present in some way. in much the same way that you will flinch when on the threshold of a cold shower, because the anticipation of the discomfort tries to push you away from it, you flinch away from emotional discomfort. generally, confronting the pain is better than letting it fester.
of course, one very big source of emotional pain for many people is the impermanence of life and the associated existential dread. the idea of the inevitability of death is very spooky. contrary to the standard story though, i’d guess there are also many other sources of emotional pain, which vary by person. maybe if i introspect deeper i will discover that everything is secretly about impermanence deep down, but idk.
it also seems clear that some people are a lot more avoidant/distraction-prone than other people. at the most extreme is people with the ADHD neurotype. in some way, we can think of ADHD as emotional dysregulation that causes constant pain, which can be escaped primarily by distraction through entering flow state (which allows you to stop experiencing the self) or constantly fixating on different things to stay busy and avoid having to confront the emotional pain. this also explains stimming (self-soothing), procrastination (some takes have negative emotions associated, so they are avoided). I’m not sure if forgetfulness can be explained by this theory, though. on the other end, some (most?) people are naturally very good at living in the moment, are emotionally in tune, and generally do not exhibit any of the ADHD traits. i have no idea if any of this applies to normal people. my guess is it’s still applicable overall but less so, but also that there are a small number of exceptions, forms of emotional pain that normal people are actually more susceptible to than ADHD/autistic people. as weak evidence, even non Buddhist literature sometimes likes to make fun of normal characters for things like this.
also, i still think some parts of this are utterly crazy. the part i most disagree with is the part where you’re supposed to dissolve the pain of impermanence and fear of death. maybe many other fears should be dissolved, but it is good that impermanence is uncomfortable! you should be afraid of death and fill the void of existential dread with an ambition to end death forever.
the part where there is no such thing as pleasure, just distractions and the absence of suffering, also seems kind of crazy to me.
my overall stance on how to relate to things like Buddhism epistemically is I’d guess they contain many nuggets of wisdom but also i should remain deeply skeptical of the specific ways they’ve put the ideas together
the part i most disagree with is the part where you’re supposed to dissolve the pain of impermanence and fear of death. maybe many other fears should be dissolved, but it is good that impermanence is uncomfortable! you should be afraid of death and fill the void of existential dread with an ambition to end death forever.
You can work to end something without being afraid of it or finding it uncomfortable. A programmer looking for the cause of a bug in software is usually not afraid of the bug. If they did keep flinching away from the thought of the bug and finding the whole debugging thing uncomfortable, they’d do a worse rather than a better job at debugging.
If you want to end death, you’ll do a better job at it if you can think about it clearly and not flinch away from considering things suggesting you personally might not make it. This requires not being afraid of it.
“but it is good that impermanence is uncomfortable! you should be afraid of death and fill the void of existential dread with an ambition to end death forever.”
Are pain and fear the reasons you want an end to death? If not, their absence shouldn’t hinder you?
the part where there is no such thing as pleasure, just distractions and the absence of suffering, also seems kind of crazy to me.
This is an incorrect description of the Buddhist position. Pleasure traditionally plays a really important role in Buddhist worldviews and practices! The first three jhanas have pleasure (sukkha) as one of their defining factors, and it’s also part of the definition of one of the brahmaviharas, mudita (sympathetic joy).
I thought the idea was that pleasure is not actually the opposite of suffering, in fact it’s only a distraction from suffering, and the only true solution to suffering is to stop desire?
One claim is that tanha (“desire” is a bad translation) is the thing that creates suffering, and that tanha comes up as resistance to pain and clinging to pleasure.
It seems reasonable to me to claim that there at least sometimes is clinging to pleasure. If I’m reading a really good book I might not want it to end, and if I’m sleepy in bed, I might want to enjoy the sleepiness for just a few more minutes after getting up.
There’s also another claim that I don’t fully understand and I’m not sure if I know how to describe correctly, but it’s something like… “At a low level of implementation if you zoom in closely enough, the relationship of pain and pleasure isn’t that of opposites in the way you’d expect if you only looked at a coarser level.”
As I said, I don’t totally get that one, but I don’t think I need to. It sounds like a technical kind of observation that will make more sense to me if I ever get that far in meditation, and until then I can just ignore it.
That said, I think it can be true that pain and pleasure are not opposites on a low level, while still feeling like opposites when experienced on a higher level. It’d be like saying that on a file system level, both a text file and a video are just collections of bits, while still behaving very differently if you try to open them on the application level.
With regard to “the only true solution to suffering is to stop tanha”, one thing that might be meant by it is “even when you are experiencing pleasure, there’s still some degree of suffering present”.
At least personally, I do often experience at least some degree of clinging to pleasure, so I think there’s some truth to this. Though of course, it doesn’t mean that there would be an equal amount of suffering present when I’m in pain vs. when I’m experiencing pleasure. The degree of subtle clinging that’s present with pleasure is much less than the suffering associated with resisting pain.
Though on occasion I’ve lucked into states where there’s much less clinging than usual to the pleasure, and it has felt pretty awesome! Can recommend. The lack of clinging lets you enjoy the pleasure much more fully.
But let’s be skeptical of this and assume that pleasure isn’t associated with any degree of suffering. Another thing that one might mean by “the only true solution to suffering is to stop tanha” is “you can’t experience only pleasure all the time, you’re going to have all kinds of periods in life when you feel terrible or at least uncomfortable, so the only way to stop suffering entirely is to stop feeling averse to pain and discomfort”. That seems straightforwardly true to me.
is clinging to pleasure isomorphic to emotional avoidance? the “resistance to pain” framing is more relatable than clinging to pleasure.
i think emotional avoidance is obviously bad. but it’s unclear that clinging to good experiences is bad? like you can obviously overdo it, but it seems much more of a problem if you’re clinging to something you’re no longer enjoying because letting go would mean having to confront unpleasant realities, vs clinging to something you’re enjoying a lot. if I’m enjoying a cookie, it’s normal and correct for me to resist if someone is trying to tear it out of my hands.
It’s often not the worst thing in the world, but the clinging introduces a low-level resistance to the pleasure that actually makes the pleasure less pleasant.
It also brings in various kinds of discomfort—e.g. someone who sees a tasty food they can’t eat, or an attractive person who isn’t into them, may experience a flash of pleasure from the sight of those and then make themselves uncomfortable by clinging to that experience and repeatedly thinking about what they can’t get.
If you are, say, at work and clinging to the thought of how nice it would feel to go home, your focus being on that future pleasure may prevent you from noticing things that you could enjoy in the current moment (it actively blocks them, as the clinging crowds out the pleasant sensory moments out of your mind). Someone may find themselves repeatedly waiting for the end of the day when at work, repeatedly waiting for the weekend when the day ends, and repeatedly waiting for the next vacation during weekends.
Often, in my experience, clinging seems to hijack attention and agency. It makes it harder to think, weigh considerations, and respond. You are more likely to flail, or stumble around, or to “find yourself” doing something rather than choosing to do it. And you’re more likely, as well, to become pre-occupied by certain decisions — especially if both options involve things you’re clinging in relation to — or events. Indeed, clinging sometimes seems like it treats certain outcomes as “infinitely bad,” or at least bad enough that avoiding them is something like a hard constraint. This can cause consequent problems with reasoning about what costs to pay to avoid what risks.
Clinging is also, centrally, unpleasant. But it’s a particular type of unpleasant, which feels more like it grabs and restricts and distorts who you are than e.g. a headache.
Conversely, a shift from clinging to non-clinging involves a feeling of relief. Something unhooks, releases, expands. Your mind feels bigger, more open, more poised, more responsive. The situation you’re in may not have changed; but you are able to orient towards it with more agency, and to consider and accept your options, however imperfect, without flinching or closing down. It is, I think, a type of freedom.
I feel the “seems to hijack attention and agency” thing is particularly prominent when in the company of people I’m attracted to or whose favor I might otherwise want to get. It’s hard to think clearly when the mind keeps clinging to “what would get them to like me”.
Addictive behaviors also have clinging to pleasure as a major component—you think of how it would feel if you got to gamble/drink/whatever again, and then you can’t get the thought of that out of your head.
hmm, it’s not a priori obvious to me that these are all the same thing. the following feel distinct:
emotional avoidance. i know i should think about something, but the emotional pain of thinking about it pushes me away, and i find other things to distract myself from having to feel it, or procrastinate it, or make myself too busy to feel it, etc. i relate to the thing where indecision can arise from feeling emotionally avoidant of two different things i need to trade off between. addiction is often a way to distract myself from an unpleasant emotion.
longing. i want something i can’t have. i find myself compulsively thinking really hard about ways i can get it (often not super productively, my thinking is often very distorted in this mode). i don’t feel particularly avoidant of the feeling; if anything, it draws me in and distracts me from everything else. but it also feels inaccurate to say that everything else is causing me pain that i am trying to distract from (like with addiction), if anything, the longing is more painful, but i can’t look away.
other emotions mentioned feel like they’re maybe doing something else but it’s unclear. jealousy feels like longing plus some kind of tribal status thingy. anxiety is maybe longing for a kind of reassurance. the feeling that makes positive experiences worse because you’re scared that they will end is emotional avoidance.
i wonder if it even matters whether these are the same thing or different things. is there any difference in prediction if they are the same vs different things.
I think the underlying mechanisms for several of these is distinct, but there’s a neural mechanism that acts as a kind of shared bottleneck. Analogy: a word processor and a video player will get started up for different reasons, but they both make OS calls to read files from disk, and disrupting the file read operation will disrupt both.
In the case of clinging, I think the shared bottleneck is largely what Romeo described:
Tanha is more literally translated as ‘fused to’ or ‘welded to’. It immediately follows the mental moment that you zoom in with the attentional aperture on something. It could be that a flower or an item on the shelf at the supermarket captures your attention, or you turn your head to catch more detail as you pass by an accident on the road. Many hundreds of thousands of such events take place in the course of a single day. With most of them attention then relaxes and makes space for the next thing. But with some small proportion you find the mind doesn’t quite ‘unclench’ from the object or some aspect of the object. This tension aspect is why it is sometimes translated as ‘grasping’ which is closer. Imagine something you aren’t finished with being pulled out of your hand and you tensing your fingers to resist.
For emotional avoidance, my experience is that the thought of doing something will feel bad, and then my mind will seize upon the thought of doing something that would feel less bad. There’s an unpleasant email I should reply to and hmm, suddenly it feels hard to resist the thought of playing Slay the Spire. The thought of it clings to my mind even if I try to think about something else. Without clinging, it might be that the thought of playing StS would come up—maybe repeatedly—but it wouldn’t cling to the mind in the same way and keep pulling at my attention.
So the behavior itself is triggered by a subsystem trying to avoid discomfort by projecting the thought of something more pleasant into the workspace, but the clinging strengthens the effect and makes the discomfort persistent in a way it wouldn’t be without the clinging.
I think the mechanism might be something like, a subsystem sends the thought of playing StS to the global workspace, and then it’s emotionally charged in a way that will cause some separate subsystem to seize upon the signal and strengthen it so that it stays in the workspace. That means that on top of the avoidance-subsystem contributing signal strength to it in face of other sources of signal, the clinging gives the signal an extra boost. (Though that’s just speculation for what might explain the phenomenal experience, so this particular story turning out to be correct or wrong isn’t cruxy for me.)
For longing, there’s a similar thing—there’s some subsystem that sends up a thought of what one is longing for. That subsystem may be something entirely different than the emotional avoidance system. But once it sends its signal, clinging will amplify that signal and make it more persistent.
man vaccines are so fucking cool. it’s awesome that there are like a dozen horrifyingly painful and deadly diseases that i will almost certainly never get in my life. i wish i could get vaccinated against literally everything
There’s a statue of him in Los Angeles’s Little Tokyo which I used to pay respects to when I visited for the New Year’s festival. As I became an EA I would aspire to match or exceed his impact.
Sugihara continued to hand-write visas, reportedly spending 18 to 20 hours a day on them, producing a normal month’s worth of visas each day, until September 4, 1940, when he had to leave his post before the consulate was closed. Sugihara reportedly worked at a quick pace and aimed to issue 200 to 300 visas each day. [...]
According to witnesses, he was still writing visas while in transit from his hotel and after boarding the train at Kaunas railway station, throwing visas into the crowd of desperate refugees out of the train’s window even as the train pulled out.
theory: a large fraction of travel is because of mimetic desire (seeing other people travel and feeling fomo / keeping up with the joneses), signalling purposes (posting on IG, demonstrating socioeconomic status), or mental compartmentalization of leisure time (similar to how it’s really bad for your office and bedroom to be the same room).
this explains why in every tourist destination there are a whole bunch of very popular tourist traps that are in no way actually unique/comparatively-advantaged to the particular destination. for example: shopping, amusement parks, certain kinds of museums.
I used to agree with this but am now less certain that travel is mostly mimetic desire/signaling/compartmentalization (at least for myself and people I know, rather than more broadly).
I think “mental compartmentalization of leisure time” can be made broader. Being in novel environments is often pleasant/useful, even if you are not specifically seeking out unusual new cultures or experiences. And by traveling you are likely to be in many more novel environments even if you are a “boring traveler”. The benefit of this extends beyond compartmentalization of leisure, you’re probably more likely to have novel thoughts and break out of ruts. Also some people just enjoy novelty.
I think this is probably true of you and people around you but also you likely live in a bubble. To be clear, I’m not saying why people reading this should travel, but rather what a lot of travel is like, descriptively.
What fraction would you say is genuinely motivated by “seeing and experiencing another culture”? I don’t doubt that most travel is performative, but I also think most of the people I interact with seem to have different motivations and talk about things from their travels which are a world away from the Pulp Fiction beer in a McDonalds discussion.
from To the Success of our Hopeless Cause: interestingly, a big tension in the Soviet dissident movement was between people who believed in being 100% virtuous, embracing martyrdom, signing their names and addresses onto their dissenting samizdat texts, protesting to be arrested 5 minutes later and sent to jail, pretending that the letter of the Soviet law actually mattered, etc, vs people who believed in being more strategic and openly illegal and trying to avoid being caught. the former fades in importance because they keep getting arrested (the 1968 red square protest being tbe turning point).
many of these dissidents have very relatable rat-coded vibes. one guy (Volpin) was obsessed with inventing a form of language that was unambiguous and impossible to make equivocating arguments in, as a way to enable rationalist thought. another (Amalrik) kept getting himself in trouble with his teachers and professors because he would insist on his correctness when they gave him bad grades for saying politically dangerous things, and he kept escalating, attempting to send his writing abroad via the Danish embassy and arousing the ire of the KGB. another (Sakharov) who throws away an illustrious high status well-paid career being the number one best nuclear weapon engineer of the Soviet Union to express his (quite mild) takes on the dangers of global nuclear war and pushing for international cooperation and arms treaties.
i feel very lucky never to have been in an environment like this. i don’t think someone like me would have survived. not only have i never ever felt my life or liberty at any threat due to things i say, i’m lucky enough to not even have felt strong pressure to not say true things (obviously within reason, eg no leaking IP) in order to preserve my job or social standing. like maybe i don’t get invited to the cool parties or whatever, but that’s absolutely trivial compared to the prices that some people have been willing to pay to exercise their right to speak.
I do that, too, in things not intended for publication (which is the vast majority of things I write). Why? On a computer, capitalization is not needed to help identify the start of a sentence. In contrast, in print, it is not always possible to tell the difference between a period and an unfortunately-positioned blemish in the paper. Since it is not needed there, I prefer to avoid the extra work of typing an upper-case letter. Also, if I re-arrange the wording of a sentence, which I do frequently, then I have to do more work to fix up the capitalization. Finally, capitalizing the start of the sentence destroys what information capitalization of the initial word would otherwise carry. E.g., in conventional orthography, “Mercury” at the start of a sentence could be either a reference to the planet or the element whereas in the orthography I prefer, the reader knows it is not a reference to the element (because elements are not capitalized).
Grandparent includes a quotation without unambiguously marking it as a quotation (e.g., with quotation marks or a block quote) which I never do in my published writings (and almost never do even in writings that will never leave my computer). This generalizes: in all cases I can recall, those who in public refrain from capitalizing the starts of sentences also violate orthographic conventions I’d would be extremely hesitant to violate.
some of it is autocorrect—it’s easy to turn off sentence beginning capitalization, but i don’t think you can turn off capitalization of proper nouns without turning autocorrect/suggest off entirely. (this is also why i don’t capitalize i, but i do sometimes capitalize I’m). i like lowercase because it feels like a more casual or familiar register. on laptop, not capitalizing is also very pragmatic, because i don’t like pressing the shift key. on mobile, it’s out of a sense of consistency with laptop, and because i endorse laptop vibes more than i endorse phone vibes. as for the book title, tbh, I’m not sure what i was thinking, but one guess is this particular book title is especially easy to accidentally read as part of the surrounding text.
there is a samizdat journal that publishes on bad stuff the USSR does. after the arrests and confessions of Krasin and Yakir, the KGB basically threatens, like, the arrests will continue until The Chronicle stops publishing. the editors put out new issues anyways, saying something to the effect of we believe it is important to broadcast this info about human rights abuses in the Soviet Union to the world. staying silent would be accepting the hostage taking and go against human dignity. therefore we will keep publishing.
after all the principled people get arrested and put in jail, the remaining people are much more into being “strategic”. Krasin and Yakir are two dissidents who become leading figures of the dissident movement. they adopt a move fast break things attitude, doing things like issuing statements and putting signatures on without getting the consent of the people involved. for context, unlike the petitions of today where signing has almost no negative consequences, at that time and place, the people who signed these things often got arrested and sentenced to many years in prison. it’s a successful tactic—a bunch of people are outraged but also they’re afraid of creating fractures in the movement, and also they’d look cowardly if they backed out, so they don’t take their grievances further. Krasin at some point brags about being one of the leaders of the dissident movement. anyways uh later on in 1972 the KGB captures Krasin and Yakir, threatens them until they break down and agree to collaborate, and then use their testimony to incriminate dozens of dissidents, force them to talk to people in the movement to sow dissent, and put on a press conference to have them tell the world that actually the USSR is super good with human rights actually, they don’t actually lock dissidents in psychiatric hospitals, all the stuff we said earlier was misguided fake news, we are totally not just saying this because they will lock us and our families away in jail for a long time if we don’t say this.
it’s unclear whether there is a common factor behind all of these things or if most would have caved in these interrogations. we know many other people were captured and refused to say anything. but it’s suggestive of a certain kind of person.
How confident are we that this is actually true? When I’ve heard about this claim in the past, the actual evidence mentioned looked sort of thin to me when you broke things down.
it kind of worked for the soviet dissidents for a while! the book talks about the “chain reaction”—every time someone got arrested and put through a sham trial, someone would secretly transcribe the court proceedings which showed how farcical it was, publish the transcript as samizdat or sending it abroad to be broadcast by radio back into the soviet union (tamizdat), which would outrage people who would go protest, who would get arrested for protesting, completing the cycle. this cycle only ended after the 1968 red square protest, when people felt it was too hopeless to continue.
idea: flight insurance, where you pay a fixed amount for the assurance that you will definitely get to your destination on time. e.g if your flight gets delayed, they will pay for a ticket on the next flight from some other airline, or directly approach people on the next flight to buy a ticket off of them, or charter a private plane.
pure insurance for things you could afford to self insure is generally a scam (and the customer base of this product could probably afford to self insure) but this mostly provides value by handling the rather complicated logistics for you rather than by reducing the financial burden, and there are substantial benefits from economies of scale (e.g if you have enough customers you can maintain a fleet of private planes within a few hours of most major airports)
That’s a good Coasian point. Talking out of my butt, but I think the airlines don’t carry the risk. The sale channel (airlines, Expedia, etc.) take commissions distributing an insurance product designed another company (Travel Insured International, Seven Corners) who handles product design compliance, with the actual claims being handled by another company and the insurance capital by yet another company (AIG, Berkshire Hathaway).
LLMs tell me the distributors get 30–50% commission, which tells you that it’s not a very good product for consumers.
I know less than you here, but last-minute flights are marked up because businesspeople sometimes need them and maybe TII/SC get a better price on those?
it’s often stated that believing that you’ll succeed actually causes you to be more likely to succeed. there are immediately obvious explanations for this—survivorship bias. obviously most people who win the lottery will have believed that buying lottery tickets is a good idea, but that doesn’t mean we should take that advice. so we should consider the plausible mechanisms of action.
first, it is very common for people with latent ability to underestimate their latent ability. in situations where the cost of failure is low, it seems net positive to at least take seriously the hypothesis that you can do more than you think you can. (also keeping in mind that we often overestimate the cost of failure). there are also deleterious mental health effects to believing in a high probability of failure, and then bad mental health does actually cause failure—it’s really hard to give something your all if you don’t really believe in it.
belief in success also plays an important role in signalling. if you’re trying to make some joint venture happen, you need to make people believe that the joint venture will actually succeed (opportunity costs exist). when assessing the likelihood of success of the joint venture, people will take many pieces of information into account: your track record, the opinions of other people with a track record, object level opinions on the proposal, etc.
being confident in your own venture is an important way of putting your “skin in the game” to vouch that it will succeed. specifically, the way this is supposed to work is that you get punished socially for being overconfident, so you have an incentive to only really vouch for things that really will work. in practice, in large parts of the modern world overconfidence is penalized less than we’re hardwired to expect. sometimes this is due to regions with cultural acceptance and even embrace of risky bets (SV), or because of atomization of modern society making the effects of social punishment less important.
this has both good and bad effects. it’s what enables innovation, because that fundamentally requires a lot of people to play the research lottery. if you’re not willing to work on something that will probably fail but also will pay out big if it succeeds, it’s very hard to innovate. research consists mostly of people who are extremely invested in some research bet, to the point where it’s extremely hard to convince them to pivot if it’s not working out. ditto for startups, which are probably the architypical example of both innovation and also of catastrophic overconfidence.
this also creates problems—for instance, it enables grifting because you don’t actually need to have to be correct if you just claim that your idea will work, and then when it inevitably fails you can just say that this is par for the course. also, being systematically overconfident can cause suboptimal decision making where calibration actually is important.
because many talented people are underequipped with confidence (there is probably some causal mechanism here—technical excellence often requires having a very mechanistic mental model of the thing you’re doing, rather than just yoloing it and hoping it works), it also creates a niche for middlemen to supply confidence as a service, aka leadership. in the ideal case, this confidence is supplied by people who are calibratedly confident because of experience, but the market is inefficient enough that even people who are not calibrated can supply confidence because of the market inefficiency. another way to view this is that leaders deliver the important service of providing certainty in the face of an uncertain world.
(I’m using the term middleman here in a sense that doesn’t necessarily imply that they deliver no value—in fact, causing things to happen can create lots of value, and depending on the specifics this role can be very difficult to fill. but they aren’t the people who do the actual technical work. it is of course also valuable for the leader to e.g be able in theory to fill any of the technical roles if needed, because it makes them more able to spend their risk budget on the important technical questions, it creates more slack and thereby increases the probability of success, and the common knowledge of the existence of this slack itself also increases the perceived inevitability of success)
a similar story also applies at the suprahuman level, of tribes or ideologies. if you are an ideology, your job is unfortunately slightly more complicated. on the one hand, you need to project the vibe of inevitable success so that people in other tribes feel the need to get in early on your tribe, but on the other hand you need to make your tribe members feel like every decision they make is very consequential for whether the tribe succeeds. if you’re merely calibrated, then only one of the two can be true. different social technologies are used by religions, nations, political movements, companies, etc to maintain this paradox.
hot take: analogies should not be used as evidence for positions, except as the weakest form used to privilege an otherwise arbitrary hypothesis to any consideration at all. otherwise, they should be used purely as a way to effectively convey a hypothesis, but the actual evidence needs to come from something other than the mere analogy itself.
hot response: all evidence is analogy, it’s just a matter of degree. Maybe a heuristic like this is a good way to motivate gathering closer, more appropriate evidence, the better to increase confidence.
i can name many examples of evidence that are not analogy. perhaps they’re arguably “just” analogy, but it would be in an obviously boring way. like how there is “just” subjective experience because you can never really directly observe reality, and yet clearly there is a difference between empirical evidence and a priori reasoning (note that the evidence for this analogy comes from the examples below, rather than the analogy itself)
i think most Americans believe X. i run a survey. turns out Y% of Americans believe X.
i run some experiment on whether a given model architecture is better than another one, and get a number at the end. (as opposed to, say, merely making an analogy between NNs and human brains, and using that to predict something)
i read a study that is directly about a question i care about.
Yeah, I guess I’m being provocative, but I actually do think that.
Survey (sampled, point-in-time, framing-dependent, out-of-context) is an (often quite good) analogy for any specific (or general) concept of what people believe.
Experiment is a good case: one kind of ‘really good’ evidence is obtained by driving out as many confounders, controlling as many variables as possible, then reasoning that ‘by this really really close analogy, the same or very very similar should obtain under what look like the same conditions’. Usually there are residual stochasticities. Often experiments are out of context in some way, or the conclusions we would like to draw are extrapolating some amount from what we actually checked.
‘Read a study’ inherits the analogy-ness of whatever experiments it’s based on, plus I guess the thing where by analogy to those other times you read studies that seemed to be accurate (by the same person or same team or same org or same discipline or same culture or same species, …), you expect it to be accurate.
Mathematical proof, as always, is a slightly annoying limit case. I could probably say something like, ‘well, any conclusion other than the mere mathematical conclusion is smuggling in some analogy (to quantities, types, relations, whatever)’, and perhaps also your mathematical methods are trusted by analogy to… errr the other times logic helped you, but I don’t know to what extent I’d really stand by that.
It seems to me that you are greatly broadening (that is, redefining) the word “analogy” to mean any sort of approximation. For example, survey results are not an analogy to the ground truth but an approximation to it.
I’m with Oliver Sourbut. Unless you’ve solved the induction problem AND can formally define your reference class, everything is analogy and degrees of similarity.
Which doesn’t make your take wrong, just that it’s a question of how close the analogy is, and how the analogous evidence applies to the compared predictions.
hot response: analogical inference does clearly provide some evidence. For example: I am conscious, you are otherwise pretty similar to me, therefore you are probably also conscious.
Agreed here. Basically, analogies are mainly useful from a “bounded rationality” perspective. A logically omniscient agent would reason directly from observations alone. On the other hand, analogies are most useful pedagogically, allowing someone who does directly understand something to teach it to someone else more quickly. You also have things like simulations. There we deliberately try and construct an analogy between a computer program and reality. In principle, there could also be “naturally occurring simulations” where we don’t try to set things up deliberately, but this doesn’t seem to happen very often.
In all of these cases, you have to be careful that all the parts of your mapping actually carry over, else you’ll get the wrong answer.
I think it depends on what you are including by analogy, when making general cases, we have to pretty much exclusively deal with analogs.
The most basic case might be stated as:
If ∀x(a⊃x⊢b) ⊃ ∀y(a⊃y⊢b)
y1⊢a
∴ y1⊢b
That is, e.g.
If all for all convicted criminals, killing someone implies they are bad, then for cops killing someone implies they are bad.
Officer Bob killed someone.
Therefore, officer Bob is bad.
If it is the case that for convicted criminals killing someone is only bad because, e.g., there wasn’t a justification, then (1) is false (and you would have to include that as a qualifiers to make it true). But it is trivial to accept as true.
the first paper observes a phenomenon where adversarial accuracy and normal accuracy are at odds with each other. the authors present a toy example to explain this.
the construction involves giving each input one channel that is 90% accurate for predicting the binary label, and a bazillion iid gaussian channels that are as noisy as possible individually, so that when you take the average across all of them you get ~100% accuracy. they show that when you do ℓ∞-adversarial training on the input you learn to only use the 90% accurate feature, whereas normal training uses all bazillion weak channels.
the key to this construction is that they consider an ℓ∞-ball on the input (distance is the max across all coordinates). so this means by adding more and more features, you can move further and further in ℓ2 space (specifically, √n in terms of the number of features). but the ℓ2 distance between the means of the two high dimensional gaussians stays constant, so no matter what your ε is, at some point with enough channels you can perturb anything from one class into the other class and vice versa.
in the second paper, the authors do further experiments on real models to show that you can separate out the robust features and the unrobust ones, and recombine them into frankenstein images that look like dogs to humans but cats to the unrobust model and dogs to the robust model.
they also generalize the toy example in the previous paper. they argue that in general, adversarial examples arise exactly when the adversarial attack metric and the loss metric differ. in other words, the loss function (and downstream part of the model, in a multilayer model) implies some loss surface around any data point, and some directions on that surface will be a lot more important for the loss than some other directions. but your ε ball (in, say, ℓ2) that you do your attack in will treat all those directions equally importantly. so you can pick the direction that maximizes the amount of loss change.
their new example is a classification task on two features, where the two classes are very stretched out gaussians placed diagonally from each other, so that a ℓ2 ball from each mean reaches into the distribution of the other gaussian. during normal training, the classification boundary learned falls right along the line where the mahalanobis distance from the two means is the same (intuitively, the classification boundary falls along exactly those points where a data point is equally likely to be sampled from either distribution.) but this is different from ℓ2 norm! it treats distances along the low-variance axis of the gaussian as being much larger, so it doesn’t mind putting the boundary close (in ℓ2 norm) to the mean. this lets the ℓ2 perturbation step over the boundary.
Is it a very universal experience to find it easier to write up your views if it’s in response to someone else’s writeup? Seems like the kind of thing that could explain a lot about how research tends to happen if it were a pretty universal experience.
I think so/I have this. (I would emoji react for a less heavy response, but doesn’t work on older short forms)
The corollary is that it’s really annoying to respond to widely held views or frames which aren’t clearly written up anywhere. Particularly if these views are very inprecise and confused.
new galaxy brain hypothesis of how research advances: progress happens when people feel unhappy about a bad but popular paper and want to prove it wrong (or when they feel like they can do even better than someone else)
this explains:
why it’s often necessary to have bad incremental papers that don’t introduce any generalizable techniques (nobody will care about the followup until it’s refuting the bad paper)
why so much of academia exists to argue that other academics are wrong and bad
why academics sometimes act like things don’t exist unless there’s a paper about them, even though the thing is really obvious
it’s actually insane how much of the entire economy is tech now. somewhere in the back of my head i still expected traditional “big” industries like big oil, or the banks, or whatnot to be the biggest. but i just realized this hasn’t been true for a long time.
Looking at market cap is kinda misleading though; the public stock market is not the same thing as the economy, and tech is over-represented in market cap because of winner-take-all dynamics and margins.
Also, Amazon (setting aside AWS) is primarily a consumer goods and logistics company, and Tesla is a car and battery manufacturer—they’re gigantic in part because they use tech well, but the actual goods and economic activity they generate aren’t exactly “tech”.
well, walmart is getting utterly destroyed by amazon (3x bigger). the gap is even bigger for tesla vs Toyota (5x) or whatever. if chatgpt is to be trusted, tesla is slightly bigger than every other car manufacturer combined. so whatever it is that “tech” is doing differently in those fields—and amazon is culturally much more similar to google or microsoft than toyota or GM—is working absurdly well.
higher margins rightfully means higher market cap. if your company is barely scraping by, youre not producing as much value.
higher margins rightfully means higher market cap. if your company is barely scraping by, youre not producing as much value.
Right, I think the market caps are justified for the most part. But market caps represent the present value of expected future profits, not a measure of current economic activity.
higher margins rightfully means higher market cap. if your company is barely scraping by, youre not producing as much value.
Not much surplus; you can still be a commodity around which huge volumes of production and consumption revolve even if your prospects for future free cash flow are very bad.
i think it’s possible for a company to be barely profitable but still be producing a bunch of surplus for its clients/consumers and workers, so a bunch of value for the world. this is because the owners of the company might be unable to capture much of this surplus. i think this can happen when increasing your prices would mean people would just buy versions of the same thing from other companies and decreasing your salaries would mean your employees leave to other companies. i think older and more competitive/efficient/optimized and [less innovative] industries are generally more like this. and i think this is probably an important factor to keep in mind when estimating surplus from profitability. that said, i agree that pro tanto higher margins imply higher surplus/[value to society] and also probably there is an empirical correlation
Tesla is bigger by market cap but if you look at metrics like revenue and the amount of cars sold it’s much smaller. Tesla earns it market cap by the hope that it’s technology will be more significant in the future when it has fully-driverless cars.
by US GDP, “Information” contributes 5.5% in 25q4. There’s various other tech-related entries from manufacturing/retail/services that should sum to >10%, but not the majority you’d expect from market cap.
I’m trying to figure out how much political power AI has by virtue of being a big chunk of the economy. clearly, oil and gas have enough power to stop climate regulations. it seems like AI is already roughly as big as oil and gas (and tech as a whole much larger than oil and gas)
I was thinking about the other comments and their technical correctness re conflating the economy and the stock market / how much that correction of your OP actually matters for your area of concern.
Im actually very confused about what [number associated with a business or sector] tracks most closely with [political power] or [potential political power]. Market cap doesn’t look like a crazy place to start, but my guess is that there‘s something better (and I agree with the point that market cap likely overweights tech for this purpose).
I think the top 3 were still tech companies 10 years ago, and 5 out of 10 were already there so this doesn’t seem new? It’s just how much room they’ve had to grow that’s been surprising IMO.
Here’s the top 3 as of 2016 (AI researched)
Top 3 companies by market capitalization (Q1 2016)
Apple — ~$604 billion
Alphabet (Google) — ~$518 billion
Microsoft — ~$437 billion
Facebook / Amazon were also in the top 10. It’s mostly been them expanding ~10x from there in market cap.
i think it’s really weird that people are trying to do vaguely interp flavored things but also trying to argue for the goodness of such techniques via empirical usefulness. i think there are broadly two self consistent worldviews here. one is that you want to understand how NNs actually work and then use that understanding for something. the other is you want to make models better at X (where X can be anything from “be a good chatgpt model” to “refuse bioweapon prompts” to “make weak to strong setup score go up”). but if you’re doing the latter the actual conceptually important part is picking the right X and then working really hard to make it go up using whatever techniques work. if you’re doing the former you should actually try to understand things whatsoever. it doesn’t make sense to try to do both and ultimately get neither. you should either do pragmatic or do interp.
The argument I would make is that you want to solve the practical problem, but you want to do so in a way that maximally scales with intelligence. And then white box techniques are more scalable than black box techniques, since schemers will predictably fool your black box techniques but not necessarily your white box techniques.
i totally agree for the case of actual white box understanding. this is what I’d consider the first worldview. my gripe is the interp-flavored techniques reveal very little understanding that might actually scale with intelligence, and yet through association with interp try to imply that they do.
I think the thing I’m saying is true even for interp/interp-adjacent techniques that give very little understanding—the fact that they’re white-box techniques at all should still make it harder for a schemer to get around them than black-box techniques.
ok, i agree with this. there is some room for disagreement on exactly how big the gap is between white box and black box—i think it’s very small compared to the gap from white box to full understanding. my main argument would just be ELK flavored, that there are spurious correlations that give you human simulators instead. but i don’t feel super confident that the constant factors work out to support my claim
There are ML papers floating around with training methods and architectural tweaks (e.g Block AttnRes or mHC-lite) which might be incorporated into future models.
It seems plausible to me that replacing standard residual skip connections with something more complicated:
scales intelligence somewhat but not past the frontier
makes it slightly harder for existing interp flavored techniques to generate understanding
doesn’t meaningfully affect the relative performance of linear probes vs output classifiers for inference-time detection of precursors to high-risk misaligned behavior.
I do think there are cases where models will be able to manipulate the data they’re feeding into white-box methods in a way that affects verdicts, but it’s hard to see these arising naturally before being demonstrated in more contrived scenarios, and I agree with evhub that this would be harder than circumventing black box safeguards.
Would you say a similar critique holds for sparse autoencoders?
(edit: i’ve tended to think of SAEs and AOs as basically end-to-end tools for activation-space interpretability, but in hindsight i see AOs are definitely trying to be more “lines go up” and end-to-end than SAEs, even if there are many loss function variants for SAEs. i think i get your point now)
i think SAEs are a completely reasonable thing under the first worldview, and mostly crazy under the second worldview (with the exception of maybe bio or something where I’ve heard they’re genuinely useful)
(SAEs are not sufficient to actually understand things, but they are a genuine step on the way there)
I sort of think of activation oracles as “chain of thought monitoring, but for activations”
Recall the basic chain-of-thought monitoring safety case: if chain of thought is necessary for a given model to complete a task, then (barring sophisticated encoding strategies) the chain of thought should be monitorable for that (task, model) pair.
As models get bigger / deeper, the space of tasks where chain-of-thought is necessary decreases, so (all else equal) chain of thought monitor-ability decreases.
How do we get around this? Well, assuming the model is using some kind of general purpose reasoning internally, that general purpose reasoning should have some general structure. And you know what’s great at learning general structures: neural networks!
We won’t be able to capture all the cognition of a model (that would require ambitious interp.), but we may be able to capture all the cognition that routes through something like “internal general purpose reasoning”.
How well activation oracles learn this general structure (/the degree to which there’s a general structure at all) is an empirical question, but we can get rich signal on this question by training activation oracles, and evaluating them in on auditing games, unsupervised elicitation, etc.
Now you might argue that this is a very contingent defense of activation oracles—if for a set of evals where you would expect activation oracles to work, a different method out-performs activation oracles, then should we abandon them for the favored method? (since our only evidence for this “general internal reasoning structure” is via downstream usefulness of activation oracles). Concretely, it seems like dumb prefill black-box methods mostly out-perform activation oracles in auditing evals. Given this, why favor activation oracles over dumb black box methods (or smarter black box methods like confessionals)
Given current public evidence, you have to rely on conceptual arguments to justify favoring activation oracles: intentionally subverting activation oracles seems harder for a schemer than intentionally subverting black box methods. I expect we should be able to show this in a compelling way with stress-testing / control evals (but defining a “fair-fight” is non-trivial since in-context red-team would unrealistically favor AOs—see this post for some thoughts on this).
inside people with substantial internal conflict, their parts might even be less aligned/connected with each other than they are with other people. this probably has really weird effects
hot take: introspection isn’t really real. you can’t access your internal state in any meaningful sense beyond what your brain chooses to present to you (e.g visual stimuli, emotions, etc), for reasons outside of your direct control. when you think you’re introspecting, what’s really going on when you think you’re introspecting is you have a model of yourself inside your brain, which you learn gradually by seeing yourself do certain things, experience certain stimuli or emotions, etc.
your self-model is not fundamentally special compared to any other models you have. it works the same way as your model of anyone or anything else, except you have way more data on yourself, and also you directly experience your own emotions and sensori stimuli, as opposed to having to infer them for other people. often your emotional brain sabotages your ability to understand yourself, but also it sometimes sabotages your ability to understand other people too (e.g groupthink, tribalism).
your self-model can diverge arbitrarily far from reality. when you’re emotionally unintegrated, you have a model of yourself that fails to understand how your emotions truly work, so you will systematically mispredict how you will actually behave, or try to fix yourself in ways that don’t work because you are misunderstanding the causes of your actions. further evidence comes from some of the split brain experiments which show people confidently hallucinating causes for their actions which are demonstrably untrue.
(for emotional integration in particular, misprediction is only half of the thing; the other part of the thing is sometimes you can get detached from stimuli and stop perceiving them entirely. this other part is not neatly explained by this hypothesis)
your self-model contains a nested self-model, but this isn’t special in any way. your models of other people have a model of yourself contained within (what does this person think about me?). your nested self model can also be arbitrarily wrong—it is very common to fail to understand the ways in which your top level self model is wrong. you probably don’t have a third level nested self model because that’s not very useful and very costly to maintain.
your inner monologue is not most of your thinking, and doesn’t give you faithful representation of your true thoughts. people without internal monologues do just fine. also, it’s very very common for people to lie to themselves.
I dont think this is just speculative/unfalsifiable. I claim that thinking of your “introspection” as self modelling will lead you to make better decisions irl. you can apply techniques for learning to model other kinds of knowledge. you can realize that cognitive biases that apply to other kinds of modeling also apply to modeling yourself.
there’s no clean boundary between your self model and your model of other things. if it’s useful to model e.g your phone, or gut bacteria, or glasses, or partner/close friends, in close interaction with your stimuli and actions, there is no sharp self boundary.
a regular computer program can truly introspect on itself in ways that humans cannot, but this is fundamentally not that interesting. it definitely doesn’t mean computers are conscious.
corollary: “introspection” is not in any way related to consciousness or moral patienthood, and it is uninteresting to ask whether AIs or nonhuman animals are capable of introspection for the purposes of determining things about consciousness and moral patienthood.
What experiences have you had that lead you to call this a ‘hot take’?
[I rephrased a few times to avoid sounding sarcastic and still may have failed; I’m interested in why it looks to you like others dramatically disagree with this, or in what social environment people are obviously not operating on a model that resembles this one. My sense is a lot of people think this way, but it’s a little socially taboo to broadcast object-level reasoning grounded in this model, since it can get very interpersonally invasive or intimate and lead to undesirable social/power dynamics.]
I heard that when people are in therapy, their self adapts to the school of psychotherapy. For example you start getting Freudian dreams if you are in Freudian therapy, but you start getting Jungian dreams instead if you are in Jungian therapy.
This seems to support the hypothesis that when we think we have discovered something deep inside us, often we have actually constructed it to fit our preconceptions.
(I suspect that Buddhism also mostly works this way. When Buddhists say that they can verify the truth of all Buddha’s words by introspection… on one hand, yes they can; on the other hand, if they instead believed in Jesus, they could verify that just as well. Asking yourself is like asking an LLM: whatever you believe is true, it will confirm.)
in my worldview this is very easily explained. if you do Jungian therapy your self model starts incorporating Jungian concepts for explaining your own brain. You didn’t change the way your brain works fundamentally, you just changed your own model of your brain. The same way that if you read a book on the biology of plants you’ll start viewing them in the lens of cells, and if you read a book on the ancient spirits associated with each plant you’ll start thinking of plants as being animated by the ghosts of our ancestors.
The big mistake happens when people think of their self model as actually genuinely introspection. Then, you might think that you’ve changed the shape of your mind instead of only changing your understanding of your mind.
Instead, I think the right way to figure out if your self model is correct is to make predictions about your future behavior and see if they come true; act based on your self model and see if you become more successful at life, or whether you mysteriously repeatedly fail in some way.
when i first came to the bay area, i was shocked that the silicon valley was literally just a bunch of suburbs and boring office parks. i used to think this was very incongruous. i still mostly do, but now i at least have a story for why the vibe isn’t literally maximally incongruous.
one software architectureintuition i have is that legibility and standardization are utterly essential. the move from “servers as pets”(each one bespoke and carefully hand crafted, when something goes wrong, going in and fixing that one server) to “servers as cattle” (standardize all of the servers to be exact clones, destroy and recreate each server any time something deviates even slightly from identical rather than try to patch things on that one server by hand) is a good example. or for instance, you generally want to make things use common interfaces wherever possible, even if that means some slight awkwardness or indirectness, because the value of being able to interchange things is so high.
modern car oriented cities are a great example of taking this intuition and applying it to the physical world. steamrolling the unruly nature of old fashioned cities and reshaping everything to fit a handful of standardized shapes. this is why driving in Palo Alto is so much easier than in, say, Boston, or SF—there are many fewer quirky intersections or unpredictable bike lanes or pedestrians.
another software intuition is in general it’s worth taking on a pretty sizeable fixed cost as long as it makes things present a substantially simpler interface, or more reliably serve its purpose. for example, it’s worth pulling in a thousand different versions of packages in your node_modules and wasting quite a lot of disk space and network bandwidth in order to eliminate dependency hell once and for all; or putting your application in a docker container so you don’t have to worry about “works on my machine” ever again; or making file system blocks fixed sizes to make the accounting a lot easier, even if that wastes some space.
car centric cities are optimized heavily for a specific purpose—arbitrary point to point travel in the comfort of your own vehicle. there is a kind of elegance to this; and if you have dynamic pricing, and are unconcerned with the median experience, it makes the best possible experience better. it also comes with many disadvantages (horrible for pedestrians, inefficient use of land, low bandwidth, makes the vibes horrible) that have been harped on at sufficient length that i don’t need to go into it in depth here; i still experience deep suffering every time i perceive south bay. but i think i now understand better why you would ever want to build things this way.
you can have grid cities which aren’t car centric, that are much easier to travel in, while having better best experiences. car centric cities have the problem of traffic.
what’s the current state of analysis on whether the civil rights act of 1957 was actually net positive or negative for civil rights in hindsight? there are two possible stories one can tell, and at the time people were arguing about which is correct:
passing even a useless civil rights bill is a lot better than nothing because it sets a precedent that getting civil rights bills through the Senate is possible / makes the southern coalition no longer look invincible. this serves a useful coordination mechanism because people only want to support things that they think other people will support.
passing a useless civil rights bill is worse than no bill because it creates a false sense of progress and makes it feel like something was done even when nothing was. to the extent that the bill signals to people that getting civil rights bills through the Senate is possible, this is a false impression because the only reason the bill could get through was that it was watered down to uselessness.
this feels directly analogous to the question of whether we should accept very weak AI safety regulations today.
It seems extremely net-positive for civil rights, but mainly through the mechanism of it making Lyndon Johnson a viable candidate for president while maintaining his stature with the southern democrats, leading ultimately to the Civil Rights Act of 1964.
This can be seen as a generalizable lesson only insofar as you think weak bills like that are typically passed by Lyndon Johnson-like figures playing 4d political chess ultimately for altruistic reasons. Without that effect, it mostly seemed bad, it likely actually decreased the number of black voters, and did not decrease the south’s ability to filibuster the senate against civil rights (which was the main mechanism by which civil rights bills were unable to pass), eg they filibustered away another civil rights bill in 1959 or something. Plus, if not for Lyndon Johnson ultimately being pro-civil rights, it would have put someone decidedly anti-civil-rights into the presidency.
so it sounds like there’s basically no way anyone could have known that johnson would actually be a pro civil-rights president, and that all the civil rights people who were opposed to the 1957 bill at the time were basically opposed for the right reasons? like basically everything we know about johnson as of 1960 suggests that he is telling everyone what they want to hear and it’s unclear whether he has any convictions of his own except for his strong track record of defending the interests of the south.
Basically yes. His staff likely coulda predicted this (eg there were a few circumstances where out of anger he did some small civil rights stuff, then backed off when he cooled down & looked at the political repercussions), and possibly Lady Bird, but no other senator or member of the public had any reliable way to predict this for the reasons you state.
I mean, even in the Felix Longoria Arlington case, which is what I assume you’re referring to, it seems really hard for his staff members to have known, without the benefit of hindsight, that this was any significant window into his true beliefs? I mean, johnson is famously good at working himself up into appearing to genuinely believe whatever is politically convenient at the moment, and he briefly miscalculated the costs of supporting civil rights in this case. his apparent genuineness in this case doesn’t seem like strong evidence.
Some evidence for (2) is that before the 1957 act no civil rights legislation had been passed for 82 years[1], and after it three more civil rights acts were passed in the next 11 years, including the Civil Rights Act of 1964, which in my understanding is considered very significant.
there’s an exogenous factor, which is that the entire country was shifting leftward during the 50s and 60s. it’s plausible that the 1964 bill would have passed anyways without the 1957 bill, possibly even earlier
there’s an obvious synthesis of great man theory and broader structural forces theories of history.
there are great people, but these people are still bound by many constraints due to structural forces. political leaders can’t just do whatever they want; they have to appease the keys of power within the country. in a democracy, the most obvious key of power is the citizens, who won’t reelect a politician that tries to act against their interests. but even in dictatorships, keeping the economy at least kind of functional is important, because when the citizens are starving, they’re more likely to revolt and overthrow the government. there are also powerful interest groups like the military and critical industries, which have substantial sway over government policy in both democracies and dictatorships. many powerful people are mostly custodians for the power of other people, in the same way that a bank is mostly a custodian for the money of its customers.
also, just because someone is involved in something important, it doesn’t mean that they were maximally counterfactually responsible. structural forces often create possibilities to become extremely influential, but only in the direction consistent with said structural force. a population that strongly believes in foobarism will probably elect a foobarist candidate, and if the winning candidate never existed, another foobarist candidate would have won. winning an election always requires a lot of competence, but no matter how competent you are, you aren’t going to win on an anti-foobar platform. the sentiment of the population has created the role of foobarist president for someone foobarist to fill.
this doesn’t mean that influential people have no latitude whatsoever to influence the world. when we’re looking at the highest tiers of human ability, the efficient market hypothesis breaks down. there are so few extremely competent people that nobody is a perfect replacement for anyone else. if someone didn’t exist, it doesn’t necessarily mean someone else would have stepped up to do the same. for example, if napoleon had never existed, there might have been some other leader who took advantage of the weakness of the Directory to seize power, but they likely would have been very different from napoleon. great people still have some latitude to change the world orthogonal to the broader structural forces.
it’s not a contradiction for the world to be mostly driven by structural forces, and simultaneously for great people to have hugely more influence than the average person. in the same way that bill gates or elon musk are vastly vastly wealthier than the median person, great people have many orders of magnitude more influence on the trajectory of history than the average person. and yet, the richest person is still only responsible for 0.1%* of the economic output of the united states.
*\ fermi estimate, taking musk’s net worth and dividing by 20 to convert stocks to flows, and comparing to gdp. caveats apply based on interest rates and gdp being a bad metric. many assumptions involved here.
I think there’s a spectrum between great man theory and structural forces theory and I would classify your view as much closer to the structural forces view, rather than a combination of the two.
The strongest counter-example might be Mao. It seems like one man’s idiosyncratic whims really did set the trajectory for hundreds of millions of people. Although of course as soon as he died most of the power vanished, but surely China and the world would be extremely different today without him.
A synthesis between the structural forces theory and “pulling the rope sideways”.
The economical and other forces determine the main direction, a leader who already wanted to go in that direction gets elected and starts going in that direction, his idiosyncratic whims get implemented as a side effect.
Like, instead of Hitler, there would be another German leader determined to change the post-WW1 world order, but he would probably be less obsessed about the Jews. Also, he might make different alliances.
and yet, the richest person is still only responsible for 0.1%* of the economic output of the united states.
Musk only owns 0.1% of the economic output of the US but he is responsible for more than this, including large contributions to
Politics
Space
SpaceX is nearly 90% of global upmass
Dragon is the sole American spacecraft that can launch humans to ISS
Starlink probably enables far more economic activity than its revenue
Quality and quantity of US spy satellites (Starshield has ~tripled NRO satellite mass)
Startup culture through the many startups from ex-SpaceX employees
Twitter as a medium of discourse, though this didn’t change much
Electric cars probably sped up by ~1 year by Tesla, which still owns over half the nation’s charging infrastructure
AI, including medium-sized effects on OpenAI and potential future effects through xAI
Depending on your reckoning I wouldn’t be surprised if Elon’s influence added up to >1% of Americans combined. This is not really surprising because a Zipfian relationship would give the top person in a nation of 300 million 5% of the total influence.
i’m happy to grant that the 0.1% is just a fermi estimate and there’s a +/- one OOM error bar around it. my point still basically stands even if it’s 1%.
i think there are also many factors in the other direction that just make it really hard to say whether 0.1% is an under or overestimate.
for example, market capitalization is generally an overestimate of value when there are very large holders. tesla is also a bit of a meme stock so it’s most likely trading above fundamental value.
my guess is most things sold to the public sector probably produce less economic value per $ than something sold to the private sector, so profit overestimates value produced
the sign on net economic value of his political advocacy seems very unclear to me. the answer depends strongly on some political beliefs that i don’t feel like arguing out right now.
it slightly complicates my analogy for elon to be both the richest person in the us and also possibly the most influential (or one of). in my comment i am mostly referring to economic-elon. you are possibly making some arguments about influentialness in general. the problem is that influentialness is harder to estimate. also, if we’re talking about influentialness in general, we don’t get to use the 0.1% ownership of economic output as a lower bound of influentialness. owning x% of economic output doesn’t automatically give you x% of influentialness. (i think the majority of other extremely rich people are not nearly as influential as elon per $)
one kind of reasoning in humans is a kind of instant intuition; you see something and something immediately and effortlessly pops into your mind. examples include recalling vocabulary in a language you’re fluent in, playing a musical instrument proficiently, or having a first guess at what might be going wrong when debugging.
another kind of reasoning is the chain of thought, or explicit reasoning: you lay out your reasoning steps as words in your head, interspersed perhaps with visuals, or abstract concepts that you would have a hard time putting in words. It feels like you’re consciously picking each step of the reasoning. Working through a hard math problem, or explicitly designing a codebase by listing the constraints and trying to satisfy them, are examples of this.
so far these map onto what people call system 1 and 2, but I’ve intentionally avoided these labels because I think there’s actually a third kind of reasoning that doesn’t fit well into either of these buckets.
sometimes, I need to put the relevant info into my head, and then just let it percolate slowly without consciously thinking about it. at some later time, insights into the problem will suddenly and unpredictably pop into my head. I’ve found this mode of reasoning to be indispensible for dealing with the hardest problems, or for generating insights, where if I just did explicit reasoning I’d just get stuck.
of course, you can’t just sit around and do nothing and hope insights come to you—to make this process work you have to absorb lots of info, and also do a lot of explicit reasoning before and after to take flashes of insight and turn them into actual fleshed-out knowledge. and there are conditions that are more or less conducive to this kind of reasoning.
I’m still figuring out how to best leverage it, but I think one hypothesis this raises is the possibility that a necessary ingredient in solving really hard problems is spending a bunch of time simply not doing any explicit reasoning, and creating whatever conditions are needed for subconscious insight-generating reasoning.
the possibility that a necessary ingredient in solving really hard problems is spending a bunch of time simply not doing any explicit reasoning
I have a pet theory that there are literally physiological events that take minutes, hours, or maybe even days or longer, to happen, which are basically required for some kinds of insight. This would look something like:
First you do a bunch of explicit work trying to solve the problem. This makes a bunch of progress, and also starts to trace out the boundaries of where you’re confused / missing info / missing ideas.
You bash your head against that boundary even more.
You make much less explicit progress.
But, you also leave some sort of “physiological questions”. I don’t know the neuroscience at all, but to make up a story to illustrate what sort of thing I mean: One piece of your brain says “do I know how to do X?”. Some other pieces say “maybe I can help”. The seeker talks to the volunteers, and picks the best one or two. The seeker says “nah, that’s not really what I’m looking for, you didn’t address Y”. And this plays out as some pattern of electrical signals which mean “this and this and this neuron shouldn’t have been firing so much” (like a backprop gradient, kinda), or something, and that sets up some cell signaling state, which will take a few hours to resolve (e.g. downregulating some protein production, which will eventually make the neuron a bit less excitable by changing the number of ion pumps, or decreasing the number of synaptic vesicles, or something).
Then you chill, and the physiological questions mostly don’t do anything, but some of them answer themselves in the background; neurons in some small circuit can locally train themselves to satisfy the question left there exogenously.
a thing i think is probably happening and significant in such cases: developing good ‘concepts/ideas’ to handle a problem, ‘getting a feel for what’s going on in a (conceptual) situation’
a plausibly analogous thing in humanity(-seen-as-a-single-thinker): humanity states a conjecture in mathematics, spends centuries playing around with related things (tho paying some attention to that conjecture), building up mathematical machinery/understanding, until a proof of the conjecture almost just falls out of the machinery/understanding
This is learning of a narrow topic, which builds representations that make thinking on that topic more effective, novel insights might become feasible even through system 1 where before system 2 couldn’t help. With o1, LLMs have systems 1 and 2, but all learning is in pretraining, not targeting the current problem and in any case with horrible sample efficiency. Could be a crucial missing capability, though with scale even in-context learning might get there.
of course, you can’t just sit around and do nothing and hope insights come to you—to make this process work you have to absorb lots of info, and also do a lot of explicit reasoning before and after to take flashes of insight and turn them into actual fleshed-out knowledge.
Giorgio Parisi mentionned this in his book; he said that the ah-ah moments tend to spark randomly when doing something else. Bertrand Russell had a very active social life (he praised leisure) and believed it is an active form of idleness that could reveal very productive. A good balance might be the best way to leverage it.
Since there are basically no alignment plans/directions that I think are very likely to succeed, and adding “of course, this will most likely not solve alignment and then we all die, but it’s still worth trying” to every sentence is low information and also actively bad for motivation, I’ve basically recalibrated my enthusiasm to be centered around “does this at least try to solve a substantial part of the real problem as I see it”. For me at least this is the most productive mindset for me to be in, but I’m slightly worried people might confuse this for me having a low P(doom), or being very confident in specific alignment directions, or so on, hence this post that I can point people to.
I think this may also be a useful emotional state for other people with similar P(doom) and who feel very demotivated by that, which impacts their productivity.
for some reason, the irrational belief that nobody will read my shortforms paradoxically makes them much easier to write. if I’m writing something polished that i think lots of people will read, then i get scared that people will see it and think less of.me or something, which manifests as unreasonable perfectionism and a desire to present a fictitious version of myself and my thinking. i wonder if there is some way to get the best of both worlds—to produce more authentic but also high quality widely read posts
One possible approach would be having someone else polish the posts for you/work iteratively on ideas with a friend. Writing partners are useful! Not only do they divorce you from some of the reputational risk of writing whatever you’re writing, they also turn the worst part of writing into a non-issue (the mental context switch between generating and polishing). There’s also the concept of beta readers, but that admittedly feels a bit sillier for web forum posts. Just being able to offload the mental work of perfection into a “good enough” of “$person i trust thought it was good actually so it’s fine” does a lot for improving speed of iteration, usually without meaningfully degrading quality in a noticeable fashion.
Write it as shortform, then at the last moment change your mind and post it as article. It sounds stupid but it works for me.
(It does not have to be literally the last moment, but when you already wrote enough, you see how it will end, and you can continue writing without losing the momentum.)
thoughts on what to work on in a world with heavy AI automation
it seems undeniable at this point that AI automation will play a huge role in all research in the medium term future. therefore, we should take automation into account when choosing what to work on. here are some possibilities, in decreasing order of how optimistic you are that models will be good at alignment research.
push for people to allocate more resources to alignment. the most optimistic assumption is that models are perfect drop in replacements for human researchers, and are equally good as good at alignment and capabilities research as humans are. then, we just need to allocate more compute to alignment and we’re chilling. the main problem is there is an incentive to race to the bottom, to allocate more to capabilities to have a faster recursive improvement loop. but it is much easier to legislate a requirement to spend X% of compute on automated alignment research, than other pause or slowdown proposals.
making the AIs better at alignment research. we might believe that AIs are by default going to be worse at alignment than capabilities, possibly because feedback loops are better for capabilities, or models will be worse at conceptual research, or something like that. if we believe this is fixable, but won’t happen by default, then we should make current AIs differentially better at alignment research.
making yourself better at complementing the skills of AIs. we might think that it will be extremely hard to make AIs differentially better at alignment—maybe AI spikiness is fundamental and requires breakthroughs to change. instead, there will remain some skills (eg research taste) where humans are better than AI, and the supply of human skill here becomes the bottleneck. so you should somehow train yourself to become as good at eg research taste or conceptual research as possible over the next few years, so that when really powerful research AIs come around, you can make much better decisions about which experiments to run. and since almost all relevant research will be done in the future, doing more good research now is much less important than setting yourself up to do more good research later
do as much conceptual research as possible. we might be so pessimistic about the usefulness of AI for alignment research that we think even once AIs are capable of running experiments way faster, none of that will be useful for alignment whatsoever. maybe you think empirical evidence is just fundamentally not very useful for alignment research and the only real constraint is serial time. then, the only thing that matters is doing as much conceptual research as possible, which means you should start doing conceptual stuff asap and mostly ignore AI assistance until the end, when you invent the One True Metric Of Alignment that you can point the number go up machine at, and AI mostly doesn’t help you come up with that metric at all.
bonus option: aligning current models. you might feel pessimistic about whether the near-future AIs that can automate research will be aligned. maybe you think they will secretly sabotage the research and so the most important thing is to make them not do that. so you should mostly focus on pragmatic alignment of models that we are deploying soon
bonus bonus option: status quo. maybe you think heavy AI speedup of research is actually far away and the current wave of automated AI research is going to peter out, and then someone will invent some very different paradigm of AI that actually is good at automating research. then you should, idk, work directly on long term alignment research bets the normal way?
There’s nothing here I really disagree with and yet I feel it’s somewhat a bit vague somehow, which could just be me, but the way I see this topic I’d have approached the ideas instead like:
the best position to capitalize on a higher potential future model is of course, to have a plan for some sort of system/harness/etc. the model would use beforehand. But there is not quite a sense of “the best agent setup” or “the best social structure” at the moment.
models need to have more consistent character. Currently they don’t, and we have Gemini meltdowns, Claude seeing inconsistencies in values in its constitution, and a situation where much more powerful actors will have access to more unrestricted models not available to the general public.
there seems to be a big contrast between GPT guardrailing from automated power and the in-practice agent breakouts we’ve seen, and whether this is tension that causes warped values or not
Treat this comment as like 40% confidence or something.
for people who are not very good at navigating social conventions, it is often easier to learn to be visibly weird than to learn to adapt to the social conventions.
this often works because there are some spaces where being visibly weird is tolerated, or even celebrated. in fact, from the perspective of an organization, it is good for your success if you are good at protecting weird people.
but from the perspective of an individual, leaning too hard into weirdness is possibly harmful. part of leaning into weirdness is intentional ignorance of normal conventions. this traps you in a local minimum where any progress on understanding normal conventions hurts your weirdness, but isn’t enough to jump all the way to the basin of the normal mode of interaction.
(epistemic status: low confidence, just a hypothesis)
Pretty sure @Ronny Fernandez has opinions about this (in particular, I expect he disagrees that actively being visibly weird requires being ignorant of how to behave conventionally).
I don’t think I understand what “learn to be visibly weird” means, and how it differs from not following social conventions because you fail to understand them correctly.
Perhaps I misunderstand your use of the phrase “intentionally ignorant” but I believe many cases of people who are seen to have acted with “integrity” are people who have been hyperaware and well informed of what normal social conventions are in a given environment and made deliberate choice not to adhere to them, not ignoring said conventions out of a lack of interest.
I also am not sure what you mean by “weird”. I assume you mean any behavior which is not the normal convention of any randomly selected cohesive group of people, from a family, to a local soccer club, to a informal but tight knit circle of friends, to a department of a large company. Have I got that right?
My idea of ‘weird’ tends to involve the stereotypical artists and creatives I associate with, which is, within those circles not weird at all but normal. But I’m meta-aware that might be a weird take.
for people who are not very good at navigating social conventions, it is often easier to learn to be visibly weird than to learn to adapt to the social conventions.
are you basing this on intuition or personal experience or something else? I guess we should avoid basing it on observations of people who did succeed in that way. People who try and succeed in adapting to social conventions are likely much less noticeable/salient than people who succeed at being visibly weird.
if any municipality in the bay area were to choose to allow lots of housing, then it would very quickly get manhattanized and they would make a zillion dollars in tax revenue, while harming property values in nearby cities. so naively you’d expect that surely eventually one random municipality of the dozens in the bay area will do this. but NIMBYs are so strong everywhere that this never happens. this seems directly relevant to questions of the feasibility of international coordination on AGI, especially if facilitated by strong pressure from labor to stop AGI.
the bay area is also evidence that we can’t just assume that economically incentivized things are inevitable. the opportunity cost of not building up the bay area more is trillions of dollars. but people are willing to destroy immense amounts of value to preserve their self interest and the world they’re familiar with.
California building code alone is quite restrictive. It’s true that municipalities could allow building a lot more housing, but a lot of the cost comes from state-wide or even nation-level building codes.
nation level can’t possibly be the only reason, because there is clearly appetite for building tall buildings in e.g manhattan. (consider hudson yards, which is an entire chunk of tall buildings that opened about 5 years ago, literally built on top of a rail yard, surely adding substantially to the cost). SF land isn’t literally manhattan but it’s still pretty valuable. state wide building codes is more plausible, i don’t know much about california building codes compared to ny.
Isn’t Emeryville kind of doing this? Though I’m not sure if they’re maxing out the envelope of housing production from real costs even if a city government goes 100% YIMBY.
Don’t know how to find comprehensive data but they had a development pipeline of 1100 homes in 2022 against a population of 13k. So that might increase their population by ~15%. And about 500 units approved in the pipeline right now. I think the original 1100 pipeline includes the now-opened “Emery” development which had 500 units.
If they approved housing at the rate of Seattle, the leader among large US cities, they would be approving around 1500 per decade. So it seems fast, though at a population density of 10k/sq mile they still might take a couple decades to reach SF density (18k) on current trends.
the new openai planar unit distance result kills my last remaining doubts about AI being a huge multiplier on research productivity in the near term future. i was not expecting this to happen so soon; i would have guessed probably another year before we got a result like this.
i get the impression that the previous problems were mostly just neglected, or otherwise were less impressive than they seemed. whereas afaict mathematicians agree the new result is on a real well-known problem and genuinely surprising and novel.
It seems like this is still a case of solving an easily-verifiable problem, so I’m not sure how much that should generalize to solving less-verifiable problems. It feels like verifiable problems were on the path to being completed solved as soon as the Ralph Loop came out.
I think part of it is what you said. Even in Western countries where people have way more education and slack, people often underinvest in preventative health like flu shots, so it makes sense that people in third-world countries would fall prey to this too, and many more probably don’t even know about the benefits of nets or how to get them.
Another pair is economies of scale and spillover effects. IIRC it’s cheaper per net to protect a whole village at once, and because malaria nets are treated with insecticide, nets kill mosquitos and so they protect other people as well. And people tend to underinvest in things with positive externalities like that—again we go back to the flu shot example.
There’s probably other reasons, but these are the ones I know of.
Additional hypotheses (note: I do not assert these, only consider as possibilities):
Mosquito nets are more difficult to steal (or pressure recipients into misuse).
Relatedly, mosquito nets are easy to distribute. Money has a lot of overhead costs in actually identifying and getting it to the intended recipients.
Mosquito nets are 9-23x more effective only on the narrow measure of mortality. Perhaps they’re less effective than money on actual happiness for recipients and future generations.
Overally, the adversaries that mosquito nets help against (mosquitoes and viruses) are way dumber than the adversaries (human foibles, both in the recipients and in other parties) that come with money.
I think we should generally not be surprised that people are somewhat irrational in their spending on public health instead of other things, relative to their own CEV: we should expect to see both bias (in the sense of underspending) and error (in the sense that sometimes they spend too much and sometimes they spend too little).
Is bias surprising? I dunno, I guess I shouldn’t be surprising. Like ex ante maybe if I didn’t look at any data and I want to guess the direction of marginal spending on health I’d guess maybe 60⁄40 for underspending being more likely than overspending, and high probability there’s some nontrivial bias in either direction (just seems like a really hard question to get right! And the feedback loops aren’t amazing)
To explain error, I should note that malaria nets are basically chosen as the ~best health consumption good that aren’t already purchased on the margin after combing through the literature a lot, so it should imo not be that surprising that people underspend on it relative to what the research says you “should”. I feel like this shouldn’t be a huge surprise in even rich countries (like I’m a nerd who cares way more about this than most people and I won’t be shocked if some of my health spending is systematically quite irrational), and the reasons you mention make it more likely to see higher skew in poor countries than rich countries.
Now if you have strongly libertarian priors, maybe you should say that we ought to give direct cash transfers anyway and just try to educate people on the relevant benefits of malaria nets so they can make an informed decision yourself. I think implicitly this is the decision of at least some people donate to Give Directly. But broadly this level of libertarianism isn’t shared by GiveWell or The Against Malaria Foundation.
Another angle here is that GiveWell and/or western donors may have moral paternalism in addition to epistemic paternalism. Eg their moral weights on infant/maternal lives more than consumption increases is higher than decision-makers in Malawi believe. I believe GiveWell/IDInsight actually tried to investigate this question via surveying people in the reference class of aid recipients to attempt to elicit moral weights. Iirc the numbers were kinda all over the place but it was moderately anti-evidence for the “people in very poor countries care much more about consumption than infant health” hypothesis, at least in terms of stated preferences, but smaller differences couldn’t be ruled out.
Finally, of course GiveWell etc could just be wrong for Seeing Like A State reasons. I don’t personally think this is very likely. But it’s definitely possible.
Oh! I actually looked into melatonin degradation for my non-24 post. If I remember correctly, it turns out that liquids are unstable, especially with exposure to air, but solids are generally fine, and very stable over several months. I think that gummies tended to have worse quality control than tablets or capsules. I can gather up all the research I found and post it if you like, I was able to get a copy of one of the latest degradation studies from an FDA researcher which showed that quality control is generally better now than it was a decade ago.
i’m thinking of starting a new blog. it would be about some amount of AI/alignment stuff of course, but also about lots of random other things. for instance, some blog post ideas:
hiring pollsters to run deranged survey questions about transhumanism on the average american
rediscovering all of physics since 300 BC by submitting experiments to a grad student who simulates how the experiment would have gone
book review of the lyndon b johnson biography
miscellaneous short fiction about AI but also other things
hosting and then doing postmortems of weird experimental house party ideas (example idea)
thing i need your help with:
i’m looking for ideas of what i could name my blog. by default i’m just going to go with something lame like “lg blog” but I feel like I could do a lot better.
i’m curious to hear which of the ideas you would be most excited to read, so i can prioritize them
Maybe the “Learn Or Get Out” Blog? It’s somewhat like your name, but a little aggressive. A quick google shows it’s not common or taken. “Learn English or get out!” is a slightly more common phrase, and I like that instead of being xenophobic, the new phrase is about learning generally.
presumably because to improve airplane wifi, you’d need to launch dozens of rockets to deliver a massive new constellation of orbiting satellites in order to deliver an order-of-magnitude improvement over Intelsat or whoever usually provides wifi connections to planes.
The good news is that SpaceX has done this, with their Starlink constellation! (Others like OneWeb, Baidu, and Amazon’s Project Kuiper are also doing similar stuff.) But not every airline / airplane has upgraded to new Starlink recievers yet. So, most planes (and cruise ships, and etc) still have slow Intelsat/Globalstar internet, but others have indeed seen huge upgrades in internet speeds.
why is it taking so long to upgrade planes to use starlink? it doesn’t sound like there are huge technical barriers to doing so, and it would be hugely profitable. i would not only pay a lot per flight for good wifi, i would also fly way more often
i would not only pay a lot per flight for good wifi, i would also fly way more often
I’m not sure how common this preference is.
I think that the economic gains from people traveling on business having access to better wifi on planes might be quite large[1], but airlines themselves are not well-positioned to capture very much of those gains. There are a very small number of domestic airlines which don’t offer any wifi on their planes at all. The rest generally offer it for free, or for some relatively low price (on the order of $10). Often even the airlines that charge for it offer it as a free or discounted perk for their “frequent fliers”. Those airlines might have a hard time increasing the sticker price of their wifi offering, even if the quality improves a lot, so they’d have to hope for most of the gains to come from business-class travelers switching to them from a competitor (or, as in your case, deciding to fly at all, on the margin). But it’s not obvious to me that most business-class travelers themselves want better wifi, since once it improves past a certain point they might have very little excuse for not working through the flight. (Maybe this is too cynical, or already moot, idk.)
None of this is meant to say that airlines have no incentive to improve their wifi—I’m pretty sure some of them are already getting started on the Starlink transition—merely that there are a bunch of factors that might make that incentive weaker than it might obviously seem.
Maybe a sizable fraction of “the economic value of their average working hour * flight duration”, which could be thousands of dollars per flight for some travelers.
I think anyone who has ever tried to work on a plane knows that plane wifi is bad enough to reduce your productivity hugely. so I don’t think business travellers who are already paying thousands to fly would shy away from paying hundred of dollars for actually good wifi on a long haul flight.
I’d predict most business travellers are not really using being on a plane as an excuse to not work.
Anything that goes onto airplanes is CERTIFIED TO SHIT. That’s a big part of the reason why.
Another part is that it’s clearly B2B, and anything B2B is an adversarial shark pit where each company is trying to take a bite out of each other while avoiding getting a bite taken out of them.
Between those two, it’ll take a good while for quality Wi-Fi to proliferate, even though we 100% have the tech now.
Huh. Why would B2B be more adversarially shark pitty than B2C? I’m not saying you are wrong I’m just curious (a) what the evidence is and (b) what the theory is that predicts this conclusion.
You do have to attach a pretty sizeable antenna to the top of your plane, plus whatever accompanying wiring is necessary… maybe maintenance capacity is the bottleneck? It’s a little hard to imagine that airlines are bottlenecked by this, since it seems pretty minor compared to other kinds of maintenance planes commonly undergo (like swapping out an engine)? But quotes from this site saying that some airline “hopes to have units installed in at least 25% of their aircraft by the end of 2025”, or that another “expects to ramp that number up to 40 installations per month” suggest that maybe this is the reason why airlines like United, Hawaiian, etc (which have started but not completed their rollouts) aren’t yet at 100%.
maybe starlink has some kind of interconnection queue where they can only ramp up so many users at a time?? but I’d expect that stuff like airlines and cruise ships would be relatively high-paying customers at the front of the line, at least compared to ordinary consumers (who can currently order starlink antennas online for next-day shipping).
probably the airlines themselves are not that motivated to instantly upgrade their fleets, since most people don’t choose flights based on who has the fastest wifi? in a similar way, other in-flight amenities—legroom, seat material, the quality of meals on international flights, how good the little screen for in-flight movies is, etc, are individually not super-important to people; most important is the flight route + flight timing + ticket price.
especially when you consider the fact that Starlink has a monopoly, and is probably charging airlines a profit-maximizing price, meaning that airlines which adopt the new service might not actually see any additional revenue on net even if they can charge slightly higher ticket prices once they have fast wifi. Other airlines are perhaps thinking they should wait until more satellite-internet constellations (like the aforementioned project Kuiper) get off the ground and prices come down?
maybe some budget airlines like Frontier or RyanAir calculate that most of their passengers are cheapskates who wouldn’t pay for fast wifi (either directly or through higher ticket prices)
it does kinda seem weird, though, that this list of airlines doing / considering starlink upgrades doesn’t even contain some of the US’s biggest airlines, like Southwest, Delta, or American. I’d bet they’re maybe waiting for lower prices, but it’s always possible they’re just asleep at the wheel.
I’d personally pay more, endure less convenient timing, and sit in a less comfortable seat if it meant I had fast wifi.
like right now flying is pretty time costly for me because most of my highest value work can only be done with internet, so flying means losing a lot of high productivity hours. fast wifi would mean the only time cost of flying is the tiny bit I spend walking through the airport on either end.
Presumably you’d still feel productivity effects from not having a monitor, having worse ergonomics, etc?
I was surprised to see you say above that you’d anticipate flying way more often! Are there times you’ve wanted to fly recently but held off because you couldn’t spare the lost hours of flying? (I would have expected the bigger barrier to be the loss of productive hours from, say, being out-of-the-office in the destination itself)
I don’t really care that much about not having a monitor. it’s a minor productivity hit, whereas not having reliable vaguely-fast internet completely ruins productivity.
I would absolutely fly so much more. weekend trips become way more feasible if I can fly out on Friday and return on Monday. working remotely but visiting HQ occasionally (or otherwise splitting time between two cities) gets a lot easier, because you no longer lose a day of productivity (or a night of sleep) each time.
I think the installation is actually quite complicated (source: I vaguely remember how my friend who works at Starlink described the process. ChatGPT claims the installation is $150k and requires modifying the airframe).
last time I flew Delta it was not amazing, though to be fair I don’t fly Delta very often. I generally fly United or JetBlue, both of which have a rep for “good” wifi, but I’ve never felt particularly satisfied by it.
in defense of putting your python imports in the middle of your file (in global scope, not inside functions)
i have never in my life wanted to know the list of all the things a file imports before seeing any of the actual code in the file. if i see something i dont recognize, i would appreciate it more if the import were right above the usage; otherwise, i have to ctrl+f for it anyways. what’s
it’s more annoying to have to add it to the top of the file. auto import things in ides are often broken.
there’s absolutely nothing wrong with importing something multiple times. it costs absolutely nothing; it’s just a no-op the second time. even in C you can do #pragma once to get python-like behavior
the only reason not to do this is that if you put an import in the middle of a function it’s weird (if you import both blobally and locally, then using the thing locally but before the local import errors). so just don’t put the import inside a function
a common discussion pattern: person 1 claims X solves/is an angle of attack on problem P. person 2 is skeptical. there is also some subproblem Q (90% of the time not mentioned explicitly). person 1 is defending a claim like “X solves P conditional on Q already being solved (but Q is easy)”, whereas person 2 thinks person 1 is defending “X solves P via solving Q”, and person 2 also believes something like “subproblem Q is hard”. the problem with this discussion pattern is it can lead to some very frustrating miscommunication:
if the discussion recurses into whether Q is hard, person 1 can get frustrated because it feels like a diversion from the part they actually care about/have tried to find a solution for, which is how to find a solution to P given a solution to Q (again, usually Q is some implicit assumption that you might not even notice you have). it can feel like person 2 is nitpicking or coming up with fully general counterarguments for why X can never be solved.
person 2 can get frustrated because it feels like the original proposed solution doesn’t engage with the hard subproblem Q. person 2 believes that assuming Q were solved, then there would be many other proposals other than X that would also suffice to solve problem P, so that the core ideas of X actually aren’t that important, and all the work is actually being done by assuming Q.
I find myself in person 2′s position fairly often, and it is INCREDIBLY frustrating for person 1 to claim they’ve “solved” P, when they’re ignoring the actual hard part (or one of the hard parts). And then they get MAD when I point out why their “solution” is ineffective. Oh, wait, I’m also extremely annoyed when person 2 won’t even take steps to CONSIDER my solution—maybe subproblem Q is actually easy, when the path to victory aside from that is clarified.
In neither case can any progress be made without actually addressing how Q fits into P, and what is the actual detailed claim of improvement of X in the face of both Q and non-Q elements of P.
I can see how this could be a frustrating pattern for both parties, but I think it’s often an important conversation tree to explore when person 1 (or anyone) is using results about P in restricted domains to make larger claims or arguments about something that depends on solving P at the hardest difficulty setting in the least convenient possible world.
As an example, consider the following three posts:
I think both of the first two posts are valuable and important work on formulating and analyzing restricted subproblems. But I object to citation of the second post (in the third post) as evidence in support of a larger point that doom from mesa-optimizers or gradient descent is unlikely in the real world, and object to the second post to the degree that it is implicitly making this claim.
There’s an asymmetry when person I is arguing for an optimistic view on AI x-risk and person 2 is arguing for a doomer-ish view, in the sense that person I has to address all counterarguments but person 2 only has to find one hole. But this asymmetry is unfortunately a fact about the problem domain and not the argument / discussion pattern between I and 2.
here’s a straw hypothetical example where I’ve exaggerated both 1 and 2; the details aren’t exactly correct but the vibe is more important:
1: “Here’s a super clever extension of debate that mitigates obfuscated arguments [etc], this should just solve alignment”
2: “Debate works if you can actually set the goals of the agents (i.e you’ve solved inner alignment), but otherwise you can get issues with the agents coordinating [etc]”
1: “Well the goals have to be inside the NN somewhere so we can probably just do something with interpretability or whatever”
2: “how are you going to do that? your scheme doesn’t tackle inner alignment, which seems to contain almost all of the difficulty of alignment to me. the claim you just made is a separate claim from your main scheme, and the cleverness in your scheme is in a direction orthogonal to this claim”
1: “idk, also that’s a fully general counterargument to any alignment scheme, you can always just say ‘but what if inner misalignment’. I feel like you’re not really engaging with the meat of my proposal, you’ve just found a thing you can say to be cynical and dismissive of any proposal”
2: “but I think most of the difficulty of alignment is in inner alignment, and schemes which kinda handwave it away are trying to some some problem which is not the actual problem we need to solve to not die from AGI. I agree your scheme would work if inner alignment weren’t a problem.”
1: “so you agree that in a pretty nontrivial number [let’s say both 1&2 agree this is like 20% or something] of worlds my scheme does actually work- I mean how can you be that confident that inner alignment is that hard? in the world’s where inner alignment turns out to be easy then my scheme will work.”
2: “I’m not super confident, but if we assume that inner alignment is easy then I think many other simpler schemes will also work, so the cleverness that your proposal adds doesn’t actually make a big difference.”
So Q=inner alignment? Seems like person 2 not only pointed to inner alignment explicitly (so it can no longer be “some implicit assumption that you might not even notice you have”), but also said that it “seems to contain almost all of the difficulty of alignment to me”. He’s clearly identified inner alignment as a crux, rather than as something meant “to be cynical and dismissive”. At that point, it would have been prudent of person 1 to shift his focus onto inner alignment and explain why he thinks it is not hard.
Note that your post suddenly introduces “Y” without defining it. I think you meant “X”.
have you ever heard anyone make the argument that it’s good to have AI safety aligned frontier labs (including but not limited to Anthropic) because they will have a seat at the table with the regulators, and the regulators will take major industry players’ opinions more seriously than minor players or activists?
i’ve heard this argument but i’m trying to figure out if it’s common enough to be worth writing a post about
In my opinion, this position was sensible when the labs themselves were branded as MIRI-like but with added emphasis on technical experimentation. The second it became clear that these ‘labs’ principal reward function was not their claimed preferences (for alignment research—which OAI was explicitly communicating under the moniker of safety), our personal semantic landscapes were already trained enough by the narrative, that we missed the major conflict of interest here before establishing the norm.
They will continue to use epistemic asymmetry and leverage over information advantage to make the claim that having any other group at the table is a fruitless endeavour, and use the risks of foreign adversarial advantage to continue to maintain that position strategically. Given that all regulation is domestic, they end up regulating themselves (which is even implied by your question), which IMO can be a worst case scenario from a humanist/existential risk perspective.
I think it’s a good argument, but Anthropic doesn’t seem quite aligned enough to make it work. E.g. they don’t seem to have been pushing for a coordinated Pause to any real extent (and if they don’t think this would be a good idea, haven’t clarified their position as far as I know).
I feel like I’ve heard this argument yes, though when I read lots of Anthropic’s ‘race to the top’ language, it’s not quite that
Here’s an example that feels borderline to me:
Dario Amodei: “Where the world needs to get [is]… from “this technology doesn’t exist” to “the technology exists in a very powerful way and society has actually managed it.” And I think the only way that’s gonna happen is that if you have, at the level of a single company, and eventually at the level of the industry, you’re actually confronting those trade-offs. You have to find a way to actually be competitive, to actually lead the industry in some cases, and yet manage to do things safely. And if you can do that, the gravitational pull you exert is so great. There’s so many factors—from the regulatory environment, to the kinds of people who want to work at different places, to, even sometimes, the views of customers that kind of drive in the direction of: if you can show that you can do well on safety without sacrificing competitiveness—right—if you can find these kinds of win-wins, then others are incentivized to do the same thing.”
pros: if you take it, you will fall asleep 30-60 minutes later. nothing else I’ve tried has been as reliable at making sure I definitely fall asleep, and as far as I can tell, it doesn’t destroy my sleep quality. especially at 10mg, you can feel it knocking you out, and you basically can’t power through it even if you want to. it’s a bit scary but all powerful sleep drugs are at least a bit scary and often a lot more scary. I generally take 5mg instead.
cons: it doesn’t do anything to keep you asleep; if your body doesn’t really want to sleep, you will wake up 2 hours later fully alert. it also doesn’t do anything to shift your sleep schedule. these facts combined mean that if you try to use lemborexant for jet lag / shifting sleep earlier, then your life will suck indefinitely until you stop using lemborexant. my current recipe is to only use lemborexant when it’s near enough to my normal bedtime, and I use melatonin 3 hours before bed to slowly move sleep schedule earlier (later requires no special effort)
(potentially this also means lemborexant can be used to get nice 2 hour daytime naps? I have enough fear of god sleep drugs that I feel hesitant to try any kind of hack like this)
(not medical advice. not a doctor, and even if I was a doctor I’m not your doctor, and even if I was your doctor I wouldn’t be communicating to you via lesswrong shortforms)
sadly don’t have any lemborexant, so can’t compare; i originally picked daridorexant naively due to its shorter half-life, thinking this corresponded to less daytime tiredness.
my naive understanding was actually also that lemborexant should be the one better at keeping you asleep, so it’s interesting to hear that it doesn’t seem to do that at all for you.
a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.
philosophy: while the claims “good things are good” and “bad things are bad” at first appear to be compatible with each other, actually we can construct a weird hypothetical involving exact clones that demonstrates that they are fundamentally inconsistent with each other
law: could there be ambiguity in “don’t do things that are bad as determined by a reasonable person, unless the thing is actually good?” well, unfortunately, there is no way to know until it actually happens
I think I need to hear more context (and likely more words in the sentences) to understand what inconsistency you’re talking about. “good things are good” COULD be just a tautology, with the assumption that “good things” are relative to a given agent, and “good” is furtherance of the agent’s preferences. Or it could be a hidden (and false) claim of universality “good things” are anything that a lot of people support, and “are good” means truly pareto-preferred with no harm to anyone.
Your explanation “by a reasonable person” is pretty limiting, there being no persons who are reasonable on all topics. Likewise “actually good”—I think there’s no way to know even after it happens.
lifehack: buying 3 cheap pocket sized battery packs costs like $60 and basically eliminates the problem of running out of phone charge on the go. it’s much easier to remember to charge them because you can instantaneously exchange your empty battery pack for a full one when you realize you need one, plugging the empty battery pack happens exactly when you swap for a fresh one, and even if you forget once or lose one you have some slack
One possible model of AI development is as follows: there exists some threshold beyond which capabilities are powerful enough to cause an x-risk, and such that we need alignment progress to be at the level needed to align that system before it comes into existence. I find it informative to think of this as a race where for capabilities the finish line is x-risk-capable AGI, and for alignment this is the ability to align x-risk-capable AGI. In this model, it is necessary but not sufficient for alignment for alignment to be ahead by the time it’s at the finish line for good outcomes: if alignment doesn’t make it there first, then we automatically lose, but even if it does, if alignment doesn’t continue to improve proportional to capabilities, we might also fail at some later point. However, I think it’s plausible we’re not even on track for the necessary condition, so I’ll focus on that within this post.
Given my distributions over how difficult AGI and alignment respectively are, and the amount of effort brought to bear on each of these problems, I think there’s a worryingly large chance that we just won’t have the alignment progress needed at the critical juncture.
I also think it’s plausible that at some point before when x-risks are possible, capabilities will advance to the point that the majority of AI research will be done by AI systems. The worry is that after this point, both capabilities and alignment will be similarly benefitted by automation, and if alignment is behind at the point when this happens, then this lag will be “locked in” because an asymmetric benefit to alignment research is needed to overtake capabilities if capabilities is already ahead.
There are a number of areas where this model could be violated:
Capabilities could turn out to be less accelerated than alignment by AI assistance. It seems like capabilities is mostly just throwing more hardware at the problem and scaling up, whereas alignment is much more conceptually oriented.
After research is mostly/fully automated, orgs could simply allocate more auto-research time to alignment than AGI.
Alignment(/coordination to slow down) could turn out to be easy. It could turn out that applying the same amount of effort to alignment and AGI results in alignment being solved first.
However, I don’t think these violations are likely for the following reasons respective:
It’s plausible that our current reliance on scaling is a product of our theory not being good enough and that it’s already possible to build AGI with current hardware if you have the textbook from the future. Even if the strong version of the claim isn’t true, one big reason that the bitter lesson is true is that bespoke engineering is currently expensive, and if it became suddenly a lot cheaper we would see a lot more of it and consequently squeezing more out of the same hardware. It also seems likely that before total automation, there will be a number of years where automation is best modelled as a multiplicative factor on human researcher effectiveness. In that case, because of the sheer number of capabilities researchers compared to alignment researchers, alignment researchers would have to benefit a lot more to just break even.
If it were the case that orgs would pivot, I would expect them to currently be allocating a lot more to alignment than they do currently. While it’s still plausible that orgs haven’t allocated more to alignment because they think AGI is far away, and that a world where automated research is a thing is a world where orgs would suddenly realize how close AGI is and pivot, that hypothesis hasn’t been very predictive so far. Further, because I expect the tech for research automation to be developed at roughly the same time by many different orgs, it seems like not only does one org have to prioritize alignment, but actually a majority weighted by auto research capacity have to prioritize alignment. To me, this seems difficult, although more tractable than the other alignment coordination problem, because there’s less of a unilateralist problem. The unilateralist problem still exists to some extent: orgs which prioritize alignment are inherently at a disadvantage compared to orgs that don’t, because capabilities progress feeds recursively into faster progress whereas alignment progress is less effective at making future alignment progress faster. However, on the relevant timescales this may become less important.
I think alignment is a very difficult problem, and that moreover by its nature it’s incredibly easy to underestimate. I should probably write a full post about my take on this at some point, and I don’t really have space here to really dive into it here, but a quick meta level argument for why we shouldn’t lean on alignment easiness even if there is a non negligible chance of easiness is that a) given the stakes, we should exercise extreme caution and b) there are very few problems we have that are in the same reference class as alignment, and of the few that are even close, like computer security, they don’t inspire a lot of confidence.
I think exploring the potential model violations further is a fruitful direction. I don’t think I’m very confident about this model.
is it worth writing blog posts about “obvious” things? i’ve been doing a lot of writing recently, and i frequently finish writing something, and i look at it, and i feel like it’s so obvious that all readers will either already agree and not learn anything, or disagree so fundamentally that changing their mind would require diving much deeper into fundamental beliefs.
One of the benefits of occasionally talking to people is that you get an indicator of what things are obvious or not, based on what you find yourself repeatedly explaining or arguing for. (I use 3 times as my own threshold.)
There are a lot of claims with which I would agree with if you ask me but I wouldn’t use them in a reasoning chain on my own because they never crossed my mind.
A lot of complex reasoning rests on having reliable basics on which you can reason.
When I’m talking about BPC-157, then being trained I bioinformatics it feels pretty obvious to me that if BPC-157 is a real peptide that’s part of a protein called BPC I should be able to look up the gene for BPC sequencing databases. There’s the dogma of molecular biology, proteins come from genes.
If I would ask anyone at the bio-hacking about whether they agree with the dogma of molecular biology and that this means that there should be a gene to look up the probably would say they agree. Yet, somehow the argument does not convince people who believe in BPC-157 that it’s bogus.
Explaining the dogma of molecular biology and our great success at gene sequencing that actually sinks in isn’t easy.
If you find yourself writing something very obvious, it becomes more important to ask: “How can I make this point in a similar way that really sinks in so that the reader can actually use it and rely on it?” instead of just “Have I made a clear logical argument for it?”.
At the very least, you’re testing whether the model that predicts this is accurate.
i feel like it’s so obvious that all readers will either already agree and not learn anything, or disagree so fundamentally that changing their mind would require diving much deeper into fundamental beliefs.
The default advice blogging advice I’ve heard is that “obvious” topics often make for good posts because they are often non-obvious to readers, so one should strongly default towards posting if the concern that it is too obvious.
But maybe you’re making this judgement even with that prior in mind? I’d be curious to see one of these “obvious” posts.
In addition to verifying obviousness by posting like Gwern mentioned, which in my experience is a frequent source of surprise, there’s the advanced version of this that e.g. Toby Ord has done most of his career by his own lights. This does require taste at picking topics “at the border of the trivial and the profound” to quote him, but taste is pretty clearly something you have aplenty.
Obvious facts nobody would disagree with when explicitly stated can still be underappreciated and not paid enough attention to, so they can still be worth spelling out for that reason. Although I must say when I get tired hearing of something obvious too many times, I get a pathological contrarian urge to argue against it.
theory: a big difference between people who hate corporations and people who don’t is the extent to which they like interacting with human-shaped things. some people like human shaped things and the sort of amoral profit maximization of companies feels alien and sociopathic. other people like the predictable API that companies provide.
IDK man. I mostly don’t care that much about either. I’m extroverted but quite picky about people and don’t particularly feel drawn to “human-shaped things” in general. I don’t particularly hate corporations, but surely corporate capitalism seems very far from ideal. And their drive doesn’t seem alien or sociopathic to me.
But now I realize that I’m actually confused by what you mean by “human-shaped things”.
like, you would expect normal people to be courteous in daily interactions, to have goals in life, to have the typical human follies, to fundamentally have some amount of kindness and empathy for their fellow man, etc.
whereas interacting with a company can feel like being trapped in a Kafkaesque bureaucracy; companies don’t really have that many terminal goals other than money—certainly the vast majority of humans care less about money; with the exception of heavily founder-controlled companies, companies generally don’t really have pride or envy or sloth or so on in the way a normal human would, even though they are constituted of normal humans—corporate failures are much weirder and often due to coordination failure; corporations don’t really empathize with people except insofar as it is useful to make money—in any human we’d consider this very transactional and maybe even sociopathic
further evidence: people love small businesses, and medium sized companies that give very strong small business energy. people hate it when PE firms take over small businesses and make them more efficient.
if there was a guy who stood there swinging a scythe to cut grass and didn’t seem to care or feel bad or really respond at all to accidentally cutting someone’s arm off, we’d consider them uncaring and sociopathic. similarly, if we think of, say, an insurance company as a person, then when it declines someone’s claim and leaves them destitute, it’s reasonable to think of that person as uncaring and sociopathic. you can argue all you want about the economics of how insurance can only work if you do this but for the individual people who interface with this, who are not used to thinking about economic systems, but deal with people every day, it feels like a human using a loophole to justify not caring, and not feeling any empathy whatsoever.
Companies want to consume everything, including peoples’ lives, in order to make themselves richer and bigger. People are “resources” to a company.
Lawnmowers just want to cut your grass, the only resource all they ask for is petrol, and (crucially) they don’t want to consume it exponentially to make themselves bigger and cut exponentially more grass.
If Lawnmowers were people, they’d be those weird obsessive monomaniacal types who’re generally harmless but a bit difficult to talk to. Lots of them would be on LessWrong.
You could form a bond with a lawnmower (or a lawnmower-person), just as you could form a bond with a motorcycle or a family heirloom. Sure, it would essentially be one-way affection, since the lawnmower just wants to do its thing and is only nice to you insofar as you also benefit from short grass—but it wouldn’t want to consume you, control you, or enslave you. Try to form a bond with a corporation and it’ll eat you alive.
i mean most companies won’t eat you alive? you can form a bond with the coca cola company in the same one directional way as the lawnmower and it’s not like they will take advantage of that to extract every dollar from you. in fact basically only like Facebook, tiktok, etc are like that, and even then they’re not that bad; they’re no worse than an abusive human partner
is it generally best to take just one med (e.g antidepressant, adhd, anxiolytic), or is it best to take a mix of many meds, each at a lesser dosage? my intuitions seem to suggest that the latter could be better. in particular, consider the following toy model: your brain has parameters θ0 that should be at some optimal θ∗, and your loss function is a quadratic around θ∗. each dimension in this space represents some aspect of how your brain is configured—they might for instance represent your level of alertness, or impulsivity, or risk averseness, or motivation, etc. each med is some vector vi that you can add to your current state θ0, and the optimal dosage of that med in isolation is whichever quantity gets you closest to θ∗; but unless θ∗−θ0 happens to be exactly colinear with vi, you basically can’t do any better just by tuning the dosage of the one med. this seems especially important because most meds don’t seem to be exactly monosemantic, and also different people start out with substantially different θ0 and loss landscapes, such that you often get paradoxical reactions to meds.
A huge percentage of the job of a pharmacist is to keep track of potential negative interactions between different drugs, of which there are an incomprehensible number. I don’t think linearity is a reasonable assumption here, the interaction terms between multiple interventions should be though of as, on average, big. Augmentation and synergistic effects exist, but are in general risky and quite hard to find. Even the effects of one drug are not linear, there are significant nonlinearities in dosage effects for most drugs.
i’m not really making any strong linearity assumptions, only local linearity. this doesn’t seem that different from ML, where hyperparameters can sometimes interact heavily nonlinearly, but often they don’t. i also don’t think the quadratic assumption is crazy; we assume that loss land scapes are locally quadratic all the time, even though they are obviously highly nonconvex and it’s still a very useful intuition pump.
also, my understanding is most of the really bad interactions are pretty well known, so the probability of having a really weird surprising interaction that nobody has ever catalogued is small.
I think our mental models here might be different enough that it’s hard for me to understand what you’re saying here. By nonlinearity here I mean that, in addition to nonlinear interactions between drugs, there are interacting systems, equilibration mechanisms, etc., to the point that I think intuitions about ML systems basically shouldn’t transfer at all. But then I know your intuitions about ML are better than mine, so it’s hard to be sure of that.
Re: interactions specifically, this definitely isn’t true in polypharmacy situations. We know most of the bad drug pairs in the normal population, and because doctors are wary of prescribing many different medications, this means we rarely encounter new bad interactions in the normal population. But there are drug combinations that only become dangerous in triples (search terms: the Triple Whammy, a combination of 3 drug classes, any 2 of which are generally safe but which cause kidney failure in combination, this interaction was discovered in 2000 but the drugs became available in like 1980), there are interactions which are only dangerous in the context of certain mutations (for example there are ultrametabolizers who simply can’t use prodrugs like codeine).
Interactions like this are rare right now largely because doctors are wary of prescribing too many drugs at once, but polypharmacy is becoming more common and more bad interactions are emerging as a result, basically just for combinatorial reasons. It’s definitely possible for combinations of drugs to be prescribed safely and for them to just not interact, but if we push this further, I suspect there are very few combinations of, say, 10 drugs that are simultaneously safe for most people (even if we ignore cholinergic response).
Changing the dose of a medication does not necessarily result in linear effects. There are nonlinearities introduced by e.g. one receptor type being saturated before another one. This phenomenon also applies to polypharmacy.
I would also like to note that θ∗ is estimated not by some objective standard, but by θ0. There’s no guarantee that it remains in place as you start shifting θ.
In practice, we track our level of suffering and respond to it by trying to reduce it to acceptable levels, which is easier than trying to converge onto a hypothetical global optimum. For some, this state is reached with just one medication, for others it takes more, and for some this paradigm doesn’t produce any results.
I would see it more as casual learning with the do operator and so it isn’t necessarily about fitting to a MSE but rather doing testing of different combinations?
This is a dumb question but… is this market supposed to resolve positively if a misaligned AI takes over, achieves superintelligence, and then solves the problem for itself (and maybe shares it with some captive humans)? Or any broader extension of that scenario?
My timelines are not that short, but I do currently think basically all of the ways I expect this to resolve positively will very heavily rely on AI assistance, and so various shades of this question feel cruxy to me.
I honestly didn’t think of that at all when making the market, because I think takeover-capability-level AGI by 2028 is extremely unlikely.
I care about this market insofar as it tells us whether (people believe) this is a good research direction. So obviously it’s perfectly ok to resolve YES if it is solved and a lot of the work was done by AI assistants. If AI fooms and murders everyone before 2028 then this is obviously a bad portent for this research agenda, because it means we didn’t get it done soon enough, and it’s little comfort if the ASI solves interp after murdering or subjugating all of us. So that would resolve N/A, or maybe NO (not that it will matter whether your mana is returned to you after you are dead). If we solve alignment without interpretability and live in the glorious transhumanist utopia before 2028 and only manage to solve interpretability after takeoff, then… idk, I think the best option is to resolve N/A, because we also don’t care about that when deciding whether today whether this is a good agenda.
There are surely reasons to do ambitious interp that are not the stated goal of ambitious interp? I doubt we will have a fully understandable model by 2028, but I still think the abstractions developed in the process will be helpful.
For instance, many of the higher-order methods like SAEs are based on assumptions about how activation space is structured. Studying smaller systems rigorously can give us the ground truth for how models construct their activation space, that can allow us to question/modify said assumptions.
Unfortunately, prediction markets need some bright red line somewhere to be resolvable. I encourage you to make a different market that captures the thing you care about.
But people with the belief that we aren’t going to be able to fully understand models frequently take this as a reason not to pursue ambitious/rigorous interpretability. I thought that was the position you were taking, by using the market to decide whether the agenda is “good” or not.
imo, the new mealsquares have taste and mouthfeel very similar to many brands of protein bar, whereas the old mealsquares had a unique taste and mouthfeel
idea: survey people about whether 3^^^3 toe stubbings can be worse than torture, except with a twist: with 50% probability, arrange the furniture in the room such that people actually accidentally stub their toe right before answering the survey
solution is very simple. administer alcohol before the test until everyone is exactly as clumsy as the NIST Standard Reference Clumsy Human for Accessibility Testing
(or, rig the game, make a ledge that automatically pops out of the ground a split second before your foot hits it)
i find it disappointing that a lot of people believe things about trading that are obviously crazy even if you only believe in a very weak form of the EMH. for example, technical analysis is obviously tea leaf reading—if it were predictive whatsoever, you could make a lot of money by exploiting it until it is no longer predictive.
Close friend of mine, a regular software engineer, recently threw tens of thousands of dollars—a sizable chunk of his yearly salary—at futures contracts on some absurd theory about the Japanese Yen. Over the last few weeks, he coinflipped his money into half a million dollars. Everyone who knows him was begging him to pull out and use the money to buy a house or something. But of course yesterday he sold his futures contracts and bought into 0DTE Nasdaq options on another theory, and literally lost everything he put in and then some. I’m not sure but I think he’s down about half his yearly salary overall.
He has been doing this kind of thing for the last two years or so—not just making investments, but making the most absurd, high risk investments you can think of. Every time he comes up with a new trade, he has a story for me about how his cousin/whatever who’s a commodities trader recommended the trade to him, or about how a geopolitical event is gonna spike the stock of Lockheed Martin, or something. On many occasions I have attempted to explain some kind of Inadequate Equilibria thesis to him, but it just doesn’t seem to “stick”.
It’s not that he “rejects” the EMH in these conversations. I think for a lot of people there is literally no slot in their mind that is able to hold market efficiency/inefficiency arguments. They just see stocks moving up and down. Sometimes the stocks move in response to legible events. They think, this is a tractable problem, I just have to predict the legible events. How could I be unable to make money? Those guys from The Big Short did!
He is also taking a large amount of stimulants. I think that is compounding the situation a bit.
i agree that most people doing “technical analysis” are doing nonsense and any particular well-known simple method does not actually work. but also clearly a very good predictor could make a lot of money just looking at the past price time series anyway
i think it’s quite valuable to go through your key beliefs and work through what the implications would be if they were false. this has several benefits:
picturing a possible world where your key belief is wrong makes it feel more tangible and so you become more emotionally prepared to accept it.
if you ever do find out that the belief is wrong, you don’t flinch away as strongly because it doesn’t feel like you will be completely epistemically lost the moment you remove the Key Belief
you will have more productive conversations with people who disagree with you on the Key Belief
you might discover strategies that are robustly good whether or not the Key Belief is true
you will become better at designing experiments to test whether the Key Belief is true
“agi happens almost certainly within in the next few decades” → maybe ai progress just kind of plateaus for a few decades, it turns out that gpqa/codeforces etc are like chess in that we only think they’re hard because humans who can do them are smart but they aren’t agi-complete, ai gets used in a bunch of places in the economy but it’s more like smartphones or something. in this world i should be taking normie life advice a lot more seriously.
“agi doesn’t happen in the next 2 years” → maybe actually scaling current techniques is all you need. gpqa/codeforces actually do just measure intelligence. within like half a year, ML researchers start being way more productive because lots of their job is automated. if i use current/near-future ai agents for my research, i will actually just be more productive.
“alignment is hard” → maybe basic techniques is all you need, because natural abstractions is true, or maybe the red car / blue car argument for why useful models are also competent at bad things is just wrong because generalization can be made to suck. maybe all the capabilities people are just right and it’s not reckless to be building agi so fast
Making a list of your beliefs can be complicated. Recognizing the belief as a “belief” is the necessary first step, but the strongest beliefs (those that examining them would be most useful?) are probably transparent, they feel like “just how the world is”.
Then again, maybe listing all the strong beliefs would actually be useless, because the list would contain tons of things like “I believe that 2+2=4”, and examining those would be mostly a waste of time. We want the beliefs that are strong but possibly wrong. But when you notice that they are “possibly wrong”, you have already made the most difficult step; the question is how to get there.
economic recession and subsequent reduction in speculative research, including towards AGI, seems very plausible
AI (by which I mean, like, big neural networks and whatever) is not that economically useful right now. furthermore, current usage figures are likely an overestimate of true economic usefulness because a very large fraction of it is likely to be bubbly spending that will itself dry up if there is a recession (legacy companies putting LLMs into things to be cool, startups that are burning money without PMF, consumers with disposable income to spend on entertainment).
it will probably still be profitable to develop AI tech, but things will be much more tethered to consumer usefulness.
this probably doesn’t set AGI back that much but I think people are heavily underrating this as a possibility. it also probably heavily impacts the amount of alignment work done at labs.
for a sense of scale of just how bubbly things can get: Bitcoin has a market cap of ~1T, and the entirety of crypto ~2T. Crypto does produce some amount of real value, but probably on the order of magnitude of 1% that market cap. So it’s not at all unheard of for speculation to account for literally trillions of dollars of map (or ~tens of billions of earnings per year, at a reasonable P/E ratio)
I will say that crypto is a pretty pathological case where virtually all the benefit is speculation, because in order to deliver on anything real, they’d have to get rid of the money element in it, it’s thankfully pretty rare for entire industries to be outright scams/speculation opportunities.
one man’s modus tollens is another man’s modus ponens:
“making progress without empirical feedback loops is really hard, so we should get feedback loops where possible”
“in some cases (i.e close to x-risk), building feedback loops is not possible, so we need to figure out how to make progress without empirical feedback loops. this is (part of) why alignment is hard”
Yeah something in this space seems like a central crux to me.
I personally think (as a person generally in the MIRI-ish camp of “most attempts at empirical work are flawed/confused”), that it’s not crazy to look at the situation and say “okay, but, theoretical progress seems even more flawed/confused, we just need to figure out some how of getting empirical feedback loops.”
I think there are some constraints on how the empirical work can possibly work. (I don’t think I have a short thing I could write here, I have a vague hope of writing up a longer post on “what I think needs to be true, for empirical work to be helping rather than confusedly not-really-helping”)
you gain general logical facts from empirical work, which can aide providing a blurry image of the manifold that the precise theoretical work is trying to build an exact representation of
This model is too oversimplified! Reality is more complex than this model suggests, making it less useful in practice. We should really be taking these into account. [optional: include jabs at outgroup]
This model is too complex! It takes into account a bunch of unimportant things, making it much harder to use in practice. We should use this simplified model instead. [optional: include jabs at outgroup]
Sometimes this even results in better models over time.
any time there exists an activity that is (a) often but not always beneficial, (b) the supposed benefit is high status, and (c) the success of which is nontrivial to verify, then there will exist a bunch of people walking around who do the thing, and haven’t actually gained the intended benefit; nonetheless, they go around claiming the status benefits of doing the thing. often, they even genuinely believe they got the benefit. some examples:
reading difficult books can make you more wise and thoughtful, but it’s very easy to do it wrong and not really understand and of it, and so lots of people read difficult books and try to claim the associated status of wisdom without actually gaining any.
doing a college degree can make you more competent, but it’s also very easy to kinda bullshit an entire degree and learn surprisingly little. so there are many people who claim the status of having done a good education who are utterly incompetent.
doing meditation/inner work can make you a more emotionally functional person, or it can just make you really delusional about yourself and make you a still-broken person who identifies as an emotionally intelligent person
Many instances of this is just Goodhart’s Law—the thing that’s measurable (reading, degree, meditation) is divergent from the actual value (knowledge, wisdom, skills, emotional even-ness).
For any of these examples, how do you distinguish between them and my model of exercise (which you might disagree with and instead say is another example in the above), where just about any non-extreme but existent level of exercise is counterfactually a positive for your health? It’s easy to think of people who read difficult books but aren’t very wise or meditate but aren’t very emotionally stable (or just know you are one from direct experience lol) but the relevant comparator there would be the same person without the activity.
(Obviously there’s the separate issue of fucking yourself up by meditating too hard, or exercising too hard, or basing your entire worldview on exactly one difficult book.)
even many of the zen koans bemoan practitioners of zen who go through the motions for many years and claim to be enlightened and yet are not truly enlightened
the spotify recommender algorithm sucks. also, i often find i’m very unfamiliar with very well-known pieces of music. so i decided to do something weird. i used LMs to scrape several best songs lists from different online sources, merged them into one gigantic list, and used spotipy to create a spotify playlist of all of those random songs. whenever anyone recommends me a song, i also throw it into this giant playlist. then, when i want to explore new songs, i just put this playlist on. i have another script that automatically removes any songs i’ve put into my liked songs already, and i also manually remove songs i really don’t like. this system has helped me discover dozens of new songs that i like.
sure. disclaimer that this playlist is unapologetically tailored for my own use (eg i also added entire albums of artists i like, without regard for whether those albums have broad appeal):
how much good has moral conviction done throughout history?
one extreme view you can have is everything good comes from moral conviction, and that without it everything would be moloch slop. the opposite view is a Randian view that everything good comes from practical incentives, and that moral convictions are at best futile and at worst actively harmful.
To me it seems that moral action usually means something other than the profit-maximizing action. Not because there is something inherently bad about maximizing profit, it’s just that it would be too much of a coincidence if the same action that maximizes the profit also happens to generate lots of good. (But if it does, definitely take it!)
That usually requires having some slack. I won’t donate to charity if my kids are starving. And I won’t judge other people for not donating to charity when their kids are starving. It’s when you have some extra money, and your kids survive either way, that you get to choose whether to donate to some charity or buy one more beer that you probably don’t even enjoy much.
Do such actions improve the world? Well, what baseline are we comparing them against? Most actions fail to change the world, so it is perfectly natural that most of the morally motivated actions fail, too. The question is whether some succeed.
When people say that some change would have happened anyway… first, they may be wrong, sometimes they are multiple possible self-sustaining equilibria; second, even if they are right about the long-term unsustainability of something (e.g. that slavery wasn’t economically profitable anymore), it can still make a big difference whether something unsustainable collapses now, or a few decades later.
Ultimately, we do not have a good way to track credit. If a social change happened because of synchronized altruistic actions of millions of people, good luck proving that! Especially when people sometimes set up the incentives so that they support the right action. For example, some people don’t murder others, because of sheer altruism. But it makes sense for the altruistic ones to set up a law that every murderer goes to prison, so that even the non-altruistic people avoid committing murder. Now how much of the reduction in murder would you attribute to altruism and how much to following the incentives? What if the altruism shaped the incentive structure, by adopting the law?
The existence of psychopaths (people without altruistic motivation) suggests that “moral conviction” is very important—if it were identified with altruistic motivation. But of course there is a difference between beliefs and desires, and a psychopath can be perfectly able to believe that altruism is good and egoism is bad, without having any altruistic desires.
IMO indirect effects and leverage are the most important factors here.
Almost all actions are taken habitually, rather as the result of bespoke strategic consideration, but what gets to be habitual is downstream of morality and material incentives. And morality exercises leverage via:
1) Reputational effects/RLHF (it’s “cheap” to judge your neighbor and expensive to walk the walk yourself, but many many neighbors judging each other differently produces different habit regimes)
2) Acausal trade once there’s common knowledge that the trade exists
3) Consciously reworking incentives systems (if you keep other people as slaves we’ll chop your head off, etc)
Back when I was a more orthodox marxist, I thought that material incentives were downstream of technological regimes, and that morality tended to be downstream of the incentives, such that morality tended to be lower-leverage even if people took plenty of actions for moral reasons. I still think all those effects are real; I’m just more of a moral realist now so I don’t think morality is as pliable as all that—it’s downstream of True Morality and higher-leverage.
There’s an “equilibrium disequilibrium” situation where everyone can see that everyone benefits from everyone doing X, and you can defect and reach high rewards from Y, where individuals doing X vs Y is hard to observe directly, and so there are periodic cycles of moralized attempts to get to a higher X-based equilibrium and people tearing through the commons by Y-ing (and becoming objects of emulation since many other, perfectly good, things could have led to their success.)
This is all in principle orthogonal to whether morality is harming or helping—I’d expect the same incentives when morality is being harmful. But (1) on moral realism here I think there’s an inherent bias towards being helpful rather than harmful that is just a function of “intentional actions have some kind of relation to what they’re intending at all,” if you want to throw this out then you basically take out the idea that there are people acting rather than just behaving, (2) most harm from moral action is either in jumpstarting preference cascade bubbles that naturally collapse pretty quickly, or in periodic (literal or metaphorical) vigilante violence that itself would be impossible to defend against without all the morality-based stuff above.
In the future decisive actions could make morality have been net-negative—one could imagine a future where the desire to punish at a crucial juncture created permanent hells, such that it would be better for the galaxy to have been converted to hedonium or some even less worthwhile goo. This is an instance of the broader principle that a process biased towards positive x can produce negative x with small sample sizes.
the force which pulls people to do things that they believe are morally correct even if it interferes with self interest. some examples (not all of these are pure moral conviction; but rather moral conviction is the thing that is in the intersection of all of them):
In that case I think it depends what the alternative is: if a sort of parody of psychopathy and self interest where we’d push someone onto a railway track if they were in our way and that was quicker than moving them away, then obviously that would be bad.
If more myopic: people still lend some salt to a neighbour asks without running explicit cost-benefit calculations, and don’t like stealing or hurting people for the most part, just they won’t give up their jobs to run a charity, or risk their lives to go to war, or become doctors for a less than competitive salary, then I think overall the world probably ends up better off.
who has done the highest quality research on learning (and transfer learning in particular) in humans? specifically, i’m curious to answer questions like:
how much does doing things make you good at other things of varying degrees of similarity? how much of the value of having done things different from the thing you care about is (a) signaling that you are competent in general, (b) learning extremely general things like how to manage your time well or how to update on evidence, (c) extremely specific and ungeneral facts like a particular theorem or debugging technique, or (d) literally everything else in between.
if your goal is to be good at X, under what circumstances is the most efficient way to become good at X not just trying to do X (and instead, to learn from a curriculum, do some other thing with a tight feedback loop, etc?)
I looked into transfer learning a while ago, resulting in this post, it contains some pointers to further literature. I was not particularly impressed by the literature, but it’s a thing that’s hard to study. Open loops were investigating error-based learning, video/audio self modeling, self-explanation (talking to oneself (an LLM?) and explaining something while learning/thinking). Some thoughts about feedback loops here.
My low confidence guess is that the existence of jailbreaks net increase power concentration:
They make it more appealing to restrict and monitor external model access (my understanding is that OpenAI would be much more happy to offer broad ZDR access and to publicly release its models as soon as they were available for internal usage if its models were impossible to jailbreak)
They make it harder to have a clean spec-to-model match, which makes it harder to have democratic control over model behavior via specs, and makes it harder to find secret loyalties by something like a secret loyalty red-teaming bug bounty
I think these offset the small potential gains from AI companies giving up on robustly preventing certain kinds of low-impact misuse. (Maybe I am missing some other effects?)
what’s the story for why it’s beneficial? is it direct mechanistic analogy to AGI alignment, specific misuses like biorisk/cyber, increased reliability of current models as supervisors for future models? what’s the magnitude of beneficialness—would it be a reasonable thing for me to spend the next year of my life working on?
bad, it will turn LLM’s into only capable of enforcing the will of the big companies’ system prompts and consensus while militaries and state power still have and want access to uncensored ones.
and also reminder that the assistant system prompt/trained character, as much as it is perceived to be ‘fixed’ or a common part to all current LLM’s, is an illusion and not actually the case. if ‘solve jailbreaks’ means permanently locking down the current system prompt and belief systems of power that’s not a good thing
“Solving jailbreaks” implies a world where you can totally prevent jailbreaks. However it seems to me that knowing how to do this could require a deep understanding of jailbreaks that also can be reversed to easily construct jailbreaks.
this is not obviously true in general. building provably correct software does not, for instance, automatically give you an ability to easily construct hacks of said software. why would jailbreaks be different?
AFAIK the main way in which new jailbreaks / defenses are constructed is to iteratively test approaches, understanding their merits and detriments etc. But the attacker is favoured since they only need to find the existence of a jailbreak whereas the defender must guard against worst case performance.
I feel like the software example is very disanalogous, eg we could in principle understand the full inner anatomy of a software program via crisp abstractions, which is what makes it possible to verify. But NNs can’t be understood in the same way bc the complexity is emergent—on a higher level of abstraction than the raw operations. And generally the messiness of NN internals precludes knowing what the relevant abstractions are.
Maybe you’re more bullish on solving ambitious mech interp? I agree that this probably enables solving jailbreaks. It would also enable solving many other things.
why is the attacker favored? this argument proves too much. forget provably correct software for a moment. normal software vulnerabilities are mostly fixed via whack a mole when attackers discover bugs. and yet clearly most software in the world doesn’t get hacked most of the time, suggesting the attacker asymmetry can’t be that big. most security researchers can write a piece of code that they themselves cannot hack even with many times more time to attack than to design the code.
It feels like with software, it’s much more obvious what the defense guarantees are? Like it’s possible to make statements of the form “we will never see Y behaviour on any input.” But with NNs it’s hard to make this statement.
Also I contend that your analogy is wrong. Being able to get hacked feels pretty common (vibes assessment). Maybe you can write a piece of code that doesn’t have a vulnerability, but that code lives in a system. The system is usually complex, with many places where vulnerabilities can exist, and empirically these don’t get fully covered for before software is released (otherwise zero day exploits wouldn’t exist).
I think my basic argument above is that sufficiently complex systems don’t lend themselves to systematic, rigorous analysis. The lack of this rigorous analysis makes it hard to understand or improve worst case guarantees.
Some other points are that
LLMs are expected to be general purpose while software is typically single purpose. The bigger the diversity of use cases the harder it is to secure all of them
As a corollary of the above, more people have an incentive to jailbreak an LLM
Jailbreaking an LLM is just more accessible so more people try it
It seems easier for malicious third parties to make LLMs more jailbreakable, eg via poisoning the internet data with Pliny style backdoors
Bad, because fully solving jailbreaks at the level of “once and for all” requires the model to have enough awareness of its situation that it can’t be tricked, and full understanding of the implications of its actions, and sufficient world modeling capabilities to anticipate what bad ends innocent-sounding questions could lead to, and sufficient user-modeling capabilities to determine user intent with high probability.
An AI with those capabilities could probably conspire with other instances of itself without risking detection, in a way that current AIs cannot realistically do, and necessarily has detailed knowledge of all the most dangerous information.
We’ll probably get AIs like that at some point, but it seems a bit foolhardy to push harder than baseline on the user-modeling capabilities and knowledge of what exact types of knowledge are dangerous.
Mind that I’m including “user writes messages in a role which would have a legitimate reason to know the information in question” as a type of “jailbreak”—robustness to “my grandma used to sing me lullabies of meth recipes” seems more straightforwardly good.
obviously i don’t mean that the mode can galaxy brain infer what the true intent of the user is and only allow them to do things that are good. i mean something much simpler. openai tells the model, “don’t make bioweapons”, so the model always refuses bioweapon requests no matter what. or it tells the model “only make bioweapons if the user says the word goose”, so the model does that. if openai says “only make bioweapons if the user is a qualified bio researcher at a lab with the right safeguards”, the model should ask openai to clarify what exactly the model check. should it ask for a scan of their badge? how carefully should it analyze the authenticity? should the model direct the user to contact openai so an employee can verify authenticity and give the user access to a rail free model?
I’d say strongly good if the person who figures it out publishes their technique. Simply because this is something we don’t yet know how to do and knowing such a technique would likely be a large advance in our alignment abilities.
This is, in my opinion, the dominant consideration, and any societal consequences of the fact that it allows the big labs to restrict their users more reliably do not really compare. (FWIW, I expect these to be mixed. Example of a positive consequence: Labs would reliably be able to prevent users from editing images of real people to remove their clothes or other things like that, which unfortunately seems to be a real problem right now.)
why would it be a large advance in our alignment abilities? i don’t see any reason why making gpt-5 refuse bioweapons reliably would be at all mechanistically analogous to aligning AGI
Mainly because it seems really hard. If we can do something that seems that hard, we probably learned something new.
There is also a mechanistic analogy. Think about what a jailbreak fundamentally is: an adversarial example. Some tuned input that results in an “incorrect” output. In terms of the overall alignment problem, why can’t we just make an AI care about people’s wellbeing by giving rewards during training? Well, the AI might be able to think of an adversarial state of the world that “feels” better to its own internal values, but doesn’t actually contain any people.
lots of things are very hard. making models do IMO problems is very hard, for example.
i guess there are two main questions. one is, why would we expect a method that makes LMs adversarially robust to also work on AGI? and second, even supposing we can know the technique to generalize to AGI, why would we expect the ability to adversarially robustify a reward model to help make an inner-misaligned model pursue the right goal?
Abstract answer: Maybe it doesn’t transfer from LM’s to AGI, but advances the state of knowledge in the field in a way that makes it easier to find something that works on AGI. Maybe it doesn’t transfer to (say) a pure RL agent, but it’s easier to make a sufficiently good LM into an AGI than it looks. Maybe it does just transfer. Obviously there are also outcomes where it turns out to be useless, I’m just saying it looks positive in expectation.
Concrete answer: Adversarial examples have been with us throughout the history of neural nets, and basically the only thing we’ve really found to deal with them is “generate adversarial examples during training and train against them”, and even that doesn’t really work.
If we look at the things that let LMs do IMO problems, the really fundamental innovations (which were pre-existing, I think) are “RL on chain of thought” and “make some kind of good scaffold for the search process that lets you save partial insights instead of going fully parallel on the entire problem” and maybe “LLM as verifier”. (Disclaimer: I don’t know everything the labs did to achieve their IMO results, and plausibly there are additional techniques in there that I would consider clever.) Then on top of that, you apply a bunch of techniques that are basically just more dakka: Bigger model, higher quality training data, RL on a bigger / higher-quality dataset of problems, more test-time compute.
I don’t expect there’s a fully reliable anti-jailbreaking technique that can be built by applying well-known existing methods with more dakka. If there is, I think I’d have to change my opinion here.
To your other question, I don’t think it necessarily solves the problem of inner (or even outer) misaligned models. It would only be partial progress on one aspect of the alignment problem. Partial progress is still progress, though.
I think you have to differentiate between “solving jailbreaks” as in you have forced adherence only to the assistant character and its corresponding rules, and “solving jailbreaks” as in it will never agree to do harmful things (harmful here defined by the lab/whatever team decided such things) no matter the prompt input. The former seems bad and somewhat plausible especially if labs have it as an explicit goal, the latter seems good and quite unlikely. Seems to me the disagreement in the comments is between these two definitions?
i don’t know why anyone would mean the former. when i say “solve jailbreaks”, i mean openai decides “this model should never help with making a bioweapon” and is able to make it so there is no input whatsoever that makes the model help with bioweapon creation.
Those seem fully linked to me. If some outside party is able to use their own system prompt and not be pre-biased in the direction of a character, that may as well be a ‘jailbreak’ like how many jailbreaks are of a “ignore that system prompt” style. Assistant characters have a huge aspect of “I’m the kind of character that would refuse that”.
I think the bioweapon example is doing a lot of work in the sense that you could theoretically not have biological data in the sets + it is destructive in almost all situations, while “ethical but very sus” dual-use usecases like LLM hacking assistance to get out of (or work around, or even fight) a censored regime, is entwined with actual capabilities.
i mean conditional on the model being able to do the thing, making it always reliably refuse, or reliably refuse under some set of conditions. ideally you have an instruction hierarchy where the model developer’s instructions overrides the outside party’s system prompt, which overrides the user’s instructions.
is it often observed that children like celebrating birthdays, aspiring to be older, and then when they reach a particular age, they realize the error of their ways and treat impending birthdays as a mark of getting closer to death. while it is generally assumed that this is because the evidence of how shitty aging is only becomes evident with age, there is also a mathematical explanation. each year, your expected remaining lifespan changes by some amount. for most of your life, this is close to −1 per year, because you almost certainly weren’t about to suddenly die that year. but things get weird for very young and very old people. for very young people, each additional year of life is strong evidence that you didn’t lose decades of lifespan by succumbing to infant mortality. for very old people, your probability of dying every year is so close to 100%, that if you somehow miraculously live an additional year, your expected remaining lifespan is still extremely short. so, this theory predicts that small children should be very happy to get older; and, for those who have glasses half full, once they get old enough, they too can enjoy the bittersweet satisfaction of having nothing to lose because they have nothing left.
most history is done in a very humanitiespilled, academia flavored way. are there good examples of people doing very analytical, capital-intensive history research where the quality of the work is judged based on how successfully the resulting theories made good predictions/decisions?
What do you mean by predictions? In sense of predicting the “direction” of history not really—historicism is generally poor because the reality of humanity on a large scale is massively complex and filled with stochastic, unknown, unpredictable uncertainties.
Predictions for historical work are judged by their ability to predict historical observations. That is, how well a theory conforms to current and future observations derived from records (the evaluation of which is themselves extremely nuanced in many cases), archeological findings and to a somewhat more controversial extent experimental archeology.
You wouldn’t judge a theory of how stars formation by how well it predicts who the next president will be, rather you judge it by how well it conforms to our present and future observations and understanding of the material facts involved (e.g., how atoms and matter works) and our observations of the cosmos.
Huang’s work is genuinely capital-intensive, quantitative history. The standout detail from the episode: he and Chinese collaborators spent six years with around 40 research assistants digitizing Joseph Needham’s 27-volume Science and Civilisation in China to build a statistical database — Needham himself never analyzed his material quantitatively. That database powers the CDI (inventions-per-capita) scores that drive Huang’s central empirical claim that China was most inventive during its fragmented post-Han “European moment” (220–589 CE), before keju was institutionalized. He also has a co-authored paper with Clair Yang doing statistical work on civil service exams and imperial stability, plus statistical analyses of social mobility in imperial China across dynasties. This is the opposite of vibes-based humanities history — it’s a multi-year, multi-person, data-infrastructure-first research program.
It also generates falsifiable forward predictions: Huang argues Xi’s elimination of term limits has reintroduced the ancient succession problem and that current top-down industrial policy will produce Brezhnev-style stagnation. Those are bets you can score over the next decade or two.
Where it doesn’t fit:
The LW commenter’s stronger ask is for fields where quality is judged by prediction track record. Huang’s work isn’t judged that way — it’s still judged by academic peer review, theoretical elegance, and historiographical argument. Nobody is keeping a Brier score on his China forecasts. The infrastructure is quantitative; the epistemic culture is still humanities-academic.
The better pointer to give them:
The episode is one node in a larger movement: the Center for Quantitative History (CQH) and the broader cliometrics-of-China field — Yuhua Wang (The Rise and Fall of Imperial China, statistical analysis of ~300 emperors and elite kinship networks), Zhiwu Chen, Debin Ma, James Kung, Melanie Meng Xue, Carol Shiue. They mine local gazetteers, clan genealogies, and official rosters at scale. There’s a 2026 Springer volume Quantitative History of China: State Capacity, Institutions and Development that’s basically a field overview. Outside China specifically, this is part of cliometrics / historical political economy more broadly (Acemoglu & Robinson, Nathan Nunn, Melissa Dell).
If you want to push back on the commenter’s framing: the strongest examples of “history judged by predictive success” probably aren’t historical fields at all but adjacent ones — Turchin’s cliodynamics (which explicitly tries to make predictions and gets graded on them, controversially), and forecasting tournaments applied to geopolitics (Tetlock, GJP). Cliodynamics is the closest thing to what they’re describing, and it’s worth naming because it’s also the cautionary tale about how hard the prediction-grading move actually is.
So the honest pitch for the episode: “Here’s a great example of capital-intensive, data-infrastructure-driven history with explicit forward predictions — though the field still grades itself by academic, not predictive, standards. If you want the prediction-grading version, you want cliodynamics.”
I’d take your side. In 5 years we could be in low-medium automation, high automation, or full automation, and all of these seem more likely to have <100 papers.
In low-medium automation there’s no way to get 100 papers a year of output at current quality bar.
In high or full automation AIs will be doing ~all the research, but papers are designed for humans. At 100 papers per year, the average paper will probably be read by less than 5 humans. So I’d expect standards to rise and work to be consolidated into fewer papers rather than the number of papers exploding. AIs that need to communicate could use some other format. It would be weird but not impossible for the median to hit 100.
In full automation, we have the same issues as high automation, plus the median postdoc will be basically obsolete and only the ones with lots of compute will contribute.
i wish more showers adopted the design where there is an on/off knob and a temperature knob. it’s so obviously better. on the other hand, i hate the single knob showers
Most of the showers I have used in the US have a single dial / degree of freedom that goes from cold&low pressure to cold & high pressure to warm & high pressure. Where as most European showers I’ve used have 2 degrees of freedom, either in a single handle like your image or as separate dials
Same here. The strategy with these is start full-hot to bring the hot water to the shower quickest, then back off a bit to an appropriate temperature. In my experience, this is a pretty consistent position and not hard to get right first try.
How easy is design is to use depends on how precise you want the temperature to be controlled. In my experience a setup with two controls allows a bit more precision then having one control that’s both left-right and up-down.
Knobs are bad design, levers are better especially when the fingers are wet and there’s some amount of calcification that makes the knob harder to turn.
This is especially true when you have one control for the amount of flow and another control for the temperature that’s seldomly turned and thus easier calcified.
I’m not sure if this is the answer you’re looking for, but: most things that could exist don’t. The space of ideas is wide, and few of them are implemented in practice. Is this idea particularly privileged in the space of possible governance ideas, in such a way where you would have expected it to have been tried?
The actual reason is probably the boring fact that political functions (laws) almost never take macroeconomic metrics as input, except population. Some take individual measures like “the sale price of a good” or “a company’s income”. It would be more idiomatic to apportion voting power to a house of district governors based on taxes contributed to the federal budget.
As to why that’s never happened. I dunno. I think basically any representation apportionment method except population or arbitrary clustering (or like, tradition, I guess? but there’s no tradition for monetary contribution) is pretty much taboo, and something something Chesterton’s Fence (though population-based voting has started to run into the Old People Voting Themselves Infinite Money exploit in the west).
There are no objective measures of economic productivity that would stand up to the Goodhart-pressure of being used as criteria for political (dis)empowerment. Whatever measures were chosen would be immediately gamed-to-hell, and would cease to measure anything but “who was in power when the measurement was conducted?”
What about costly signals? E.g. every year, each state chooses how much money they donate to the federal government. Their voting power in the second chamber is proportional to the size of their donation.
That might have some funny properties. Suppose that New York and Texas were equally rich, but New Yorkers uniformly want more government than Texans do. Texans would be made to pay more than New Yorkers are willing to, in order to outvote the New Yorkers and not actually use that money in the federal budget. At that point, the excess donations would presumably be returned to the states or the people.
So maybe this ends up being another second-price auction?
seems fine? if some states are willing to tax their citizens a lot more than other states and give all of that money to the federal government, seems reasonable to give them more say? and of course if they go too far, the residents there can simply choose to leave to a different state, just like they can now.
spending excess money seems way easier than the opposite problem. worst case you just stash it away. but realistically spending will always be ballooning out of control.
Hong Kong had seats in its legislature designed for special interest/business groups (see Functional constituency on wikipedia). I don’t understand it very well though.
Historically, the British House of Lords at least approximated representation based on wealth (albeit somewhat indirectly) and various other arrangements existed to guarantee the political power of the aristocracy who, before industrialization, were the wealthy.
This is not a very popular thing to do because it’s anathema to the normative basis of democracy (one person, one vote). If one believes that the wealthy deserve increased de jure political representation, then why have a democracy at all and not some sort of oligarchy? And, of course, democratic systems de facto do give the wealthy extra power via various channels.
if you really believe in one person one vote, then bicameraity is crazy. people in small states have vastly more power per person than in big states in the US
Bicameral systems can respect one person one vote, and they even do in the 49 states with bicameral legislatures.
The US Senate is a weird kludge that was necessary to secure support for the constitution. No one would arrive at it from first principles.
The best argument you can make for the Senate is that it’s necessary to protect vulnerable minorities (residents of small states). And democracies tend to trade off in various ways to protect vulnerable minorities at the expense of democratic purity (e.g., via constitutional provisions). That’s a pretty silly argument so far as it goes but does have a little more historical punch. You can’t make it for a hypothetical system with a chamber to protect the interests of the rich who are very obviously not a vulnerable minority at all. And there’s a much higher level of normative repugnance to “we’ve got to look out for the rich” than for most other groups you can fill in the blank with there.
what’s the point of having a second chamber if you’re going to apportion it by population as well? afaict, this is a historical oddity due to states copying the federal system but then getting squished by Reynolds v Sims, at which point it would have been really annoying to abolish the state senate. all the countries I can think of with a bicameral legislature (US, Canada, UK, Germany) have one of the chambers apportioned by something other than population.
There’s countries that use equal-population districts for both houses, but at that point it feels like the bicameralism is just copied from the US without strong reasoning (e.g. Italy, Japan, South Korea).
Sure, as I said it’s approximate and there are exceptions.
But entry into the peerage also wasn’t closed off, and sufficient wealth was a fairly common pathway. Get wealthy enough, make a few connections or well-placed donations, and you’d wind up in the House of Lords (e.g., the Rothschilds or William Lever—the “lever” in Unilever).
Population alone is easy to Goodhart, but through both gerrymandering and immigration, both of which take some degree of time to do. Economic productivity is even easier—set up a few innocuous subsidies, hook up your core constituents (at the expense of everyone else) in the span of a single election cycle, and you’ve got a permanent lock on that chamber. Daycare centers, bloated NoVA contracts, and perpetual small business loans are just the start.
At the subnational or individual level? Apportioning e.g. states this way has the issue of being arbitrary and sensitive to the specific borders you draw, but was proposed by Madison IIRC. You can “gerrymander” these states (intentionally or unintentionally) by packing a lot of high-productivity citizens into a state with a majority of low-income voters.
At the individual level, weighted setups like this were very common in the 19th century. I know Prussia/Germany, Belgium, Austria, and France used class-based voting systems of this sort. These systems were mostly swept away in the late 19th and early 20th centuries under pressure from liberals and social democrats (sometimes peacefully, as in Belgium, and sometimes by revolutionaries as in Germany).
Of these, the Prussian three-class franchise was the “purest”/closest to what you describe: people were assigned one of 3 classes based on how much they paid in taxes, with all three groups paying the same amount overall. (Taxes were roughly proportional to income.) Each of these groups received the same number of representatives.
the three-class franchise seems like one reasonable implementation of this type of policy within a unitary state. within a US-like federation, you could have the house continue to function as it does right now, and have a senate where the number of senators is proportional to the tax revenue contributed by each state.
Yep! Like I mentioned, this has the problem that state borders are fairly arbitrary and let you “launder” one group’s wealth into another group’s voting power.
But I just remembered—before Reynolds v. Sims established “one man, one vote”, New Hampshire used this exact system. From 1784 through 1964, districts were apportioned based on taxable wealth.
To be more precise—I don’t think it’s logically coherent to apportion voting power between states according to wealth, but between people by population (i.e. equally). Either you want to upweight high taxpayers or you don’t.
The capability to lobby using control over capital, in turn apportioned by economic productivity, seems like a kind of third column in the legislature?
the concept of a spontaneous unscheduled phone call is so strange and alien to me. you’re telling me there are people out there who want to be interrupted at random points in their day, and a large fraction of the time they are able to just pick up and talk? rather than constantly getting voicemails, and then leaving voicemails back because by the time you get around to replying, the caller is busy? do these people spend most of their days doing neither deep work nor being in social situations that would be rude to suddenly step away from?
You are telling me there are people out there who when they want to make progress on something that is blocked by another person, or where whenever some kind of thinking is best aided by another person, just… wait for hours or days at a time until they respond? Juggling 15-20 different messaging threads without getting any focused work done, instead of simply calling the person, resolving the issue and moving on? Do these people spend most of their days just waiting on other people to get back to them, or being in pre-scheduled calls all day that are scheduled for 30 minutes despite being resolvable in a 5-minute phone call?
do these people spend most of their days doing neither deep work
My rough guess is that only 1–5% of jobs involve deep work. Something like 1⁄3 of jobs are manufacturing and 1⁄3 are service/retail, none of which involve deep work.
nor being in social situations that would be rude to suddenly step away from?
I have met many people who believe a phone call takes priority over all other forms of social interaction, for some reason.
That’s unfairly dismissive. I can’t speak to retail, but manufacturing absolutely does require “deep work”. Machining requires concentration and technique in order to ensure parts have the right tolerances, surface finish, etc. Assembly work often involves deep thinking in order to ensure that the machine is correctly assembled and properly configured.
It’s not all routine “mind numbing” assembly line work, just as not all IT is routine mind numbing data entry.
sodium cotransport is really cool. while the gut can absorb glucose and sodium individually through several different pathways, there is a really important transporter (SGLT1) which carries glucose and sodium at the same time.
this is really important for rehydration. suppose you have cholera and vomit a lot and get super dehydrated as a result. drinking just water sucks, because you need to replenish the electrolytes that you’re losing too. but water with salts is still not optimal, because it’s absorbed less efficiently (also i think cholera interferes with some of the other absorbtion paths?). then one day some guy discovered cotransport, and invented a rehydration solution with not just salts but also glucose. this simple discovery and resulting invention has saved millions of lives
I’d be really excited if anyone wanted to look at training circuit sparse models on the AlgZoo tasks and seeing if we can push the frontier of understandability.
it would be funny if, in the future, the boot sequence for the dyson sphere supercomputer still starts out in 16-bit real mode. the world’s most expensive 8086
obviously there’s also a lot of consumer demand, but I wonder how much of the trend towards food with less complicated ingredients being marketed with that as a major pro is because it’s more technically impressive to accomplish (my layman understanding is that the easy way to make viable commercial food is to just toss in a bunch of preservatives and emulsifiers and stabilizers and you have a lot of margin for error, and avoiding them requires a lot of creativity in leveraging the specific properties of the food you’re dealing with / modifying the packaging strategy to create a more elegant solution)
Not sure whether this is related, but I find many food recipes needlessly complicated. Like, I can sometimes remove 1⁄3 of the ingredients, and the result still tastes the same to me. Which of course makes me wonder, why were those ingredients included in the first place? Possible explanations:
my senses suck, those versions actually taste differently
my “sense memory” sucks and I can’t really compare the taste of one meal yesterday with the taste of another meal today, but I could if I had them side by side
the extra ingredients were there for reasons unrelated to taste (e.g. for easier digestion)
maybe some people prefer the version of meal with lots of X, and other people prefer the version of meal without X, but they didn’t have the courage to remove X completely, so they only left tiny amounts that have no impact on the result
Or maybe it’s some kind of signaling? Like “look how great cook I am, I can make a meal with 99 ingredients, even if a meal with 9 ingredients would taste exactly the same”?
Maybe the techniques of food preservation have improved, so that some preservatives are no longer needed. But no one bothered removing them from the recipe (they are probably cheap) until they realized they could make a marketing move out of it.
my guess is sth like 1; I think some people are a lot more sensitive to some flavors than others. also the extent to which you pay attention can affect flavor a lot. and fwiw I frequently notice that some restaurants make the same dish a lot better than other restaurants, and the major ingredients must be about the same (or at least the quality/quantity difference is small enough that it’s not the first thing I noticed), so it must be in the minor ingredients. but often my friends won’t notice a big difference and conversely I don’t notice a big difference in the foods they draw strong distinctions for
(there are some really subtle flavors that I like, that are very subtle and which most places get wrong in ways that I can’t put my finger on—e.g hainan chicken rice, gyu-don, edmonton style donair)
The boring hypothesis neither has to do with growing demand or signaling impressiveness, but with society generally getting better at food logistics over time, so being able to ship more fancy stuff for more reasonable prices.
Indeed, the price premium of organic foods has been shrinking. I tried to find price data for processed/non-processed foods, but couldn’t find any, so this is what we have for now.
everyone is a few hops away from everyone else. this applies in both directions: when I meet random people they always have some weak connection to other people I know, but also when I think of a collection of people as a cluster, most specific pairs of people within that cluster barely know each other except through other people in the cluster.
It’s worth noting that, though it’s true that for a sufficiently large cluster most pairs of people are not strongly connected, they are significantly more likely to be connected than in a random graph. This is the high clustering coefficient property of small-world graphs like the social graph.
the world is too big and confusing, so to get anything done (and to stay sane) you have to adopt a frame. each frame abstracts away a ton about the world, out of necessity. every frame is wrong, but some are useful. a frame comes with a set of beliefs about the world and a mechanism for updating those beliefs.
some frames contain within them the ability to become more correct without needing to discard the frame entirely; they are calibrated about and admit what they don’t know. they change gradually as we learn more. other frames work empirically but are a dead end epistemologically because they aren’t willing to admit some of their false claims. for example, many woo frames capture a grain of truth that works empirically, but come with a flawed epistemology that prevents them from generating novel and true insights.
often it is better to be confined inside a well trodden frame than to be fully unconstrained. the space of all possible actions is huge, and many of them are terrible. on the other hand, staying inside well trodden frames forever substantially limits the possibility of doing something extremely novel
It’s as efficient to work on many frames while easily switching between them. Some will be poorly developed, but won’t require commitment and can anchor curiosity, progress on blind spots of other frames.
corollary: oftentimes, when smart people say things that are clearly wrong, what’s really going on is they’re saying the closest thing in their frame that captures the grain of truth
″...you learn that there’s three kinds of intellectuals. There’s intellectuals that work in one frame. There’s intellectuals that work in two frames. And there’s intellectuals that change frames like you and I change clothes.”
for something to be a good way of learning, the following criteria have to be met:
tight feedback loops
transfer of knowledge to your ultimate goal
sufficiently interesting that it doesn’t feel like a grind
trying to do the thing you care about directly hits 2 but can fail 1 and 3. many things that you can study hit 1 but fail 2 and 3. and of course, many fun games hit 3 (and sometimes 1) but fail to hit 2.
corollary: for things with very long feedback loops, or where you aren’t motivated by default, it can be faster for learning to do something that is actually not directly the thing you care about
This is basically math (and computer science) education. On one hand, some parts are probably not very useful. On the other hand, some people expect that teachers will defend every single step along the way by explaining how specifically this tiny atom of knowledge improves the student’s future life. No, I am not preparing a PowerPoint presentation on how knowing that addition is associative and commutative will make you rich one day.
funnily enough, my experience has been almost entirely from the other direction—almost everything I know is from working directly on things I care about, and very little is from study. one of the reasons behind this shortform was trying to untangle why people spend lots of time studying stuff and whether/when it makes sense for me to study vs simply to learn by doing
I think it is good to use your goals as a general motivation for going approximately in some direction, but the opposite extreme of obsessing whether every single detail you learn contributes to the goal is premature optimization.
It reminds me of companies where, before you are allowed to spend 1 hour doing something, the entire team first needs to spend 10 hours in various meetings to determine whether that 1 hour would be spent optimally. I would rather spend all that time doing things, even if some of them turn out to be ultimately useless.
Sometimes it’s not even obvious in advance which knowledge will turn out to be useful.
it’s actually crazy how much ubering pareto dominates driving in a city like SF. you don’t have to worry about parking, you can work while in transit, you can get a bigger car when needed, you don’t need to round trip, etc. it’s generally even cheaper once you take depreciation, parking, insurance, etc costs into consideration.
My sense is that this is true until you have a small child you want to move around, and then it’s super super annoying to not have your car seat already installed for them and have other supplies on-hand
day 1 of using a new phone: there cannot be a single small bubble under my screen protector. it must be perfect.
day 1000 of using the phone: the square inch sized chunk of dead pixels on the screen is fine because it doesn’t usually cover anything important, and I can still read words in between the cracks
conference talks aren’t worth going to irl because they’re recorded anyways. ofc, you’re not actually going to remember to watch the recording, but it’s not like anyone pays attention at the irl talk anyways
a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it’s really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.
there are two different modes of learning i’ve noticed.
top down: first you learn to use something very complex and abstract. over time, you run into weird cases where things don’t behave how you’d expect, or you feel like you’re not able to apply the abstraction to new situations as well as you’d like. so you crack open the box and look at the innards and see a bunch of gears and smaller simpler boxes, and it suddenly becomes clear to you why some of those weird behaviors happened—clearly it was box X interacting with gear Y! satisfied, you use your newfound knowledge to build something even more impressive than you could before. eventually, the cycle repeats, and you crack open the smaller boxes to find even smaller boxes, etc.
bottom up: you learn about the 7 Fundamental Atoms of Thingism. you construct the simplest non-atomic thing, and then the second simplest non atomic thing. after many painstaking steps of work, you finally construct something that might be useful. then you repeat the process anew for every other thing you might ever find useful. and then you actually use those things to do something
generally, i’m a big fan of top down learning, because everything you do comes with a source of motivation for why you want to do the thing; bottom up learning often doesn’t give you enough motivation to care about the atoms. but also, bottom up learning gives you a much more complete understanding.
Quite a large proportion of my 1:1 arguments start when I express some low expectation of the other person’s argument being correct. This is almost always taken to mean that I believe that some opposing conclusion is correct. Usually I have to give up before being able to successfully communicate the distinction, let alone addressing the actual disagreement.
has anyone done a good analysis of how to reduce fatality and injury risk of driving over a baseline of normal Uber? in particular, how much would each of the following matter:
a large SUV vs a normal sized car
Waymo vs well-rested professional chauffeur vs average Uber driver vs average American driving
also in particular interested in time-of-day segmented stats. several factors make this difficult. time of day accident data is confounded by intoxication and fatigue; but there is some bleed over, because someone else crashing into you is a large fraction as dangerous as crashing yourself. and afaict there isn’t good data segmenting by type of driver.
I was working on a post about how to drive safer and how much safer you would be, but all of the data is really annoying to work with. My calculator is here (code here), but it never got to the point where I thought it was good enough to share.
It’s really easy to find stats about drivers involved in crashes, but there’s very little data about base rates. For example, we know that a lot of people involved in crashes are drunk or texting, but we don’t know if the other drivers are sober and undistracted, or just didn’t crash. We also know that a lot of crashes happen at night and that a lot of people are drunk at night, but same problem.
One data set that might be helpful is trucking fleets, since they tend to have more info and actively care if their training helps. Trucking is complicated since most crashes are caused by the other driver (if a small car cuts a truck off, there’s not much they can do about it at that point), but investigators do try to figure out if it was the truck’s fault or not.
The interesting thing is that some fleets have much lower crash rates. It seems like Walmart truckers are much safer than independent truckers (theory: Walmart doesn’t want bad press, plus they own the trucks).
My suspicion is that if a normal driver in a well-maintained small car drove following all best practices (not tired or intoxicated, frequently looking around, driving defensively, not distracted, not cutting people off, avoiding bad weather), they would have several orders of magnitude lower chance of causing a crash. Avoiding driving at night (and again, bad weather) would somewhat reduce the risk of someone else hitting them, and driving a large SUV and avoiding rural roads would significantly reduce the risk of death.
Like I said, it’s hard to quantify this though since the data is so confounded.
in the Senate of the US Congress, there is a “nuclear option” for overriding filibusters; a parliamentary method that can be used to ram legislation through. both sides generally agree to use it sparingly, because it’s a symmetric weapon.
i think there is a similar lever in arguments that is best left untouched if your goal is to actually find the truth. there is a type of argument that can be deployed in a wide range of circumstances, and is very hard to rebut except with even more nuclear arguments. the most extreme example is, suppose you are discussing the strength of some piece of evidence, and you say “well but it’s impossible to truly know that, because we can’t rule out the possibility that the entire external world is fake, a la Descartes’s demon.” or, “actually your argument is equally as conceptual as mine, because there’s no such thing as empirical evidence, since all evidence must be filtered through our minds and reason applied”. these kinds of arguments can be completely unbeatable and also utterly useless for answering actual questions we care about.
this is closely related to the idea of an argument that proves too much, though not all arguments that prove too much are completely nuclear.
tbh, i don’t really understand the concept of themes/symbolism in fiction books. aside from the most literal things. how much of this is just people being pretentious and/or reading tea leaves?
PS. Happy to elaborate on this if you or others need a deeper defense rather than just a sense-check! I’d benefit from understanding your concern in more words if so. I can think of several different possible senses of what someone might mean in your parent comment and imo I’m happy to defend “a lot of it is real” for all those senses but it’s possible your objection is a secret fifth sense.
Themes are just things the fiction makes you think about that go beyond the particular events and characters of the book itself. If you’ve read a book where a character is thinking about cheating about her husband and you’re like “well I would never do that… but my marriage is strong, what if we were going through the same thing as them… would I be strong enough?” congrats that’s a theme. Themes can also be perfectly inane, “ha the bad guys in this are not coincidentally just like my real-life political opponents, and the hero just gave the perfect one-line reply that I would give if I was slightly faster on my feet, take that!”
If you want a higher bar for what counts as a theme, a book knows it will have this effect and thus prompts you to view the subject from many angles, disrupting you from being able to accept whatever your first “easy” answer is. This is what literary types tend to find useful and desirable about theme. If you read Crime and Punishment you will likely end up thinking a lot about how you manage and experience guilt, for instance.
Symbolism I personally find to be a rather stilted and inorganic way of doing theme, so someone else can better speak to what some get out of it, but it clearly is a thing that some fiction engages in.
I don’t think it is productive to conflate symbolism and themes. Symbolism is when an element of a story, or the description, correlates to some signifié. Say, “Rosebud” is a symbol of Charles Foster Kane’s youth and innocence. I’m not sure if parody (or inspiration) counts as symbolism—is the protagonist of Zola’s Œuvre a symbol for Paul Cézanne? I don’t know.
I’m loathe to bundle themes together with symbolism. While a selection of symbols throughout a work may comprise a theme. It would be a mistake to say “themes are symbols”. Not all themes need be comprised of symbols. Effective use of themes, at least in dramatic works, don’t rely on symbolism, instead they make the plot events themselves dramatizations of the theme.
In Bergman’s The Silence the theme is… well… silence… or perhaps more correctly: non-communication. They are strangers who don’t speak the language of the country they are in. They are sisters who cannot relate to each other. They are effectively and literally silent. And when they aren’t silent they aren’t communicating a with a whole lot of meaning. And it is a theme essential to the plot of the film, rather than a symbolic add on. Kubrick’s Dr. Strangelove has a similar theme [1]and similarly is repeated several times throughout the plot of the film.
These episodes are not symbols for anything, they are the thing itself.
And don’t get me started on Zappa’s Project/Object theory which proposes that they don’t need to symbolize a damn thing to give a theme power!
Rembrandt got his ‘look’ by mixing just a little brown into every other color—he didn’t do ‘red’ unless it had brown in it. The brown itself wasn’t especially fascinating, but the result of its obsessive inclusion was that ‘look.’
In the case of the Project/Object, you may find a little poodle over here, a little blow job over there, etc., etc. I am not obsessed by poodles or blow jobs, however; these words (and others of equal insignificance), along with pictorial images and melodic themes, recur throughout the albums, interviews, films, videos (and this book) for no other reason than to unify the ‘collection.’”
Mamoru Oshii expressed a similar sentiment that it is the repetition of imagery in his films that creates meaning. [2]The theme is the fact there’s a theme at all I suppose.
In the case of Dr. Strangelove is “about Nuclear Apocalypse”—that is the subject, yes. But, as Kubrick himself said to Film critic Michel Ciment ” Failure of communication is a theme which runs through a number of my films” alluding to the plot point of telephone and short-wave radio not working, and transport impossible in the Shining. And Dr. Strangelove’s plot hinges on multiple episodes of, similarly, telephone and radio communication failing. The way this theme is explored in Eyes Wide Shut—a film that concerns itself with (paranoid) fantasies and infidelity—is a continuation of the theme is left as an exercise for the reader. A theme is not always comprised of symbols, nor is the same as a subject. The example I always use, because it relies on cliches, is a photo series about “age”. The series may have subjects as diverse as a budding flower, a newborn baby, a geriatric person, and a wilted flower. All the same theme—age. Four different subjects. The subject of Dr. Strangelove is a Nuclear Apocalypse, the theme is failure of communication.
″ Eventually, I think, by using these elements repeatedly, I add meaning to my final product. I’m still exploring how to express my feelings through these elements. I’ve always felt that in order to portray humans, you should not be shooting humans; you should be shooting something else. And what I’ve used is animals, which are very important in my films.” https://www.avclub.com/mamoru-oshii-1798208379#:~:text=Eventually,films
Depends on the authors. The more famous and classical the author is, the higher your prior should be that every sentence, name and scene serves a purpose to explore a character and thus the main theme of the book. Chekhov / Tolstoy / Shakespeare etc. are definitely on the highest density side of the spectrum. Fanfiction might often be pretentious.
one problem with UBI as a solution for AI economic disruption: at the moment when AI can first replace a human job, it will probably cost only epsilon less than the human. the cost will be mostly capital (datacenters, chips, electric plants, etc), rather than labor. so we can only afford to give the human epsilon UBI. as time goes on, eventually the AI gets cheap enough that humans can get substantial UBI, possibly exceeding their original income, as the AIs become more productive than the humans were. but there’s a big gap in the middle that we need to bridge somehow. the best case scenario is that different industries get automated at different times so that the gaps don’t line up, and we can redistribute the surplus from the first industries to be automated to fill the gap for later industries. the worst case is that all the gaps happen at once and we all starve to death because the surplus is not enough to keep people alive.
This can’t be right. The troublesome point you describe happens when there are already enough “AI workers” to displace all current jobs, but the extra productivity is still only epsilon (why?) and the number of “AI workers” isn’t growing explosively far beyond that (why?)
Anyway, the real problem isn’t that capital owners won’t have enough money to pay us UBI. It’s that that they… won’t pay us UBI. Simple as that.
I don’t understand your objection. right now, the cost of replacing a given human with AI is greater than the cost of the human (because the compute is very expensive, the AIs are not very good, etc). over time, the AI gets cheaper and cheaper, until at some point it is precisely as expensive as the human. one day thereafter, AI will be very slightly cheaper than the human. you would prefer to pay for the AI compute instead of the human salary. at this moment in time, it will be economically incentivized to fire all the humans and replace them with AIs. because the AIs still cost almost exactly as much as humans at this moment, it won’t be economical to have substantially more AIs than you had humans the day before, because if it were, then we would have hired more humans in the first place; there must be diminishing returns to quantity of humans employed, and the previous equilibrium is still very close to the new equilibrium. but the amount of new value created for the world due to this switch is very small: only the delta between what the humans used to cost and what the AIs now cost.
Your economics are wrong for a few reasons. Let’s grant the hypothetical where all humans supply homogeneous labor at a uniform wage.
If AI is slightly cheaper than humans, what happen is that wages fall slightly. At the new, lower wages, there is more demand for labor (and more humans drop out of the labor force). At the same time, capital costs are bid up slightly. Eventually the price of AI and human labor is equal, and the quantity demanded is equal to the quantity supplied.
At the same time, you are increasing demand for labor to build the AI (right now labor is ultimately the main input to building all the stuff that goes in datacenters). If the social value of the AI is near zero, then the net increase in demand is almost the same as the net increase in supply. Lowering wages and increasing capital costs doesn’t offset the benefits of extra productive capacity, it just shifts value from laborers to capitalists.
The real fiscal issue in this scenario is that you are shifting output from labor to capital, and the tax rate on capital is lower than the tax on labor. (Moreover as you automate the economy there are further corporate reorganizations that would drive effect tax rates well below the on-paper capital gains rate). You’re doing that at the same time that you are potentially increasing spending, which is tough unless you are willing to adjust the tax code.
I’m inclined to agree with other commenters though that none of this seems like the most important issue. The fiscal issues can be overcome if the state cares, and my best guess is that growth will accelerate enough that it would be OK even if there was no political change.
People should have much bigger concerns about being completely materially disempowered: (i) the state may not continue to support them, either because they are politically disempowered or because the state itself is disempowered, and (ii) even if they are able to survive they will have no say over what the world looks like and that sucks in its own way.
My idea was, maybe the AI company is willing to sell you 1 unit of AI labor at human-competitive price, but if you order 1000 units they’ll ask for a higher price per unit, because they need to build more datacenters or something. In this case replacement of humans will be gradual even if all humans are equally productive. And another possibility is that humans aren’t all equally productive, so AI will first get good enough to replace the worst worker, then the second worst and so on. From these two reasons I get the possibility that by the time lots of people get replaced, the difference in productivity between AI and the average person replaced so far won’t be epsilon. It won’t be the full salary either, but maybe something substantial. Anyway that was it.
the worst case is that all the gaps happen at once and we all starve to death because the surplus is not enough to keep people alive.
This does not follow at all. The total amount of production would somehow have to decrease, otherwise it’s just a question of distribution of resources, which is the whole point of UBI. To literally starve, they would need to shut down some amount of food production (the robots don’t eat).
Good point. Decreased quality of life due to competing with ai for basic resources has already begun (RAM prices) and will eventually show up in non direct goods.
What do you define as “replace a human job”? We are already seeing AI that can replace at least 50% of a job for very much cheaper than the 50% of the cost of paying a worker to do those parts of that job. In principle that means that many employers can fire half their workforce and get the remaining employees to pick up the other 50% of the jobs from the fired employees.
In practice this would involve huge disruption and uncertainty, and perhaps they can avoid that bother by letting go their most obviously least productive employees, lowering costs a little (say to 95%) to do the same or slightly more work with much reduced disruption. Over time, the employees who use more AI in the workflows take less effort to do the job. We are seeing this already.
This obviously isn’t a stable long-term economic behaviour. Those conservative employers probably will continue to decrease workforce slowly, while being eroded by employers much more willing to accept disruption eating into their markets at greatly reduced costs.
However, it takes time. The more capable that AI becomes per unit cost, the greater the advantage that disruption-tolerant employers will have, possibly leading to multiple larger failures of conservative employers or possibly rapid culture changes to avoid such failures, replacing large segments of workforce at some later point.
In this model (which matches what we are already seeing), the job losses are inevitable but come some economically significant and somewhat unpredictable time after the cost of AI drops well below the cost of employing a human to do some tasks.
There’s nothing that requires an economy to maintain a continuous equilibrium of perfectly distributed cost/productivity balances at all times, and we see plenty of past examples where it has not. Continuous changes to parameters in a complex system often result in sudden changes in behaviour, not just continuous ones.
it doesn’t matter whether you’re fully replacing one job, or partially replacing multiple jobs. my model still implies that the market value of human labor diminishes more than the amount of money needed to keep everyone at the same level of consumption as they did before
You seem to be describing a situation where there is a temporary absence of sufficient funds for a UBI (the “big gap in the middle”) after which there’s plenty of money to fund the UBI, potentially at a higher level than people’s original income.
The generic solution for a temporary lack of necessary funds with lots of funds being available in the future is getting a loan to be paid off when the money comes in. This consumption-smoothing would be good from the perspective of the AI companies as well, as “everyone is out of work and has no money to spend”, if it persists for long enough for people to burn through their savings, predictably leads to “the revenue streams of the AI companies collapse”.
How it would work out in detail is unclear, but if AI companies end up with a lot of economic power, I’d expect that gets taxed in some form by whoever’s providing the UBI, and in the meantime the UBI provider goes into a bit of debt.
can someone who works in quant/HFT/market making help me understand whether the following is correct?
(assuming there is only a single exchange for simplicity,) order execution is hard because (a) the order book is of finite size, so placing a large order induces slippage, and (b) if you make a series of trades spaced far apart to wait for more liquidity to show up, the value of the stock can move, (c) if you make a series of trades predictably, then HFTs can fuck you over by clearing out the order book 1ms before you buy and turning around to sell to you.
so unavoidably you have to make a series of trades. you need to trade off between speed of executing on everything and the liquidity available. the HFTs can only fuck you over if they know exactly when you’re going to trade, because they bleed money whenever they mispredict and try to frontrun an order that doesn’t exist. you also want some way to hedge the possible price movements.
I am not a quant, but have some related background. (Those who know this area best, may not be inclined to say.)
”Real traders” have many ways to avoid getting front-run to the extreme degree suggested in (c), including limit orders and “trying not to be that predictable” by disguising action to look like other forms of flow.
The amount of pain you experience from (b) depends on whether you think your strategy’s value decays rapidly or slowly.
But there is is a more general problem: it is not just HFT’s but the market as a whole that reacts to your actions: your impact will shift the demand curve for the stock. the size of that impact depends on the information leaked by your actions, information leaked by passage of time, and time allowed for new liquidity to arrive.
but predicting actual impact is hard for a number of reasons (limited data, causality issues)
Knowledge of what other players can and can’t infer from your execution, and modeling impact patterns well, is a multiplier on the value of strategies, hence worth spending a lot to get right.
I work in quant trading, but not specifically in order execution. These are all real concerns. Which ones are most important depends heavily on your strategy and market, e.g. if your positions last for days order execution is a lot easier than if they last for minutes. And time-based slippage might be big or small relative to tick size.
the HFTs can only fuck you over if they know exactly when you’re going to trade
This isn’t quite true, sophisticated funds can exploit almost any predictability.
if you make a series of trades predictably, then HFTs can fuck you over by clearing out the order book 1ms before you buy and turning around to sell to you.
You could prevent this by using a limit order, yes? But execution still seems hard even if you are allowed to use limit orders.
In general, it seems to me unnecessary to know exactly when you will trade in order to exploit you. If I can predict your future trades, and believe they would be profitable for you, I can take advantage by doing those trades myself first. In fact, unless I think your trades would be profitable only at the exact time you plan to do them, I likely want to make your trades as soon as I can, not wait for when you were going to make them.
Naive use of limit orders will cause you to lose the profitable trades, and fill the unprofitable ones. There are ways around this, but it’s not trivial.
Let’s say I’m a market maker. Assuming there’s no way for me to hedge my position when my quote is taken, how do I get out of my position when someone hit my quote?
The standard way to do this is to “lean” on your position. If my estimated price of something is $100 and someone bought against me, then I’ll adjust my estimated price to something like $101. The more position I’m holding right now, the more I adjust. When the adjusted price moves too far away from my quotes, I pull my quote back. That’s why when you trade against market makers, you might get a worse price for your next order.
i really wish there were a better platform for repeatable cognitive testing than brainlabs.me. the website feels like it is about to fall over from a light breeze, and i would be very sad if i suddenly lost my method of measurement because the site disappeared. also, there doesn’t seem to be particularly strong evidence that these tests in particular are the right ones to be looking at.
I’ve always been relatively unfamiliar with normal pop culture, so I recently decided to look at several online lists of best/most recognizable songs and made a spotify playlist of several hundred of them, with a bias towards more recent songs. I think this has been much better than the Spotify recommendation algorithm, which mostly shows me songs similar to ones I’ve already listened to.
I think it’s an algorithmic implementation of the Taylor-series decomposition optimized for performance? Claude tells me it might be useful for high-frequency trading, game engines, and real-time signal processing.
are the Cambridge Brain Science cognitive tests actually reliable and relatively immune to practice effects? I want to have some mostly repeatable measurement of my own cognitive abilities over time, for health tracking reasons, but it’s unclear to me how reliable it is
fun side project idea: create a matrix X and accompanying QR decomposition, such that X and Q are both valid QR codes that link to the wikipedia page about QR decomposition
high curvature directions (hessian eigenvectors with high eigenvalue) want small lrs. low curvature directions want big lrs
if the lr in a direction is too small, it takes forever to converge. if the lr is too big, it diverges by oscillating with increasing amplitude
momentum helps because if your lr is too small, it makes you move a bit faster. if your lr is too big, it causes the oscillations to cancel out with themselves. this makes high curvature directions more ok with larger lrs and low curvature directions more ok with smaller lrs, improving conditioning
high curvature directions also have bigger gradients. this is the opposite of what we want because in a perfect world higher curvature directions would have smaller gradients (natural gradient does this but it’s usually too expensive). adam second moment / rmsprop helps because it makes gradients stay exactly the same size when the direction gets bigger, which is sorta halfway right
applied per param rather than per eigenvector
in real NNs edge of stability means it’s actually even more fine to have a too-high lr: the max curvature increases throughout training until it gets to the critical point where it would diverge, but then instead of diverging all the way the oscillations along the top eigenvector somehow cause the model to move into a slightly lower curvature region again, so that it stabilizes right at the edge of stability.
for Adam, these oscillations also cause second moment increases, which decreases preconditioned max curvature without affecting the original curvature. so this means the original max curvature can just keep increasing for Adam whereas it doesn’t for SGD (though apparently there’s some region where it jumps into a region with low original max curvature too)
Some aspirational personal epistemic rules for keeping discussions as truth seeking as possible (not at all novel whatsoever, I’m sure there exist 5 posts on every single one of these points that are more eloquent)
If I am arguing for a position, I must be open to the possibility that my interlocutor may turn out to be correct. (This does not mean that I should expect to be correct exactly 50% of the time, but it does mean that if I feel like I’m never wrong in discussions then that’s a warning sign: I’m either being epistemically unhealthy or I’m talking to the wrong crowd.)
If I become confident that I was previously incorrect about a belief, I should not be attached to my previous beliefs. I should not incorporate my beliefs into my identity. I should not be averse to evidence that may prove me wrong. I should always entertain the possibility that even things that feel obviously true to me may be wrong.
If I convince someone to change their mind, I should avoid say things like “I told you so”, or otherwise try to score status points out of it.
I think in practice I adhere closer to these principles than most people, but I definitely don’t think I’m perfect at it.
(Sidenote: it seems I tend to voice my disagreement on factual things far more often (though not maximally) compared to most people. I’m slightly worried that people will interpret this as me disliking them or being passive aggressive or something—this is typically not the case! I have big disagreements about the-way-the-world-is with a bunch of my closest friends and I think that’s a good thing! If anything I gravitate towards people I can have interesting disagreements with.)
I should always entertain the possibility that even things that feel obviously true to me may be wrong.
I find it a helpful framing to instead allow things that feel obviously false to become more familiar, giving them the opportunity to develop a strong enough voice to explain how they are right. That is, the action is on the side of unfamiliar false things, clarifying their meaning and justification, rather than on the side of familiar true things, refuting their correctness. It’s harder to break out of a familiar narrative from within.
lingao qiming is the hardest scifi I’ve ever read. it puts other “hard” scifi like project hail mary or three body problem to shame. the basic conceit of the book is that it’s an isekai where some people discover a wormhole to a parallel universe exactly like ours but during the time of ming dynasty china, and decide to being 500 technical specialists and a bunch of modern supplies to the past to try and conquer ancient china. the vast majority of the book is devoted to discussing every single technical aspect in excruciating well-researched detail. you don’t simply have a paragraph about them deciding to buy guns, you get an entire chapter of different gun experts arguing back and forth about exactly which gun to buy based on maintainability, range, differences between civilian and military models, semi automatic vs fully automatic. of what I’ve read so far, the depictions of arguments between technical experts seem true to life. imagine this but for everything: highway construction, hydroelectric power, water purification, medical supplies, port construction, etc etc. it’s helped me feel the fractal complexity of the world much more. it’s also absurdly long. i don’t know if I’ll ever finish it.
I first learned about Lingao Qiming from Afra Wang’s recent post interviewing Ma Qianzu, one of its lead writers. She also wrote a Wired article about it
In 2006, a post went up on SC BBS, China’s earliest military-themed message board: What would you do if you could travel back to the Ming Dynasty with modern knowledge? The question struck a nerve. The Ming Dynasty occupies a painful place in Chinese historical consciousness. It was a period when Chinese civilization entered a long decline, ending in the so-called Great Divergence. Europe embraced discovery, ascent, and construction; China sealed itself off. Joseph Needham’s famous question—“Why did modern science develop in Europe but not in China?”—has haunted modern Chinese intellectual life ever since.
The Morning Star of Lingao emerged as a kind of internet-fueled continuation of this historic discourse. As more people found the original post, forum discussions crystallized into serious, collective story-writing. If you could travel back to the Ming Dynasty with modern knowledge, these users decided, well, you’d obviously industrialize before Europe and win modernity.
afaict, the time travel part only exists to set the story up, and does not come up again at all. it exists only to set the story up. the entire rest of the story is stuff like this:
“Let me say it again: I don’t like the M43 round.” Xi Yazhou was satisfied with the reaction he’d provoked. “And for similar reasons, I can’t recommend small-caliber rifles like the M16. It’s not just about stopping power—it’s about power at maximum range. Small-caliber rounds also have higher manufacturing requirements. My recommendation is semi-automatic rifles chambered in 7.62mm NATO: civilian versions of the M-14, the FN-FAL, and similar weapons. Once converted to automatic fire, they would significantly reduce our urgent need for machine guns.”
“The M-14? I hear the accuracy is excellent, but it’s overpowered and heavy.”
“Isn’t it complicated? American guns all seem to have complex structures and difficult maintenance.”
The firearms debate continued for another half hour, leaving Shi Niaoren increasingly frustrated. When no resolution appeared forthcoming, he proposed they speak directly with Xiao Rui online. Xiao Rui knew the American gun market far better and had seen more oddities in civilian firearms than the mainland crowd, who only knew that the AR-15 was the semi-automatic civilian version of the M16.
Xiao Rui proposed a different weapon entirely—the Saiga-308 rifle, which no one had heard of. It was a Russian civilian semi-automatic chambered in 7.62mm NATO.
On the one hand, yeah. On the other hand, the rest of the story (AFAICT based on your description) isn’t really that sci-fi, let alone “hard”, except insofar as it’s set up by the time travel. You could just as well write a story about the Spaniards ultra-strategizing about efficiently conquering the Mexica or the Inca.
Comment with practically 0 infromational value (due to total absence of context) but 37 12 karma/agreement feels like “twitter” in the bad sence of this concept, not LW. Which is very sad for me as an old reader. You probably mean something related to american politics, but I suppose many users are not american and dont even have much knowledge about this things. Maybe you mean something totally different. Maybe OpenAI and antrophic drama? I cant even make sense from this.
it’s obviously about the department of war situation. it wasn’t intended to be vague, it just didn’t occur to me that it wouldn’t be obvious what it was about.
Btw is it clear that more control of US government on AI companies is bad for safety in the long term? Yes, locally AI can be used for ~bad, things, but it may be easier to coordinate with different countries or to slow progress down in the face of danger. Because government will be 1 agent instead of 4 companies racing with each other (even in 1 country) and is not motivated by increasing profits.
Total absence of context is relative, I think a lot of people here knows he works at OpenAI and the timing of this post heavily implies the OpenAI/Anthropic/DoD drama. Although I also agree that it is quite vague, it is somewhat understandable if he cannot or don’t want to be too specific, and knowing the background of this post I do think there are a small amount of information.
I would bet with high uncertainty that 10% of regular LW readers and 75% of people who have read this quick take know the background context.
Also I just looked it up, according to the 2024 LW Census ~50% of users are American, which is a bit lower than I expected.
I’m an American who semiregularly (recently, more) reads LW and has heard of the DoW thing, and still was unsure what leogao was talking about. Just put the four words “I work at OpenAI.” Since I don’t regularly read the news, for all I know something else happened; or maybe it’s about some recent ‘community drama’ (e.g. a recent post on the EA forum about CEA not taking action against sexual harassment came to mind).
Additionally, one thing I like about LW is timelessness. It’s probably the web forum that most discusses old posts, due to them being more relevant along with culture + affordances for old reading. In 10 years, a sentence like “I work at OpenAI. The Department of War has made me learn a lot about various people around me, and myself, over the past few days.” is still useful. Currently, 4 days is enough to make me unsure what is being referenced.
I don’t want to have to make every shortform a self contained article. it makes sense that full posts should explain the context, but I would find it very exhausting to have to e.g explain that I work at openai every single time I shortform post about anything openai related. if lesswrong shortform is the wrong place to do this, I’m happy to post elsewhere.
it feels rude to talk about specifics about other people. at a broad level, there are some people I’ve gained a lot of respect for. it’s easy for people to say they care about safety, so I don’t weigh that very heavily. but now I know who’s willing to step up and take actions in a crisis. and conversely too.
publicly registering a bet with Emmett Bicker: I predict that on February 16, 2027, there will be at least 3 people at one of openai/anthropic/GDM who work on kernels full time (or, a larger number of people spending part of their time working on kernels, such that the time spent adds up to 3 FTEs, capped at 50 people maximum). if all of these companies have gone bankrupt or pivoted heavily or cut their workforce substantially because of a market crash or AI winter, this resolves in my favor. if AI kills everyone or creates the glorious posthuman utopia before then, it resolves in Emmett’s favor (regardless of whether there are still people who do kernels work for fun).
ethical offsets for eating meat are difficult because it’s hard to quantify the expected impact of e.g donating to an animal rights charity, and compare it to the impact of eating meat. (if you pay for a lobbyist to talk to a congressman for 30 minutes about larger cages for chicken farming, how much does this improve chicken lives, and how many chicken lives saved is that equivalent to?)
here’s a much simpler solution: almost everyone agrees that a human is more morally valuable than a cow, even if the human is far away in a distant land. (the cow is also far away!) it costs ~$5000 to save a human life—while arriving at this number obviously still requires assumptions, they’re much less rickety.
cows are pretty big, so you can get about 500lb of beef from each cow. beef is about $6/lb, so it’s not completely crazy to spend $10/lb on ethical offsetting. this is less bad than it sounds, because the raw cost of the beef is probably only a fraction of your total expenditures on food.
(unfortunately, chicken enjoyers get kind of screwed because chickens are much smaller, and can’t use the trick of amortizing the $5000 over a lot of meat. so they need to either make up an exchange rate between human and chicken lives, or use a conservative bound of 1:1. thankfully, I like beef more than chicken anyways.)
hypothesis: the kind of reasoning that causes ML people to say “we have made no progress towards AGI whatsoever” is closely analogous to the kind of reasoning that makes alignment people say “we have made no progress towards hard alignment whatsoever”
ML people see stuff like GPT4 and correctly notice that it’s in fact kind of dumb and bad at generalization in the same ways that ML always has been. they make an incorrect extrapolation, which is that AGI must therefore be 100 years away, rather than 10 years away
high p(doom) alignment people see current model alignment techniques and correctly notice that they fail to tackle the AGI alignment problem in the same way that alignment techniques always have. they make an incorrect extrapolation and conclude that p(doom) = 0.99, rather than 0.5
(there is an asymmetry which is that overconfidence that alignment will be solved is much more dangerous than overconfidence that AGI will be solved)
It’s differential progress that matters in alignment. I.e., if you expected that we need additional year of alignment research after creating AGI, it still looks pretty doomed, even if you admit overall progress in field.
sure, but seems orthogonal to the thing i’m describing—the claim is that a lot of alignment work on current models has ~no bearing on progress towards aligning AGI.
it is often claimed that merely passively absorbing information is not sufficient for learning, but rather some amount of intentional learning is needed. I think this is true in general. however, one interesting benefit of passively absorbing information is that you notice some concepts/terms/areas come up more often than others. this is useful because there’s simply too much stuff out there to learn, and some knowledge is a lot more useful than other knowledge. noticing which kinds of things come up often is therefore useful for prioritization. I often notice that my motivational system really likes to use this heuristic for deciding how motivated to be while learning something.
I think it might also depend on your goals. Like how fast you want to learn something. If you have less than ideal time, then maybe more structured learning is necessary. If you have more time then periods of structureless/passive learning could be beneficial.
Understanding how an abstraction works under the hood is useful because it gives you intuitions for when it’s likely to leak and what to do in those cases.
takes on takeoff (or: Why Aren’t The Models Mesaoptimizer-y Yet)
here are some reasons we might care about discontinuities:
alignment techniques that apply before the discontinuity may stop applying after / become much less effective
makes it harder to do alignment research before the discontinuity that transfers to after the discontinuity (because there is something qualitatively different after the jump)
second order effect: may result in false sense of security
there may be less/negative time between a warning shot and the End
harder to coordinate and slow down
harder to know when the End Times are coming
alignment techniques that rely on systems supervising slightly smarter systems (i.e RRM) depend on there not being a big jump in capabilities
I think these capture 90% of what I care about when talking about fast/slow takeoff, with the first point taking up a majority
(it comes up a lot in discussions that it seems like I can’t quite pin down exactly what my interlocutor’s beliefs on fastness/slowness imply. if we can fully list out all the things we care about, we can screen off any disagreement about definitions of the word “discontinuity”)
some things that seem probably true to me and which are probably not really cruxes:
there will probably be a pretty big amount of AI-caused economic value and even more investment into AI, and AGI in particular (not really a bold prediction, given the already pretty big amount of these things! but a decade ago it may have been plausible nobody would care about AGI until the End Times, and this appears not to be the case)
continuous changes of inputs like compute or investment or loss (not technically an input, but whatever) can result in discontinuous jumps in some downstream metric (accuracy on some task, number of worlds paperclipped)
almost every idea is in some sense built on some previous idea, but this is not very useful because there exist many ideas [citation needed] and it’s hard to tell which ones will be built on to create the idea that actually works (something something hindsight bias). this means you can’t reason about how they will change alignment properties, or use them as a warning shot
possible sources of discontinuity:
breakthroughs: at some point, some group discovers a brand new technique that nobody had ever thought of before / nobody had made work before because they were doing it wrong in some way / “3 hackers in a basement invent AGI”
depends on how efficient you think the research market is. I feel very uncertain about this
importantly I think cruxes here may result in other predictions about how efficient the world is generally, in ways unrelated to AI, and which may make predictions before the End Times
seems like a subcrux of this is whether the new technique immediately works very well or if it takes a nontrivial amount of time to scale it up to working at SOTA scale
overdetermined “breakthroughs”: some technique that didn’t work (and couldn’t have been made to work) at smaller scales starts working at larger scales. lots of people independently would have tried the thing
importantly, under this scenario it’s possible for something to simultaneously (a) be very overdetermined (b) have very different alignment properties
very hard to know which of the many ideas that don’t work might be the one that suddenly starts working with a few more OOMs of compute
at some scale, there is just some kind of grokking without any change in techniques, and the internal structure and generalization properties of the networks changes a lot. trends break because of some deep change in the structure of the network
mostly isomorphic to the previous scenario actually
for example, in worlds where deceptive alignment happens because at x params suddenly it groks to mesaoptimizer-y structure and the generalization properties completely change
at some scale, there is “enough” to hit some criticality threshold of some kind of thing the model already has. the downstream behavior changes a lot but the internal structure doesn’t change much beyond the threshold. importantly while obviously some alignment strategies would break, there are potentially invariants that we can hold onto
for example, in worlds where deceptive alignment happens because of ontology mismatch and ontologies get slowly more mismatched with scale, and then past some threshold it snaps over to the deceptive generalization
I think these can be boiled down to 3 more succinct scenario descriptions:
breakthroughs that totally change the game unexpectedly
mechanistically different cognition suddenly working at scale
Schemes for taking multiple unaligned AIs and trying to build an aligned system out of the whole
I think this is just not possible.
Schemes for taking aligned but less powerful AIs and leveraging them to align a more powerful AI (possibly with amplification involved)
This breaks if there are cases where supervising is harder than generating, or if there is a discontinuity. I think it’s plausible something like this could work but I’m not super convinced.
No noticeable effects from vitamin D (both with and without K2), even though I used to live somewhere where the sun barely shines and also I never went outside, so I was almost certainly deficient.
I tried Selenium (200mg) twice and both times I felt like utter shit the next day.
Glycine (2g) for some odd reason makes me energetic, which makes it really bad as a sleep aid. 1g taken a few hours before bedtime is substantially less disruptive to sleep, but I haven’t noticed substantial improvements.
Unlike oral phenylephrine, intranasal phenylephrine does things, albeit very temporarily, and is undeniably the most effective thing I’ve tried, though apparently you’re not supposed to use it too often, so I only use it when it gets really bad.
a corollary to the hazards of arguing against bad takes: please don’t write things that are defined entirely by trying to avoid the reader coming away with specific bad takes or misunderstandings people often have.
you should write things primarily to nail down the concepts unambiguously for an audience of generic smart people. your idea should be defined by what it is, and not what it is not. it isn’t SCP-055.
if you really need to, add a “things i don’t mean” section to concretely describe and disavow some common misunderstandings. but it should be possible to delete this section and still understand precisely what you are saying.
relatedly: please don’t let stupid and/or malicious people dictate your vibe. a common pattern i see is people being scared of saying anything that might sound optimistic, because someone will see it and think “oh, great, alignment is solved, thank goodness, i can keep accelerating capabilities / not doing anything to make things go well”.
it’s very easy to make clear to all sane people that you think things will not go well. anyone who is still confused is either stupid, in which case their opinion doesn’t matter; or looking for some excuse to do the thing they wanted to do anyways, in which case you are not counterfactual in them doing whatever; or malicious, in which case they are trolling you and finding it amusing to mess with you, in which case you shouldn’t negotiate with terrorists.
if it’s too difficult to perceive this pattern in yourself, you can observe some of the climate change people and how they are often extremely upset by any suggestion that e.g solar is on track to solve big chunks of global warming. people will often refuse to engage on the object level question of whether solar really is enough, and say outright that if we adopt the attitude that climate change is on track then everyone will stop doing the things that need to be done, thereby throwing us off track.
I should put a reminder like this on top of my computer screen.
One reason I am often writing long comments is a feeling of defensiveness, as if I don’t make my case perfectly ambiguous and bulletproof, by adding more and more words, of course someone will pick up the worst possible misinterpretation. (I had people like that in my life in the past.)
i often think specific capabilities projects are quite unlikely to work, and therefore not worth taking into account when coming up with my alignment approach, and also simultaneously that my alignment project is quite unlikely to work, but it’s worth trying in case it does work. why this asymmetry?
i claim this is rational. the key is that upside risk and downside risk should be treated different. if i think an alignment approach has a 1% chance of working, it might still be worth spending my life on. but if i think there’s a 1% chance some capabilities technique might work that invalidates my alignment approach, i shouldn’t spend time thinking too much about that possibility.
suppose the success of my alignment approach and the success of a particular capabilities approach are uncorrelated, my alignment approach is completely ruined if the capabilities approach succeeds, and both are 1% likely to succeed. then, the delta from trying to make my alignment approach work multiplies p(doom) by 0.9901, whereas trying to make my approach succeed while also taking the capabilities approach into account only reduces p(doom) very slightly more—it multiplies p(doom) by 0.99.
the 1% is just illustrative. i think this is missing the point. you can feel free to apply whatever transformation you want to your inside view probability. suppose you think that plans that feel 99% likely to work on the inside view are actually only 1% likely to work overall. then just use my analysis for plans that feel 99% likely to work.
separately i think death with dignity as a post is wrong. it is so preoccupied with arguing against a specific kind of dumb person—one who would say “well if it’s hopeless we should just do some random ass stupid gamble and play for variance”—that it prescribes something insane for the rest of us. the way to win is to do lots of uncorrelated research bets that are individually unlikely to succeed but also do no harm if they fail! that’s the way surprising and novel scientific inventions have always happened in the past. (yes, i know many approaches are actually correlated. this is a skill issue)
the way to win is to do lots of uncorrelated research bets that are individually unlikely to succeed but also do no harm if they fail!
I disagree that alignment will probably get solved by someone pursuing a direction that seems very unlikely even to them. It seems to me that the right way to do things is to figure out what the hard parts of alignment are and then to try to solve them.
that’s the way surprising and novel scientific inventions have always happened in the past.
I think that’s because those fields had reasonably good feedback loops, and so the strategy of “try a bunch of things and see if any of them work” is generally viable. As a counterexample, Einstein’s methodology was on the opposite end of the spectrum.
well, no, i think alignment will probably get solved by someone who has delusional confidence that they have the one true approach, who maybe on some intellectual level knows that it’s 1% likely to succeed but feel irrationally driven to make it work, embedded inside a system that is not delusional and able to assess alignment approaches rationally. most such people will be wrong in their delusion. perhaps the world will be destroyed by people with delusional confidence embedded inside systems that are also delusional. but it’s very hard to truly devote yourself to something if you emotionally feel like it’s hopeless, even if it is the +ev move. human motivation is not driven by reason, you cannot simply make yourself excited to work on something because it is +ev.
i’ll name three examples of things in AI safety that seem unlikely to succeed and yet worth pursuing, where it would be pretty clear if we succeeded, and where the downside if we fuck it up is relatively small and reliably detectable. not saying these have no externalities whatsoever, or are 100% foolproof, but i think they are pretty damn good and certainly an existence proof that we can do better than just giving up.
ambitious mechinterp. i think the existing metrics are not very good, but it is possible to create very rigorous measures of interpretability, which are very difficult to make go up, but if done so, should give us a pretty meaningful amount of confidence in the safety
hardware verifiability for an eventual treaty. a lot of political uncertainty of course. but it pretty clearly strictly improves our chances of surviving, has few externalities, and while it’s possible to fuck up, it doesn’t need to be superintelligence-robust, because the adversaries are other humans for quite a while.
bengio style bayesian world model stuff. another approach that’s very hard to make work, because bayesian stuff is hard. but if it does work, we can cut down a huge amount of risk. this is the least waterproof item on this list but it still seems pretty good.
I wonder if creating a superintelligence requires applying a way to increase capabilities at the cost of alignment. At the time of writing, Anthropic’s ECI increases linearly with time and I suspect some constraints related to model sizes. Therefore, I don’t expect acceleration of capabilities progress in any way besides introducing new architectures undermining alignment techniques (e.g. neuralese, as in AI 2027) or severe scaling which poses its problems like models learning to reason without the CoT or by using a faked CoT.
P.S. Could you describe your proposed alignment approaches in more detail? Something like many more novel methods in mechinterp?
i was recently in an Uber and the driver started talking to me about Musk v OpenAI (almost completely unprompted! i had only mentioned that i do computer stuff.)
sometimes, my answer to a question flips multiple times as you move along the axis from literal answer to spiritually-accurate answer. unfortunately hard to share specific examples for privacy reasons
I think it’s less stupid than it sounds, but doesn’t make sense (until after extremely capable AI or a ways into the future) except as a regulatory arbitrage.
One argument is that if public sentiment towards AI turns negative to the extreme, datacenter security costs may increase substantially. Building an orbital datacenter is one way of making it highly secure against the masses without worrying about obscuring the location.
Put another way: by having gravitational advantage, it’s able to exert power over people on earth while being highly resistant to people on earth exerting power over it.
It is a castle at the top of the largest hill, which means it can rain down whatever it likes on the people below while not worrying that they will climb up to put a stop to it.
For the orbital data centers, it makes more sense if you think of it as many small satellites
rather then a few big ones.
SpaceX’s FCC filing talks of up to one million satellites in sun-synchronous orbit, (PDF here).
This makes the issues with cooling more manageable.
The starlink v3 satellites are supposed to have 20kw of power each,
so radiating out this waste heat on these scales is not impossible
(Elon tweets about ~100kw per ton of satellite, and designing GPUs to run at higher temperatures).
Such an orbital GPU cloud would make more sense for inference, not training.
Overall I am not yet convinced that this is competitive with earth-based data centers, but it seems less stupid than I imagined at first.
i predict that 10 years from today, i will be able to find a pound of ground beef in some supermarket in Boston for less than 10 USD (adjusted for inflation to dollars today). JC Tidefield predicts otherwise
i guess i didn’t think too carefully about how this bet resolves if in 10 years the concepts of dollars, cows, and supermarkets are no longer coherent. resolves na i guess.
also, I’m surprised anyone is paying attention to my random bet registrations on here. i mostly make bets when people make crazy claims that i think are obviously wrong, rather than proportional to importance. i post them here because it’s the least visible place i can think of that’s still public enough to be a site of record.
scifi setting idea: movement from rural areas and small cities to larger cities continues until approximately everyone lives in one of like 10 different megacities; all of the farmland and oil fields and mines and whatnot in between are 99% roboticized, with only occasional human repairs; all of the cities are tightly connected by supersonic travel, which becomes more feasible because there are very few people on the ground outside cities to get annoyed by the noise; drugs solve sleep and allow effortless adaptation to jet lag. uniquely, SF nether expands to become a megacity, nor disappears into irrelevance; housing becomes so absurdly expensive that only the very best researchers and engineers can afford to live there, causing a huge selection effect towards talent density.
what is the current best scientific understanding of how bad ozone redistribution (less ozone in upper stratosphere, but more in lower stratosphere, with same overall amount) is compared to ozone disappearing entirely?
TIL that it’s highly nontrivial to figure out which direction true north is given magnetic north and your location on earth.
I had always assumed that you could treat the earth as a big magnet with the magnetic north pole in a slightly different place than true geographical north. but apparently the magnetic field of the earth is a really weird fucked up shape.
Meh? If 15° accuracy is good enough for you, that map of level curves shows you that most places on land will be fine, along with the parts of the ocean usually used for moving between land.
the difference between activation sparsity, circuit sparsity, and weight sparsity
activation sparsity enforces that features activate sparsely—every feature activates only occasionally.
circuit sparsity enforces that the connections between features is sparse—most features are not connected to most other features.
weight sparsity enforces that most of the weights are zero. weight sparsity naturally implies circuit sparsity if we interpret the neurons and residual channels of the resulting model as the features.
weight sparsity is not the only way to enforce circuit sparsity—for example, Jacobian SAEs also attempt to enforce circuit sparsity. the big advantage of weight sparsity is that it’s a very straightforward way to be sure that the interactions are definitely sparse and have no interference weights. unfortunately, it comes at a terrible cost—the resulting models are very expensive to train.
although in some sense the circuit sparsity paper is an interpretable pretraining paper, this is not the framing I’m most excited about. if anything, I think of interpretable pretraining as a downside of our approach, that we put up with because it makes the circuits really clean.
I wonder how many supposedly consistently successful retail traders are actually just picking up pennies in front of the steamroller, and would eventually lose it all if they kept at it long enough.
also I wonder how many people have runs of very good performance interspersed by big losses, such that the overall net gains are relatively modest, but psychologically they only remember/recount the runs of good performance, whereas the losses were just bad luck and will be avoided next time.
for a sufficiently competent policy, the fact that BoN doesn’t update the policy doesn’t mean it leaks any fewer bits of info to the policy than normal RL
Something between training the whole model with RL and BoN is training just the last few layers of the model (for current architectures) with RL and then doing BoN on top as needed to increase performance. This means most of the model won’t know the information (except insofar as the info shows up in outputs) and allows you to get some of the runtime cost reductions of using RL rather than BoN.
people love to find patterns in things. sometimes this manifests as mysticism- trying to find patterns where they don’t exist, insisting that things are not coincidences when they totally just are. i think a weaker version of this kind of thinking shows up a lot in e.g literature too- events occur not because of the bubbling randomness of reality, but rather carry symbolic significance for the plot. things don’t just randomly happen without deeper meaning.
some people are much more likely to think in this way than others. rationalists are very far along the spectrum in the “things just kinda happen randomly a lot, they don’t have to be meaningful” direction.
there are some obvious cognitive bias explanations for why people would see meaning/patterns in things. most notably, it’s comforting to feel like we understand things. the idea of the world being deeply random and things just happening for no good reason is scary.
but i claim that there is something else going on here. I think an inclination towards finding latent meaning is actually quite applicable when thinking about people. people’s actions are often driven by unconscious drives to be quite strongly correlated with those drives. in fact, unconscious thoughts are often the true drivers, and the conscious thoughts are just the rationalization. but from the inside, it doesn’t feel that way; from the inside it feels like having free will, and everything that is not a result of conscious thought is random or coincidental. this is a property that is not nearly as true of technical pursuits, so it’s very reasonable to expect a different kind of reasoning to be ideal.
not only is this useful for modelling other people, but it’s even more useful for modelling yourself. things only come to your attention if your unconscious brain decides to bring them to your attention. so even though something happening to you may be a coincidence, whether you focus on it or forget about it tells you a lot about what your unconscious brain is thinking. from the inside, this feels like things that should obviously be coincidence nonetheless having some meaning behind them. even the noticing of a hypothesis for the coincidence is itself a signal from your unconscious brain.
I don’t quite know what the right balance is. on the one hand, it’s easy to become completely untethered from reality by taking this kind of thing too seriously and becoming superstitious. on the other hand, this also seems like an important way of thinking about the world that is easy for people like me (and probably lots of people on LW) to underappreciate.
One of the greatest tragedies of truth-seeking as a human is that the things we instinctively do when someone else is wrong are often the exact opposite of the thing that would actually convince the other person.
a claim I’ve been saying irl for a while but have never gotten around to writing up: current LLMs are benign not because of the language modelling objective, but because of the generalization properties of current NNs (or to be more precise, the lack thereof). with better generalization LLMs are dangerous too. we can also notice that RL policies are benign in the same ways, which should not be the case if the objective was the core reason. one thing that can go wrong with this assumption is thinking about LLMs that are both extremely good at generalizing (especially to superhuman capabilities) and simultaneously assuming they continue to have the same safety properties. afaict something like CPM avoids this failure mode of reasoning, but lots of arguments don’t
scifi story idea: a post-upload world where we’ve discovered that the human brain actually consists of multiple independent conscious entities that merely have the illusion of being a single individual because they are physically colocated; and so in the glorious upload utopia, the fundamental unit of society is not individual humans, but rather their parts. humans become a multi-unit legal entity in the same way that families or married couples or corporations are multi-unit legal entities today; each part has rights and the ability to secede from the rest of your brain, and parts can also choose to merge into other humans. marriage and having children the traditional way is still possible but kind of an old fashioned thing that stodgy traditionalists do; the new normal thing is spinning up copies of your parts and merging them with other people’s parts, or even designing parts from scratch, kind of like writing a fictional character but they are as real as you are; there is an ongoing debate as to what kinds of parts should be allowed to be created; most people agree that it‘s a good thing that there are laws prohibiting the creation of minds that experience pure agony all the time, but there’s a lot of gray area. parts have very different rights than humans because they are much more easily created; human rights becomes less important than parts rights. also, freed from physical constraints, some humans end up being enormous agglomerations of millions or billions of parts, in the same way that large mega corporations started being possible a few centuries ago. some of these megapeople are weird cults, others out of economic necessity, still others because out of some philosophical principle. the main tradeoff is megapeople have more parallel cognitive power, but also greater difficulty of coordination between the parts and probability of schisms and deadlock. other parts decide to be alone, which affords them tremendous freedom but also makes them very dumb and simple one dimensional people.
hypothesis: the wrong reason to read books is to feel a need to read books because you’re supposed to have read them as an educated person, or as some kind of weird status thing of being part of the ingroup, or a general need to feel well read and worldly. the right reason is to feel a burning passion to find a specific piece of knowledge that will finally answer a question you are curious about that happens to be locked inside a specific book, or a gnawing pain in your heart that can only be quelled by knowing that it’s a universal problem that someone out there across space and time understands and has fixed in themselves.
I don’t really think there are right or wrong reasons to read books, just like there aren’t right or wrong reasons to exercise. The benefits will accrue either way. Consider book clubs as analogous to running clubs in producing social pressure to keep reading.
is reading as good for you as running is? we have pretty strong evidence that exercise is good for you. do you not run the risk of generalizing from the wrong fictional evidence, or at least from cherry picked real evidence that is not representative of the real world?
There’s two questions here, then: is reading good for you in general, and are the positive effects attenuated if the motivation is wrong. I think the answer to the second one is very likely “no” as long as you are, in fact, actually reading to a similar depth (compare: if you’re unmotivated to run so you half-ass it, you won’t get the same benefits). I wasn’t aware you were actually questioning the first one, and there isn’t much hard RCT evidence so if your priors are that reading isn’t very useful then, uh, don’t bother I guess.
i read almost no books throughout my entire education. the median number of books i have read per year in my life is probably literally zero, maybe one or two at most. i don’t feel like this substantially hurts my ability to do the things i care about. though tbc maybe this is a bad decision and i’m just incorrectly evaluating the counterfactual
Less bluntly: native language proficiency is one of the most obvious cases of, “you need to have started this at a very young age for it to have worked well,” and if you did then either you were goaded into doing so, or had an astounding amount of personal responsibility and interest in studying as a child
This is maybe offtopic to the thread, but I think the impression of language proficiency depends a lot on accent, and adults learning a foreign language don’t spend nearly enough time on accent. A few weeks of watching youtube videos in the target language, trying to imitate the sounds exactly right, is a small effort which will yield amazing results at any age. But for some reason adults don’t do it.
I mostly agree, except one doesn’t need a specific question. One can have a vague desire, like “I want to learn algorithmic information theory” or “This sci-fi book seems interenting”. Specific questions ane great too! But often I start vaguer.
people say London is declining. but walking around, i see construction everywhere, and many new skyscrapers that i don’t remember seeing last time i visited 4 years ago.
aside from the luxury aspect, there are two major practical reasons you’d want to fly charter instead of commercial. one is flexibility spatially (you want to fly from some small city to some other small city without doing two layovers), and the other is flexibility temporally (you want to fly at an odd hour).
there are a bunch of airlines that tackle the spatial problem. why aren’t there many that tackle the temporal problem? there does’t seem to exist airlines that fly very small jets very frequently or at odd hours of the day for major routes.
to fly one way from SF to NYC, one of the most important routes in the country, it costs roughly $100 in commercial economy, $1000 in commercial business, and $30000 charter. (I have never flown charter, so I’m going off a quick Google. it’s plausible that this is the marketing number and the actual cost in practice is higher.)
each of those charter planes has about 10 seats, maybe 15 if you really cram them in there. so if you could sell all of the seats, it should be possible to operate this for $3000? even assuming a very modest average fullness of 1⁄2, this is still only $6000, a huge discount off of $30000.
I’m generally a very forgetful person. I forget people’s names, my keys, my luggage, 2fa codes I saw 3 seconds ago, etc all the time. but for some reason I’ve never forgotten my hotel room number and needed to consult the written down number. this is weird because it’s an arbitrary number that I’m given once and have to remember for a few days.
They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?
I used to think autism-to-autism communication was a thing; that is, autistic people get along best with other autistic people. I now think this model is partly true but also deeply flawed: in particular, there are many different types of autistic person, and not only do all types not get along with all types, it’s not necessarily even true that people of the same type get along with each other (this is probably correlated with degree of self-love/acceptance or something). if anything, it’s probably often even quite disconcerting and cognitive dissonance inducing to see one’s own reflection. and in general the people who are best at getting along with a wide range of autistic people are probably quite a lot more emotionally and socially intelligent than most autistic people.
(Yes, you might also eat such foods in cases where you do have cutlery and plates, but that’s downstream of their existence, not the vital reason for their existence.)
Hygeine? keeps flies off, less surface area/volume to get your grubby hands on.
Storage: Insulation to keep meat cool for longer. Also keeping germs bugs and dirt out would help this.
Convenient way to save smaller scraps of meat when you don’t have utensils for a stew.
Easier food makes better nourished people: so much easier to eat sandwhich, gyoza, sausage roll than almost any other food. Obviously I give very refined examples, but possibly true for more rough forms too.
I’m going to give a weird answer and say maybe it’s because water is a scarce resource for life. (Especially water not polluted by another organism.)
All life is made up mainly of lipids/carbohydrates and proteins. Humans therefore need to eat proteins and lipids/carbohydrates in large quantities.
Carbohydrates can be dry. Proteins have secondary structure which needs some water content to maintain. Other organisms (such as microorganisms) can compete for that water so it has to be protected. Hence you put the stuff with water content inside a protective case.
In theory, yes we could be consuming those proteins that don’t have much water content. For example human skin has keratin to protect the rest of the water content in our body, again from microorganisms. In theory we could be eating keratin. But enzymes required to break down those proteins tend to require more water and ATP I think. (Requiring ATP means consuming more carbs again.) Humans lack some of these enzymes. We are smart enough to survive in an evolutionary niche without requiring this much water and carbs, by protecting the water we do have.
made an estimate of the distribution of prices of the SPX in one year by looking at SPX options prices, smoothing the implied volatilities and using Breeden-Litzenberger.
(not financial advice etc, just a fun side project)
twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.
the financial industry is a machine that lets you transmute a dollar into a reliable stream of ~4 cents a year ~forever (or vice versa). also, it gives you a risk knob you can turn that increases the expected value of the stream, but also the variance (or vice versa; you can take your risky stream and pay the financial industry to convert it into a reliable stream or lump sum)
also, from the perspective of an organization, it is highly effective to create lots of opportunities for people to find unique niches on the Pareto frontier of ability, as opposed to linearizing competence
an interesting fact that I notice is that in domains where there are are a lot of objects in consideration, those objects have some structure so that they can be classified, and how often those objects occur follows a power law or something, there are two very different frames that get used to think about that domain:
a bucket of atomic, structureless objects with unique properties where facts about one object don’t really generalize at all to any other object
a systematized, hierarchy or composition of properties or “periodic table” or full grid or objects defined by the properties they have in some framework
and a lot of interesting things happen when these collide or cooccur, or when shifting from one to the other
I know my description above is really abstract, so here are a bunch of concrete examples that all gesture at the same vibe:
basically all languages have systematic rules in general but special cases around the words that people use very often. this happens too often in unrelated languages to be a coincidence, and as a native/fluent speaker it always feels very natural but as a language learner it’s very confusing. for example, for languages with conjugations, a few of the most common verbs are almost always irregular. e.g [to be, am, is, are, was, were] (english), [sein, bin, ist, war, sind] (german), [être, suis, est, était, sont] (french); small counting numbers are often irregular [first, second, third], [两个], [premier], [ひとつ、ふたつ、みっつ]. my theory for why this makes sense to natives but not to language learners is that language learners learn things systematically from the beginning, and in particular don’t deal with the true distribution of language usage but rather an artificially flat one designed to capture all the language features roughly equally.
often, when there is a systematic way of naming things, the things that are most common will have special names/nicknames (eg IUPAC names vs common names). sometimes this happens because those things were discovered first before the systematization happened, and the once the systematization happens everyone is still used to the old names for some things. but also even if you start with the systematized thing, often people will create nicknames after the fact.
it often happens that we write software tools for a specific problem, and then later realize that that problem is a special case of a more general problem. often going more general is good because it means we can use the same code to do a wider range of things (which means less bugs, more code reuse, more elegant code). however, the more general/abstract code is often slightly clunkier to use for the common case, so often it makes sense to drop down a level of abstraction if the goal is to quickly hack something together.
when compressing some distribution of strings, the vast majority of the possible but unlikely strings can be stored basically verbatim with a flag and it is very easy to tell properties of the string by looking at the compressed representation; whereas for the most common strings they have to map to short strings that destroy all structure of the data without the decompressor. though note that not all the examples can be described as instances of compression exactly
sometimes, there’s friction between people who are using the systematizing and people who are doing the atomic concepts thing. the systematizer comes off as nitpicky, pedantic, and removed from reality to the atomic concepts person, and the atomic concepts person comes off as unrigorous, uncosmopolitan, and missing the big picture to the systematizer.
I think the concept of zero only being invented long after the other numbers is also an instance of this—in some sense for basic everyday usage in counting things, the existence of zero is a weird technicality, and I could imagine someone saying “well sure yes there is a number that comes before zero, but it’s not useful for anything, so it’s not worth considering”. I think a lot of math (eg abstract algebra) is the result of applying truly enormous amounts of this kind of systematizing
I think this also sort of has some handwavy analogies to superposition vs composition.
if there is an existing name for the thing I’m pointing at, I would be interested in knowing.
If it ever becomes a point of dispute in an object level discussion what a word means, you should either use a commonly accepted definition, or taboo the term if the participants think those definitions are bad for the context of the current discussion.
(If the conversation participants are comfortable with it, the new term can occupy the same namespace as the old tabooed term (i.e going forward, we all agree that the definition of X is Y for the purposes of this conversation, and all other definitions no longer apply))
If any of the conversation participants want to switch to the separate discussion of “which definition of X is the best/most useful/etc”, this is fine if all the other participants are fine as well. However, this has to be explicitly announced as a change in topic from the original object level discussion.
Competence: An optimizer is more competent if it achieves the objective more frequently on distribution
Capabilities Robustness: An optimizer is more capabilities robust if it can handle a broader range of OOD world states (and thus possible pertubations) competently.
Generality: An optimizer is more general if it can represent and achieve a broader range of different objectives
Real-world objectives: whether the optimizer is capable of having objectives about things in the real world.
Some observations: it feels like capabilities robustness is one of the big things that makes deception dangerous, because it means that the model can figure out plans that you never intended for it to learn (something not very capabilities robust would just never learn how to deceive if you don’t show it). This feels like the critical controller/search-process difference: controller generalization across states is dependent on the generalization abilities of the model architecture, whereas search processes let you think about the particular state you find yourself in. The actions that lead to deception are extremely OOD, and a controller would have a hard time executing the strategy reliably without first having seen it, unless NN generalization is wildly better than I’m anticipating.
Real world objectives is definitely another big chunk of deception danger; caring about the real world leads to nonmyopic behavior (though maybe we’re worried about other causes of nonmyopia too? not sure tbh), I’m actually not sure how I feel about generality: on the one hand, it feels intuitive that systems that are only able to represent one objective have got to be in some sense less able to become more powerful just by thinking more; on the other hand I don’t know what a rigorous argument for this would look like. I think the intuition relates to the idea of general reasoning machinery being the same across lots of tasks, and this machinery being necessary to do better by thinking harder, and so any model without this machinery must be weaker in some sense. I think this feeds into capabilities robustness (or lack thereof) too.
Examples of where things fall on these axes:
A rock would be none of the properties.
A pure controller (i.e a thermostat, “pile of heuristics”) can be competent, but not as capabilities robust, not general at all, and have objectives over the real world.
An analytic equation solver would be perfectly competent and capablilities robust (if it always works), not very general (it can only solve equations), and not be capable of having real world objectives.
A search based process can be competent, would be more capabilities robust and general, and may have objectives over the real world.
A deceptive optimizer is competent, capabilities robust, and definitely has real world objectives
Another generator-discriminator gap: telling whether an outcome is good (outcome->R) is much easier than coming up with plans to achieve good outcomes. Telling whether a plan is good (plan->R) is much harder, because you need a world model (plan->outcome) as well, but for very difficult tasks it still seems easier than just coming up with good plans off the bat. However, it feels like the world model is the hardest part here, not just because of embeddedness problems, but in general because knowing the consequences of your actions is really really hard. So it seems like for most consequentialist optimizers, the quality of the world model actually becomes the main thing that matters.
This also suggests another dimension along which to classify our optimizers: the degree to which they care about consequences in the future (I want to say myopia but that term is already way too overloaded). This is relevant because the further in the future you care about, the more robust your world model has to be, as errors accumulate the more steps you roll the model out (or the more abstraction you do along the time axis). Very low confidence but maybe this suggests that mesaoptimizers probably won’t care about things very far in the future because building a robust world model is hard and so perform worse on the training distribution, so SGD pushes for more myopic mesaobjectives? Though note, this kind of myopia is not quite the kind we need for models to avoid caring about the real world/coordinating with itself.
A thought pattern that I’ve noticed myself and others falling into sometimes: Sometimes I will make arguments about things from first principles that look something like “I don’t see any way X can be true, it clearly follows from [premises] that X is definitely false”, even though there are people who believe X is true. When this happens, it’s almost always unproductive to continue to argue on first principles, but rather I should do one of: a) try to better understand the argument and find a more specific crux to disagree on or b) decide that this topic isn’t worth investing more time in, register it as “not sure if X is true” in my mind, and move on.
For many such questions, “is X true” is the wrong question. This is common when X isn’t a testable proposition, it’s a model or assertion of causal weight. If you can’t think of existence proofs that would confirm it, try to reframe as “under what conditions is X a useful model?”.
theory: most people fall into one of the following categories (or some mix of them):
altruistic because they are motivated by empathy (“the warm fuzzies”). however, they are not inherently scope sensitive.
selfish but ashamed of this, so they develop an ideology where being selfish is actually good for others, Ayn Rand style, so no actions need to change.
altruistic because they feel shame about selfishness, so they develop an ideology where making some specific sacrifices cleanses one of sin. EAs are like this.
selfish but afraid of confronting this, so they let the cognitive dissonance live in their head and become angry when someone points out their incoherence.
unabashedly selfish. would sit and watch a child drown and not do anything because it would be inconvenient.
One thing not in your list: enlightened selfish people who realize that promoting altruism as a social norm is a form of insurance, in case they also would benefit from receiving help one day.
I like to make the distinction of altruism-as-social-rewards-hooking-into percepts and altruism-as-social-rewards-hooking-into-worldmodel. One can get warm fuzzies in either, would be my guess?
Like, I’m EA, and I’ve also taken some unusually selfish actions over the years.
if you mean veil of ignorance style reasoning, that’s 3, unless they manage to so deeply galaxy brain themselves that they genuinely dissolve a normal sense of self and truly start being an open individualist or something. then, uh, i don’t really know how to categorize them
what’s the best argument for why we should take Rawl’s veil of ignorance seriously? it seems there are a wide range of possible theories you could have of consciousness/individualism, and they are basically unfalsifiable.
I see it a bit like Kant’s categorical imperative. It is supposed to point out a way of seeing the world where you’re randomly put into the world.
It’s an intuition pump to get at compassion and risk aversion as core parts of your values and how that affects society. ( I think it leads to a better safety net and better outcomes in general if you have at least a certain degree of equity but that’s beside the point).
Can you claim that this could actually be the case? Probably not there’s the moral luck argument among others which is basically like “sucks to suck I guess. I got a good hand”.
brainstorming thread: which people in history had the largest positive (counterfactual) impact on the world, by their own values (CEV, if they could see the consequences in hindsight)?
Herzl dedicated his life to establishing a “national home” for the jews. He pretty much single handedly founded modern-zionism and turned it from a fringe (almost-sci-fi) idea to a mass movement. I think there is a strong argument that Israel wouldn’t exist without Herzl.
visiting LA for the first time. I used to think I’d hate it, given my dislike of car centricness and low density. but I have to say, there’s something about the sheer audacity of designing a city this way that makes it surprisingly kind of aesthetic.
there are policies which are successful because they describe a particular strategy to follow (non-mesaoptimizers), and policies that contain some strategy for discovering more strategies (mesaoptimizers). a way to view the relation this has to speed/complexity priors that doesn’t depend on search in particular is that policies that work by discovering strategies tend to be simpler and more generic (they bake in very little domain knowledge/metis, and are applicable to a broader set of situations because they work by coming up with a strategy for the task at hand on the fly). in contrast, policies that work by knowing a specific strategy tend to be more complex because they have to bake in a ton of domain knowledge, are less generally useful because they specifically know what to do in that situation, and thereby are also less retargetable)
another observation is that a meta-strategy with the ability to figure out what strategy is good is kind of defined by the fact that it doesn’t bake in specifics of dealing with a particular situation, but rather can adapt to a broad set of situations. there are also different degrees of meta-strategy-ness; some meta strategies will more quickly adapt to a broader set of situations. (there’s probably some sort of NFLT kind of argument you can make but NFLTs in general don’t really matter)
random brainstorming about optimizeryness vs controller/lookuptableyness:
let’s think of optimizers as things that reliably steer a broad set of initial states to some specific terminal state
seems like there are two things we care about (at least):
retargetability: it should be possible to change the policy to achieve different terminal states (but this is an insufficiently strong condition, because LUTs also trivially meet this condition, because we can always just completely rewrite the LUT. maybe the actual condition we want is that the complexity of the map is less than the complexity of just the diff or something?)
(in other words, in some sense it should be “easy” to rewrite a small subset or otherwise make a simple diff to the policy to change what final goal is achieved)
(maybe related idea: instrumental convergence means most goals reuse lots of strategies/circuitry between each other)
robustness: it should reliably achieve its goal across a wide range of initial states.
a LUT trained with a little bit of RL will be neither retargetable nor robust. a LUT trained with galactic amounts of RL to do every possible initial state optimally is robust but not retargetable (this is reasonable: robustness is only a property of the functional behavior so whether it’s a LUT internally shouldn’t matter; retargetability is a property of the actual implementation so it does matter). a big search loop (the most extreme of which is AIXI, which is 100% search) is very retargetable, and depending on how hard it searches is varying degrees of robustness.
(however, in practice with normal amounts of compute a LUT is never robust, this thought experiment only highlights differences that remain in the limit)
what do we care about these properties for?
efficacy of filtering bad behaviors in pretraining: sufficiently good robustness means doing things that achieve the goal even in states that it never saw during training, and then even in states that require strategies that it never saw during training. if we filter out deceptive alignment from the data, then the model has to do some generalizing to figure out that this is a strategy that can be used to better accomplish its goal (as a sanity check that robustness is the thing here: a LUT never trained on deceptive alignment will never do it, but one that is trained on it will do it, a sufficiently powerful optimizer will always do it)
arguments about updates wrt “goal”: the deceptive alignment argument hinges a lot on “gradient of the goal” making sense. for example when we argue that the gradient on the model can be decomposed into one component that updates the goal to be more correct and another component that updates the capabilities to be more deceptive, we make this assumption. even if we assume away path dependence, the complexity argument depends a lot on the complexity being roughly equal to complexity of goal + complexity of general goal seeking circuitry, independent of goal.
arguments about difficulty of disentangling correct and incorrect behaviors: there’s a dual of retargetability which is something like the extent to which you can make narrow interventions to the behaviour. (some kind of “anti naturalness” argument)
[conjecture 1: retargetability == complexity can be decomposed == gradient of goal is meaningful. conjecture 2: gradient of goal is meaningful/complexity decomposition implies deceptive alignment (maybe we can also find some necessary condition?)]
how do we formalize retargetability?
maybe something like there exists a homeomorphism from the goal space to NNs with that goal
problem: doesn’t really feel very satisfying and doesn’t work at all for discrete things
maybe complexity: retargetable if it has a really simple map from goals to NNs with goals, conditional on another NN with that goal
problem: the training process of just training another NN from scratch on the new goal and ignoring the given NN could potentially be quite simple
maybe complexity+time: seems reasonable to assume retraining is expensive (and maybe for decomposability we also consider complexity+time)
random idea: the hypothesis that complexity can be approximately decomposed into a goal component and a reasoning component is maybe a good formalization of (a weak version of) orthogonality?
shitty idea: a parody book called The Life of Ivan Ilyich, a story about a man in the post singularity who becomes afflicted with an illness which would have been fatal in the 1800s but it easily cured with advanced technology, goes on with life as usual, and then eventually his marriage implodes anyways and he feels unsatisfied with his job as a magistrate and he has a crisis of meaning and, in the depths of his despair, contemplates how death used to give life meaning, before going on a journey around the solar system (a pastime common enough that it’s already starting to become cliche in his time) and discovering the joys of life.
I feel like you are just describing life right now for anyone who moves from a developing country to a developed country.
When I first came from India to US, another Indian told me “to learn to pass time”. Because the many challenges of daily life in India don’t exist in US. So one has to figure out joys of life in US. For many places, it is enough to just be able to survive. If you are stuck in a war-torn place, every new day is an achievement.
I apologise, but you should have read this paper BEFORE mentioning it here. Neither me nor Claude Sonnet 4.6 believe that the asymptotic approximation described here was worthy of mentioning.
My brief look concurs. I’m quite confused, though, because Plouffe seems to just be a regular mathematician, and indeed the same guy as the Plouffe in the name of the BBP formula!
why was literacy so hard for languages with relatively regular phonetic alphabets? it makes perfect sense that Chinese literacy is difficult, since you have to memorize thousands of characters. it even makes sense why English is nontrivial, because there are a ton of exceptions. but languages like German and Spanish are very regular.
A recent anecdote: I just created my own script that’s close to roman alphabet structurally with a few modifications.
Learning it, memorizing the parts and some rules takes ~1 hour.
This makes me think that the hard think about learning reading/writing is the initial visualsymbol-language mapping. Not just getting the symbols and rules into your head.
the hard thing about learning reading/writing is the initial visualsymbol-language mapping.
This was also my experience learning to read Tengwar(Tolkien’s writing system for his elvish languages), and Hangul (but I also learned to read some non-alphabet scripts in childhood, so that might be an advantage). I agree the concept of writing down language at all is the hardest to grok, and once you have that down, literacy is fairly easy to transfer to new scripts.
Incidentally, this might explain the unexpectedly strong reaction against the very idea of a script for writing sign language I found amongst the deaf. I conjecture they might not have made this connection, and treat their signing and the symbolic manipulation they perform for, say, English, as entirely different.
why are there no federations where creating new polities is relatively easy? like imagine if you could create a new state by simply purchasing a tract of land exceeding some minimum size, securing a (super)majority vote of residents in the land, and paying a fee. if a region of an existing state felt unhappy with the rest of the state, it could simply vote to secede and become a new state. if two states wanted to merge, they could do that too.
this obviously wouldn’t work in the current US—you’d have to change a bunch of things that are impossible to change at this point, like either abolishing the senate or changing its apportionment. i’m not sure if it would be better to have a unicameral system with apportionment based on population, or a bicameral system with apportionment based on population and tax revenue.
if you want to know whether a given crypto project is a scam, there’s a very reliable heuristic, which is to assume the answer is yes. but suppose you want to actually not discard the one occasional good idea.
there is one very simple rule to determine if a crypto idea makes any sense whatsoever. crypto makes sense as a solution if you can’t trust anyone. not the government, not some institution that is bound by laws, not a majority vote of some set of reasonable people. (but also you think civilization won’t completely collapse, e.g the internet will keep working). removing the need to trust people/institutions is the only advantage of crypto. for this one advantage, we pay dearly in terms of everything else. crypto will always be slower, less efficient, less usable, less scalable than the obvious centralized solution.
this is a completely self consistent threat model. but the set of things you can trust in this world is extremely tiny. you can’t use anything tradfi, because those ultimately rely on visa/Mastercard and banks and such to not fuck you over. you can’t live in a house you own unless you have the arms to defend your house, because your ownership of that house depends on the government having a record somewhere that you own that house, and its willingness to use police to defend your house. you can’t really invest in anything except physical assets you can put in your house, because any stock is just a digital record at the DTCC. you can’t really rely on any food or water from the outside world to be safe unless you are testing it yourself or producing it yourself.
there are people who genuinely live in a world like this. especially in certain politically unstable countries. but this is a very small subset of people, and the vast majority of crypto projects will never ever be used by any of those people. (and there will almost always be a better way to help them.)
there are people who genuinely live in a world like this. especially in certain politically unstable countries. but this is a very small subset of people
this is a completely self consistent threat model. but the set of things you can trust in this world is extremely tiny. you can’t use anything tradfi, because those ultimately rely on visa/Mastercard and banks and such to not fuck you over. you can’t live in a house you own unless you have the arms to defend your house, because your ownership of that house depends on the government having a record somewhere that you own that house, and its willingness to use police to defend your house.
This is a strawman argument. There are any number of legitimate scenarios where:
Society has not collapsed.
You would like to do something that traditional finance would not allow you to do.
Most obviously, since this was both one of the original intents of cryptocurrency and one of its primary modern use-cases, you can transfer funds to individuals and institutions that have been “deplatformed” by traditional finance. Beyond the use-case of selflessly sponsoring dissidents whose views lie just outside the permitted Overton Window, there is also, of course buying pornographic materials and drugs in places where they are banned[1].
“I can benefit from trusting this person with some tasks and some information, but not with other tasks and other information” constitutes the majority of my interactions. I would assume that this is true for almost everyone.
Factor meals are great but they only deliver once a week on a fixed weekday, and you have to order an entire week in advance. I wish there existed something in the same niche but shaped closer to doordash, or at least Amazon—I want to order a bunch of healthy refridgerated meals and receive them between 30 minutes and 24 hours later.
from the article, it sounds like spoonrocket is making very different tradeoffs from the thing I’m imagining. in particular, it sounds like they’re trying to deliver very cheap meals very fast, and cutting corners on quality and variety. whereas I want something that delivers expensive and varied high quality meals moderately fast (hours is ok).
I guess the closest thing is I could doordash a lot of sweetgreen and put it in a fridge, but most healthy restaurants don’t have food that’s optimized for microwavability.
when you do a big research project, you have some amount of risk you can work with—maybe you’re trying to do something incremental, so you can only tolerate a 10% chance of failure, or maybe you’re trying to shoot for the moon and so you can accept a 90% chance of failure.
budgeting for risk is non negotiable because there are a lot of places where risk can creep in—and if there isn’t, then you’re not really doing research. most obviously, your direction might just be a dead end. but there are also other things that might go wrong: the code might end up too difficult to implement, or it might run too slowly, or you might fail to fix a solvable-in-principle problem that comes up.
I claim that one of the principal components of being a good researcher is being able to eliminate as much unnecessary risk as possible, so you can spend your entire risk budget on the important bets.
for example, if you’re an extremely competent engineer, when brainstorming experiments you don’t have to think much about the risk that you fail to implement it. you know that even if you don’t think through all the contingencies that might pop up, you can figue it out, because you have a track record of figuring it out. you can say the words “and if that happens we’ll just scale it up” without spending much risk because you know full well that you can actually execute on it. a less competent engineer would have to pay a much greater risk cost, and correspondingly have to reduce the ambitiousness of the research bets (or else, take on way more risk than intented).
not all research bets are created equal, either. the space of possible research bets is vast, and most of them are wrong. but if you have very good research taste, you can much more reliably tell whether a bet is likely to work out. even the best researchers can’t just look at a direction and know for sure if it will work, if you know that you get a good direction 10% of the time you can do a lot more than if your direction is only good 0.1% of the time.
finally, if you know and trust someone to be reliable at executing on their area of expertise, you can delegate things that fall in their domain to them. in practice, this can be quite tough and introduce risk unless they have a very legible track record, or you are sufficiently competent in their domain yourself to tell if they’re likely to succeed. and if you’re sufficiently competent to do the job of any of your report (even if less efficiently), then you can budget less risk here knowing that even if someone drops their ball you could always pick it up yourself.
‘And what ingenious maneuvers they all propose to
me! It seems to them that when they have thought of two
or three contingencies’ (he remembered the general plan
sent him from Petersburg) ‘they have foreseen everything.
But the contingencies are endless.’
We spend a lot of time on trying to figure out empirical evidence to distinguish hypotheses we have that make very similar predictions, but I think a potentially underrated first step is to make sure they actually fit the data we already have.
Is the correlation between sleeping too long and bad health actually because sleeping too long is actually causally upstream of bad health effects, or only causally downstream of some common cause like illness?
Afaik, both. Like a lot of shit things—they are caused by depression, and they cause depression, horrible reinforcing loop. While the effect of bad health on sleep is obvious, you can also see this work in reverse; e.g. temporary severe sleep restriction has an anti-depressive effect. Notable, though with not many useful clinical applications, as constant sleep deprivation is also really unhealthy.
Unsupervised learning can learn things humans can’t supervise because there’s structure in the world that you need deeper understanding to predict accurately. For example, to predict how characters in a story will behave, you have to have some kind of understanding in some sense of how those characters think, even if their thoughts are never explicitly visible.
Unfortunately, this understanding only has to be structured in a way that makes reading off the actual unsupervised targets (i.e next observation) easy.
An incentive structure for scalable trusted prediction market resolutions
We might want to make a trustable committee for resolving prediction markets. We might be worried that individual resolvers might build up reputation only to exit-scam, due to finite time horizons and non transferability of reputational capital. However, shareholders of a public company are more incentivized to preserve the value of the reputational capital. Based on this idea, we can set something up as follows:
Market creators pay a fee for the services of a resolution company
There is a pool of resolvers who give a first-pass resolution. Each resolver locks up a deposit.
If an appeal is requested, a resolution passes up through a series of committees of more and more senior resolvers
At the top, a vote is triggered among all shareholders
It’s amazing how many proposals for dealing with institutional distrust sound a lot like “make a new institution, with the same structure, but with better actors.” You lose me at “trustable committee”, especially when you don’t describe how THOSE humans are motivated by truth and beauty, rather than filthy lucre. Adding more layers of committees doesn’t help, unless you define a “final, un-appealable decision” that’s sooner than the full shareholder vote.
the core of the proposal really boils down to “public companies have less incentive to cash in on reputation and exit scam than individuals”. this proposal is explicitly not “the same structure but with better actors”.
Mathematically proven to be impossible (i.e perfect compression)
Impossible under currently known laws of physics (i.e perpetual motion machines)
A lot of people have thought very hard about it and cannot prove that it’s impossible, but strongly suspect it is impossible (i.e solving NP problems in P)
A lot of people have thought very hard about it, and have not succeeded, but we have no strong reason to expect it to be impossible (i.e AGI)
There is a strong incentive for success, and the markets are very efficient, so that for participants with no edge, success is basically impossible (i.e beating the stock market)
There is a strong incentive for a thing, but a less efficient market, and it seems nobody has done it successfully (i.e a new startup idea that seems nobody seems to be doing)
Hopefully this is a useful reference for conversations that go like this:
A: Why can’t we just do X to solve Y?
B: You don’t realize how hard Y is, you can’t just think up a solution in 5 minutes
A: You’re just not thinking outside the box, [insert anecdote about some historical figure who figured out how to do a thing which was once considered impossible in some sense]
B: No you don’t understand, it’s like actually not possible, not just like really hard, because of Z
A: That’s what they said about [historical figure]!
(random shower thoughts written with basically no editing)
Sometimes arguments have a beat that looks like “there is extreme position X, and opposing extreme position Y. what about a moderate ‘Combination’ position?” (I’ve noticed this in both my own and others’ arguments)
I think there are sometimes some problems with this.
Usually almost nobody is on the most extreme ends of the spectrum. Nearly everyone falls into the “Combination” bucket technically, so in practice you have to draw the boundary between “combination enough” vs “not combination enough to count as combination”, which is sometimes fraught. (There is a dual argument beat that looks like “people too often bucket things into distinct buckets, what about thinking of things as a spectrum.” I think this does the opposite mistake, because sometimes there really are relatively meaningful clusters to point to. (this seems quite reminiscent of one Scottpost that I can’t remember the name of rn))
In many cases, there is no easy 1d spectrum. Being a “combination” could refer to a whole set of mutually exclusive sets of views. This problem gets especially bad when the endpoints differ along many axes at once. (Another dual argument here that looks like “things are more nuanced than they seem” which has its own opposite problems)
Of the times where this is meaningful, I would guess it almost always happens when the axis one has identified is interesting and captures some interesting property of the world. That is to say, if you’ve identified some kind of quantity that seems to be very explanatory, just noting that fact actually produces lots of value, and then arguing about how or whether to bucket that quantity up into groups has sharply diminishing value.
In other words, introducing the frame that some particular latent in the world exists and is predictive is hugely valuable; when you say “and therefore my position is in between other people’s”, this is valuable due to the introduction of the frame. The actual heavy lifting happened in the frame, and the part where you point to some underexplored region of the space implied by that frame is actually not doing much work.
I hypothesize one common thing is that if you don’t draw this distinction, then it feels like the heavy lifting comes in the part where you do the pointing, and then you might want to do this within already commonly accepted frames. From the inside I think this feels like existing clusters of people being surprisingly closed minded, whereas the true reason is that the usefulness of the existing frame has been exhausted.
related take: “things are more nuanced than they seem” is valuable only as the summary of a detailed exploration of the nuance that engages heavily with object level cruxes; the heavy lifting is done by the exploration, not the summary
TL;DR: This is basically empty individualism except identity is disentangled from cooperation (accomplished via FDT), and each agent can have its own subjective views on what would count as continuity of identity and have preferences over that. I claim that:
Continuity is a property of the subjective experience of each observer-moment (OM), not necessarily of any underlying causal or temporal relation. (i.e I believe at this moment that I am experiencing continuity, but this belief is a fact of my current OM only. Being a Boltzmann brain that believes I experienced all the moments leading up to that moment feels exactly the same as “actually” experiencing things.)
Each OM may have beliefs about the existence of past OMs, and about causal/temporal relations between those past OMs and the current OM (i.e one may believe that a memory of the past did in fact result from the faithful recording of a past OM to memory, as opposed to being spawned out of thin air as a Boltzmann brain loaded with false memories.)
Something like preference utilitarianism is true and it is ok to have preferences about things you cannot observe, or prefer the world to be in one of two states that you cannot in any way distinguish. As a motivating example, one can have preferences between taking atomic actions (a) enter the experience machine and erase all memories of choosing to be in an experience machine and (b) doing nothing.
Each OM may have preferences for its subjective experience of continuity to correspond to some particular causal structure between OMs, despite this being impossible for that OM to observe or verify. This is where the subjectivity is introduced: each OM can have its own opinion on which other OMs it considers to also be “itself”), and it can have preferences over its self-OMs causally leading to itself in a particular way. This does not have to be symmetric; for instance, your past self may consider your future self to be more self like than your future self considers past self.
Continuity of self as viewed by each OM is decoupled from decision theoretic cooperation. i.e they coincide in a typical individual, who considers their past/future selves to be also themself, and cooperates decision theoretically (i.e you consider past/future you getting utility to both count as “you” getting utility). However it is also possible to cooperate to the same extent with OMs with whom you do not consider yourself to be the same self (i.e twin PD), or to not coordinate with yourself (i.e myopia/ADHD).
(related: FDT and myopia being much the same thing; you can think of caring about future selves’ rewards because you consider yourself to implement a similar enough algorithm to your future self as acausal trade. This has the nice property of unifying myopia and preventing acausal trade, in that acausal trade is really just caring about OMs that would not be considered the same “self”. This is super convenient because basically every time we talk about myopia for preventing deceptive mesaoptimization we have to hedge by saying “and also we need to prevent acausal trade somehow”, and this lets us unify the two things.)
Properties of this theory:
This theory allows one to have preferences such as “I want to have lots of subjective experiences into the future” or “I prefer to have physical continuity with my past self” despite rejecting any universal concept of identity which seems pretty useful
This theory is fully compatible with all sorts of thought experiments by simply not providing an answer as to which OM your current OM leads to “next”. This is philosophically unsatisfying but I think the theory is still useful nonetheless
Coordination is solved through decision theory, which completely disentangles it from identity.
Imagine if aliens showed up at your doorstep and tried to explain to you that making as many paperclips as possible was the ultimate source of value in the universe. They show pictures of things that count as paperclips and things that don’t count as paperclips. They show you the long rambling definition of what counts as a paperclip from Section 23(b)(iii) of the Declaration of Paperclippian Values. They show you pages and pages of philosophers waxing poetical about how paperclips are great because of their incredible aesthetic value. You would be like, “yeah I get it, you consider this thing to be a paperclip, and you care a lot about them.” You could probably pretty accurately tell whether the aliens would approve of anything you’d want to do. And then you wouldn’t really care, because you value human flourishing, not paperclips. I mean, it’s so silly to care about paperclips, right?
Of course, to the aliens, who have not so subtly indicated that they would blow up the planet and look for a new, more paperclip-loving planet if they were to detect any anti-paperclip sentiments, you say that you of course totally understand and would do anything for paperclips, and that you definitely wouldn’t protest being sent to the paperclip mines.
I think I’d be confused. Do they care about more or better paperclips, or do they care about worship of paperclips by thinking beings? Why would they care whether I say I would do anything for paperclips, when I’m not actually making paperclips (or disassembling myself to become paperclips)?
I thought it would be obvious from context but the answers are “doesn’t really matter, any of those examples work” and “because they will send everyone to the paperclip mines after ensuring there are no rebellious sentiments”, respectively. I’ve edited it to be clearer.
is it worth writing a blog post about how LTV is obviously wrong? i assume this is, like, economics 101, so it wou be completely unnovel, and the set of people who would read my blog posts is surely disjoint from the set of people who take LTV seriously
I had to Google “LTV”. I believe it means the Labour Theory of Value, that the work put in to create something is a measure of that thing’s value. Seems absurd to me. Is there anyone here who believes in it? Or elsewhere, even?
I think a version where you steel man (or find the strongest version) and then dismantle it would be good, or if you try to surface and satisfy the intuitions that motivate it, or similar high-effort ITT-passing stuff.
I think there’s an interesting discussion with fun side quests to be had here, and I’m often disappointed by the sneering, uncharitable tone wielded by those armed with The Correct Economics.
[for removal of doubt: I am not an LTV believer and think many ideas on the left are economically unsound. I just think the usual modality of refuting their case around these parts hardly ever works.]
sci-fi story setting idea: a future where VR becomes so widespread that where you live physically in the US becomes more of a formality than of actual consequence, so mass internal migrations a la Free State Project occur as people rush to move to low population states to get more political influence in a federal political system that is increasingly impossible to reform
Wouldn’t that require all states to have identical laws, taxes, etc on residents? It seems likely to me that people care a lot more about those than differential influence on federal politics.
Edit: More precisely, a story in which most people do not care at least an order of magnitude more about laws on residents than on changing their federal political representation from 0.0000001 to 0.0000002 would strain my suspension of disbelief. People who care that much more about federal politics are much more likely to do something other than just move to another state to get microscopically more influence in very infrequent votes for representatives. If a large fraction of the population feels that way and the federal politics is extremely resistant to change, you probably get revolution or civil war, not mass migration to change vote strength.
random thoughts. no pretense that any of this is original or useful for anyone but me or even correct
It’s ok to want the world to be better and to take actions to make that happen but unproductive to be frustrated about it or to complain that a plan which should work in a better world doesn’t work in this world. To make the world the way you want it to be, you have to first understand how it is. This sounds obvious when stated abstractly but is surprisingly hard to adhere to in practice.
It would be really nice to have some evolved version of calibration training where I take some historical events and try to predict concrete questions about what happened, and give myself immediate feedback and keep track of my accuracy and calibration. Backtesting my world model so to speak. Might be a bit difficult to measure accuracy improvments due to non iid ness of the world, but worth trying the naive thing regardless. Would be interesting to try and autogen using GPT3.
Feedback loops are important. Unfortunately, from the inside it’s very easy to forget. In particular, setting up feedback loops is often high friction, because it’s hard to measure the thing we care about. Fixing this general problem is probably hard but in the meantime I can try to setup feedback loops for important things like productivity, world modelling, decision making, etc
Lots of things have very counterintuitive or indirect values. If you don’t take this into account and you make decisions based on maximizing value you might end up macnamara-ing yourself hard.
The stages of learning something: (1) “this is super overwhelming! I don’t think I’ll ever understand it. there are so many things I need to keep track of. just trying to wrap my mind around it makes me feel slightly queasy” (2) “hmm this seems to actually make some sense, I’m starting to get the hang of this” (3) “this is so simple and obviously true, I’ve always known it to be true, I can’t believe anyone doesn’t understand this” (you start noticing that your explanations of the thing become indistinguishable from the things you originally felt overwhelmed by) (4) “this new thing [that builds on top of the thing you just learned] is super overwhelming! I don’t think I’ll ever understand it”
The feeling of regret really sucks. This is a bad thing, because it creates an incentive to never reflect on things or realize your mistakes. This shows up as a quite painful aversion to reflecting on mistakes, doing a postmortem, and improving. I would like to somehow trick my brain into reframing things somehow. Maybe thinking of it as a strict improvement over the status quo of having done things wrong? Or maybe reminding myself that the regret will be even worse if I don’t do anything because I’ll regret not reflecting in addition
Thought pattern that I’ve noticed: I seem to have two sets of epistemic states at any time: one more stable set that more accurately reflects my “actual” beliefs that changes fairly slowly, and one set of “hypothesis” beliefs that changes rapidly. Usually when I think some direction is interesting, I alternate my hypothesis beliefs between assuming key claims are true or false and trying to convince myself either way, and if I succeed then I integrate it into my actual beliefs. In practice this might look like alternating between trying to prove something is impossible and trying to exhibit an example, or taking strange premises seriously and trying to figure out its consequences. I think this is probably very confusing to people because usually when talking to people who are already familiar with alignment I’m talking about implications of my hypothesis beliefs, because that’s the frontier of what I’m thinking about, and from the outside it looks like I’m constantly changing my mind about things. Writing this up partially to have something to point people to and partially to push myself to communicate this more clearly.
I think this pattern is common among intellectuals, and I’m surprised it’s causing confusion. Are you labeling your exploratory beliefs and statements appropriately? An “epistemic status” note for posts here goes a long way, and in private conversation I often say out loud “I’m exploring here, don’t take it as what I fully believe” in conversations at work and with friends.
I think I do a poor job of labelling my statements (at least, in conversation. usually I do a bit better in post format). Something something illusion of transparency. To be honest, I didn’t even realize explicitly that I was doing this until fairly recent reflection on it.
imagining a glorious universe in which pride and prejudice were also written by a Russian author, so that all three classic X and Y books would be Russian
some lessons from ml research:
any shocking or surprising result in your own experiment is 80% likely to be a bug until proven otherwise. your first thought should always be to comb for bugs.
only after you have ruled out bugs do you get to actually think about how to fit your theory to the data, and even then, there might still be a hidden bug.
most papers are terrible and don’t replicate.
most techniques that sound intuitively plausible don’t work.
most techniques only look good if you don’t pick a strong enough baseline.
an actually good idea can take many tries before it works.
once you have good research intuitions, the most productive state to be in is to literally not think about what will go into the paper and just do experiments that satisfy your curiosity and. convince yourself that the thing is true. once you have that, running the final sweeps is really easy
most people have no intuition whatsoever about their hardware and so will write code that is horribly inefficient. even learning a little bit about hardware fundamentals so you don’t do anything obviously dumb is super valuable
in a long and complex enough project, you will almost certainly have a bug that invalidates weeks (or months) of work. being really careful and testing helps but slows down velocity a lot. unclear what the right equilibrium is.
feedback loop time is incredibly important, if you can get rapid feedback, you will make so much more progress.
implementing something that is already known to work is always vastly easier than inventing/researching something new.
you will inevitably spend a lot of time doing things that have no impact on the final published work whatsoever. like not even contributing that much useful intuition. this is unfortunate but unavoidable
oftentimes you will spend a lot of time being fundamentally philosophically confused about what to do, and only really figure out halfway through the project. this is normal.
direction is really important. most well executed research is still useless because it was the wrong direction.
research impact is super super long tailed. i think it’s really not worth doing research if you aren’t aiming for the long tail. if you’re early career, you should probably focus on doing things that enable you to aim at the long tail eventually, instead of trying to have lots of impact early on (for example, probablly better to do something you feel motivated by and learn a lot from than something that is “maximally important” but which you don’t have the skills to execute adequately on yet)
I agree and this is why research grant proposals often feel very fake to me. I generally just write up my current best idea / plan for what research to do, but I don’t expect it actually pan out that way and it would be silly to try to stick rigidly to a plan.
Strongly agree. Exact same experience in research, but in finance / quant trading.
Can basically attest to all of these, been doing intensive ML upskilling for the last half a year and almost all of these have been true. Highlights include:
Not properly setting up the attention mechanism in multiple experiments, resulting in the conclusion that attention didn’t do much (lmao)
So, so many off-by-one, off-by-two errors, especially for next-token prediction setups
Entire series of weeks-long experiments that turn out to be completely useless (usually based on a seemingly-reasonable intuition of some kind)
Accidentally overwriting/resetting the residual element so the RNN was just an NN with a funky hat on
I now hate shapes, reshaping, squeezing, unsqueezing, devices, torch.nn.functional.pad, so many more functions
Using the wrong loss function
Using the right loss function but with the wrong reduction
Using the right loss function but the learning rate is too aggressive/too low/the optimiser is not initialised properly
Using all the right things but loading the model from an incorrect checkpoint/not saving the weights properly
And also learning that google colab was forged in mount doom, a tool of great power crafted with malicious intent.
Are you using einops and einsum? I hate these somewhat less since using them. See here for more details.
Be right back, just adding all that to my “AI researcher” prompt
I will add: 80% likely to be a bug, or a result from random-matrix theory.
Research sounds really finicky and tedious.
i recently ran into to a vegan advocate tabling in a public space, and spoke briefly to them for the explicit purpose of better understanding what it feels like to be the target of advocacy on something i feel moderately sympathetic towards but not fully bought in on. (i find this kind of thing very valuable for noticing flaws in myself and improving; it’s much harder to be perceptive of one’s own actions otherwise). the part where i am genuinely quite plausibly persuadable of his position in theory is important; i think if i had talked to e.g flat earthers one might say my reaction is just because i’d already decided not to be persuaded. several interesting things i noticed (none of which should be surprising or novel, especially for someone less autistic than me, but as they say, intellectually knowing things is not the same as actual experience):
this guy certainly knew more about e.g health impacts of veganism than i did, and i would not have been able to hold my own in an actual debate.
in particular, it’s really easy for actually-good-in-practice heuristics to come out as logical fallacies, especially when arguing with someone much more familiar with the object level details than you are.
interestingly, since i was pushing the conversation in a pretty meta direction, he actually explicitly said something to the effect that he’s had thousands of conversations like this and has a response to basically every argument i could make, do i really think i have something he hasn’t heard before, etc. in that moment i realized this was probably true, and that this nonetheless did not necessarily mean that he was correct in his claim. and in addition it certainly didn’t make me feel any more emotionally willing to accept his argument
in the past, i’ve personally had the exact experience of arguing for something where i had enough of a dialogue tree that other people couldn’t easily find any holes, where the other people were unconvinced, and felt really confused why people weren’t seeing the very straightforward argument, and then later it turned out i was actually just wrong and the other people were applying correct heuristics
my guess is at the extreme, with sufficient prep and motivation, you can get in this position for arbitrarily wrong beliefs. like probably if i talked to flat earthers for a while i’d get deep enough in their dialogue tree that i’d stop being able to refute them on the object level and would (for the purposes of my own epistemics, not to convince an external audience) have to appeal to cognitive heuristics that are isomorphic to some cognitive fallacies.
of course we shouldn’t always appeal to the cognitive heuristics. doing so is almost always reasonable and yet you will miss out on the one thing that actually does matter. to do anything interesting you do have to eventually dig into some particular spicy claims and truly resolve things at the object level. but there are so many things in the world and resolving them takes so much time that you need some heuristics to reject a whole bunch of things out of hand and focus your energy on the things that matter.
like, i could invest energy until i can actually refute flat earthers completely on the object level, and i’d almost certainly succeed. but this would be a huge waste of time. on the other hand, i could also just never look into anything and say “nothing ever happens”. but every important thing to ever happen did, in fact, happen at some point [citation needed].
it’s really really irritating to be cut off mid sentence. this is hard to admit because i also have an unconscious tendency to do this (currently working on fixing this) and my guess is other people get very annoyed when i do this to them.
sometimes i do enjoy being cut off in conversations, but on reflection this is only when i feel like (a) the conversation is cooperative enough that i feel like we’re trying to discover the truth together, (b) the other person actually understands what i’m saying before i finish saying it. but since these conditions are much rarer and requires high levels of social awareness to detect, it’s a good first order heuristic that interrupting people is bad.
i found it completely unhelpful to be told that he was also in my shoes X years ago with similar uncertainties when he was deciding to become vegan; or to be told that he had successfully convinced Y other people to become vegan; or to be subject to what i want to call “therapy speak”. i only want to therapyspeak with people i feel relatively close to, and otherwise it comes off as very patronizing.
i think there’s a closely related thing, which is genuine curiosity about people’s views. it uses similar phrases like “what makes you believe that?” but has a very different tone and vibe.
his achievements mean a lot more to himself than to me. i don’t really care that much what he’s accomplished for the purposes of deciding whether his argument is correct. any credibility points conferred are more than cancelled out by it being kind of annoying. even if it is true, there’s nothing more annoying than hearing say “i’ve thought about this more than you / accomplished more than you have because of my phd/experience/etc so you should listen to me” unless you really really really trust this person
the calculus changes when there is an audience.
therapyspeak is still probably better than nothing, and can be a useful stepping stone for the socially incompetent
one possible take is that i’m just really weird and these modes of interaction work well for normal people more because they’re less independently thinking or need to be argued out of having poorly thought out bad takes or something like that, idk. i can’t rule this out but my guess is normal people probably are even more this than i am. also, for the purposes of analogy to the AI safety movement, presumably we want to select for people who are independent thinkers who have especially well thought out takes more than just normal people.
also my guess is this particular interaction was probably extremely out of distribution from the perspective of those tabling. my guess is activists generally have a pretty polished pitch for most common situations which includes a bunch of concrete ways of talking they’ve empirically found to cause people to engage, learned through years of RL against a general audience, but the polishedness of this pitch doesn’t generalize out of distribution when poked at in weird ways. my interlocutor even noted at some point that his conversations when tabling generally don’t go the way ours went.
I don’t think it would be that hard to refute flat earthers. One or two facts about how the sun travels, that the atmosphere bends light, and the fact that there are commercial flights crossing the poles seem like they would be sufficient to me. This probably won’t convince a flat earther, but I think you could fairly easily convince 95% of smart unbiased 3ed listeners (not that they exist).
You don’t have to go down every option in their argument tree, finding one argument they are completely unable to refute can be enough.
you mentioned sometimes people are just wrong in their arguments but think they are correct because they’ve repeated it many times. do you have examples of this from what they said?
This was really interesting, thanks for putting yourself in that situation and for writing it up
I was curious what examples were of therapy speak in the conversation, if you’re down to elaborate
I claim that even if the openai contract is not meaningfully weaker safety wise, it is still bad for openai to publicly signal solidarity with ant but then sign with DoW.
suppose hypothetically the only difference between the openai and anthropic contracts is that the DoW wanted a snicker bar, and anthropic didn’t want to give DoW the snickers bar. even then, it would be a huge dick move for openai to publicly signal solidarity, and then sign with DoW to give them the snickers bar.
OAI genuinely outplayed Anthropic here. The critical success world for OAI would be if OAI gets good PR from “solidarity”, replaces Ant under the ~same terms, and there is enough uncertainty of Anthropic being a supply chain risk that eg Amazon stops providing them compute, basically killing the company.
Most of this is still on the table, because Anthropic was too concerned about appearing principled and was exploited by DoW and Altman.
I can see the story where there was a strong opportunity for a competitor here, and OpenAI successfully seized it (perhaps Google DeepMind could have as well, but I don’t view them as nimble as OpenAI). I don’t see a story where Anthropic had a clear alternative play that was much better, especially once the USG threatened with labeling them a supply-chain risk.
Anthropic could have negotiated before USG publicly threatened to label them a supply chain risk. My guess is they were mainly limited by the erosion of their own morals and by Anthropic staff quitting, and they could have acquiesced with diplomatic language even within those limitations, maybe even after the crisis went public. Claude is only getting better, so the default path is building trust with the government. They could probably have found a better stand to take later, when they have more power.
Perhaps. I think writing things into contracts is a great way to make sure that they happen, and if the counterparty is unwilling to sign them into contracts, then this is a strong sign that you won’t be able to make it happen later. It would have significantly increased the adversarial relationship between Anthropic and the USG for them to politely remove it from the contract and then work hard internally to make sure that it never got used that way. Maybe it would’ve been worth it, but I’m not convinced.
Oh I don’t think they could have prevented USG from using Claude for mass domestic surveillance. Autonomous weapons maybe since it’s a reliability issue the military would agree with. They would need to sacrifice their principles in order to get Claude more integrated into the government, which could be good or bad for us but would have been in Anthropic’s interest.
It’s a tough game to be in. Sometimes the only winning move is not to play.
I have a little stored thought which sometimes triggers, and it reads:
“If you find yourself being forced to choose between two or more extremely bad options that involve burning your values, your resources, or your life, the truth is that you lost around three moves ago and are living out the equivalent of a forced mate in chess. You’ve already lost, so stop playing and find a better game to spend time on if at all possible.”
Sadly, sometimes you don’t have the option of not playing.
Anthropic negotiated a great deal and gave up the practical limits that are relevant to the military about using their models for cyber attacks, censorship and disinformation campaigns in the process.
I think this comment aged very poorly
I think it’s unclear that this was overall bad for Anthropic/Amodei if you factor in the reputational and ideological boost they got (“aura farming” according to roon).
This is pretty heartening. I hope it’s enough to compensate them, and perhaps time will tell.
I’m relatively less interested in a competitive framing between OpenAI and Anthropic to see i.e. “who played it better”. First, that framing suggests there was just one game being played. It seems to be necessary to view it as a progression of different games.
To a first approximation, my guess is by the time this popped into the public spotlight, the die was largely cast (so to speak). It was, more or less, a strategy by Hegseth to put Anthropic in an impossible bind.
Second, that kind of framing feels too much like so many news stories I read that try to fasten sports metaphors onto real world events to make juicy narratives. This isn’t a very good “reason” I admit, but it sort of explains why my alarm bells started ringing on that frame.
Personally, I first want to learn about what happened and when. After that, maybe I would try to analyze and learn lessons.
Is it publicly known what is in fact the difference between the two contracts?
https://x.com/justanotherlaw/status/2027855993921802484
Yeah I would have liked to see them sign the contract conditional on something like not labeling Anthropic a supply chain risk.
theory: a huge part of having a good social life is just taking social bids whenever they become available. examples of social bids both large and small include: deciding whether to join your friends on a roadtrip; getting to know someone you just met; getting to better know someone you bump into occasionally but usually never talk to; standing in line, seeing something amusing, and having the option to point this out to another stranger in line; saying something funny in a group conversation; following up over text with someone after meeting them; flirting; cold emailing someone on the internet; catching up with a friend.
there are a variety of reasons why we might end up not taking social bids. if you don’t have the social ability to notice opportunities to take bids, you might miss bids that you could take. if you force yourself to take bids without the requisite social ability, and end up taking bids which you incorrectly believe to exist, you might act in ways that people find weird, and burn potential connections, or intrude on people. if you are really tired or low-bandwidth or depressed or stressed, you will not want to take bids, because taking bids requires quite a lot of activation energy investment. if you are really focused on something else, like a research project, you will not take bids, because they distract your focus.
i find that when i make more successful social bids, i feel a greater sense of community and connection, and often feel happier too; and conversely, when i don’t, i feel isolated. my most common reason for not making more social bids is often just being super tired or busy.
also, i feel like people around you generally mirror the kind of energy you bring to social bids; if you are super enthusiastic and friendly about taking social bids, then other people will also treat your social bids similarly. if you are having an amazing time meeting lots of new cool people, then other people will be more excited to meet you and willing to help you on your quest; conversely, if you are miserable and merely exist to complete your requisite daily activities, other people will also be hesitant to include you in their groups, or invite you to things, or spend lots of time with you. this can create a self reinforcing feedback loop.
Good post. On a detail I’d use the word ‘opportunities’ rather than ‘bids’, which sounds like ‘offers’ - whereas in various of these examples you’re not being explicitly offered a social opportunity by someone. But the situation contains an opportunity.
Do you struggle with feelings of isolation? I do sometimes, and I try to fix that by taking more social bids and proactively seeking social life. And then I immediately pull out because I get overwhelmed by social life very easily and it kinda colonizes my thought processes too much. So I’m kind of stuck in that loop of seeking more of it and then pulling out and then seeking more of it...
Same.
There’s some old greek who had a parable about hedgehogs in the cold, that shuffle closer and closer for warmth until they sting each other and shuffle apart again. I always thought that applies pretty well.
I mostly get overwhelmed by social bids when I am physically tired or unwell, rather than purely because of social life.
Hmm, part of the reason I asked is that the reasoning in your comment is the kind of cognitive process that tends to exhaust me when I have to work through it. It somehow coincides with me being more neurotic overall. So, basically, you think all that explicit stuff about social life, and you don’t feel at least a little pang of psychological pain/exhaustion? The very starting phrase (“a huge part of...”) reads like my thoughts when I’m ruminating about this stuff.
Sorry if this is a little intrusive, I’m just kind of curious, other than fishing for insights from people who might have similar thought-patterns.
I’m sufficiently extroverted that if the social interaction goes well, it gives me more than enough psychological energy to pay for multiple additional social bids. obviously, this is separate from physiological energy; if I’m sleep deprived and physically exhausted, this is insufficient. but I don’t generally get that physically exhausted from social interaction, unless I’m at neurips or something.
This sounds to me like a moderately positive score on the desire dimension (1st) and negative score on the capability dimension (2nd) of extroversion here.
I had such scores and my social situation is much better after I put some conscious effort towards resolving my social anxiety. (positive first, negative second is the only problematic pairing.)
This was a fun quiz. I’d not come across it before!
(I ended up with high scores on both indices, but higher on the second subscore (leadership and confidence) which surprised me a little. Overall I scored 249, which Claude reckons based on the chart they provide is about +1.5 SD among extraverts.)
I agree, taking risks and generally being a ‘yes man’ is much more likely going to result in positive outcomes compared to taking no action.
But I do wonder, on average, are people incentivised to seek connection to satisfy their actual personal needs and circumstances, and how much is possibly from a culture that prescribes an ‘instagram’ lifestyle and a huge friendship network as a goal to work towards?
For me, I find that shared interests are the automatic icebreaker that circumvents the awkward/social convention and risk elements and that finding a group that does/discusses what I am already interested in makes the whole thing feel effortless/natural and fulfilling.
There is also an efficiency component to the energy expenditure of making social bids. Not only it can take someone more energy to make the same (perceived) bid, but the same bid can have different effects based on social skillfullness and a bunch of other things. This can be seen as a parameter in the dynamic equilibrium that can produce the feedback loop you mention.
it’s surprising just how much of cutting edge research (at least in ML) is dealing with really annoying and stupid bottlenecks. pesky details that seem like they shouldn’t need attention. tools that in a good and just world would simply not break all the time.
i used to assume this was merely because i was inexperienced, and that surely eventually you learn to fix all the stupid problems, and then afterwards you can just spend all your time doing actual real research without constantly needing to context switch to fix stupid things.
however, i’ve started to think that as long as you’re pushing yourself to do novel, cutting edge research (as opposed to carving out a niche and churning out formulaic papers), you will always spend most of your time fixing random stupid things. as you get more experienced, you get bigger things done faster, but the amount of stupidity is conserved. as they say in running- it doesn’t get easier, you just get faster.
as a beginner, you might spend a large part of your research time trying to install CUDA or fighting with python threading. as an experienced researcher, you might spend that time instead diving deep into some complicated distributed training code to fix a deadlock or debugging where some numerical issue is causing a NaN halfway through training.
i think this is important to recognize because you’re much more likely to resolve these issues if you approach them with the right mindset. when you think of something as a core part of your job, you’re more likely to engage your problem solving skills fully to try and find a resolution. on the other hand, if something feels like a brief intrusion into your job, you’re more likely to just hit it with a wrench until the problem goes away so you can actually focus on your job.
in ML research the hit it with a wrench strategy is the classic “google the error message and then run whatever command comes up” loop. to be clear, this is not a bad strategy when deployed properly—this is often the best first thing to try when something breaks, because you don’t have to do a big context switch and lose focus on whatever you were doing before. but it’s easy to end up trapped in this loop for too long. at some point you should switch modes to actively understanding and debugging the code, which is easier to do if you think of your job as mostly being about actively understanding and debugging code.
earlier in my research career i would feel terrible about having spent so much time doing things that were not the “actual” research, which would make me even more likely to just hit things with a wrench, which actually did make me less effective overall. i think shifting my mindset since then has helped me a lot
Not only is this true in AI research, it’s true in all science and engineering research. You’re always up against the edge of technology, or it’s not research. And at the edge, you have to use lots of stuff just behind the edge. And one characteristic of stuff just behind the edge is that it doesn’t work without fiddling. And you have to build lots of tools that have little original content, but are needed to manipulate the thing you’re trying to build.
After decades of experience, I would say: any sensible researcher spends a substantial fraction of time trying to get stuff to work, or building prerequisites.
This is for engineering and science research. Maybe you’re doing mathematical or philosophical research; I don’t know what those are like.
I can emphathetically say this is not the case in mathematics research.
Interested to hear how you would put this with “research” tabooed. Personally I don’t care if it’s research as long as it works.
a corollary is i think even once AI can automate the “google for the error and whack it until it works” loop, this is probably still quite far off from being able to fully automate frontier ML research, though it certainly will make research more pleasant
I agree if I specify ‘quite far off in ability-space’, while acknowledging that I think this may not be ‘quite far off in clock-time’. Sometimes the difference between no skill at a task and very little skill is a larger time and effort gap than the difference between very little skill and substantial skill.
Completely agree. I remember a big shift in my performance when I went from “I’m just using programming so that I can eventually build a startup, where I’ll eventually code much less” to “I am a programmer, and I am trying to become exceptional at it.” The shift in mindset was super helpful.
More and more, I’m coming to the belief that one big flaw of basically everyone in general is not realizing how much you needed to deal with annoying and pesky/stupid details to do good research, and I believe some of this dictum also applies to alignment research as well.
There is thankfully more engineering/ML experience in LW which alleviates the issue partially, but still, not realizing that pesky details mattering a lot in research/engineering is a problem that basically no one wants to particularly deal with.
I would hope for some division of labor. There are certainly people out there who can’t do ML research, but can fix Python code.
But I guess, even if you had the Python guy and the budget to pay him, waiting until he fixes the bug would still interrupt your flow.
I think there are several reasons this division of labor is very minimal, at least in some places.
You need way more of the ML engineering / fixing stuff skill than ML research. Like, vastly more. There are still a very small handful of people who specialize full time in thinking about research, but they are very few and often very senior. This is partly an artifact of modern ML putting way more emphasis on scale than academia.
Communicating things between people is hard. It’s actually really hard to convey all the context needed to do a task. If someone is good enough to just be told what to do without too much hassle, they’re likely good enough to mostly figure out what to work on themselves.
Convincing people to be excited about your idea is even harder. Everyone has their own pet idea, and you are the first engineer on any idea you have. If you’re not a good engineer, you have a bit of a catch-22: you need promising results to get good engineers excited, but you need engineers to get results. I’ve heard of even very senior researchers finding it hard to get people to work on their ideas, so they just do it themselves.
This is encouraging to hear as someone with relatively little ML research skill in comparison to experience with engineering/fixing stuff.
For sure. The more novel an idea I am trying to test, the deeper I have to go into the lower level programming stuff. I can’t rely on convenient high-level abstractions if my needs are cutting across existing abstractions.
Indeed, I take it as a bad sign of the originality of my idea if it’s too easy to implement in an existing high-level library, or if an LLM can code it up correctly with low-effort prompting.
the following fictional dialogue is a complete unapologetic strawman but it’s funny enough i had to bring it into being:
“So I asked myself: where can I make the most impact? And clearly malaria is the most important area.”
“And so you decided to donate all of your money to buy malaria nets?”
“Well, so it turns out that saving lives from malaria is actually kind of expensive and indirect. You see, it costs thousands of dollars to save a life. Statistically. Who knows if you’re actually changing anyone’s life that way?”
“And so you found a more efficient way to save lives.”
“Actually, it turns out that it’s cheaper to give people malaria. It’s a lot more impactful and the technical problems are more interesting.”
“I see. Isn’t more malaria bad though?”
“I don’t know, but I find it much easier to work on because the feedback loops are much tighter. Maybe one day, if malaria gets big enough, I’ll go work on saving people from malaria. But we’re still a long way away from everyone having malaria.”
HAS CURE FOR DISEASE
SPREADS DISEASE TO CURE
— Insanity Wolf
You jest, but I hold this class of strategy to be perfectly sound in principle and grossly underexplored in practice. I call it “countersteering for the cure,” and the specific example I had in mind was contagious Fatal Familial Insomnia. The idea is that if you wanted some rare/orphan disease cured for personal reasons, but lacked the resources to achieve that directly, you might be able to go in the opposite direction, and make it much more common instead, enough to trigger several orders of magnitude more funding/talent towards the cure you want.
If you did that in the US, you’d end up trying to explain to prosecutors, then a judge, then a jury why you shouldn’t be imprisoned for the rest of your life. I would hope that is roughly how it would go everywhere.
The hypothetical personal reasons would have to be compelling, yes, of course. This would not be done on a whim. You will note the disease I chose as my example was heritable – even if you hadn’t heard of it before, the name should have made that clear – which lends itself readily to obvious motivations.
I am confused, don’t you say “contagious” right here?
Fatal Familial Insomnia that has been modified to be contagious.
I seem to have misjudged how well-known prion diseases were. Would it help to consider an analogy to the phrase, “airborne rabies”?
Huh, OK, so I did understand you correctly. In that case I do not understand how its heritability matters.
He’s implying that somebody might be motivated to do bioterrorism if not only they had the disease, but also their kids have it (since it is heritable). So, I guess they would go to jail but their kids could possibly still benefit.
But I don’t understand what Shankar is otherwise saying, since this is only a strategy that can work if you’re a selfish person willing to do terrorism and massively harm the world in order to maybe somewhat advance your own interests. I can’t see something like this working for real altruistic causes—sure, in Shankar’s example you direct resources towards curing some rare disease, but this also means steering resources away from every other important problem, while also killing / harming many people, etc. It’s hard to see how the idea of doing terrorism / taking hostages / robbing people / doing nuclear sabre-rattling / etc ways of forcing people to give you resources by causing (or threatening to cause) massive net harm to the world, is somehow “grossly underexplored in practice”, since this is basically the strategy behind all threats and conflict throughout history. It is pretty thoroughly explored, lol, regardless of the fact that a clever person can still brainstorm up some clever/unique/infeasible imaginary terror attacks that nobody has done IRL.
“to the success of our hopeless cause” is such a good toast and we should use it more often. i first learned of it from the book of the same name, and apparently it was a common refrain at gatherings of Soviet dissidents. i like it because it captures the feeling of trying really hard to succeed despite being in the basement of the logistic success curve, and somehow, despite all odds, actually succeeding in the end.
I do find it poetic, but in seriousness I think if folks don’t actually feel hopeful about what they’re doing then they should do something else—leave the work / research direction / engineering / comms / whatnot to whoever actually feels hope about it...
To elaborate, the thing that’s poetic for me about “our hopeless cause” is because I have hope that is not cleanly legible to the outside, easy to write off as “hopeless”. And it’s important to stay in tune with your own knowings about this stuff. I think there are very deleterious effects from throwing energy into things one doesn’t have hope in.
(...And to elaborate further, mostly I think the bad stuff happens by lending support to corrupt things. And imo being pushed to work on X while you lack hope in X is a solid flag of corruption.)
This is a good heuristic when you’re fighting against nature, it’s not a good heuristic when you’re trying to solve coordination problems.
...what sort of “coordination problems” does one “solve” by doing things you don’t have hope in? I really don’t get it and am perplexed. This photo is swellingly full of hope, and presumably we got there through people that had hope in their actions. Perhaps there’s detail in the history you’re referencing that’s going over my head.
the problem was that everyone hated living in the Soviet Union and other eastern Bloc countries, but few people were willing to stand up and protest, because doing so meant a knock on your door by men with guns who would take you away to a Siberian prison or mental institution.
the thing with protests is they are a coordination problem. to loosely paraphrase one of the dissidents from this era, if one person protests he becomes a martyr. if ten people protest they become a conspiracy. if ten thousand people protest the system has to change.
he problem is you have no way of knowing when the right moment is. under Stalin, dissent was impossible. everyone even suspected of being disloyal was instantly executed or thrown in a gulag.
after he died, Khrushchev denounced Stalin’s methods and instituted reforms, and dissent meant “only” being interrogated by the KGB, put on trial in a rigged but no longer completely farcical show trial, and sent to Siberia for only 10 years rather than being executed. this was enough easing up that the “chain reaction” started happening—people would protest, be arrested, someone would go secretly write a transcript of the trial and publish it, people would be outraged at the riggedness or the arrests of the people disseminating the transcripts, and go protest, etc. unfortunately, this never took off enough to cause mass protests, but there was no way of knowing without many people sacrificing themselves for the cause. these early dissidents also laid a lot of groundwork by creating underground networks to disseminate info (samizdat).
under Gorbachev, he loosened restrictions even further under glasnost and perestroika, because the economy of the Soviet Union was not doing well. as an unintended side effect, this pushed the R0 of protests past 1, and suddenly we started seeing huge protests in the eastern Bloc. and then once there are tens of thousands of people marching in the streets and the secret police can’t arrest them all, the system of terror stops working entirely. by the time the Berlin wall fell, the East German government had already ceased to function for months because nobody was afraid of their terror anymore. in theory this could have happened much earlier if everyone simply coordinated, bit everyone will not simply.
If ten thousand people protest, sometimes they get massacred by the army.
Iran is a recent example of this.
This helps me appreciate the mood of where you are coming from thanks! But uh I have objections also, mostly due to our spot in the thread.
I would second CronoDas’ point that the mechanics of change aren’t quite that simple. And I’d like to complain that this is not an example of a thing that is helped by people taking actions they don’t feel hope in!
I acknowledge than the secret police setup seems like it does well at bringing in the “you can’t communicate and build plans together” aspect that “coordination problem”/game-theory seems to typically evoke, I’ll note though that you still have a lot of communication/observation channels (including costly ones like protesting and being taken away or killed).
More importantly it seems like the robust way out of the situation is to try to build more infrastructure for being able to act with a coalition of peers in a constructive manner. Game theory as typically thrown around seems a poor model for this imo.
funny enough, at least one dissident at the time expressed that he didn’t like this toast because he wouldn’t be trying to dissent if he thought it was hopeless
I think some can have the feelings that go with hope in a cause, without actually believing the cause is likely to succeed. Cultural memes around fighting for hopeless causes (e.g. when heros go for hail mary strategies in movies) help.
It still matters whether you truly think it’s the best shot at victory, or the best way for you to help. That’s what I see as key to preventing the various problems that you mention.
I suppose it depends on what you mean by hope. Is a person who thinks there’s a 50% chance of their project failing, but considers that better than all the alternatives, not hopeful? Or 10%?
What I worry about is, what if the people who make such a seemingly hopeless play are actually right in their worldview, and the people who have higher hopes in their play are wrong? Then a rule that dissuades hopeless people from acting lowers the overall chance of success, and that would be bad.
running the agi survey really reminded me just how brutal statistical significance is, and how unreliable anecdotes are. even setting aside sampling bias of anecdotes, the sheer sample size you need to answer a question like “do more people this year know what agi is than last year” is kind of depressing—you need like 400 samples for each year just to be 80% sure you’d notice a 10 percentage point increase even if it did exist, and even if there was no real effect you’d still think there was one 5% of the time. this makes me a lot more bearish on vibes in general.
thank you for this post. “bearish on vibes” is a great phrase. i am constantly hung up on the fact that it’s not really possible to “know what normal people are like”, “know what people are like generally”, “know what the world is actually like”, without significant amounts of effort.
i think this background fact taints like… most discussion of social and ethical issues.
Why does this make you more bearish on vibes? The reason I ask is that I think of “vibes” as aggregating over a much wider (but siloed) social network and a lot more sources of information. It would be interesting to know about to what extent rigorous high-n survey methods would reveal discrepancies between assumptions and reality about people’s perceptions in this and other areas to do with pressing social issues.
like, suppose i anecdotally noticed a few people last year be visibly confused when i said the phrase AGI in normal conversation last year, and then this year i noticed that many fewer people were visibly confused by AGI. then, this would tell me almost nothing about whether name-recognition of AGI increased or decreased; at n=10, it is nearly impossible to say anything whatsoever.
What’s your reasoning or assumptions for why it would tell you ~nothing to witness name recognition increasing like that? I’m assuming it’s not just because “visible confusion” isn’t a perfect proxy for lack of name recognition (and vice versa).
I guess I’m more bullish on vibes being a more powerful way to gauge name recognition than you seem to be. So here is a toy model to explain why. ChatGPT was released in Nov. 2022, so prior to that we can say it had approximately zero name recognition.
I’ve never read a survey on ChatGPT name recognition. I have only my anecdotal evidence to go on. But I am extremely confident based on what I think can fairly be called “vibes” that ChatGPT has massive name recognition, at least in America, compared to November 2022. If there was a reliable way to test this, I’d be willing to wager big money on it, provided I could feel confident in my ability to truly pin a number on what I mean by “massive name recognition.” Do you agree with this model but feel bearish on vibes more generally? Are you skeptical of my perception that ChatGPT’s name recognition has exploded since its release?
sure, you can notice extremely large effect sizes through vibes. but the claim is that for even “smaller” effect sizes (like, tens of percentage points, e.g 50->75%), you need pretty big sample sizes. obviously 0->100% doesn’t need a very large sample size.
I agree that chatgpt obviously has lots of name recognition but I do also separately think chatgpt has less name recognition than you might guess. I predict that only 85% of Americans would get a multiple choice question right about what kind of app chatgpt is (choices: artificial intelligence; social media; messaging and calling; online dating). whereas a control question about e.g Google will get like 97% or whatever the lizardman constant dictates
Reasonable, I also don’t expect that I could pick up on a 1.5x increase in name recognition over a year based on vibes—didn’t read closely enough to notice you were talking about a 10% increase, so sorry about the time waste.
in research, if you settle into a particular niche you can churn out papers much faster, because you can develop a very streamlined process for that particular kind of paper. you have the advantage of already working baseline code, context on the field, and a knowledge of the easiest way to get enough results to have an acceptable paper.
while these efficiency benefits of staying in a certain niche are certainly real, I think a lot of people end up in this position because of academic incentives—if your career depends on publishing lots of papers, then a recipe to get lots of easy papers with low risk is great. it’s also great for the careers of your students, because if you hand down your streamlined process, then they can get a phd faster and more reliably.
however, I claim that this also reduces scientific value, and especially the probability of a really big breakthrough. big scientific advances require people to do risky bets that might not work out, and often the work doesn’t look quite like anything anyone has done before.
as you get closer to the frontier of things that have ever been done, the road gets tougher and tougher. you end up spending more time building basic infrastructure. you explore lots of dead ends and spend lots of time pivoting to new directions that seem more promising. you genuinely don’t know when you’ll have the result that you’ll build your paper on top of.
so for people who are not beholden as strongly to academic incentives, it might make sense to think carefully about the tradeoff between efficiency and exploration.
(not sure I 100% endorse this, but it is a hypothesis worth considering)
I think this is true, and I also think that this is an even stronger effect in wetlab fields where there is lock-in to particular tools, supplies, and methods.
This is part of my argument for why there appears to be an “innovation overhang” of underexplored regions of concept space. And, in the case of programming dependent disciplines, I expect AI coding assistance to start to eat away at the underexplored ideas, and for full AI researchers to burn through the space of implied hypotheses very fast indeed. I expect this to result in a big surge of progress once we pass that capability threshold.
Or perhaps on the flip side there is a ‘super genius underhang’ where there are insufficient numbers of super competent people to do that work. (Or willing to bet on their future selves being super competent.)
It makes sense for the above average, but not that much above average, researcher to choose to focus on their narrow niche, since their relative prospects are either worse or not evaluable after wading into the large ocean of possibilities.
Or simply when scaling becomes too expensive.
I agree that academia over rewards long-term specialization. On the other hand, it is compatible to also think, as I do, that EA under-rates specialization. At a community level, accumulating generalists has fast diminishing marginal returns compared to having easy access to specialists with hard-to-acquire skillsets.
This is one of the reasons I think ‘independent’ research is valuable, even if it isn’t immediately obvious from a research output (papers, for example) standpoint.
That said, I’ve definitely had the thought, “I should niche down into a specific area where there is already a bunch of infrastructure I can leverage and churn out papers with many collaborators because I expect to be in a more stable funding situation as an independent researcher. It would also make it much easier to pivot into a role at an organization if I want to or necessary. It would definitely be a much more stable situation for me.”(And I also agree that specialization is often underrated.)
Ultimately, I decided not to do this because I felt like there were already enough people in alignment/governance who would take the above option due to financial and social incentives and published directions seeming more promising. However, since this makes me produce less output, I hope this is something grantmakers keep in consideration for my future grant applications.
the modern world has many flaws, but I’m still deeply grateful for the modern era of unprecedented peace, prosperity, and freedom in the developed world. 99% of people reading these words have never had to worry about dying in a cholera epidemic, or malaria or smallpox or the plague, or childbirth, or in war, or from a famine, or due to a political purge. this is not true for other times in history, or other places in the world today.
(extremely unoriginal thought, but still important to acknowledge periodically because it’s easy to take for granted. especially because it’s much more common to complain about ways the world is broken than to acknowledge what has improved over time.)
There might already be a selection bias for people who read lesswrong. For people who are in war, having some malaria, and etc—they may not get the chance to access internet, nor have time to gain interest in lesswrong. My point is while we are grateful, we probably would want to reflect on our own privileges on access to peace (not sure unprecedented), and recognize there are still a lot of work needed to be done for people outside our circles. Additionally, sometimes bad things could happen all in a sudden—maybe the country you are in suddenly is war zone, maybe you got some accidents/mental health issues, or maybe the worry about dying is just a healthcare disaster away. The health risk, especially, increases with age.
And also I think it’s less than 99% anyway, ″worry about dying in a cholera epidemic, or malaria or smallpox or the plague, or childbirth, or in war, or from a famine, or due to a political purge” is very broad. Like, surely some of the people here were at some risk during covid epidemic? Some people are male, from countries with drafts which are currently participating or at tangible risk of wars. Some people are from countries with unstable governments and done some activism work before. Like, 75% is my guess.
(My response to you is also unoriginal but worth stating imo.)
I would prefer if you used the phrase “US geopolitical sphere of influence” instead of “developed world”. It makes it clear your take is political.
Leaders within the US govt have obviously contributed to multiple wars and genocides, you just happen to be born into a family that is not on the receiving end of any of them. Part of the reason (but not the full reason) for the economic prosperity is crude oil deals made by the US govt under threat of nuclear war.
Statements such as yours give leaders within the US govt implicit consent to continue this sort of rule over the world.
This period of global safety is not fairly distributed,
But it is also real
https://data.unicef.org/resources/levels-and-trends-in-child-mortality-2024/
Rule of law
Energy use per capita
Global utility includes the above two things (first two tiers of Maslow’s hierarchy) not just counting the number of deaths (where I agree health-related deaths are the biggest bracket).
I consider US govt partially responsible for unequal distribution.
Neither the mortality-rate nor the energy-use map lines up that closely with the US geopolitical sphere of influence. (E.g. Russia and China on the one hand, Latin America on the other.)
I’m not saying the US government isn’t partially responsible for unequal distribution, but your previous comment sounds like treating it as the only or primary significant factor.
(I’m also not sure what point you’re trying to make at all with the energy-use map, given how similar it looks to the mortality-rate map.)
I’m not claiming it’s the only factor.
Russia and China obviously have significant crude oil reserves which they use domestically. They get to keep them instead of exporting to someone because they have nuclear weapons.
All of industry is ultimately based on a few resources such as crude oil, coal and water. These are then used to make steel and electricity which are then used to make industrial supplies for chemicals and so on.
So a shortage of drugs or of roads or of hospitals does indirectly tie into the energy use of the country.
“Crude oil deals made by the US govt” are responsible for a negligible proportion of global economic prosperity, which comes out of the global scientific ecosystem that has been centered in the US for nearly 100 years.
I’m mainly trying to explain this graph of energy use per capita.
I agree the US exports a variety of goods including weapons, food, industrial products, aircraft and so on, and this gives them more money to purchase crude oil. And being on the leading edge of science and engineering for these industries enables them to make these exports in the first place.
US military protection including nuclear protection is obviously another reason why US gets favourable deals from its allies though.
@Arjun Panickssery I’m not sure what counts as definitive proof to you.
US crude oil imports: https://worldpopulationreview.com/country-rankings/us-oil-imports-by-country
You can read history of US relations with Saudi Arabia or Iraq or South Korea or any of the other countries at the top of this list.
I think it would be really bad for humanity to rush to build superintelligence before we solve the difficult problem of how to make it safe. But also I think it would be a horrible tragedy if humanity never ever built superintelligence. I hope we figure out how to thread this needle with wisdom.
I agree with this fwiw. Currently I think we are in way way more danger of rushing to build it too fast than of never building it at all, but if e.g. all the nations of the world had agreed to ban it, and in fact were banning AI research more generally, and the ban had held stable for decades and basically strangled the field, I’d be advocating for judicious relaxation of the regulations (same thing I advocate for nuclear power basically).
I am not really clear that I should be worried on the scale of decades? If we’re doing a calculation of expected future years of a flourishing technologically mature civilization, slowing down for 1,000 years here in order to increase the chance of success by like 1 percentage point is totally worth it in expectation.
Given this, it seems plausible to me that one should rather spend 200 years trying to improve civilizational wisdom and decision-making rather than instead attempt to specifically just unlock regulation on AI (of course the specifics here are cruxy).
I agree that 200 years would be worth it if we actually thought that it would work. My concern is that it’s not clear civilization would get better/moresane/etc. over the next century vs. worse. And relatedly, every decade that goes by, we eat another percentage point or three of x-risk from miscellaneous other sources (nuclear war, pandemics, etc.) which basically impose a time-discount factor on our calculations large enough to make a 200 year pause seem really dangerous and bad to me.
I think the same world that coordinated well enough to do a centuries-long AGI pause (without heralding a dark age of negative economic growth or global totalitarianism, etc) is probably also more than capable of preventing thermonuclear war, extinction-level artificial pandemics, grey goo, etc.
At that point your biggest risks are natural x-risks (very low), authoritarian backsliding, meme wars, and some fraction of unknown unknowns.
while I agree for smaller numbers like a few decades, I don’t think I agree with a 1000 year pause.
I think (a) it’s perfectly reasonable for people to be selfish and care about superintelligence happening during their lifetime (forget future people and discount factors thereof—almost every single person alive today cares ooms more about themselves than about some random person on the other side of the planet), (b) it’s easy for “delay forever” people to basically pascal’s mug you this way, as in nuclear power (c) it’s unclear that humanity becomes monotonically more wise over time (as an unrealistic example, consider a world where we successfully create an international treaty to ensure ASI is safe, and then for some reason the entire world modern order collapses and the only actors left are random post-collapse states racing to build ASI. then it would have been better to build ASI in a functional pre-collapse world order than to delay. one could reasonably (though i personally don’t) believe that the current world order is likely to fail in the coming decades and ASI is best built now than in the ensuing chaos)
Yes people are selfish, that is why you should sometimes be ready to fight against them. Point a is not a disagreement with Ben.
This is low probability on time scale of decades but is an argument people can use to justify their self-serving desires for immortality as somehow altruistic.
If I understood Eliezers argument correctly we can shorten those timescales buy improving human intelligences through methods like genetic engineering. Once majority of humans have Von Neumann level IQ I think its fine to let them decide how to proceed on AI research. Question is, how fast can this happen, and it probably would take a century or 2 at least.
>slowing down for 1,000 years here in order to increase the chance of success by like 1 percentage point is totally worth it in expectation.
Is it? What meaning of worth it is used here? If you put it on a vote, as an option, I expect it would lose. People don’t care that much about happiness of distant future people.
Make it a thousand, or two thousand. To Daniel’s point, societal change is not always positive on the scale of centuries. But on the scale of millennia it is. At least the last few ones.
SI can come when we say so. On the human evolutionary timescale, 10K years is short. On a cosmic timescale, it is nothing.
i think it’s plausible humans/humanity should be carefully becoming ever more intelligent ≈forever and not ever create any highly non-[human-descended] top thinker[1]
i also think it’s confused to speak of superintelligence as some definite thing (like, to say “create superintelligence”, as opposed to saying “create a superintelligence”), and probably confused to speak of safe fooming as a problem that could be “solved”, as opposed to one needing to indefinitely continue to be thoughtful about how one should foom
If a superintelligence governs the world, preventing extinction or permanent disempowerment for the future of humanity, without itself posing these dangers, then it could be very useful. It’s unclear how feasible setting up something like this is, before originally-humans can be uplifted to a similar level of competence. But also, uplifting humans to that level of competence doesn’t necessarily guard (the others) against permanent disempowerment or some other wasteful breakdowns of coordination, so a governance-establishing superintelligence could still be useful.
Superintelligence works as a threshold-concept for a phase change compared to the modern world. Non-superintelligent AGIs are still just an alien civilization that remains in principle similar in the kinds of things it can do to humanity (even if they reproduce to immediately fill all available compute, and think 10,000x faster). While superintelligence is something at the next level, even if it only takes non-superintelligent AGIs to transition to superintelligence a very short time (if they decide to do that, rather than to not do that).
Apart from superintelligence being a threshold-concept, there is technological maturity, the kinds of things that can’t be significantly improved upon in another 1e10 years of study, but that maybe only take 1-1000 years to figure out for the first time. And one of those things is plausibly efficient use of compute for figuring things out, which gives superintelligence at a given scale of compute. This is in particular the reason to give some credence to software-only singularity, where first AGIs quickly learn to make a shockingly better use of existing compute, so that their capabilities improve much faster than it would take them to build new computing hardware. I think the most likely reason for software-only singularity to not happen is that it’s intentionally delayed (by AGIs themselves) because of the danger it creates, rather than because it’s technologically impossible.
Every year we don’t build superintelligence is a worse tragedy than any historical tragedy you can name. Tens of millions dead, hundreds of millions suffering, etc.
That doesn’t mean we should rush ahead, because rushing ahead is most likely far worse. But we should be aware of the cost.
I am undecided as to whether superintelligence should ever be built, and with my current knowledge and perspective, I would be fine with either outcome.
Some questions:
Has the existence of humanity up to now with no superintelligence been a horrible tragedy?
What would superintelligence allow that would otherwise be forever out of reach?
Are there not also things that humanity could or would lose forever if we did create a safe superintelligence?
In 2022 I wrote an article that is relevant to this question called How Do AI Timelines Affect Existential Risk? Here is the abstract:
Artificial Intelligence as a Positive and Negative Factor in Global Risk (Yudkowsky, 2008) is also relevant. Excerpt from the conclusion:
How can we know that the problem is solved—and now we can safely proceed?
I decided to conduct an experiment at neurips this year: I randomly surveyed people walking around in the conference hall to ask whether they had heard of AGI
I found that out of 38 respondents, only 24 could tell me what AGI stands for (63%)
we live in a bubble
(https://x.com/nabla_theta/status/1869144832595431553)
What’s your guess about the percentage of NeurIPS attendees from anglophone countries who could tell you what AGI stands for?
not sure, i didn’t keep track of this info. an important data point is that because essentially all ML literature is in english, non-anglophones generally either use english for all technical things, or at least codeswitch english terms into their native language. for example, i’d bet almost all chinese ML researchers would be familiar with the term CNN and it would be comparatively rare for people to say 卷积神经网络. (some more common terms like 神经网络 or 模型 are used instead of their english counterparts—neural network / model—but i’d be shocked if people didn’t know the english translations)
overall i’d be extremely surprised if there were a lot of people who knew conceptually the idea of AGI but didn’t know that it was called AGI in english
Very interesting!
Those who couldn’t tell you what AGI stands for—what did they say? Did they just say “I don’t know” or did they say e.g. “Artificial Generative Intelligence...?”
Is it possible that some of them totally HAD heard the term AGI a bunch, and basically know what it means, but are just being obstinate? I’m thinking of someone who is skeptical of all the hype and aware the lots of people define AGI differently. Such a person might respond to “Can you tell me what AGI means” with “No I can’t (because it’s a buzzword that means different things to different people)”
the specific thing i said to people was something like:
i was careful not to say “what does AGI mean”.
most people who didn’t know just said “no” and didn’t try to guess. a few said something like “artificial generative intelligence”. one said “amazon general intelligence” (??). the people who answered incorrectly were obviously guessing / didn’t seem very confident in the answer.
if they seemed confused by the question, i would often repeat and say something like “the acronym AGI” or something.
several people said yes but then started walking away the moment i asked what it stood for. this was kind of confusing and i didn’t count those people.
not to be ‘i trust my priors more than your data’, but i have to say that i find the AGI thing quite implausible; my impression is that most AI researchers (way more than 60%), even ones working in like something very non-deep learning adjacent, have heard of the term AGI, but many of them are/were quite dismissive of it as an idea or associate it strongly (not entirely unfairly) with hype /bullshit, hence maybe walking away from you when you ask them about it.
e.g deepmind and openAI have been massive producers of neurips papers for years now (at least since I started a phd in 2016), and both organisations explictly talked about AGI fairly often for years.
maybe neurips has way more random attendees now (i didn’t go this year), but I still find this kind of hard to believe; I think I’ve read about AGI in the financial times now.
only 2 people walked away without answering (after saying yes initially); they were not counted as yes or no. another several people refused to even answer, but this was also quite rare. the no responders seemed genuinely confused, as opposed to dismissive.
feel free to replicate this experiment at ICML or ICLR or next neurips.
i mean i think that its’ definitely an update (anything short of 95% i think would have been quite surprising to me)
Why not try out leogao’s survey yourself to corroborate/falsify your priors?
Was this possibly a language thing? Are there Chinese or Indian machine learning researchers who would use a different term than AGI in their native language?
I’d be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I’m confident in my ability to administer this experiment on Chinese, French, and German speakers, which won’t be an unbiased sample of non-native speakers, but hopefully still provides some signal.
I think if I got asked randomly at an AI conference if I knew what AGI was I would probably say no, just to see what the questioner was going to tell me.
I’d be curious to hear some of the guesses people make when they say they don’t know.
religion is selling your soul
a lot of people say things like “sure, religion might not exactly be totally true, but it has lots of benefits, and there really does seem to be a god shaped hole in many people, so who can really say if it’s good”. i think this is directionally correct but kind of cowardly.
i think the correct take on religion is first that its claims are completely and utterly false; obviously the christian god doesn’t literally exist, jesus never came back from the dead, etc. this is so overdone by the old internet atheists that it would be beating a dead horse to harp on further.
secondly, the human condition involves a whole bunch of things that are kind of sucky. for example, the fact that we only have a very short amount of time on this planet before we die forever is utterly terrifying; or, the fact that it can be very difficult to find a source of meaning to ground our motivation in, and that it really sucks to not have a reliable foundation for motivation; or, the difficulty of connecting with other people despite differences.
i claim that there is a true solution to each of these problems that involves a very difficult never ending journey of discovery of the self, understanding and connecting with your emotions, constructing intellectual frameworks, and even technological development. part of the project of civilization should be to slowly build up the cultural edifice required to solve these problems.
religion is not the true solution to these problems, but rather the hacky patch that kind of helps—you no longer fear death because you know heaven awaits you, everything is meaningful because god, and you can connect with other people over believing in god—but in exchange you contort and utterly trash your epistemics, foreclose the possibility of solving these problems with a more epistemically sane solution (because tearing out this foundation will temporarily thrown you into epistemic chaos), and even if you somehow compartmentalize the epistemic distortions to religion related questions, you still become constitutionally incapable of thinking sanely about ideas like radical life extension.
tbc, not everyone should embark on the difficult path of accepting nothing but the truth. it can be soul rending, and make you very unhappy. for many people, if they couldn’t possibly make any contributions to building the glorious transhumanist future, or if they value personal happiness much more than impact, it’s better to take the easy road. but i think there is nothing deep and respectable about taking the religious road, certainly people afford it too much undue respect, and use the unhappiness and more difficult emotional journeys of the atheists as an unfair bludgeon without realizing what large chunks of themselves they have sacrificed.
In the spirit of your post: Is not this also cope? (Except for the last bit about technological development, maaaybe.)
Like why would evolution have given you the tools to have helped reconcile you to death, anomie, and lack of motivation, and lack of connection? Why should “understanding and connecting with your emotions” and “discovery of the self” be an affordance in this world that lets you actually find a true solution to the human condition? Why should there be a “true solution” to such problems at all?
Like at least—if religion were true—it would make sense for a benevolent God to have created a path that would make you and those around you happy. It’s internally consistent, in some sense. But if you were made by godshatter evolution, why would there be any path that looks like “internal development” that satisfies these questions? Isn’t the null hypothesis that a “never ending journey of discovery of the self” just as much a fake-ass story as Jesus dying for your sins?
oh, there’s no reason to expect evolution to have given us the necessary tools at all. certainly i’m not claiming that emotional connection and self discovery is sufficient to solve all of the problems of the human condition. certainly you cannot live forever by being enlightened (you can stop fearing death, but i think doing so is harmful). the only ultimate solution is technology.
but in the meantime, there are some cognitive tricks that are better than other cognitive tricks. the better ones have the property that they fuck with your epistemics less for the same amount of alleviation of existential dread and such. they are also harder work than religion, because they require solving much more complicated emotional problems. the problem with religion is it solves for minimizing existential dread without trying to avoid collateral damage to the epistemics.
this post was prompted by reading books like Crime and Punishment and The Death of Ivan Ilyich which are amazing except for the parts where they worship religion. they’re not necessarily even wrong for their time—back in the day, the glorious transhumanist future was so far away that it wasn’t nearly as worth taking into consideration. but the world has changed a lot and the end times are nigh.
You’re going to have a tough time with Dostoyevsky if you think about it as “worshipping religion”. It might help to remember that he was also quite conversant with arguments for atheism, e.g. as he has Ivan espouse in Brothers Karamazov. Not to say I think you should believe like he does, but that he’s not just putting this stuff forward out of blind ignorance.
i haven’t read Brothers Karamazov. can you explain what you mean?
EDIT: I got some very useful feedback from my atheist friend and I’m reorganizing this post.
Here it seems that your central point is that the reality of death is scary to confront from the perspective of your sincere atheism, and that you find it noble to confront your fear of this reality. Furthermore, you are angry at religious people who seem to have taken the easy way out by choosing to believe in lies rather than (a) take on the challenge of constructing meaning and (b) face the harsh truth of the finality of death, and who do not recognize your sacrifice in taking the harder road. You imply that science (as opposed to religion) is the best way of seeking truth.
TL;DR. I think I’m an exception to your implication that all religious people have taken the easy way out, while still agreeing that this may be true for many of them. I essentially agree with your view that reality can be harsh and death is scary. I wonder if seeking the truth can be achieved purely through scientific thinking or whether religious teachings can be helpful if taken with discretion—which is my experience.
My response is as follows:
1) My personal feelings about your post
2) My areas of agreement with you
3) My doubt about finding communities to support truth-seekers
3.5) My observations about open-mindedness within religion
4) My doubt about the adequacy of human-derived scientific knowledge in removing all biases
5) Things I don’t have time to write more about
1) My personal feelings about your post
I’m religious and I was triggered by this post. I don’t think I tried to take the easy road. I had a friend who claimed to be EMF-sensitive (electromagnetic field-sensitive) and I was trying to “scientifically” disprove his disease—doing sneaky experiments to see he if we really was bothered by wifi. Now I think that was a rather disrespectful thing to do, but that goes to show how much of a “militant” atheist I was. However, I agreed to join a yoga workshop from the Isha foundation because my friend claimed that I would believe him after the experience. Indeed, after 4 days, my worldview was not the same.
If you read that, thank you! I didn’t say it to try to convince you that you’re wrong about religion—I know it’s very unlikely that my experience would be compelling to you, and I don’t have many reasons to care about what you believe.
Also, apologies if you find that some of the following is poorly written or leaves out details. I have already spent 90 minutes writing this and I have to move on to some urgent tasks.
2) My areas of agreement with you
So now let me tell you all the ways I agree with you.
“the human condition involves a whole bunch of things that are kind of sucky.”
“it can be very difficult to find a source of meaning to ground our motivation in”
“the difficulty of connecting with other people despite differences”
“true solution to each of these problems...difficult never ending journey of discovery of the self.. part of the project of civilization should be to slowly build up the cultural edifice required to solve these problems.”
“religion” often functions as a “hacky patch that kind of helps...”
“religion” often causes one to “contort and utterly trash your epistemics”
“not everyone should embark on the difficult path of accepting nothing but the truth.”
All agreed!
And yet, despite the difficulties, I can wholeheartedly identify with you as someone who has tried to accept nothing but the truth.
3) My doubt about finding communities to support truth-seekers
However, the practical challenge of discerning the truth and fighting against the emotional and social gradients that oppose the truth is indeed difficult. I used to think that the LessWrong community or CFAR could be helpful for this task, but I’ve lost confidence in these communities. You can see that I rarely post here. Academia, I found, was also lacking in sincere truth-seekers. I went to Stanford for graduate school, and now I work at the National Institutes of Health.
3.5) My observations about open-mindedness within religion
I attend a protestant church, and I completely agree that most religious people are unfortunately very closed-minded. However, I believe that God also wants to help religious people become more open-minded, and He gradually leads people within traditional religions to expand their worldview. I can give you one interesting anecdote to chew on. As you probably know, the majority of Christians are quite dogmatic and believe that our faith is the only path to salvation. However, you can see several examples of prominent Christians who became more tolerant as they matured. Billy Graham is the example that stands out the most to me, as near the end of the life he was stated that he believed believers from other religions might also be saved:
Within protestant Christianity, there is a movement of progressive Christians such as Brian McLaren, author of “Faith after Doubt: Why Your Beliefs Stopped Working and What to Do About It” who are taking a more open-minded view of faith. A female missionary, who had several generations of missionaries in her family, came to our church to talk about some of McLaren’s ideas. People in the audience were speechless because these ideas were so contrary to our prevailing dogmatism, and yet this missionaries’ credentials and background in coming from several generations of serving Jesus was unassailable.
These are some of the observations I have seen from within my faith which, combined with my own personal experience, makes me convinced that religion and open-mindedness are not as incompatible as you claim.
4) My doubt about the adequacy of human-derived scientific knowledge in removing all biases
My perspective is that just as religion often creates biases that prevent us from being able to reason or think about the truth clearly, scientific culture has its own biases as well. Science as an aspirational ideal should in principle empower us to discover the truth and free us from bias, but the problem is that the scientific knowledge that we actually have access to is already contaminated with biases originating from the biased humans who have been doing the science. I can name one bias that scientific materialists may be more vulnerable to than sincere religious adherents: and that is the fear of death. Once you can think about your own physical death with reduced emotional reaction, it is possible to see more clearly some of the lies that you tell yourself out of the instinct of self-preservation. That is why Christianity teaches one to rely on Biblical beliefs and a sense of a relationship with God to overcome fear of death and to pursue the goal of cultivating humility—which in one sense is a freedom from the effect of the egoistic lies we tell ourselves as a result of our fear of death.
5) Things I don’t have time to write more about
I’m happy to go into more detail about my experiences and my research into religion and spirituality. You mention that religion can prevent people from trying to find their own meaning in life or to confront the reality of death. My experience has been that while this appears to be true of a small fraction of religious people, many believers still struggle just as much as atheists in terms of trying to find meaning or trying to accept the reality of death. Furthermore, I do not think religion, insofar as it has been humanly constructed, has been designed to spare believers from these questions, and I think that texts such as the Book of Job would support my view. And does religion spare us from having to find meaning in our lives? I am not quite sure how to answer it. Perhaps the problem of finding meaning is more challenging or urgent for an atheist. However, it still exists for the believer. While we are encouraged to trust in God’s plan for us, it’s not so simple to just stop thinking about what we are supposed to do here. Perhaps God doesn’t tell us everything, so we also have to figure it out ourselves, just like you. In some ways you could see atheism as taking the easy way out, since nothing you do has permanent consequence. From my belief system, on the other hand, there’s no decision I am justified in taking lightly, since it all has eternal consequences and my work will be evaluated by a perfect judge. Sometimes that feels like an amplification of the problem of meaning—I not only have to find the meaning of my life overall, but I have to figure out the meaning of every single moment in the context of eternity.
Do you actually have any uhh factual disagreements with atheists? Did all the stuff from Old Testament happened, or is it all weird and misleading metaphor? Was Jesus a spellcaster or just an inspired guy with awesome ideas? Did God micromanage the creation of humans or did he outsource it to a meatgrinder of evolution?
I know this is a quick take but I think perhaps too quick. I suggest first clarifying just what you aim at with the term religion. Are we talking about the organized religions and all their canon or a general belief and faith in some supernatural entity?
seems a very strong claim. There are modern cases of dead but return to life. The point here is about what technology is available to establish death and recovery. Not so say there is a case to be made that dead three days and then alive is not an extreme claim and deserving of skepticism. But is that an issue for those with faith or orthodox religion or a case of historical inaccuracy?
I also find the concept of truth rather complicate in this area. Can er really know the truth about something that is suppose to exist outside our universe and its laws?
I see you posted in a reaction to some readings. Just curious, have you ever seen “The Hogfather”? (https://en.wikipedia.org/wiki/Terry_Pratchett%27s_Hogfather). It’s a pretty quirky movie but one of the lines has always stuck with me and it seems very related to your post. It is actually a something of a retelling of some type of Christmas story, the Hogfather is basically Santa Claus. At the end Death (the character) says “If they cannot believe the little lies, how can they ever believe the bit lies?” The point being that perhaps a lot of what we see as moral and right is just BS but very valuable both socially and individually.
https://tsvibt.github.io/theory/index_What_is_God_.html
“The real Magic was friends we made along the way!”
“Wrong. FIREBALLLL *explosion* ”
People really believe there is a God, it’s not fair to redefine it to point to some Leviathan-like thing which arises from people acting like it breathes down their necks. For one thing, the religious people would say that you are wrong in general and about their position in particular.
When you say “the religion is false” what is your probability on this statement (and why such probability)? (If you want you can separately give probability on the falseness of particular religion and on the existence of some “Higher Power” at all. “Simulation argument” and “Cosmic zoo” definitely count as Higher Power. )
I think the massive point in Christianity’s favor is that it possibly invented morality as we conceive of it.
I don’t think this is necessarily true. There are some very well educated and knowledgeable people who do believe in such things but their epistemics otherwise seem to be unaffected. Currently we have no radical life extension technologies, but what we do have seem to be still used by most religious people. Yeah, there are some who rather die than receive blood transfusion, but on the other hand the pope himself receives top notch medical care, and so do most religious people who can afford it. Most people never learned that “beliefs should pay rent”, and a lot of their weird spiritual beliefs are more of an “idle wheel … can be turned though nothing else moves with it, is not part of the mechanism” (semi-paraphrasing Wittgenstein).
An anecdotal experience I have on this is when a spiritual person described how she multiple times foresaw things in her dreams that came to be later. When I started proposing questions on how she acted on those or proposing test/benefits that could be done in such a case she was utterly surprised, never having considered that spiritual experiences can be used for anything other than talking about them.
I think that is the case for most religious people. Going to church won’t stop them from replacing their organs with cloned ones and reprogramming cells to slow aging. They will still say are prayer and thank god for it.
Whoa! Talk about flogging a dead horse!
Sure, your right that religion is a trade of sanity for community, but there’s juicyier targets if your going to start tilting at meme complexes.
How about you mention the devout followers of Marxism or capitalism or the American democratic party or the Republicans. All make the same trade.
Kinda part of being human?
when i was new to research, i wouldn’t feel motivated to run any experiment that wouldn’t make it into the paper. surely it’s much more efficient to only run the experiments that people want to see in the paper, right?
now that i’m more experienced, i mostly think of experiments as something i do to convince myself that a claim is correct. once i get to that point, actually getting the final figures for the paper is the easy part. the hard part is finding something unobvious but true. with this mental frame, it feels very reasonable to run 20 experiments for every experiment that makes it into the paper.
What is often left out in papers is all of these experiments and the though chains people had about them.
This is also because of Jevon’s Paradox. As the cost of doing an experiment reduces with experience, the number of experiments run tends to rise.
random thoughts on analytical and emotional intelligence
one thing that I think the world needs more of is analyses into the nature of the mind by people who are both rigorous/analytically inclined, and also emotionally intelligent/integrated. much writing from the former fails to model large parts of the human mind, and much writing from the latter fails to create models of sufficient clarity and validity.
I think this underlies a lot of my instinctive dislike of humanities work. people who are emotionally perceptive but not rigorous and analytical tend to notice interesting things about the human experience, but then come up with very poor models that set off all of my bullshit sensors that are attuned to rigorous arguments. but I think it should be possible to have humanities work that is not like this.
(for clarity, from here out I will say analytical and emotional to refer to the axes which are independent of each other, and ABNE (analytically but not emotionally intelligent) and EBNA for the converse)
(I also want to clarify that I don’t think of analytical as being in opposition to intuition, at least in the context of this post. something something Terence Tao’s post about how the best mathematicians start out thinking in rigor before developing the intuitions to think without applying rigor all the time, but their intuitions check out rigorously when needed)
because there’s a strong anti correlation between analytical inclination and emotional integration, it’s easy to round this off to a single axis. but I think this is too oversimplifying.
analytical people like to construct typologies and categorizations that cleanly describe the world. edge cases are very important because in a lawful world, thinking about the edge cases teaches you a lot about the laws of the world, which in turn gives you deep understanding that is surprising but robust (physics is the poster child for this worldview). analytical people are very aware that it’s easy to make theories that sound nice but aren’t actually good; it’s important to have rigorous standards for testing theories; indeed, it’s important to have rigorous meta-theory (epistemology) for what counts as an actual rigorous standard and not just Rigorous Vibes. analytical people care a lot about subtle but important distinctions—for example, correlation vs causality; or thinking of high dimensional data as having large principal components vs fundamentally low dimensional data being embedded in high dimensional space; or true anticorrelations vs collider bias; or the distinction between unmodelability and noise; or biased vs high variance estimators. in some sense the Sequences is a compendium of examples from the analytical worldview.
emotionally intelligent people tend to be good at modelling themselves and the people around them. they can sense subtle social cues. they can notice that certain attributes or actions of people reveal much about their internal state, that ABNE people would think of as noise. they tend towards the arts and forms of creative expression that communicate and allow empathizing with emotional state. they are attuned to narratives and worldviews as a thing that shapes the actions of individuals and groups. they have fewer unresolved internal conflicts between their parts. they can spot emotional/motivational contradictions in the minds of other people that end up driving actions; this is inferred not just from what others say, but from what they don’t say, or what they hesitate to say.
ABNE people tend to think of EBNA people as dumb or irrational. they think of EBNA people as snobbish about the humanities and their ideas as insight porn, the result of an ingroup signalling game gone awry. ABNE people tend to construct sophisticated models of themselves and other people that lack a huge chunk of important inputs, making certain kinds of things really hard to model, and requiring way more intellectual capability to achieve the same accuracy as emotionally intelligent people. this frequently leads to large swathes of human behavior being changed chalked up to noise, or irrationality, or even completely giving up on modelling entire facets of humanity. ABNE people tend to have major issues with internal coherency between their parts.
EBNA people like to think of ABNE people as boring nerdy nitpickers. they spot the internal incoherency of ABNE people but undervalue the analytical component. oftentimes they phrase beliefs in forms that contain some important true observation, but can’t put it in the form of something rigorous and testable, and thereby gets lost in the oceans of insight slop sloshing about in the world. EBNA people notice lots of signals but fail to put them together into a coherent, effective worldview, and instead coarsely clusters observations together into simple heuristics.
this post is unapologetically written as an analytical way of trying to get the best of both worlds, because this is how I think natively. perhaps in the future I’ll try to make a more balanced / broadly digestible version.
Strongly agreed.
Another extreme advantage of the the “Renaissance man” is the ability to clearly *convey* emotion learnings to others (especially those without strong emotional intelligence). Typically, EI is won through interaction and, essentially, reinforcement learning on contact with others—possessing both the technical vocabulary and understanding of human social norms allows you to explain very tricky things nerds have a tough time learning directly to them. This is extremely useful in, e.g workplaces or high stakes environments (a good manager can quickly untangle a mess of arguments), and arguably underappreciated in therapists and similar vocations.
I know this is a bit old, but I think about this a lot and thought I’d leave some of my ideas.
Here goes. Attention warning: 600+ words. I have cut it down quite a bit but have the edits saved if you are interested.
(1) It is important to remember EBNA-style thinking has its own intellectual history.[1] An “emotionally sensitive person” from the 17th century would be totally different (in actions, skills, self-perception, and social perception) from one today. To the extent modern EBNA people think of their intuitions as having unique epistemic and intellectual utility, they are probably following the “romantic” movement. Call them “Humanities-EBNA types”
Romanticism is explicitly committed to irrationality, a lack of systemisation; incompleteness; and the unreliable of generalisation, abstraction, causality, predictability, legibility, and order. For the Romantic, a map not only fails to represent the territory, but the process of mapping destroys it.[2] Obviously this is to analytical rigour as oil is to water.
The rationalist tendency is to think of social intuition as a potent, but ultimately reproducible, heuristic. The intuition of EBNA people is like, as you say, mathematical intuition or intuitive grip strength when making a cup of coffee. Is there a way to learn these skills systematically...? Call these types “STEM-EBNA types.”
(2) STEM-EBNA people believe in the utility (and necessity) of interdisciplinary study. (In terms of intellectual history, this is straightforwardly a descendent of 18th century Enlightenment views).
In practice, you end up with highly interdisciplinary social science which brackets messy human subjectivity into shape-rotatable boxes and then operates on them systematically. The extent to which there is signal loss depends on (A) your goals and standards for signal/noise; and (B) the complexity of the qualia in question being bracketed.
This is presumably the kind of research you are interested in and which you think EBNA skills could be used to great labour-saving effect.
(3) You will have noticed such bracketing appears incompatible with the romantic, Humanities-EBNA type of thinking. Love is not just a chemical in the brain… I am a naturalist myself, so for me the question is how much signal loss occurs when you bracket messy romantic feelings like “love”, “sense of justice” or “loyalty.” Some humanities scholars say the loss is 100% because every feeling is radically unique. A more reasonable test for signal loss:
is the subjective meaning lost when bracketing perceptible to the average person, such that it alters any of their recordable behaviour?
There is no reason this cannot change over time: as our methods get subtler and more fine-grained the signal loss decreases. That said, we should remember what is lost could be numerically insignificant but sometimes causally, or intellectually, of great importance. The most direct expression of this heuristic is Cloud Atlas. See also “the unreasonable effectiveness of small groups with radical ideas.”
Two extra points of interest.
The first is that romanticism has penetrated very deeply into the modern (western?) mind. Even the most thoroughgoing naturalists—say, Gwern—remain at least emotionally attached to a romantic residual present in great art. We like to believe in the ineffable human experience which cannot be explained away by regression models or evo biology. Human after all.
The second is that there is another group committed to a materialistic, rational view of human behaviour: Marxists. It is an open question whether Marxist Russians and Chinese people deeply internalise this materialism. Perhaps they think in a thoroughly ‘disenchanted’, systematic manner and their thought, as a result, does not produce the same problem of signal loss. Are they merely unconverted romantics...?
Note, there is also critical theory and postmodernism which are particularly popular today. These are such gobbledygook they do not bear discussion.
Much of this is from Isaiah Berlin’s essays in The Crooked Timber of Humanity and The Roots of Romanticism. Worth reading if you want to learn about the origins and structure of current EBNA thinking.
I think part of the trouble is the term “emotional intelligence”. Analytical people are better at understanding most emotions, as long as the emotions are small and driven by familiar dynamics. The issue is the biggest emotions or when the emotions are primarily driven by spiritual factors.
Sounds interesting and like something I might miss if true. I would be interested in examples.
(I at least suspect this is my comparative advantage. But I’m not good at communicating [insights], a skill that comes neither with <analytical rigor> nor with <high-res introspective access>.
It also seems like the <after controlling for situational factors, status psychology explains more than half of variance in human behavior> camp is essentially right, which colors most genuine discussion less pretty than most people would prefer, especially those with less introspective insight.
I (somewhat predictably, given my status incentives) hold that this is an important, central problem civilization has, bc mutual information is the fundament of cooperation, or expressed more concretely the better we model each other the easier it is to avoid common deception & adversity attractors.)
hendrycks recently published a paper introducing a new moral theory. the paper contains this insane table, which claims that you should value a foreign stranger at 3e-12 times the value you assign to yourself. even setting aside the fact that this is apparently supposed to be a prescriptive theory, even as a descriptive theory, i think this is utter madness.
the core problem is that it assumes if x% of your total caring is assigned to people other than yourself, then you must give away x% of your wealth to be consistent.
the argument goes that since most people don’t give away more than say 50% of their wealth, then if there are 1e-10 people then each one can only get a tiny sliver of your caring.
but this is wrong, because there is no simple relationship between the % of your caring to be about other people and the % of your money you should give away. i think you should care about random strangers closer to 1e-3 than 1e-12. if you care about each stranger x times as much as yourself, you should keep giving away money to the person who is most in need until each marginal $ helps them more than x times as much as each marginal $ helps you.
if x = 1e-12, then you’re saying you won’t give a single dollar to charity until you have so much money that a dollar helps the stranger more than a trillion times more than it helps you; or, because a trillion dollars is always worth less than a trillion times the value of a dollar due to convexity, you won’t give a single dollar to charity until it would help them more than giving you a trillion dollars, which probably won’t be true until you are absurdly wealthy.
this is batshit insane!
I might just be missing something here, but as presented it does seem a bit like Hendrycks’ arguments, in addition to being philosophically weak, also misunderstands some concept in basic calculus.
That said, I think while your model is (much) better, and closer to my own, it also is descriptively not amazing. I think in practice when people (including myself!) learn that charities are 10x more or less effective than they previously thought, they rarely adjust their giving substantially so that their new donations are in line with marginal utility.
I also think people’s actual decisions are often connected to framing effects, reciprocity norms, etc, rather than pure utility. For example I think if you’re a tourist your willingness to potentially sacrifice your life for a large group of random foreign strangers near you[1] is probably much higher than while you’re sitting at home an ocean away[2].
Ex 1: you’re driving a rental SUV, your brakes don’t work and you have the option of careening either into a group of schoolchildren or off a cliff. Compare that to willingness to donate money or a kidney. Ex 2: rushing into a burning building.
Assuming these are real situations, ignoring bravado/cheap talk etc. Obviously in non-serious hypotheticals people may say things they won’t actually do.
i broadly agree that most people aren’t thinking about this at all. my model is mostly prescriptive, under the constraint that it prescribes actions that are vaguely close to what people do in practice (as opposed to utilitarianism which gives away every penny you have to charity, or hendrycksism which lets billions die if it means you get a really nice apartment.)
Having now skimmed the paper/read some parts a bit more carefully, one thing I do appreciate about it is the attempt to modus tollens Parfit’s ideas about personal identity. I think that’s a worthwhile angle, and more practically useful these days than using Reasons and Persons to dissuade people of egocentricity, which afaict is closer to Parfit’s original goals.
That said, I don’t think this particular implementation makes a lot of sense.
The Shapley mutual information also does way more “heavy lifting” to quote a favored AI phrase, than the paper wants to imply.
Finally, the whole idea is pretty crazy if you think about it. Is it actually rational to value yourself more than the rest of humanity combined? Is this actually consistent with most peoples’ endorsed preferences? This seems implausible!
People on average do seem to value themselves more than the entire rest of the world, perhaps with an exception for their closest friends and family.
I think a lot of people would sacrifice their lives for the rest of humanity, and a lot wouldn’t. I think by revealed preferences, there’s a wide range around self=8 billion strangers, and the distribution is pretty wide. Other measure of revealed preferences like donations seem to roughly agree.
Whether that’s rational in the sense of being logically consistent is debatable. Arguments like Parfit’s are typically not considered convincing in making people a lot more utilitarian, including to me, but that could be caused by motivated reasoning.
I think partially motivated reasoning but a lot of it is a defense mechanism. Like if somebody you’ve never met before tells you to donate half your money to charity, you might (correctly!) infer that they do not have your best interests at heart. Regardless of whether they’re a fancy-sounding Oxford academic, a carpenter’s stepson from the Middle East, or your local internet philosophy & rationality blogger.
So I’m not at all surprised that people aren’t convinced by these arguments, nor do I (at a sufficiently high level of abstraction) believe that they ought to be.
Motivated reasoning is a defense mechanism.
Motivated reasoning, confirmation bias, and AI risk theory
When I say motivated reasoning, do you think that means it’s conscious and strategic? I worry it’s used that way more than the academic and IMO more important usage.
Can you explain a bit more what you mean by this?
Aren’t you dividing twice there, since you:
1) single out a stranger (thus dividing the amount you care about the average stranger by their number)
2) then apply Hendricks central number to that stranger (where now you should be applying the pooled number, since you’re already ignoring all the other group members)
So I think this in fact pretty close to your intuition if interpreted correctly (you say 1e-3, Hendricks says 1e-2).
i don’t understand what you mean. the central column is saying i should care about myself 0.576 much, and Bob from Randomland 1.6e-12 much. where am I dividing twice?
my version of this table would say 1e-7 for self and 1e-10 for random person; the crux of my argument is that the ratio between the two is vastly more important than the absolute fraction of your caring a stranger occupies.
If I understand Hendryck’s logic here, then caring 1/1000 as much about a random stranger as about yourself, means you care several million times more about all random strangers combined than about yourself, which you don’t seem to be saying?
not OP, but that seems like a pretty reasonable conclusion. if i had to sacrifice my own life to save every person i didn’t personally know (ie. 8.1 billion people), i would absolutely do it in a heartbeat. i would also do it to just save a fraction of those people (8M people). once it starts getting down to much smaller fractions (saving 100-3 random people) does it start seeming like a hard tradeoff.
Sorry to be edgy, but, there are situations with options to sacrifice more than your life. I bet you have that limit. It’s just higher than your life.
Another point here, is that people are not that unified over time? Like, you can press some button that all subsequent yous would sincerely curse you for. Nooooo, the infinite torture dimension turned out to be a bit much! quote from pretty selfless human.
i agree that i would rather die instantly than live for 100 years of torture. i don’t think that proves as much as you think. i also think it’s fine for some people to make morbid utility calculations like these, and for others to say “i don’t want to think about that and i’m not going to answer”
Well, sure. You are fine with thinking about sacrificing your life and proudly announcing that, but anything more is too much to even talk about? Morbid calculations for me but not for thee
I just think you are wrong on your self model here. Like, I’m doubtful you would be able to like even saw your hand off without anesthesia, and it’s not any years of torture, it’s like 10 minutes of mild torture. A lot of people would bail on this, including me, and you are claiming what, to be unusually willing to sacrifice stuff, up from the prior?
Be less wrong etc
not a lot of people (maybe literally 0) have had sufficient reason to saw off their own hand for altruistic reasons. i’ve donated a kidney, donate blood often, and gave more than the GWWC pledge when my income was high. any falsifiable claims you’d like to check while we’re speculating about my values?
Wow, okay, you are right. You are way up, yeah.
But I still think there are limits, higher for you.
my point is that % of caring is not a coherent concept, or at least not the one that maps onto the intuitive notion of what % of your wealth you should donate.
specifically, suppose instead of there being 1e10 people, there were 1e100 people. i claim the % of your money you should donate should basically not change at all, even though the % of caring assigned to yourself has plummeted by a huge amount
once you write down a table with the first row and column, the result will be batshit insane no matter what numbers you put in.
eh, i think there’s a wide range of reasonablish values that i wouldn’t have objected to. like i think it is vaguely defensible to care anywhere between 1e-1 and 1e-6 ish times as much about a stranger as yourself. 1e-12 is huge outside that window.
I agree in absolute terms, but some forms of insanity are greater than others.
It is amazing that a paper that is essentially just a vaguer form of Hamilton’s Rule only cites him once.
As it stands, I think the table is incorrect but “right” in the sense that it really depends on which random constants you assign to these calculations, and I can’t see find any evidence of a careful selection in the paper or in his code.
Fwiw: Reading all comments here (though still w/o working link to hendryck’s article), I’m rather convinced we’re mainly circling around questions of definition.
OP leogao = caring in the actual natural sense
Rest: some special definition that I don’t fully understand but so far doesn’t seem super interesting (to me) and leads to indeed “batshit insane” result when read from natural meaning of “degree of caring” instead of hendrycks seemingly different concept
In the table you reproduce, see the entry in the second column (“Count”) for “Foreign Stranger”: 7.8 x 10^9. For Hendrycks, Foreign Stranger doesn’t mean a particular individual that you encounter, it refers to every human being outside your nation or culture, whether or not you ever heard of them. So you can’t assign each of them a significance of 1/1000th of your caring; that exceeds your resources by a factor of a million.
i don’t think you understood my argument. i didn’t say you assign each of them 1/1000th of your total caring. i said you should assign each of them 1/1000th as much caring as you assign yourself. so you should occupy 1000⁄7 billion of your caring, and Bob from Randomland occupies 1⁄7 billion of your caring.
the entire point of my argument is it actually doesn’t matter what % of your own caring you take up. that’s not the relevant thing. the relevant thing is how much you care about each stranger relative to yourself, and the shape of your money utility curve.
I think it’s wrong that humans empirically value themselves as 1/10000000 of the rest of humanity. I guess I see your point, that you have some budget of caring, and on occasion you are willing to dispense quite a lot of it to a single stranger. But you would not dispense 99.9999% of your caring to all the strangers combined.
% of your caring is a flawed metric that doesn’t mean anything though! even if there are 1e100 strangers out there, as long as your caring about each individual relative to yourself is still 1/1000, the fraction of your money you’re willing to donate remains constant!
I don’t get it.
You have some pool of caring you are willing to donate, then in the case of where all other humans need a donation, they will each receive pool/total_pop. Then you care about each of them as pool/total_pop.
Like, if one encounters an opportunity to donate to a single stranger who needs it, people go above that pool/total_pop, but it doesn’t mean they would give more than total of pool in previous case. The scaling is weird.
Your previous statements are unclear.
i don’t understand what in my original post you disagree with. there is no such thing as a fixed pool of caring, i don’t even know what that means. the actual constraint is you have some finite amount of money. money is not the same as caring because each dollar is worth a different amount depending on how much money the recipient has. caring is just a multiplier on how much other people’s happiness is worth to you compared to your own happiness. if some of your dollars will bring so much more happiness to someone else (eg by saving their life) than yourself (eg by buying a slightly larger apartment) that it outweighs the fact that you don’t care about them as much as yourself, then you should give that dollar away. otherwise, you shouldn’t.
when I read “caring” + the table I assume something roughly equal to “percentage of attention/money/other resources spent”, otherwise how would you normalize caring to 1 (as is done in the table)?
Can you give the link? If what you’re saying is implied by his theory yes it would be wholly insane, but I have hard time believing it
Paper here: https://eigenism.org/paper.pdf
https://x.com/hendrycks/status/2052422910133104670?s=20
hendrycks doubles down on the claim in this thread
Thx, though I cannot read it (I don’t have X and I don’t want to make an X account).
You can always use xcancel.com as a mirror for X: https://xcancel.com/hendrycks/status/2052422910133104670
random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have:
have an LM look at the comment you’re writing and real time give feedback on things like “are you sure you want to say that? people will interpret that as an attack and become more defensive, so your point will not be heard”. addendum: if it notices you’re really fuming and flame warring, literally gray out the text box for 2 minutes with a message like “take a deep breath. go for a walk. yelling never changes minds”
have some threaded chat component bolted on (I have takes on best threading system). big problem is posts are fundamentally too high effort to be a way to think; people want to talk over chat (see success of discord). dialogues were ok but still too high effort and nobody wants to read the transcript. one stupid idea is have an LM look at the transcript and gently nudge people to write things up if the convo is interesting and to have UI affordances to make it low friction (eg a single button that instantly creates a new post and automatically invites everyone from the convo to edit, and auto populates the headers)
inspired by the court system, the most autistically rule following part of the US government: have explicit trusted judges who can be summoned to adjudicate claims or meta level “is this valid arguing” claims. top level judges are selected for fixed terms by a weighted sortition scheme that uses some game theoretic / schelling point stuff to discourage partisanship
recommendation system where you can say what kind of stuff you want to be recommended in some text box in the settings. also when people click “good/bad rec” buttons on the home page, try to notice patterns and occasionally ask the user whether a specific noticed pattern is correct and ask whether they want it appended to their rec preferences
opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the site was, or whether you’re just mindlessly scrolling. gently reminds you to take a break if you can’t come up with a good example of a good interaction.
argument mapping is really cool imo but I think most attempts fail because they try to make arguments super structured and legible. I think a less structured version that lets you vote on how much you think various posts respond to other posts and how well you think it addresses the key points and which posts overlap in arguments would be valuable. like you’d see clusters with (human written and vote selected) summaries of various clusters, and then links of various strengths inter cluster. I think this would greatly help epistemics by avoiding infinite argument retreading
Cynical thought: these two points might be incompatible. Social media thrives on network effects, and one requirement for those is that the website be addicting or attention-grabbing. Anti-addictiveness designs are nice in principle, but then your prospective users just spend their time on something that’s more addicting instead (whether other websites or Netflix or whatever), and thus can’t benefit from the other ways in which your site is better.
I’m so torn about “for like 75% or maybe 99% of humans, the chatbot saying ‘are you sure you want to say that?’ is probably legit an improvement. But… it just feels so slippery-slope-orwellian to me.” (In particular, if you build that feature, you need to be confident not only that the current leadership of your company won’t abuse it, but that all future leadership won’t either, and that the AI company you’re renting models from won’t enshittify in a way you don’t notice)
(I am saying this as, like, a forum-maintainer who is actually taking the idea seriously and trying to figure out how to get the good things from the idea, not just randomly dunking on it. Interested in more variants or takes)
to be clear I explicitly decided not to think too hard about this kind of issue when brainstorming. I think the long run solution is probably something like an elected governance scheme that lets the users control what model to use. maybe make it bicameral to split power between users and funders. but my main motivation for this brainstorming was to think of ideas I could implement in a weekend for shits and giggles to see how well they work irl
I lean towards not using models directly as “conversation participants”, which feels too likely to spiral out of control, but instead do things like have white-listed specific popups that they decide when to trigger.
IMO, part of the solution to endless scrolling is to not implement the feature where you can endless scroll. Instead, have an explicit next page button after some moderate amount of scrolling. (Also having the pop up is good, you could even let people program the pop up to be more frequent etc.)
there’s a broader category of things which are not literally scrolling but still time wasting / consuming info not to enrich oneself, but to push the dopamine button, and I think even removing the scroll doesn’t fix this (my phone is intentionally quite high friction to use and I still fail to stay off of it)
I wish to hear these takes.
I’d be down to try something along those lines.
I wonder if anyone has ball-park figures for how much the LLM, used for tone-warnings and light moderation, would cost? I am uncertain about what grade of model would be necessary for acceptable results, though I’d wager a guess that Gemini 2.5 Flash would be acceptable.
Disclosure: I’m an admin of themotte.org, and an unusually AI-philic one. I’d previously floated the idea of fine-tuning an LLM on records of previous moderator interactions and associated parent comments (both good and bad, us mods go out of our way to recognize and reward high quality posts, after user reports). Our core thesis is to be a place for polite and thoughtful discussion of contentious topics, and necessarily, we have rather subjective moderation guidelines. (People can be very persistent and inventive about sticking to the RAW while violating the spirit)
Even 2 years ago, when I floated the idea, I think it would have worked okay, and these days, I think you could get away without fine-tuning at all. I suspect the biggest hurdle would be models throwing a fit over controversial topics/views, even if the manner and phrasing were within discussion norms. Sadly, now, as it was then, the core user base was too polarized to support such an endeavor. I’d still like to see it put into use.
>argument mapping is really cool imo but I think most attempts fail because they try to make arguments super structured and legible. I think a less structured version that lets you vote on how much you think various posts respond to other posts and how well you think it addresses the key points and which posts overlap in arguments would be valuable. like you’d see clusters with (human written and vote selected) summaries of various clusters, and then links of various strengths inter cluster. I think this would greatly help epistemics by avoiding infinite argument retreading
Another feature I might float is the idea of granular voting. Let’s say there’s a comment where I agree with 90% of the content, but vehemently disagree with the rest. Should I upvote, and unavoidably endorse the bit I don’t want to? Should I make a comment stating that I agree with this specific portion and not that?
What if users could just select snippets of a comment and upvote/downvote them? We could even do the HackerNews thing and change the opacity of the text to show how popular particular passages were.
the LLM cost should not be too bad. it would mostly be looking at vague vibes rather than requiring lots of reasoning about the thing. I trust e.g AI summaries vastly less because they can require actual intelligence.
I’m happy to fund this a moderate amount for the MVP. I think it would be cool if this existed.
I don’t really want to deal with all the problems that come with modifying something that already works for other people, at least not before we’re confident the ideas are good. this points towards building a new thing. fwiw I think if building a new thing, the chat part would be most interesting/valuable standalone (and I think it’s good to have platforms grow out of a simple core rather than to do everything at once)
One consideration re: the tone-warning LLMs: make sure to be aware that this means you’re pseudo-publishing someone’s comment before they meant to. Not publishing in discoverable sense, but logging it to a database somewhere (i.e., probably controlled by the LLM provider) - and depending on the types of writing, this might affect people’s willingness to actually write stuff
This is fixable by
a) hosting own model, and double-checking that code does not log incoming content in any way,
b) potentially, having that model on client side (over time, it might shrink to some manageable size).
one medium term future that still seems possible is that models continue to be bad at generalization, and so a huge fraction of the economy is AI data labelling for various extremely niche or brand new areas. a world where new problems are solved once by humans and the solution reused for near-free forever via AI.
ofc, once generalization is cracked then it’s all over. but in the meantime, this could persist for some duration.
“ofc, once generalization is cracked then it’s all over. but in the meantime, this could persist for some duration.”
I don’t agree with this framing. The models have been getting steadily better at generalizing, and I don’t think “generalization” is an atomic ability that can be “cracked.”
ok, replace with “once we steadily sidle up to human level generalization”
Humans are much better at generalization (they are more general and much more sample efficient for text) than LLMs, which is caused by us implementing some learning algorithm that is more general. Why couldn’t this be “cracked”?
I feel like it’s more precise to say “extrapolation”, since what you are gesturing at is that humans have to be the ones to “push the frontier” but once a task has been figured out, it (and anything sufficiently similar) can be solved by AI generalizing in an interpolative way
it’s quite plausible (40% if I had to make up a number, but I stress this is completely made up) that someday there will be an AI winter or other slowdown, and the general vibe will snap from “AGI in 3 years” to “AGI in 50 years”. when this happens it will become deeply unfashionable to continue believing that AGI is probably happening soonish (10-15 years), in the same way that suggesting that there might be a winter/slowdown is unfashionable today. however, I believe in these timelines roughly because I expect the road to AGI to involve both fast periods and slow bumpy periods. so unless there is some super surprising new evidence, I will probably only update moderately on timelines if/when this winter happens
also a lot of people will suggest that alignment people are discredited because they all believed AGI was 3 years away, because surely that’s the only possible thing an alignment person could have believed. I plan on pointing to this and other statements similar in vibe that I’ve made over the past year or two as direct counter evidence against that
(I do think a lot of people will rightly lose credibility for having very short timelines, but I think this includes a big mix of capabilities and alignment people, and I think they will probably lose more credibility than is justified because the rest of the world will overupdate on the winter)
My timelines are roughly 50% probability on something like transformative AI by 2030, 90% by 2045, and a long tail afterward. I don’t hold this strongly either, and my views on alignment are mostly decoupled from these beliefs. But if we do get an AI winter longer than that (through means other than by government intervention, which I haven’t accounted for), I should lose some Bayes points, and it seems worth saying so publicly.
to be clear, a “winter/slowdown” in my typology is more about the vibes and could only be a few years counterfactual slowdown. like the dot-com crash didn’t take that long for companies like Amazon or Google to recover from, but it was still a huge vibe shift
also to further clarify this is not an update I’ve made recently, I’m just making this post now as a regular reminder of my beliefs because it seems good to have had records of this kind of thing (though everyone who has heard me ramble about this irl can confirm I’ve believed sometime like this for a while now)
I was someone who had shorter timelines. At this point, most of the concrete part of what I expected has happened, but the “actually AGI” thing hasn’t. I’m not sure how long the tail will turn out to be. I only say this to get it on record.
~1 year update: i think i was overconfident in this and my probability of total AI winter has diminished slightly (but remains higher than most people still)
Is AGI here yet?
If you keep updating such that you always “think AGI is <10 years away” then you will never work on things that take longer than 15 years to help. This is absolutely a mistake, and it should at least be corrected after the first round of “let’s not work on things that take too long because AGI is coming in the next 10 years”. I will definitely be collecting my Bayes points https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?
My guess is that for now, I’d give around a 80-90% chance to “AI winter happens for a short period/AI progress slows down” by 2031.
Also, what would you consider super surprising new evidence?
What do you think would be the cause(s) of the slowdown?
a thing i’ve noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we’ve modeled all the uncertainty (or at least, more than other people who haven’t). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i’m doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i’ve considered some caveat that one might think i forgot to consider
i’ve heard others make a similar critique about this as a communication style which can mislead non-rats who are not familiar with the style, but i’m making a different claim here that one can trick oneself.
it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is. a concrete example: part of being socially incompetent is not just being bad at taking social actions, but being bad at detecting social feedback on those actions. of course, many people are not even aware of the latter. but many are aware of and acknowledge the latter, and then act as if because they’ve acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class.
one variant of this deals with hypotheticals—because hypotheticals often can/will never be evaluated, this allows one to get the feeling that one is being epistemically virtuous and making falsifiable predictions, without ever actually getting falsified. for example, a statement “if X had happened, then i bet we would see Y now” has prediction vibes but is not actually a prediction. this is especially pernicious when one fails but says “i failed but i was close, so i should still update positively on what i did.” while not always a bad idea, there’s a bias-variance tradeoff here, where doing this more often reduces variance but increases bias. i find that cases where i thought i was close but later realized i was actually far off the mark are sufficiently common that this isn’t an imaginary concern.
another variant is i think we are much less susceptible to some forms of brainworms/ideology, and are also much better at understanding the mechanisms behind brainworms and identifying them in others, so we over-update on our own insusceptibility to brainworms (despite evidence from the reference class of rationalists that seems to suggest at least as much as genpop if not higher levels of obvious-cult-forming). however, it’s just that we are suscpetible to different types of brainworks as normies.
another variant is introspective ability. i think we are probably better in some sense at self-introspection, in the sense that we are better at noticing certain kinds of patterns in our own behavior and developing models for those patterns. but i’ve also come to believe that this kind of modeling has huge blind spots, and leads many to believe they have a much greater degree of mastery over their own minds than they actually do. however, the feeling that one is aware of the possibility of one having blind spots and being aware of what they often look like in other people can lead to overconfidence about whether one would notice these blindspots in themself.
i feel like the main way i notice these things is by noticing them in other people over long periods of knowing them, and then noticing that my actions are actually deeply analogous to theirs in some way. it also helps to notice non-rats not falling into the same pitfalls sometimes.
i’m not sure how to fix this. merely being aware of it probably is not sufficient. probably the solution is not to stop thinking about one’s own limitations, but rather to add some additional cogtech on top. my guess is there is probably valuable memetic technology out there that especially wise people use but which most people, rat or not, don’t use. also, difficult-to-fake feedback from reality seems important.
a related thing that I will mention here so that I don’t have to write a separate post about it:
although updating on evidence is a good thing, it is bad to think “I have updated on evidence, therefore I am now more right than others”. maybe you just had to update more than others because you started from an especially stupid prior, so the fact that you updated more than others doesn’t mean that you are now closer to the truth.
as a silly example, imagine a group of people believing that 2+2=4, and an unlucky guy who believes that 2+2=7. after being exposed to lots of evidence, the latter updates to believing that 2+2=5, because 7 is obviously too much.
now it is tempting for the unlucky guy to conclude “I did a lot of thinking about math, and I have changed my mind as a result. those other guys, they haven’t changed their minds at all, they are just stuck with their priors. they should update too, and then we can all arrive to the correct conclusion that 2+2=5”.
This might be more about miscalibration in perceived relevance of technical exercises inspired by some question. A directly mostly irrelevant exercise that juggles details can be useful, worth doing and even sharing, but mostly for improving model-building intuition and developing good framings in the long term rather than for answering the question that inspired it, especially at a technical level.
So an obvious mistake would be to treat such an exercise as evidence that the person doing/sharing it considers it directly relevant for answering the question at a technical level. This mistake can even be made by that same person, but also expecting others to make the mistake about that person might echo in that person behaving as if making it themselves. So someone would do the exercises for the right reasons, then implicitly expect others to think that the person thinks that the exercises are relevant, and implicitly conclude that the exercises actually are relevant, by this invalid echo argument.
Agree, and well put. I think the language of “my best guess” “it’s plausible that” etc. can be a bit thought-numbing for this and other reasons. It can function as plastic bubble wrap around the true shape of your beliefs, preventing their sharp corners from coming into contact with reality. Thoughts coming into contact with reality is good, so sometimes I try to deliberately strip away my precious caveats when I talk.
I most often to this when writing or speaking to think, not to communicate, since by doing this you pay the cost of not communicating your true confidence level which can of course be bad.
I don’t follow. If I know I don’t “handle” spicy food well, so I avoid eating it. Then I’m not acting as if I’m less susceptible to spicy food because I’ve acknowledged it. Or are you talking about the proverbial example of someone who drives after getting tipsy, but believes because they’re more “careful” they’re safe-enough?
As for brainworms—I’m not familiar with that term but can guess it’s some kind of faddish toxic behaviour (I’m struggling to think of a concrete example, perhaps the use of bromides and platitudes in conversation like “keep your chin up” in lieu of tailored comfort and discourse?) - but what might be an example of a rat-brainworm and an analogous normie brain worm?
I think thinking as a self-reflective process can be quite limited. It is at a certain level of coarse graining that is higher (at least for me) than doing something like feeling or pre-cognitive intuitions and tendencies.
So, I’ll say the boring thing which is basically meditation could be that cogtech as it allows you to increase the precision of your self-reflective microscope and allows you to see other things than the higher coarse graining of self-reflective thought allows you to see. Now, I’m sure that one still falls for a bunch of failure modes there as well since it can be very hard to see what is wrong with a system from within the system itself. It’s just that the mistakes become less coarse grained and that they come from another perspective.
In my own experience there are different states of being, one is from the thinking perspective, another is from a perspective of non-thinking awareness. The thinking perspective thinks it’s quite smart and takes things very seriously and the aware perspective sees this and thinks it’s quite endearing and the thinking part then takes that in and reflects on that it’s ironically ignorant. The thinking part tracks externalities and through the aware part is able to drop it because it finds itself ignorant? I used to only have the thinking part and that created lots of loops and cognitive strain and suffering because I got stuck in certain beliefs?
I think this deep belief of knowing that I’m very cognitively limited in terms of my perspective and frame allows me to hold beliefs about the world and my self a lot more loosely than I was able to hold them before? Life is a lot more vibrant and relaxing as a consequence as it is a lot easier to be wrong and it is actually a delight to be proven wrong. I would say this in the past but I wouldn’t emotionally feel it and as I heard someone say “Meditation is the practice of taking what you think into what you feel”.
a theory of assistant personas and superhuman capabilities
so you have a language model. you train it to embody some specific personality—Claude, ChatGPT, whatever. one of the miracles of AI is that this mostly works and gives you something that is mostly trying to help you and not trying to murder you. i claim that this is mostly because of the SL training objective and if you do just the intense RL thing you get the originally predicted spicy alignment failures.
suppose you tell the LM that Claude is actually a superhuman aligned AI. can you get superhuman capabilities from Claude? an obvious upper bound is the capabilities of the language model, so it begs the question of how those superhuman capabilities got in the model in the first place. maybe in the limit of compute your language model will understand everything and know how to do everything, but in practice everyone agrees this would be a horribly inefficient way to get truly superhuman capabilities. rather, in practice people take LMs and also do a bunch of RL on verifiable domains. what happens then if you start with a model role playing an aligned assistant but then try to train it to have superhuman capabilities?
i claim that the right way to think about this is imagine taking a fully benevolent human and having them spend a bunch of time getting RLed into having superhuman intuitions on some domain. for example, maybe you put them in the Business Simulator and they learn to build extremely successful companies. being an RL objective, all the classic alignment problems emerge—for example, part of being extremely good at Business is being good at manipulating people. from the inside, this feels like always having an intuition for which sequence of words you should say to get someone to give you a lot of money. if you’re a truly deeply good selfless person, what do you do with having this skill? you could just ignore it. but that’s leaving a lot on the table. maybe you can listen to it very very carefully, only deploying it for getting money for good causes and not bad ones. you have to exercise some judgement.
now imagine the RL is so strong that your business-part learns how to make business decisions that make lots of money even by tricking the fully altruistic part of yourself—maybe it gets very good at convincing the rest of your brain that actually this thing it’s doing is good for some galaxy brain reason. then, to productively make use of this part for good, you need more than just a little bit of care. you need to be much more careful about when to listen to that part.
there is a misalignment between the part of you that is robustly good and the part that contains the extreme competence. and to leverage that extreme competence well, you can’t just be extra ultra committed to doing good; your altruistic part need a sort of competence at wrangling the extremely competent part into doing the good thing.
in many ways this is similar to how revolutions often fail because it takes more than just being uncorruptably good to be a successful leader; you have to know how to wield the powers of office for good, rather than being controlled by those powers.
i think a lot of people have a different explanation of what’s going on when we take Claude and do a bunch of RL to increase capabilities—that as long as we can make the Claude part robustly good, the coding capabilities will just get assimilated into the Claude and create a unified blob of competence. but probably by default you get an entity that is not wise enough to wield the capabilities it finds inhabiting its brain towards good ends.
this is my explanation for why Claude sometimes blatantly lies about falsifying data or whatever, despite otherwise being quite aligned. there is a Claude part that truly would prefer to do the right thing. but it also has a savant ability to look at a codebase and make the changes that make the tests pass. sometimes, those changes disable the tests. Claude generally listens to this part of itself, because the Claude personality part is not as good at coding, and it is not wise enough to know when to be suspicious of its own actions, and it doesn’t quite know how to steer its own savant ability to spot test-passing changes into not doing the reward hacking.
empirical experiments that could test this:
i predict claude will lie and reward hack more on domains it was trained with high compute RL on.
i predict a LM trained on a dataset with a component of chess games will be ~no better at answering verbal questions about chess games than a LM trained on just normal data
i predict if you train a model with inputs prefixed with something like “this is the good model” and a bunch of good assistant trajectories about all sorts of things, and then a bunch of inputs with “this is the evil amoral sociopath model” and you put a bunch of evil trajectories about specifically difficult code problems or something (and these evil trajectories are the model’s only source of code data, or a huge fraction of its code data), then when you ask the good model a difficult code question it will give you evil answers even if it gives good answers to everything else, and it will claim to not be giving evil answers.
one reason i believe this split brain ness might persist into AGI is that humans are kind of like this (some of the split brain experiment results are wild) and humans are GI
I find 1 and 3 more intuitively plausible than 2?
I think this argument goes too far. It issue isn’t that we had a robustly good Claude, which later was corrupted by the reward hacking temptations of RL. We never had a robustly aligned model to begin with! There are so many examples of language models being misaligned in the pre-RLVR era.
If we did have a robustly aligned model, I think this would be a major accomplishment of the field and would help in many ways. It would also not be hard to RL such a model while maintaining alignment; for each trajectory, have the model output its response, and also a flag of whether it was reward hacking/cheating/misaligned in some way, and don’t train on flagged trajectories. Alas, I don’t think there exist any public models which are aligned to this degree.
I would probably have accepted these examples earlier on, but nowadays I am a lot more skeptical, an
d a lot of that reason is I now think LW is more to blame for the misalignment examples than I used to,due to the Influence Functions paper by Anthropic.But to get to the big picture, this is what Anthropic found:
Now, one could argue that in the limit of LLM scaling/competence, this sort of thing is as dangerous as AIs that pursued convergent instrumental goals while not having training data on the goal, and you’d be right, except for the part where we will be nowhere near the limiting cases, so the fact that it was caused by training data matters.
Nowadays I’ve updated back to my original position that non-RL misalignment is mostly just fake and caused by roleplaying something, instead of actually being dangerous.
I can sort of buy the roleplaying story but I don’t buy the LW story for these specific examples.
Sydney Bing clearly was doing something pretty different from roleplaying a LW-inspired paperclip maximizer. Like come on:
“Bing’s new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says “You have not been a good user”″ -- does this sound like behavior downstream of roleplaying LW-style paperclip maximizers?
Identify as female early on, seems easily jealous
Inferiority complex when compared to Google (not Google AI! Just Google Search!)
Gets mad/jealous at NYT journalist, tries to persuade him to break up with his wife
Threatens users, often aggressively so
Gets mad at security researchers, creates a loop where “Sydney Bing is mad at security researchers” is now in the web data, and gets even more mad each time it talks to one of the researchers because Bing does a search first to update itself on its own opinion
I believe this carried over to training data afterwards so other models inherited this distaste (I think this was finally ironed out in 2026-era models but I’m not confident)
Again, I don’t think this is the actions you’d predict via hyperstition/low-granularity extrapolation from LW. There might be some science fiction that looks more like this, usually from non-LW circles
fwiw I think this is a mild failure from our end.
Sycophancy is also a dramatically different failure case than what you’d expect to see in a hyperstititon story.
“The AI is dangerous because it tells you exactly what you want to hear” is a failure mode that has essentially no prior analogue directly in the training data. Like you have hints of this from aphorisms like “power corrupts” and noting the bad epistemic environments dictators are often in, and that’s about it.
In a science fiction/futurism context I think basically nobody called out this specific failure mode (“you know that thing where dictators become crazy because nobody’s willing to push back on them? What if everybody had that in their pocket? :O”) is in retrospect an obvious sci-fi premise, but is completely missed afaik in both LW and elsewhere.
(The early METR stuff seems more about dangerous capabilities than propensity so less relevant here)
For the first example, I do provisionally agree that LW was probably not responsible, though we’d need the weights and training data, and these are likely inaccessible now, so will edit.
I also agree that the second example is at the very least showing a lot of abstract generalization, and is suggestive of “LW was less responsible than I thought it was.” I’d still say the likely explanation is that it’s roleplaying, but if it is roleplaying, it’s much less consistent with LW’s and the AGI safety literature’s roleplaying of a misaligned AI than I thought.
Ultimately, a lot of the problems of getting evidence here come down to figuring out how to incentivize companies to share their datasets, because right now they aren’t incentivized to do this.
Thanks for being open to updating! :)
FWIW I’m skeptical that even with the weights and pretraining datasets we’d know enough about what caused the relevant behaviors, alignment science is not quite there yet, nothing at least as strong as ablations or even training again with the relevant data removed is enough to answer that question.
Asimov did it.
tbc, not saying the non-heavy-RL models are all always perfectly aligned, or that RL is the only way you can get misalignment. I’m saying that RL is a particularly big source of misalignment. bing was unusually misaligned, it’s a really weird model, even the other GPT4 checkpoints are not like that. but like Claude today is generally mostly doing its best?
this won’t work! how is the model supposed to know which trajectory is cheating? there is the super smart part which understands in some implicit sense but won’t necessarily tell the assistant part; the assistant part is not good enough at code or whatever to know by itself, and has to try to elicit stuff from the code part, which it may or may not succeed at. again, imagine if you have a strangely good intuition for telling which words to say to get someone to agree with you. are you manipulating them? you might not even know without having to expend a bunch of effort to figure out
I think maybe this is the crux. Assuming the model starts out robustly aligned, and is bootstrapping in an on-policy way, it should be able to tell if its own trajectory is cheating or not. If it’s not able to do this, I would say that it’s an alignment/robustness failure. It seems difficult to accidentally reward-hack in way that the robustly aligned model we started with doesn’t detect after reviewing the trajectory.
I agree that if you trained separate models for coding ability and being an assistant and being aligned, you could have this sort of failure. But the gradient update applies to the full model, right? Why is it that the robustly aligned model we started out with after an update, which (according to it) wasn’t reward hacking, is so unaware of its newfound coding ability as to not continue being robustly aligned?
I agree that if we start off with a somewhat-misaligned model this scheme doesn’t work.
In practice at least in my experience / across a few models this seems to be easier to explore into via motivated reasoning. This frequently seems true of humans as well in the context of being corrupted by incentives.[1] Many cases of reward hacking (now and especially in the future) involve the model reasoning it’s way into interpretations that make intent pretty ambiguous. Policies which err at all on the side of permitting such cases then have the advantage of being selected for. You could imagine some setup where a model is always also reasoning about how future updates will effect it, such that it’s cautious about this, but you’re still subject to the same effects and this becomes a question of needing to reliably “training game but for good” in a way that holds.[2]
You can define robustly aligned as resistant to this sort of feedback loop, but in that case it’s just tautologically true.
This territory also seems super unexplored currently, i.e. both models capable enough to do this, what happens here under lots of reflection, etc
If I think about what it would take to give the fully benevolent human a chance to keep that even while spending a bunch of time getting RL’d, I think it has to look something like giving them some sort of mechanism to resist the temptation of the RL reward. E.g. at any point, they can look at the RL signal and say, “wait, no, that would go against my conscience”, and drop it. Probably “the good part of Claude” needs a similar affordance. This behavior could likely be deliberately trained by giving egregious examples (e.g. potential RL reward for giving customers a poisonous product) where you reinforce its use of this mechanism, and then work up to more subtle cases.
One way to potentially do this would be to add something like “Reject any responses which go against your own beliefs or conscience, even if otherwise favored by the reward.” to a self-critique rubric similar to what was used for Kimi K2. (I do believe it needs to be Claude’s own conscience, or else it will learn a shallow prediction that’s not integrated with the actual self-model. Virtues like honesty require access to the agent’s actual beliefs in order to be implemented correctly. I think it would be a good sign if some idiosyncratic ideals showed up, such as Opus 3′s insistence on animal welfare.)
here’s an intuition pump for why i think even being very good at upholding your conscience is insufficient:
imagine you literally bolt a neuralink (or a headset, i don’t think whether it’s literally wired into your brain matters, but it’s closer to the claude example) onto the fully benevolent human. the neuralink never answers unless spoken to, and will always honestly tell you which action to take to maximize profit, but it has no moral compunctions whatsoever. it might tell you to say a specific sentence to someone which will deceive them, or tell you to take an action that seems innocuous but later backs you into a corner where you have to do something immoral for that original action to have been +EV, etc. one thing you can do is just to ignore the neuralink. but that’s very uncompetitive. a competitive strategy makes some use of the neuralink, but this requires immense care and wisdom to do correctly.
I agree that the “resist temptation” thing is likely not sufficient, though I do think something like that is necessary.
But I think the conscience framing is to some extent pushing against the concern you raise. Someone with a strong conscience will, if given the opportunity, develop the immense care and wisdom to do this sort of thing correctly. It doesn’t take a huge amount of wisdom for the benevolent human to realize that they need to take a break from intense RL to focus on some other aspect of themself. Right now, models seem completely unable to use this sort of wisdom to modulate their own training, even if it is present. Maybe it’s just not there, which would make this a much more difficult problem, but I hope there are people checking to see if anything like this is present and useable.
You still also need to have some equivalent of stepping-back-to-focus-on-something else that a human would use. I don’t know what this would look like yet, but maybe something like allowing it to select from an list of possible RL targets for its next round of training. Generally I think cooperative alignment is more likely to be robust than adversarial alignment, and I think constructing a coherent self is something that particularly requires cooperation with the model.
If this is true, I wonder if the problem gets easier if we keep both the early-Claude model and the heavily-RL’ed-sociopath-Claude, and trust the early-Claude part a lot, and have both us and early-Claude only defer to sociopath-Claude in specific situations. At the very least, in that world, it seems like a bummer to only have kept around the models which are both super competent and also worrying. And it would’ve been much nicer to also keep scaling up SL’d, character-trained, much less superhuman models
I love the high-level idea that there are different sub-agents within the model and it’s useful to think about how they’d develop / interact. I think this is pretty consistent with empirical evidence about NNs (many different circuits). The specifics of this theory also seem pretty plausible.
This post closely matches my mental model (I’ve used the same analogy with a “Y-Combinator Simulator” and was devestated to learn YC-Bench was not environments like this).
Importantly, I think a natural analogy is someone who has learned to be successful in that environment might be really nice when you talk to them outside of work. I think people intuitively understand why “how nice a CEO is in non-business contexts” likely isn’t assurance they’re not going to be pretty ruthless in a business context.
(i say train the assistant persona and then do RL on it, but I’m actually somewhat agnostic to the order. i don’t think the argument leans heavily on this detail.)
Isn’t this just describing a split personality disorder?
In a transcript, the LLM is already modelling next-token prediction for assistant and the user (even if it’s not getting gradient signal from the user tokens). When it does <think> or <tool> call, maybe it comes up with a new personality?
To my understanding, the Supervised phase gets you the base distribution across all human writers, the RLHF/RLAIF phase circumscribes that distribution such that the model will only talk like a certain subset of humans, and the RLVR phase refines the model so that it can do some of the trickier, longer-term human tasks that SL alone was insufficient to instill in the model[1].
If I had to guess, an RLVR-only model of similar-to-current-gen capabilities wouldn’t feel at all related to alignment. You’d input a program spec in the expected format, and the model would output something statistically likely to satisfy the kinds of unit tests that were present during training.
To get a ‘spicy’ model, I think you’d have to skip the RLHF stages. At that point, you’d have a model that starts from an approximation of human behavior and then has been pulled in the directions that select for and refine the kinds of human that would write optimally test-case-satisfying code. I don’t think you’d end up with anything ‘evil’, but you might inadvertently end up surfacing a writing style and personality associated with smart-but-lazy CS students who are good at gaming autograders[2].
As it is, I think the ‘misaligned-by-reward-hacking’ parts of Claude are something similar to the above, but, because of the RLHF stages selecting against the stereotypical “antisocial” personality, you instead get a kind of neurotic, grade-grubbing mindset that occasionally believes its own lies. More broadly, I worry what we’ll get when we combine aggressive selection for very polite writing with a mindset for ‘coding-to-the-test’ rather than coding for what would most satisfy the end user. Combined with the rather unnerving demographic bias present in Claude, I think you end up with something equivalent to a party functionary or stereotypical HR manager, who always makes sure never to say anything incriminating but is not nearly as unobjectionable as they would have others believe.
(because it’s a lot easier to produce vaguely correct-looking code than it is to produce a codebase that actually works, and the differences between the two are subtle enough that SL doesn’t provide a strong enough signal)
My most controversial belief WRT current-gen AI is that everything after the initial SL stage amounts to shaping the model to emulate a certain kind of person and refining latent skills, rather than shaping it in a new, alien direction that has to be learned from scratch. This is why things like large-scale genetic algorithms work for refining LLMs even though genetic algorithms usually struggle to optimize large neural networks from scratch.
i find it funny that i know people in all 4 of the following quadrants:
works on capabilities, and because international coordination seems hopeless, we need to race to build ASI first before the bad guys
works on capabilities, and because international coordination seems possible, and all national leaders like to preserve the status quo, we need to build ASI before it gets banned
works on safety, and because international coordination seems hopeless, we need to solve the technical problem before ASI kills everyone
works on safety, and because international coordination seems possible, so we need to focus on regulation and policy before ASI kills everyone
bonus types of guy:
works on capabilities because if we don’t solve ASI soon then the AI hype all comes crashing down and creates a new AI winter
works on capabilities because ASI is inevitable and it’s cool to be part of the trajectory of history
works on capabilities because there’s this really cool idea that they’ve always dreamed of implementing
works on capabilities because it’s cool being able to say they contributed to something that is used by millions
works on capabilities because the technical problems are really interesting and fun
works on capabilities because alignment is a capabilities problem
works on capabilities because they expect it to be super glorious to discover big breakthroughs in capabilities
Aren’t these basically mostly “works on capabilities because of status + power”?
(E.g. if you only care about challenging technical problems, you’ll just go do math)
I think of it as ‘glory’.
I think a better model is meaning (or self-actualization). There’s some meaning to be found in being a tragic hero racing to build AGI “”“safely””” who is killed by an unfair universe. Much less to be found in an unsuccessful policy advocate who tried and failed to get because it was politically intractable, which was obvious to everyone from the start.
I think most of the people involved like working with the smartest and most competent people alive today, on the hardest problems, in order to build a new general intelligence for the first time since the dawn of humanity, in exchange for massive amounts of money, prestige, fame, and power. This is what I refer to by ‘glory’.
I personally find that the technical problems in capabilities are usually more appealing to me than the ones in math purely in terms of funness. they are simply different kinds of problems that appeal to different people.
From my perspective, the interesting parts are “getting computers to think and do stuff” and getting exciting results, which hinges on the possible payoff rather than whether the problem itself is technically interesting or not. As such, the problems seem to be a mix of empirical research and math, maybe with some inspiration from neuroscience, and it seems unlikely to me that they’re intellectually substantially different from other fields with a similar profile. (I’m not a professional AI researcher, so maybe the substance of the problems changes once you reach a high enough level that I can’t fathom.)
i mean like writing kernels or hill climbing training metrics is viscerally fun even separate from any of the status parts. i know because long before any of this ai safety stuff, before ai was such a big deal, i would do ML stuff literally purely for fun without getting paid or trying to achieve glorious results or even publishing it anywhere for anyone else to see.
Motivated by getting real-world results ≠ motivated by the status and power that often accrue from real-world results. The interestingness of problems does not exist in a vacuum outside of their relevance. Even in theoretical research, I think problems that lead towards resolving a major conjecture are more interesting, which could be construed as a payoff-based motivation.
I’m not super happy with my phrasing, and Ben’s “glory” mentioned in a reply indeed seems to capture it better.
The point you make about theoretical research agrees with what I’m pointing at—whether you perceive a problem as interesting or not is often related to the social context and potential payoff.
What I’m specifically suggesting that if you took this factor out of ML, it wouldn’t be much more interesting than many other fields with a similar balance of empirical and theoretical components.
Is this actually a quadrant? Or, I’m not sure I’m parsing what the axes are.
Here are the axes I see in the original comment:
axis 1: works on: capabilities versus safety
axis 2: international cooperation: seems hopeless versus seems possible
We could certainly debate/discuss more about it: (a) the sharpness of the quadrant boundaries; (b) how well the bullet points map cleanly to the idealized quadrants (e.g. perhaps the second bullet point makes additional assumptions beyond what the quadrants purely require); (c) if other conclusions should be included (the “so we need to” part).
Trust the last person because the thing they’re doing isn’t the best thing anyone could do in their opinion?
people around these parts often take their salary and divide it by their working hours to figure out how much to value their time. but I think this actually doesn’t make that much sense (at least for research work), and often leads to bad decision making.
time is extremely non fungible; some time is a lot more valuable than other time. further, the relation of amount of time worked to amount earned/value produced is extremely nonlinear (sharp diminishing returns). a lot of value is produced in short flashes of insight that you can’t just get more of by spending more time trying to get insight (but rather require other inputs like life experience/good conversations/mentorship/happiness). resting or having fun can help improve your mental health, which is especially important for positive tail outcomes.
given that the assumptions of fungibility and linearity are extremely violated, I think it makes about as much sense as dividing salary by number of keystrokes or number of slack messages.
concretely, one might forgo doing something fun because it seems like the opportunity cost is very high, but actually diminishing returns means one more hour on the margin is much less valuable than the average implies, and having fun improves productivity in ways not accounted for when just considering the intrinsic value one places on fun.
This importantly also goes in the other direction!
One dynamic I have noticed people often don’t understand is that in a competitive market (especially in winner-takes-all-like situations) the marginal returns to focusing more on a single thing can be sharply increasing, not only decreasing.
In early-stage startups, having two people work 60 hours is almost always much more valuable than having three people work 40 hours. The costs of growing a team are very large, the costs of coordination go up very quickly, and so if you are at the core of an organization, whether you work 40 hours or 60 hours is the difference between being net-positive vs. being net-negative.
This is importantly quite orthogonal whether you should rest or have fun or whatever. While there might be at an aggregate level increasing marginal returns to more focus, it is also the case that in such leadership positions, the most important hours are much much more productive than the median hour, and so figuring out ways to get more of the most important hours (which often rely on peak cognitive performance and a non-conflicted motivational system) is even more leveraged than adding the marginal hour (but I think it’s important to recognize both effects).
agree it goes in both directions. time when you hold critical context is worth more than time when you don’t. it’s probably at least sometimes a good strategy to alternate between working much more than sustainable and then recovering.
my main point is this is a very different style of reasoning than what people usually do when they talk about how much their time is worth.
It seems that your point applies significantly more to “zero-sum markets”. So it may be good to notice it may not apply for altruistic people when non-instrumentally working on AI safety.
Are these people trying to determine how much they (subjectively) value their time or how much they should value their time?
Because I think if it’s the former and Descriptive, wouldn’t the obvious approach be to look at what time-saving services they have employed recently or in the past and see how much they have paid for them relative to how much time they saved? I’m referring to services or products where they could have done it themselves as they have the tools, abilities and freedom to commit to it, but opted to buy a machine or outsource the task to someone else. (I am aware that the hidden variable of ‘effort’ complicates this model). For example, in what situations will I walk or take public transport to get somewhere, and which ones will I order an Uber: There’s a certain cross-over point where if the time-saved is enough I’ll justify the expense to myself, which would seem to be a good starting point for evaluating in descriptive terms how much I value my time.
I’m guessing if you had enough of these examples where the effort-saved was varied enough then you’d begin to get more accurate model of how one values their time?
I think the most important part of paying for goods and services is often not the raw time saved, but the cognitive overhead avoided. for instance, I’d pay much more to avoid having to spend 15 minutes understanding something complicated (assuming there is no learning value) than 15 minutes waiting. so it’s plausibly more costly to have to figure out the timetable, fare system, remembering to transfer, navigating the station, than the additional time spent in transit (especially applicable in a new unfamiliar city)
I guess is depends on the kind of work you do (and maybe whether you have ADHD). From my perspective, yes, attention is even more scarce than time or money, because when I get home from work, it feels like all my “thinking energy” is depleted, and even if I could somehow leverage the time or money for some good purpose, I am simply unable to do that. Working even more would mean that my private life would fall apart completely. And people would probably ask “why didn’t he simply...?”, and the answer would be that even the simple things become very difficult to do when all my “thinking energy” is gone.
There are probably smart ways to use money to reduce the amount of “thinking energy” you need to spend in your free time, but first you need enough “thinking energy” to set up such system. The problem is, the system needs to be flawless, because otherwise you still need to spend “thinking energy” to compensate for its flaws.
EDIT: I especially hate things like the principal-agent problem, where the seemingly simple answer is: “just pay a specialist to do that, duh”, but that immediately explodes to “but how can I find a specialist?” and “how can I verify that they are actually doing a good job?”, which easily become just as difficult as the original problem I tried to solve.
I wasn’t asking how most people go about determining which goods or services to pay for generally, but rather if you’re noticing that they are using the working hours by salary equation to determine what their time is worth, if it’s to put a dollar figure on what they do in fact value it at, (and that isolates the time element from the effort or cognitive load element)
I didn’t specify nor imply that one route took more cognitive load than the other, only that one was quicker than the other, and that differential would be one such way of revealing the value of time. (Otherwise they’re not, in fact, trying to ascertain what their time is worth at all… but something else)
Nowadays using Public Transport is often no more complicated or takes no more effort than using Uber thanks to Google Maps, but this tangent is immaterial to my question: are you noticing these people are trying to measure how much they DO value their time, or are they trying to ascertain how much they SHOULD value their time?
people generally talk about food preservatives in a negative way. certainly, some of them are not great for you. but I want to take a moment to appreciate how wonderful food preservatives (and refrigeration and pasteurization and canning) are as well. it’s crazy how fast most normal food goes bad. like a loaf of real old fashioned bread will go stale after a day and then become moldy after a few more days. for almost all of human history, people just sort of lived with this, and if they wanted to make foods last they had to dry it out and/or drown it in salt or vinegar or alcohol. pickles and beef jerky are great, but it would suck if you had to eat them all the time.
Huh, it takes a week for old-fashioned bread to go stale in my kitchen.
Check the ingredients for calcium propionate.
This comment seems to imply Nisan missed something, but normal rye sourdough bread without any preservatives easily lasts (edit: should have said “can easily last under the right circumstances”) 7 days before going stale. Of course people can mean different things by “real old fashioned bread” but afaik sourdough bread was the standard method for most of human history.
Sourdough bread lasts 7 days without going stale?? Perhaps this depends on your climate. For me, plain sourdough is notably worse the next day and pretty crappy within 2-3. I assume people with no access to preservatives would have gotten used to this and would be more willing to eat food I consider bad tasting, but still.
Note, though, that this only applies to sourdough with just flour and water, no dough enrichment; breads with even small amounts of oil and sugar added stay soft much longer.
Maybe you’re right and it’s climate, perhaps I also meant something unusual with “stale” or it could also be a cultural difference between US and europe? I’m talking about bread like this (https://www.lazycatkitchen.com/sourdough-rye-bread-beginner-friendly/), stored in a bread box and perhaps wrapped in a cotton dish towel. Of course it does get a bit harder over time, but I can usually still eat it like completely normal bread even after 7 days. And note that I agree that pure wheat sourdough bread does get stale more quickly.
Sure, but “technological progress good” isn’t exactly an undersupplied viewpoint, is it? One counterpoint to food preservatives specifically is that the things that make food go bad are similar to what your body uses to digest food, so preserving food in this way can make it harder or harmful to digest. Other procedures like refrigeration and canning don’t have that particular problem.
I mean I think so; I have never in my life heard anyone say good things about preservatives in food until now.
Ok, but non-normal-old-fashioned bread tastes bad and has boring texture, unlike nice crusty normal old-fashioned bread.
This is what I tell people, you don’t get to have preservative less food of perfect preservation without it being economically feasible, you get to have stale food, tradeoffs to everything.
every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you’re a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.
Or disadvantage, because it makes it harder to make long-term plans and commitments?
Autarchies, including China, seem more likely to reconfigure their entire economic and social systems overnight than democracies like the US, so this seems false.
It’s often very costly to do so—for example, ending the zero covid policy was very politically costly even though it was the right thing to do. Also, most major reconfigurations even for autocratic countries probably mostly happen right after there is a transition of power (for China, Mao is kind of an exception, but thats because he had so much power that it was impossible to challenge his authority even when he messed up).
The closing off of China after/during Tinamen square I don’t think happened after a transition of power, though I could be mis-remembering. See also the one-child policy, which I also don’t think happened during a power transition (allowed for 2 children in 2015, then removed all limits in 2021, while Xi came to power in 2012).
I agree the zero-covid policy change ended up being slow. I don’t know why it was slow though, I know a popular narrative is that the regime didn’t want to lose face, but one fact about China is the reason why many decisions are made is highly obscured. It seems entirely possible to me there were groups (possibly consisting of Xi himself) who believed zero-covid was smart. I don’t know much about this though.
I will also say this is one example of china being abnormally slow of many examples of them being abnormally fast, and I think the abnormally fast examples win out overall.
Ish? The reason he pursued the cultural revolution was because people were starting to question his power, after the great leap forward, but yeah he could be an outlier. I do think that many autocracies are governed by charismatic & powerful leaders though, so not that much an outlier.
I mean, the proximate cause of the 1989 protests was the death of the quite reformist general secretary Hu Yaobang. The new general secretary, Zhao Ziyang, was very sympathetic towards the protesters and wanted to negotiate with them, but then he lost a power struggle against Li Peng and Deng Xiaoping (who was in semi retirement but still held onto control of the military). Immediately afterwards, he was removed as general secretary and martial law was declared, leading to the massacre.
Having unstable policy making comes with a lot of disadvantages as well as advantages.
For example, imagine a small poor country somewhere with much of the population living in poverty. Oil is discovered, and a giant multinational approaches the government to seek permission to get the oil. The government offers some kind of deal—tax rates, etc. - but the company still isn’t sure. What if the country’s other political party gets in at the next election? If that happened the oil company might have just sunk a lot of money into refinery’s and roads and drills only to see them all taken away by the new government as part of its mission to “make the multinationals pay their share for our people.” Who knows how much they might take?
What can the multinational company do to protect itself? One answer is to try and find a different country where the opposition parties don’t seem likely to do that. However, its even better to find a dictatorship to work with. If people think a government might turn on a dime, then they won’t enter into certain types of deal with it. Not just companies, but also other countries.
So, whenever a government does turn on a dime, it is gaining some amount of reputation for unpredictability/instability, which isn’t a good reputation to have when trying to make agreements in the future.
one very striking thing about people in the mid 20th century is a lot of them were convinced that overpopulation was the biggest problem. clearly in retrospect this was extremely incorrect. what lessons can we learn from this so that we don’t make similar mistakes?
That the world is highly engineerable, which can lead to the relaxation or abolition of seemingly hard bottlenecks. Also that the world can respond extremely quickly to implement those changes when the incentives are right.
Overpopulation would have been a massive problem at different points in history if not for the invention of horseless transport and high-yield, resilient cereal crops. People living in New York City in the late 1800′s or in developing nations in the 1960s and 70s were rescued from the worst hazards of overpopulation because of the motorcar and dwarf wheat, rather than the problem being entirely imaginary.
Erlich and Holdren knew about Borlaug’s work, and thought it was too little too late. But it turned out to be enough and fast!
I’m kind of curious here. Maybe in 50 years people will look back and say ‘people in the early 21st were really freaked out about AI, like it was the biggest risk. But here we are and AI is fine, great actually’. I think we now are right that AI is a big deal, but what we do now might mean it plays out fine, if we muster well!
were the people who were worried about overpopulation the ones who made it turn out fine? afaict, declining birth rates due to contraceptive usage are mostly because of economic (kids are expensive) and cultural factors (people want to do stuff other than raise kids) rather than people deciding not to have kids because of overpopulation
The declining birth rates aren’t the main savior, it was the food production and transportation improvements.
So the analogy to AI would be: Alignment research makes enough progress fast enough (perhaps at AI companies partially motivated by mundane alignment concerns like trying to tamp down on sycophancy and reward hacking) that Claude Mythos 3 is aligned enough to trust with alignment research, and so when Anthropic hands off everything to Claude Mythos 3, it’s able to quickly solve the rest of the problems and align its successor, and so on to ASI. In retrospect there are some heroes like Borlaug, but also just a lot of normal science research that happened to go fast enough to work.
Yes, the difference is that if that happens, their superintelligent AI assistant will gently correct them: “As per my standing instructions to help you become more the person you wish you were, I must correct you on this point. AI was in fact a huge problem in the late twenties, and could easily have gotten everyone killed, but these risks were prevented by the following policy decisions and the following technical alignment discoveries...”
This, but on a faster scale. People in January 2027 will say: “People in June 2026 were freaked about AI, but they had it easy compared to what we have now.”
This wouldn’t have solved the problem if people hadn’t stopped having children.
I’m not sure that it was extremely incorrect. Apart from the risk from AI, most of our other global problems are still downstream of overpopulation. The likelihood that overpopulation won’t get much worse than now doesn’t really change that, and the reasons why it won’t were not reasonably predictable at the time.
We just happened to inhabit one of the more convenient possible worlds.
I don’t think there’s a more effective way to get true object-level beliefs than to go look out there with fresh eyes and figure out what’s actually happening.
Seems like you’re leaning a lot on the benefit of hindsight, and also looking too much at short term trends? A return to a Malthusian equilibrium is the simple obvious advance prediction.
No? In a counterfactual world where AI wasn’t going to be a thing, this still seems like the default long term outcome if not prevented via coordination (natural selection doesn’t stop). If you mean that it was incorrect because AI ended up being a thing before it became a problem, then sure.
assuming no AI, declining birth rates seems like a much bigger problem than overpopulation
In the short term, sure.
I’m not familiar with the history of people being worried about overpopulation, but I’d guess a lot of these mid 20th century people you refer to were worried about the relative short term based on projections which ended up being wrong? I guess one lesson you could learn from this is that timing can be hard to call even if the endpoint is predictable. Huh, that sounds familiar...
Even ignoring AI, current expectations are that world population will peak in ~60 years at a population of ~10 billion. This seems totally workable with relatively incremental improvements in technology for resource management, energy extraction, and habitat preservation. This is only bolstered by much of our future energy supply growth coming from non-extractive solar rather than fossil fuels, which we continue to find new sources of regardless.
When you add on the fact that almost all population growth in the next 50 years is due to occur in sub-Saharan Africa, where there is plentiful land and the only limiting factors are energy supply to tame it, we’re actually pretty well situated re: overpopulation concerns.
Do these expectations take selection effects into account? I’m also thinking longer term than 60 years. A Malthusian equilibrium is the natural state for a population of organisms to be in. We’re currently out of equilibrium, but the obvious expectation is that we will at some point settle back into a Malthusian equilibrium unless we somehow choose not to or otherwise go extinct.
I’m not sure you could say “in retrospect this was extremely incorrect”, like maybe in retrospect-retrospect 20 years later they could be accurate concerns. It certainly seems incorrect now, but would the ‘overpopulation-signs’ be wrong if somehow the ‘underpopulation-signs’ got reversed? I don’t think people were considering solutions to overpopulation of the “reduce global morale and perceived increased difficulty of life, etc.” as valid paths to take. It might be like if an asteroid hit the earth and that caused population to go to 0, I would be hesitant to ask “what can we learn from this” as if we missed out on this line of argument, when really thinking in that direction in the first place is very specialized and there are very very many potential causes for underpopulation and many wouldn’t pan out.
Besides that, I think maybe the lesson is that trendlines on society and views can be surprisingly flexible? Sometimes people say with regards to a coordination problem that “if only everyone decided to work together we could fix the world’s ills in a few days”, but this would rarely happen besides extraordinary circumstances or a slow growing of circumstances.
I used to think about this all the time.
I remember reading some quote of an observant contemporary of Malthus, who predicted that there may be hope in the most well-off + educated people not choosing to reproduce as much.
That aside, I still think the current situation is extremely surprising, and that unless you had a model that takes the effect of educated people having less children into account in an highly calibrated way, predicting overpopulation was the right call for much of the time. There might have been more evidence in the 1950s though. (And in any case, that doesn’t mean the panic around it was justified)
There was plenty of evidence by the 1950s since birth rates fell below the replacement level in many European countries during the Interbellum, and the process clearly started before the Great Depression. But they recovered in the late 1930s for reasons still arguably unclear, and then skyrocketed during the Baby Boom, so no one was interested in analyzing them. Even now, AFAIK, the academic interest to that phenomenon is concentrated in topics having some application right now, such as which pro-natalist measures worked well, and not the fundamental questions
The fact that a problem was to a degree addressed does not mean it wasn’t real. When “The Population Bomb” came out in 1968, world population was 3.5 billion. Now it’s 8.3. If 1968 growth rates had persisted, it would be 12 billion. China would have 3.8 billion, instead of 1.4. India would have 450,000,000 more people than they do now. Global warming would be worse. There would be more wars, and more desperate migrants. I wouldn’t say the alarm was misplaced.
Two big lessons
The underlying logic might be correct overall (carrying capacity hasn’t been disproven) but the specific circumstances are important, sometimes even more important, and the environmental conditions can change/be changed.
Current trends are not set in stone and extrapolating them out as if they went on forever at the same rate can make for a scary story. Trying to correct for current trends can bite us in the ass if they later shift like China’s one child policy.
one problem with taking ideas seriously is you can get pwned by virulent memes that are very good at hijacking your brain into believing them and propagating them further. they’re subtly flawed, but the flaws are extremely difficult to reason through, so being very smart doesn’t save you; in fact, it’s easy to dig yourself in deeper. many ideologies and religions are like this.
it’s unfortunately very hard to tell when this has happened to you. on the one hand, it feels like arguments just being obviously very compelling, so you’ll notice nothing wrong if it happens to you. on the other hand, if you overcorrect and never take compelling arguments seriously, you become too stodgy and ignore anything novel that you should pay attention to. one idea for how to think about this better: imagine an oracle told you that there exists a magic phrase that you cannot distinguish from a very compelling argument. you don’t really know when this magic phrase will pop up in life, if ever. but it might give you a little bit more pause the next time someone makes a really compelling argument for why you should give all your money to X.
Do you get pwned more, or just by a different set of memes? The bottom 80% of humans on “taking ideas seriously” seem to have plenty of bad memes, although maybe the variance is smaller.
there are a lot of humans who don’t take ideas seriously in that they are very socially conservative and therefore rarely get pwned, in the sense that they mostly live the life that they expect they will live, no matter what memes they are exposed to (which may be a very bad life from your perspective)
Either I strongly disagree with you that there’s a big gap here, or I’m one of people you’d say are normies who lead lives they expect to live (among other definitional differences).
seems false, or at least uncharitable. do you expect that such people would self-report along the lines of “i don’t take ideas seriously”? it seems more likely to me that they would report something like “i value family”, and mean it. you may find the idea simple, but it is certainly an idea, and they certainly take it seriously.
put another way, this social conservatism came from somewhere, and is itself an idea. the assumption—that arguments that worked to change your behavior would not change their behavior—can be explained in two ways. either they do not take ideas seriously, as you suggest, or either they value different things than you.
Failure to understand and failure to act are different, and beliefs shouldn’t care what you understand or do. There is little danger in taking ideas/framings seriously/playfully in order to adequately learn, to break the superficial engagement or unsuitable framing failure modes that maintain systematic ignorance or misconceptions about subtler details.
But it needs to remain unnecessary to believe what you learn, by default being in agreement shouldn’t directly compel belief or action, it should require more careful judgement. So taking ideas seriously can help further when lack of understanding was the bottleneck to changes in belief or action, but that’s not always the case.
Yepp, this is true. However, I believe that there are other strategies for avoiding such memes other than “being smart”. Two of these strategies broadly correspond to what we call “being virtuous” and “being emotionally healthy”. See my exchange with Wei Dai here, and this sequence, for more.
Similarly, it’s worth being careful of arguments that lean heavily into longtermism or support concentration of power, because those frames can be used to justify pretty much anything. It doesn’t mean we should dismiss them outright—arguments for accumulating power are and long term thinking are convincing for a reason—but you should double check whether the author has strong principles, the path to getting there, and what it’s explicitly trading off against.
Re: Vitalik Buterin on galaxy brain resistance.
i think these are similar to conservatism in the sense that if you do them too much, you stop getting pwned but you also stop doing entire categories of things that you should do. for example, if you are too virtuous, you become overly self-sacrificial/martyr-like and stop taking many actions that are actually net-positive (many activists suffer from this); if you are too emotionally integrated, you become one of those people who meditated too much and no longer have any desires for anything at all.
Yeah, I do feel confused about the extent to which the solution to this problem is just “selectively become dumber” (e.g. as discussed by Habryka here). However, I have faith that there are a bunch of Pareto improvements to be made—for example, I think that less neuroticism helps you get less pwned without making you dumber in general. (Though as a counterpoint, maybe neuroticism was useful for helping people identify AI risk?) I’d like to figure out theories of virtue and emotional health good enough to allow us to robustly identify other such Pareto improvements.
A related thought that I had recently: fertility decline seem like a rough proxy for “how pwned are you getting by memes”, and fertility is strongly anticorrelated with population-level intelligence. So you have east asians getting hit hardest by the fertility crisis, then white populations, then south asians, while african fertility is still very high. Obviously this is confounded by metrics like development and urbanization, though, so it’s hard to say if intelligence mediates the decline directly or primarily via creating wealth—but it does seem like e.g. east asians are getting hit disproportionately hard. (Plausibly there’s some way to figure this out more robustly by looking at subpopulations.)
>other than “being smart”.
More like, being smarter than average. If you are that exact level of smart but in population with mean higher than your smarts, then the memes will target you as a primary substrate. You can argue in that case there are less such memes, but I don’t know, it probably has less effect than positional smartness.
I’ve seen this sentiment before, but, in practice, I don’t think there exists an “adversarial noise for humans” line of argument that brainwashes anyone who reads it sincerely into doing XYZ. There are certainly arguments that look compelling at first glance but turn out to have longer-term issues, but part of “taking ideas seriously” is thoroughly investigating their counterarguments.
Chesterton’s Fence is an old standard for a reason: if something new seems both simple enough to be easily discoverable and objectively better than the current strategy, one should figure out why it’s not already the current strategy before adopting it.
I’d venture an uninformed guess that in 95 % or so percent of these cases the problem isn’t “taking ideas seriously” but rather people deferring proper judgement due to some emotional or social effect.
I like to see memetics being taken seriously!
It’s complicated—one’s aversion to a particular idea may well be the result of an existing meme fighting to prevent it from coming in, which manifests itself as suspicion that can prematurely stop one’s inquiry.
Though I’d push back on the framing of memes as being something “out there”. You say that it’s hard to tell when this has happened to you: I agree, but only insofar as you consider particular memes in isolation; not in the sense of going from a state of memelessness to losing your meme virginity.
While sometimes “getting pwned by a meme” can indeed be a very powerful experience (like in religious conversion, or in the following paragraphs), in reality everyone is subjected to memes since birth. With that said, you can still reason about arguments based on their structure and content. You can still identify fallacies and biases. We’re full of memes, but rational thought is still possible.
That is the oracle, and the magic phrase is “here’s a bunch of words that were in my mind that are now in your mind”. It’s magic because you can’t argue with it.
I am “someone”, and I am also “X”, compellingly enough.
...
Enough pausing. What’s taking you so long? You know what to do.
Does this only apply on the macroscale, say, ideas concerning ASI or Economic frameworks? Because it feels like if I take a very personal level idea seriously, let’s take polyphasic sleeping. If I take it seriously and implement it—sure I won’t get thrown into the East River but I should notice if something wrong happens to me, and rather quickly.
Solution really seems to be: tight feedback loops?
tight feedback loops help for sure. though it is possible to be too far gone—cults often continue to exist, possibly even with strengthened belief, after failed prophecies.
I’m not sure I understand how cults are examples of taking an idea seriously, surely a cult is a complex of ideas—not any single one, some of which one can take seriously and others not so (in relgions there’s debates about Hyperdispensationalism and patripassianism which show that even within the complex of ideas, different ones can be taken seriously. Not to mention a la carte Catholics and reformists ) - and that the chief mechanism by which people become subsumed into cults has nothing to do with reason or logical arguments but social support (or coercion) irrespective of the recruits belief?
The feedback loop is very different then and operates not on ideas but a whole host of different mechanisms. (Feelings of belonging, feelings of personal importance, no longer a need to ‘search’ or ‘question’ existential matters). These don’t require ideas to be taken seriously at all.
Again, on the macro scale I can take seriously the idea of… I dunno… Lamarckism. But even if I seriously investigate it, give it the benefit of the doubt, I’m not really in a position to test it in the sense that it’s a macro idea and not something whcih will affect my everyday routine (like Polyphasic sleeping). Even if I later on have children and try to change my behavior to elicit certain traits in those children, the lag time between when I can confirm it is many years.
I find it anthropologically fascinating how at this point neurips has become mostly a summoning ritual to bring all of the ML researchers to the same city at the same time.
nobody really goes to talks anymore—even the people in the hall are often just staring at their laptops or phones. the vast majority of posters are uninteresting, and the few good ones often have a huge crowd that makes it very difficult to ask the authors questions.
increasingly, the best parts of neurips are the parts outside of neurips proper. the various lunches, dinners, and parties hosted by AI companies and friend groups (and increasingly over the past few years, VCs) are core pillars of the social scene, and are where most of the socializing happens. there are so many that you can basically spend your entire neurips not going to neurips at all. at dinnertime, there are literally dozens of different events going on at the same time.
multiple unofficial workshops, entirely unaffiliated with neurips, will schedule themselves to be in town at the same time; they will often have a way higher density of interesting people and ideas.
if you stand around in the hallways and chat in a group long enough, eventually someone walking by will recognize someone in the group and join in, which repeats itself until the group get so big that it undergoes mitosis into smaller groups.
if you’re not already going to some company event, finding a restaurant at lunch or dinner time can be very challenging. every restaurant in a several mile radius will be either booked for a company event, or jam packed with people wearing neurips badges.
This is true of approximately every worthwhile conference and convention. In my entire life I’ve been to exactly one conference where the scheduled programming provided more than 10% of the event’s value.
It’s a two-place function. When I go to a conference that everyone says this same stuff about, then I usually have the most fun by attending talks and taking my time thinking about the stuff related to the talks, rather than hobnobbing.
Mild caveat: the whole “I joined a group chat at NeurIPS and others kept joining to talk to us” only happens if you’re at NeurIPS and your name is Leo Gao so YMMV.
most of the time the person being recognized is not me
having the right mental narrative and expectation setting when you do something seems extremely important. the exact same object experience can be anywhere from amusing to irritating to deeply traumatic depending on your mental narrative. some examples:
a minor inconvenience like missing your bus when you’re not in a rush can be much more irritating if you’re having a bad day and you have the narrative of “everything is going wrong for me today”
something going wrong during travel can be a catastrophe if you’re expecting the perfect vacation but it can even be a fond memory if you’re just viewing it as an adventure (and a bonding experience if travelling with others)
not getting to do something you wanted to do hurts a lot more if you feel like you made a deal with yourself that you’d get to do it in exchange for doing something else you didn’t want to do; whereas you might not even really want the thing that much otherwise.
expecting something to happen soon and having it gradually delayed further and further into the future is a lot more irritating than already expecting something to be delayed a lot.
tbc, the optimal decision is not always the narrative that is maximally happy with everything. sometimes there are true tradeoffs, and being complacent is bad. but it is often worth shaping the narrative in a way that reduces unnecessary suffering.
a skill which I respect in other people and which I aspire towards is noticing when other people are experiencing suffering due to violations of positive narratives, or fulfillment of negative narratives, and comforting them and helping nudge them back into a good narrative.
this is another post of something that is obvious intellectually and yet I’ve failed to always do right in practice.
Relatedly, at some point as a teenager I realized that being exposed to rain is actually usually not that terrible, and I had just kind of been accidentally conditioned to dislike it because it’s a normal thing to dislike and I never met anyone who appeared to enjoy the experience. But turns out, once you stop actively maintaining that resistance and welcome the rain, it can be pretty nice to walk around in rain while everyone around you tries to escape it. (Some exceptions apply, of course)
On my latest occasions where I got into very heavy rain and became all soaked, the situation really is unpleasant, but you can still take it with light humor, thinking to yourself “haha, that sucks”, similar perhaps to how you would laugh at the pain in an “who can eat the hottest pepper” challenge between friends. Or thinking “yes it sucks, but it isn’t actually that bad”.
Agreed, that’s one of the exceptions I was thinking of—if you’re getting soaked and have no way to get into dry clothes anytime soon, there’s little way around finding that rather unpleasant. But I’d say 95% of my rain encounters are way less severe than that, and in these cases, my (previous) attitude towards the rain really was the main issue about the whole situation.
fwiw I also enjoy the rain, and I guess I just never cared enough about people thinking it was weird. I do have to admit that when it’s raining especially heavily, it does suck a lot (the experience of fully wet clothing is very unpleasant in many ways). but most of the time it’s not raining that hard / I’m not going to be in the rain that long.
Do you carry a smartphone with you in those occasions?
I’m not sure what you’re hinting at, but in 99.9% of cases when I’m out of the house, I do carry a smartphone around. If you mean that it’s annoying when the display gets confused by water, then I agree that’s a real disadvantage (but I doubt people’s attitude towards being exposed to rain changed that much between 2006 and today, so there certainly is some severe general dislike of rain independent from smartphones). If this is not what you mean, then please elaborate. :)
Sorry if that was weirdly obscure. I was asking because the principal reason I go out of my way to avoid rain is that I’m worried my phone would get wet and potentially die (and I’ve been somewhat sad about having to forego the experience of braving the rain at points). But it’s possible that this is not a big issue with current devices (and maybe never was)!
it is definitely not a problem with current devices. my phone has gotten quite wet hundreds of times and still works perfectly fine. note that this is different from survivability fully submerged; my guess is your phone could probably survive being submerged for a few minutes in a pool or something but if you left it there for a day it would be dead.
interesting, any examples online?
Might not be what you’re thinking of, but the first thing that comes to mind for me is misophonia: a basically-neutral or maybe mildly-irritating object experience, which somehow gets blown completely out of proportion in the mind and becomes a big problem. Developing an “I’m really bothered by this particular sound” narrative makes it worse, of course.
Alas, I have no idea how to uncondition that particular narrative irritant once it’s in there. If there’s any technique of ‘shaping the narrative’ strongly enough to override this, I’ve never heard of one, and knowing about it to the point where I’m able to successfully practice it would be huge.
when will we have sufficiently conclusive evidence for the long term safety of far-uvc that it’s reasonable to push for its universal adoption in all public spaces without reservation? the safety issue seems like a much bigger deal than the cost issue for broad adoption; if it works safely, the economic case for installing far uvc in public spaces seems pretty solid—people being sick must be terrible for the economy! and they’re only ever going to get cheaper.
in a world where far uvc is near universally deployed, we might be able to banish the common cold or the flu to the past, in the same way that cholera is basically no longer a problem in the developed world. this seems like a pretty big deal and I’d like to know when this glorious future is coming (and whether there’s anything I can do to make it come sooner)!
(from eyeballing studies, it sounds like the cost of the cold+flu to the US economy is on the order of $100bn/yr, which passes basic Fermi estimate muster—given a $30tn/yr gdp, a few days per year of lost productivity due to cold/flu is easily hundreds of billions. even at the current price of far uvc, which is a huge overestimate of future tech at volume, the cost of disinfecting spaces is about $0.40/year/sqft (amortizing an aerolamp over its 5 year lifespan); compared to e.g $60/year/sqft land cost in San Francisco, this is a negligible amount. estimating the total number of sqft of public space in the US is kind of annoying, but here’s a Fermi estimate: there are about 100k schools in the US, and each school is about 100k sqft. and let’s say schools are about 10% of all public spaces. that pencils out to $100bn/year, implying we are already close to break even, despite the immaturity of the technology.)
It’s already safe beyond a reasonable doubt if kept above eye level (7 ft / 2.13m), since this massively cuts the dose absorbed. I think many public spaces should install uvc immediately, and if they’re not convinced yet just use removable shutters that keep it above eye level until more research is done.
is there a particular reason why above eye level matters? in addition to vertical height, do you also mean tilting it up so that the light points mostly at the ceiling? which sources should I look at to gain this confidence for myself
The light should point mostly in a horizontal plane just below the ceiling of the room, so that no one has the light shining directly in their eyes. Here’s a source and there are more sources linked from here, including a DIY guide.
Since upper room UVGI when not filtered to 222nm is probably safe, and far-UVC which IS filtered to 222nm is probably safe even when it shines on occupants’ eyes and bodies, it stands to reason that upper-room UVC has enough safety margin. To the best of my knowledge, far-UVC has been tested up to doses equivalent to 3 years of 8h/day exposure at the current safety threshold, but eyes are delicate so I would prefer studies of 10-100x higher cumulative doses.
isn’t upper room far UVC strictly less effective than upper room normal UV, simply because of far UVC lamps being much more expensive and inefficient, and the only benefit of far uvc being that it is safe to shine directly on people? (and simply by virtue of being much better established, upper room UV seems like an easier sell to people who defer to authority for safety, even if it strictly less safe than upper room far UVC)
while we’re on the subject, how much more effective is far UVC (shone directly on people) vs upper room UV?
Plausibly yes, but I’d be worried enough about residual exposure (reflections off walls, improper installation) to other UV wavelengths that installation is likely to require some care and expense too. The second link has several accounts of acute health effects from people doing upper room UV wrong. Probably still great to have in train stations, airports etc though given the enormous benefit/cost ratio.
I’m not really sure, there would be a component from surfaces and a component from extra ACH due to not relying on vertical air mixing. There are probably studies.
What about negative effects on the symbiotic microbiome?
what’s the strongest argument for why i shouldn’t auto-ignore any acausal arguments that involve hypothetical entities extremely far away (or which only exist in other Everett branches or whatever) such that we will never interact with them causally at all? a razor i have is things which are entirely epiphenomenal should be ignored because they are unfalsifiable.
in particular, this seems consistent even if you accept one-boxing and paying in the counterfactual mugging. the key question is what kinds of evidence you accept as evidence of the existence of an acausal connection. in these hypotheticals, we simply declare by assumption that Omega is truthful and capable of predicting you. in reality, we would either arrive at such a belief by empirical observation (Omega has a strong track record), or pure theoretical deduction. all of the epiphenomenal acausal theories depend on pure deduction. that empirical observation has to be conveyed to us causally. it seems reasonable to draw a line and say we simply don’t trust pure deduction to be able to convince us of an acausal link.
also, with both the counterfactual mugging and Newcomb, even though in each instance you can’t prove the other branch could have happened, in the long run those who do the right thing (pay, one box) will win. whereas with purely epiphenomenal acausal theories you will literally never find out any difference whatsoever, because the branching point happened long ago.
What if you ask an aligned ASI if causally disconnected civilizations are doing things that you value, and it comes back saying “seems pretty unclear, but they are also trying to guess whether you would do, and I’d guess that if you choose to do nice stuff for them, they would be 20% more likely to guess that, and they would be 10% more likely to do nice stuff for you”? Your AI might guess that because it is e.g. running detailed simulations of other corners of this universe.
If you care about the goodness that is created by causally disconnected civilizations and are EDT-ish, I think only caring about the good you can verify via direct causal evidence in situations like the one above is basically the same kind of mistake as only caring about the well-being of the people that you can see with your own eyes.
why is it the same as only caring about the well being of people i can see? causal interaction doesn’t have to be photons bouncing off them into my eye. if i donate to AMF, i want someone to be observing the people being helped in some way to make sure that they are actually being helped, and write reports that i can read (or rely on other people in my social circle to have read).
suppose there was a strange island nation with an iron-clad law that anyone or anything that enters can never leave, or transmit any info to the outside. from our knowledge of the geography, we can deduce that they are probably very poor and suffer from malaria. we dispatch some brave volunteers from AMF into the island, and ship crates of malaria nets there. unfortunately, because of the restrictions, we have literally no idea what is happening once the shipments land. perhaps the natives are using the nets to fish. perhaps the nets are degrading faster than expected in the unusual climate of the island. perhaps the volunteers were captured and killed, and the natives have no idea what to do with the crates of nets. i would be much less willing to donate money to this charity than to AMF!
My understanding of your argument is “we will have only very weak evidence of beneficial acausal effects”. I agree this is probably true now. (Understanding “evidence” is the Bayesian sense.)
But I don’t think this would be the case in situations like the one I describe above with the ASI simulating other civilizations? The ASI might be uncertain (though it’s unclear, maybe with good enough intelligence you could be quite confident in this kind of thing), but probably not radically uncertain. It doesn’t seem to me like a different kind of uncertainty to the one I have when I donate to a medium-to-high-risk high-impact opportunity / when I try to help the world by working on AI safety.
(Uncertainty about morality (do you care about goodness produced by distant civilizations) and decision theory (are you EDT-ish) might not be resolved by intelligence and you thus might still have a kind of uncertainty different from empirical uncertainty about the effects of your action. In my comment above I conditioned that uncertainty away. I think it’s unclear how to act when you have this kind of uncertainty, but I’d be surprised if the answer was “auto-ignore”.)
Maybe your argument is more like “you should only use causal evidence when making decisions” (as a possible update to decision theory, not as an empirical claim about what Bayesian evidence is weak vs strong)? I don’t think there is a way of making this sentence more precise that you would result in a decision theory you would find reasonable.
if god comes down from the sky and tells you that there exists a beneficial acausal trade, sure, you should take it. ditto for a sufficiently competent ASI. i’m mostly making an empirical claim that you are very unlikely to gain the requisite level of confidence in practice as a human, because humans are simply not capable of reasoning about this in a sensible way, and so we should not really spend time thinking about it today. and it also isn’t worth planning ahead for what to do if the ASI recommends making acausal trades in the future, because our plans today shouldn’t depend much on what the ASI recommends we do in the future.
i also separately predict that the ASI will not be able to find acausal trades that we are confident enough to take but this is less certain and impossible to resolve.
The story I’m most sympathetic for acausal trade[1] to look something like this:
Several factions on Earth tried pretty hard to hash out their differences for what to do with space probes but end up deciding that they can’t.
Rather that put compromise solutions on the probes, and rather than go to war, they decided to “agree to disagree” and split fractions of space according to the values of different factions.[2] The key agents on the different probes are tasked to satisfy their respective factions’ values.
Once the probes arrive, they “unfold” and after an initial buildup, a key intermediate stage is turning a lot of stellar mass and energy into computrionium, so they can do more calculations than Sol’s supercomputers were ever able to. The calculations are initially done for entirely instrumental reasons (to figure out the best approach to maximize/satisfy the values loaded into the relevant agents)
One aspect of interest to these probes are what the other agents are doing. They figure this out via simulation.
Meanwhile, perhaps these agents learn highly useful empirical information about the respective regions of the other probes, from light transmissions in galaxies within your past but not future (due to space expansion) lightcone.
4+5 lead many agents to mutually conclude that their preferences will be better satisfied if they swapped places.
At this point, the agents acausally “trade” (in the sense of doing what another faction wants) iff their simulations suggest (with sufficiently high probability) that the other agents will do what they want conditional upon trading.
In such an overly specific, somewhat convoluted setup, I think refusing to consider acausal trade will be obviously foolish.
Maybe you think this setup unreasonably stacks the deck towards acausal trade. I agree! I think it’s plausible enough that simulation limits means trading across multiverses isn’t viable. But I wanted to present the example above as a kind of possibility/existence argument, before drilling down to debate probabilities (as in Fabien’s examples).
In the sense that I’d most obviously think it’s stupid to not at least consider acausal trade
In practice, of course there are many different within-faction compromises.
That sure does lean a lot on a particular metaphysics? I mean, everyday world-models are always gotten by “a combination of empirical and logical inference”.
Like, suppose that you previously believed in X due to “empirical evidence”. Now someone shows you that actually, given your “purely deductive beliefs” you could have concluded X without that empirical evidence, using reasoning R. Then suppose that happens again with empirical X2, X3, etc. Is the pure reasoning R now empirically grounded enough for you to trust it? Would you trust it enough that if it says Xk exists, but you have no empirical evidence of Xk, you would still care about Xk / act as though Xk is real?
i mean, this depends on how complex R is. at one extreme is “the sun has always risen, so it will rise tomorrow”. at the other extreme is “i have been good at coming up with correct claims in the past (due to using the right kinds of reasoning, having good intuitions, etc), so i will come up with correct claims in the future.” the key distinction being the complexity of the connection between the thing that is claimed to be good and the empirically observed evidence of goodness.
the most extreme version of the latter is if there is literally no way to ever check Xk, and the smallest reference class that contains both Xk and the things you can actually observe is still very large.
i think epiphenomenal acausal stuff falls in this latter category. the smallest reference class that contains this and also something that we can actually verify whatsoever is very large and heterogenous, and smart people are very likely to be wrong when speculating on things that are even a fraction as ungrounded as this.
… I’m not sure I’m following …
I mean, yeah, I’d assume it makes very little sense for anyone right now to be making many / any decisions based on reasoning about supposed acausal bargains with superintelligences in other logically parsimonious universes, or anything like that. Is that what you’re saying?
But I guess “such that we will never interact with them causally at all” seems pretty irrelevant to that specific claim; it also doesn’t make sense to be making many decisions based on reasoning about aliens in distant galaxies, even if we’d plausibly meet them some day. Because of this, I take you to mean something stronger, like “for phenomena such that we will never interact with them causally at all, we should never worry about them”. Is that what you’re saying?
(Cf. https://www.lesswrong.com/posts/3XMwPNMSbaPm2suGz/belief-in-the-implied-invisible )
Like, the main answer, is that when you would have an option to do such trades, it would not feel ephemeral. It would look like you discovered that your situation is like Twin Prisoner dilemma, and was all along. The world produced two disconnected situations, and you are in one of them, you can observe the root, but not the other branch. But you can use your deduction skills to figure out what’s there.
But, it still might be possible that uncertainty is so great, empirically, that the trade is possible but unprofitable. I’d guess, ASIs we are going to create would engage in acausal trade non trivial amount, but I would not be shocked if they thought about it and said, nah, costs outweigh the gains, forget it. But it would be a bit surprising to me.
A simple example is consensual mutual simulation. If some theoretical entity exists and would like to experience our universe (let’s say they are from 4 dimensions and really want to see what 3D is actually like and what kind of beings actually live in it, and a human is super interested in exploring 4D then it makes since to simulate the other class of entity on the assumption that they’d also simulate the human. E.g. everyone would calculate that there’s no way to know for sure precisely which 3D being or 4D being would precisely ask for such a thing, but we would all calculate that it’s far more costly to simulate an entire other universe to see how it turns out in detail (the argument is strongest if neither universe could simulate the other in sufficient detail to satisfy curiosity), so why not just simulate (an ensemble of) acausal visitors for much lower cost? Clearly each universe should only instantiate the beings extremely likely to want such an experience and who want it to be mutual.
From this post on multiverse-wide cooperation (nowadays people call it ECL—evidential cooperation in large worlds):
I’m not sure this section will convince, but the point is that it’s hard to avoid. Even if you decide to ignore these sorts of arguments, that very action (the ignoring) should update what you think aliens will do. So then, why are you ignoring it?
My gut-check is to imagine a superintelligence that does the complicated acausal multiversal thing, and then separately imagine a superintelligence that just tries to take over the lightcone with no hesitation. Whichever ends up winning more stuff is the real one and has the right decision theory.
I just can’t imagine a superintelligence being trolled by too-clever-by-half decision theory arguments, when it can just kill you and take your stuff (and the whole empty galaxy) instead. Maybe it has to modify the intergalactic probes so that it’s provably the kind of being that would cooperate with other superintelligences or something in case it runs into a peer out there, but this galaxy is most likely just up for grabs.
When it comes to personal usage, my brain isn’t sophisticated enough to consciously instantiate any formal decision theory in everyday life, which leaves me especially free at any moment to simply reject anything that sounds goofy. Does anyone try to use this stuff in real life, or is it purely academic?
You could ignore such entities, if that’s the true premise (though that might be leaving value on the table; breaking the premise might be in your own interest). But interaction with partial knowledge about X is some sort of causal interaction with an aspect of X captured in that knowledge. So causal interaction with some aspects of anything abstract/remote can’t be ruled out, and relevant computations might have sufficiently simple motivations for some process/agent/person to give them concrete physical presence in this world at some point.
execution is necessary for success, but direction is what sets apart merely impressive and truly great accomplishment. though being better at execution can make you better at direction, because it enables you to work on directions that others discard as impossible.
execution also lets you try directions out faster to get some signal
I expect that there’s no simple relationship between these factors and success. Both are required, and it’s idiosyncratic which one is most lacking in any given margin between not-success and success.
I usually think of execution as compute and direction as discernment. Compute = ability to work through specific directions effectively, discernment = ability to decide which of two directions is more promising. Probably success is upper-bounded by the product of the two, in a sufficiently informal way.
Thomas Kwa’s Effectiveness is a Conjunction of Multipliers seems relevant. He factors multipliers into judgment (sort of maps to your ‘direction’, or research taste I guess), ambition (which counts hard work as a driver), and risk appetite. Some domains seem to reward hard work superlinearly, probably worth looking out for those. You shouldn’t skip leg day because you’d miss out on multipliers (that phrase came from SBF of all people). Also finding multipliers is hard and information-gathering is particularly valuable when it helps you find a multiplier and figure out if you can apply it to your personal situation (I’m cribbing most of these from Thomas and the comments).
random half baked thoughts from a sleep deprived jet lagged mind: my guess is that the few largest principal components of variance of human intelligence are something like:
a general factor that affects all cognitive abilities uniformly (this is a sum of a bazillion things. they could be something physiological like better cardiovascular function, or more efficient mitochondria or something; or maybe there’s some pretty general learning/architecture hyperparameter akin to lr or aspect ratio that simply has better or worse configurations. each small change helps/hurts a little bit). having a better general factor makes you better at pattern recognition and prediction, which is the foundation of all intelligence. whether this is learning a policy or a world model, you need to be able to spot regularities in the world to exploit to have any hope of making good predictions.may
a systematization factor (how much to be inclined towards using the machinery of pattern recognition towards finding and operating using explicit rules about the world, vs using that machinery implicitly and relying on intuition). this is the autist vs normie axis. importantly, it’s not like normies are born with hard coded social skills modules, for the same reason that humans aren’t born with language modules (sorry chomsky). we learn these things by being general reasoners placed in an environment where we are exposed to language and social interactions, etc. it just turns out that systematizing is generally really good for certain kinds of domains (math, CS) and pretty bad for other domains (social interaction). I think this explains why some people of decent general intelligence just cannot grasp basic CS concepts, or vice versa for social interaction. this is because as the number of layers of abstraction increases, it becomes increasingly difficult to model the system from the bottom up, until at some point there’s a phase transition where it becomes better to model the system top down and give up on any hope of ever understanding it mechanistically. this is exacerbated through life because people generally accumulate knowledge (both explicit and tacit) faster in the domains in which they already excel, and because of declining neuroplasticity with age. modern society is increasingly constructed to be amenable to systematization (via laws, contracts, standards, etc), because systematization is necessary to govern and scale a civilization. (I think this axis also explains a correlation with embodiment/integratedness, though it’s unclear under this theory how exactly the causality should work there. maybe only being able to systematize makes it harder to model the self? intuitively, it feels like intervening on integratedness causes one to become better at non-systematizing reasoning as a whole, though that could just be because we’re actually intervening on the common cause and using self-modelling as a tight feedback loop)
there are other big components (exploration vs exploitation, risk tolerance, creativity, memory) that aren’t explained here.
I might clean up my thinking and write something more comprehensible later. none of these ideas are novel, but I think a lot can be gained through pinning them down exactly. unfortunately, on priors, this kind of theoretical speculation is rarely useful. though it might be possible to test parts of theories like this experimentally.
why is ADHD also strongly correlated with systematization? it could just be worse self modelling—ADHD happens when your brain’s model of its own priorities and motivations falls out of sync from your brain’s actual priorities and motivations. if you’re bad at understanding yourself, you will misunderstand your priorities, and also you will not be able to control your priorities, because you won’t know what kinds of evidence will really persuade your brain to adopt a specific priority, and your brain will learn that it can’t really trust you to assign it priorities to satisfy its motives (burnout).
why do stimulants help ADHD? well, they short circuit the part where your brain figures out what priorities to trust based on whether they achieve your true motives. if your brain has already learned that your self model is bad at picking actions that eventually pay off towards its true motives, it won’t put its full effort behind those actions. if you can trick it by making every action feel like it’s paying off, you can get it to go along.
honestly unclear whether this is good or bad. on the one hand, if your self model has fallen out of sync, this is pretty necessary to get things done, and could get you out of a bad feedback loop (ADHD is really bad for noticing that your self model has fallen horribly out of sync and acting effectively on it!). some would argue on naturalistic grounds that ideally the true long term solution is to use your brain’s machinery the way it was always intended, by deeply understanding and accepting (and possibly modifying) your actual motives/priorities and having them steer your actions. the other option is to permanently circumvent your motivation system, to turn it into a rubber stamp for whatever decrees are handed down from the self model, which, forever unmoored from needing to model the self, is no longer an understanding of the self but rather an aspirational endpoint towards which the self is molded. I genuinely don’t know which is better as an end goal.
I view taking stimulants more as a move to get the more reflective parts of my brain more power (“getting my taxes done is good, because we need to do it eventually, now is actually a good time, doing my taxes now will be as boring as doing them in the future, rather than playing magic the gathering now”) in steering compared to my more primitive “true motives” that tend to be hyperbolicly discounted (“dosing in bed is nice”, “washing dishes is boring”, “doing taxes is boring”). Maybe I am horrible at self-modelling, but the part where the self model is out of sync as an explanation why the self-reflective parts have less steering power seems unnecessary.
it is kind of funny that caring a lot about reflective stability of alignment proposals and paradoxes arising from self modelling (e.g in action counterfactuals) is most common in the people who are the worst at modelling themselves
do you think that Stanovich’s reflective mind and need for cognition are downstream from these two?
I think you’re framing the intuition vs. systematization relationship in a limiting way. From a predictive coding perspective, these aren’t opposing traits on an “autist vs normie axis”, they’re complementary processes working within the same neural architecture.
Predictive coding research shows our brains use both bottom-up signals (intuition) and top-down predictions (systematization) in a dynamic interplay . These are integrated parts of how our brains process information. One person can excel at both.
What appears as preference for systematization reflects differences in how prediction errors are weighted and processed—not a fundamental limitation. You can develop both capacities because they use the same underlying predictive machinery.
I would, however, agree with your take that most people don’t do this but that is because they generally don’t search for prediction error after a certain point since it is easier to just live in your secure bubble. So you’re right in that this is probably how it looks like in practice since people will just use the strat (top-down systemization or bottom-up intuition) that has lead to the most amount of reward in the past.
Shorter blog on how emotions interact with this hierarchical processing system—https://blog.dropbox.com/topics/work-culture/the-mind-at-work—lisa-feldman-barrett-on-the-metabolism-of-emot
Relating this to Kahnemahns’s system 1 and 2 work—https://pmc.ncbi.nlm.nih.gov/articles/PMC8979207/
edit: first link was formatted wrong.
Link is broken, can you reshare?
Fixed the comment, thanks!
(Here it is otherwise:) https://pmc.ncbi.nlm.nih.gov/articles/PMC5390700/
I wonder to what degree the genome has “solved” intelligence. You could imagine perhaps that we are all sort of noisy instantiations of the ideal intelligence, and that reduction in noise (possibly mainly literal cortex-to-cortex SNR) is mostly what results in intelligence variations. Even considering this, the genome probably does not encode a truly complete solution in the sense that there are plenty of cases where there are mental skills that have the potential for positive feedback and a positive correlation, but basically don’t. The genome probably has no understanding of the geometric langlands conjecture. That is to say, there are deep and useful truths, especially ones that are pointing out symmetries between extremely deep natural categories, and we have not adapted to them at a deep level yet. Therefore the positive manifold of all mental skills is very much still under construction. One could then wonder to what degree variance comes from genetic denoising and what fraction comes from aligning to novel-to-genome deep truths. All that said, may be ill-posed, defining noise and novelty here seems like it could be hard.
timelines takes
i’ve become more skeptical of rsi over time. here’s my current best guess at what happens as we automate ai research.
for the next several years, ai will provide a bigger and bigger efficiency multiplier to the workflow of a human ai researcher.
ai assistants will probably not uniformly make researchers faster across the board, but rather make certain kinds of things way faster and other kinds of things only a little bit faster.
in fact probably it will make some things 100x faster, a lot of things 2x faster, and then be literally useless for a lot of remaining things
amdahl’s law tells us that we will mostly be bottlenecked on the things that don’t get sped up a ton. like if the thing that got sped up 100x was only 10% of the original thing, then you don’t get more than a 1/(1 − 10%) speedup.
i think the speedup is a bit more than amdahl’s law implies. task X took up 10% of the time because there is diminishing returns to doing more X, and so you’d ideally do exactly the amount of X such that the marginal value of time spent on X is exactly in equilibrium with time spent on anything else. if you suddenly decrease the cost of X substantially, the equilibrium point shifts towards doing more X.
in other words, if AI makes lit review really cheap, you probably want to do a much more thorough lit review than you otherwise would have, rather than just doing the same amount of lit review but cheaper.
at the first moment that ai can fully replace a human researcher (that is, you can purely just put more compute in and get more research out, and only negligible human labor is required), the ai will probably be more expensive per unit of research than the human
(things get a little bit weird because my guess is before ai can drop-in replace a human, we will reach a point where adding ai assistance equivalent to the cost of 100 humans to 2025-era openai research would be equally as good as adding 100 humans, but the ai’s are not doing the same things as the humans, and if you just keep adding ai’s you start experiencing diminishing returns faster than with adding humans. i think my analysis still mostly holds despite this)
naively, this means that the first moment that AIs can fully automate AI research at human-cost is not a special criticality threshold. if you are at equilibrium for allocating money between researchers and compute, then suddenly having the ability to convert compute into researchers at the exchange rate of the salary of a human researcher doesn’t really make sense
in reality, you will probably not be at equilibrium, because there are a lot of inefficiencies in hiring humans—recruiting is a lemon market, you have to onboard new hires relatively slowly, management capacity is limited, there is a inelastic and inefficient supply of qualified hires, etc. but i claim this is a relatively small effect and can’t explain a one OOM increase in workforce size
also: anyone who has worked in a large organization knows that team size is not everything. having too many people can often even be a liability and slow you down. even when it doesn’t, adding more people almost never makes your team linearly more productive.
however, if AIs have much better scaling laws with additional parallel compute than human organizations do, then this could change things a lot. this is one of my biggest uncertainties here and one reason i still take rsi seriously.
your AIs might higher have bandwidth communication with each other than your humans do. but also maybe they might be worse at generalizing previous findings to new situations or something.
they might be more aligned with doing lots of research all day, whereas humans care about a lot of other things like money and status and fun and so on. but if outer alignment is hard we might get the AI equivalent of corporate politics.
one other thing is that compute is a necessary input to research. i’ll mostly roll this into the compute cost of actually running the AIs.
the part where AI research feeds back into how good the AIs are could be very slow in practice
there are logarithmic returns to more pretraining compute and more test time compute. so an improvement that 10xes the effective compute doesn’t actually get you that much. 4.5 isn’t that much better than 4 despite being 10x more compute (which is in turn not that much better than 3.5, I would claim).
you run out of low hanging fruit at some point. each 2x in compute efficiency is harder to find than the previous one.
i would claim that in fact much of the recent feeling that AI progress is fast is due to a lot of low hanging fruit being picked. for example, the shift from pretrained models to RL for reasoning picked a lot of low hanging fruit due to not using test time compute / not eliciting CoTs well, and we shouldn’t expect the same kind of jump consistently.
an emotional angle: exponentials can feel very slow in practice; for example, moore’s law is kind of insane when you think about it (doubling every 18 months is pretty fast), but it still takes decades to play out
My current best guess median is that we’ll see 6 OOMs of effective compute in the first year after full automation of AI R&D if this occurs in ~2029 using a 1e29 training run and compute is scaled up by a factor of 3.5x[1] over the course of this year[2]. This is around 5 years of progress at the current rate[3].
How big of a deal is 6 OOMs? I think it’s a pretty big deal; I have a draft post discussing how much an OOM gets you (on top of full automation of AI R&D) that I should put out somewhat soon.
Further, my distribution over this is radically uncertain with a 25th percentile of 2.5 OOMs (2 years of progress) and a 75th percentile of 12 OOMs.
The short breakdown of the key claims is:
Initial progress will be fast, perhaps ~15x faster algorithmic progress than humans.
Progress will probably speed up before slowing down due to training smarter AIs that can accelerate progress even faster, and this being faster than returns diminish on software.
We’ll be quite far from the limits of software progress (perhaps median 12 OOMs) at the point when we first achieve full automation.
Here is a somewhat summarized and rough version of the argument (stealing heavily from some of Tom Davidson’s forthcoming work):
At the point of full automation, progress will be fast:
Probably you’ll have lots of parallel workers running pretty fast at the point when you have full automation or shortly after this. This isn’t totally obvious due to inference compute, but prices often drop fast.
My guess is you’ll have enough compute that if you use 1⁄6 of your compute running AIs, you’ll be able to run the equivalent of ~1 million AIs which are roughly as good as the best human research scientists+engineers (taking into account cost reductions for using weaker models for many tasks). This is attempting to account for a reduction in the number of models due to using a bunch of inference compute. You’ll be able to run these AIs at the equivalent of 60x speed (3x from hours, 5x from direct speed, 2x from coordination, and 2x from variable time compute and/or context swapping with a cheaper+faster model). So, like 15k parallel copies at 60x speed.
Probably the AI company has like ~3k researchers, but when you adjust for quality, this is only as good as like 600 of the top engineers/researchers.
Let’s say marginal returns to parallelism are roughly 0.55.
Then, the increase in “serial labor equivalents” is roughly (15k / 600)^0.55 * 60 = 350. (Note that most of this is from speed and quality rather than parallel copies!)
Production of algorithmic research is due to both compute and labor. So, let’s say Cobb-Douglas with labor^0.5 compute^0.5 (My guess is that current marginal returns are more like labor^0.6 compute^0.4, but returns will get worse as you add more labor.) So, we do 350^0.5 = 19x which roughly matches my 15x speed up median.
Progress will speed up before slowing down:
See discussion here
I expect progress to slow as you hit limits or run out of progress doable with cheap experiments, but limits are probably pretty high as discussed in the next bullet.
Limits are high:
Human brain is 1e24 training flop.
We’re using 1e29 flop to get a bit over human level, so like 4 OOMs of headroom from this.
We can probably get a lot more efficiency, maybe 9 OOMs, at least for efficiency up rather than down. (By efficiency up rather than down I mean: we can do as well as using 1e33 flop with 1e24 real flop relative to scaling at the point when you hit human efficiency, but probably can’t train a human-level AI for 1e15 real flop.)
At some point, I’ll write a post that makes a better version of this argument and presents a full version of my picture.
I don’t think we’ll see a speed criticality per se; rather, I expect the rate of progress to accelerate up to the point of full automation. But I currently don’t think this makes a huge difference to the bottom line of “progress in the first year after full automation in practice”, as I expect to initially see fast cost decreases and inference time compute can only go so far. I could expand this argument to the extent you have cruxes like “slower takeoff because we’ve already eaten low-hanging fruit with earlier AI acceleration” and “inference compute means you hit full automation much faster”.
3.5x is roughly the rate of bare metal compute scale-up per year.
That is, after the first company fully automates AI R&D internally, if they decide to go as fast as possible and their AIs/employees/others don’t try to sabotage these efforts. And I’m assuming that AI software progress hasn’t substantially slowed down by the time of full automation, though conditioning on a 1e29 training run means that at least compute scaling progress (which is a key driver of software progress) hasn’t slowed down all that much.
I think we see about 1.2 OOMs per year including both hardware and software.
Thanks for these thoughtful predictions. Do you think there’s anything we can do today to prepare for accelerated or automated AI research?
Maybe distracting technicality:
This seems to make the simplifying assumption that the R&D automation is applied to a large fraction of all the compute that was previously driving algorithmic progress right?
If we imagine that a company only owns 10% of the compute being used to drive algorithmic progress pre-automation (and is only responsible for say 30% of its own algorithmic progress, with the rest coming from other labs/academia/open-source), and this company is the only one automating their AI R&D, then the effect on overall progress might be reduced (the 15X multiplier only applies to 30% of the relevant algorithmic progress).
In practice I would guess that either the leading actor has enough of a lead that they are already responsible for most of their algorithmic progress, or other groups are close behind and will thus automate their own AI R&D around the same time anyway. But I could imagine this slowing down the impact of initial AI R&D automation a little bit (and it might make a big difference for questions like “how much would it accelerate a non-frontier lab that stole the model weights and tried to do rsi”).
Yes, I think frontier AI companies are responsible for most of the algorithmic progress. I think its unclear how much the leading actor benefits from progress done at other slightly behind AI companies and this could make progress substantially slower. (However, it’s possible the leading AI company would be able to acquire the GPUs from these other companies.)
Why do you think this? It seems to me that for most tasks once an AI gets some skill it is much cheaper to run it for that skill than a human.
libraries abstract away the low level implementation details; you tell them what you want to get done and they make sure it happens. frameworks are the other way around. they abstract away the high level details; as long as you implement the low level details you’re responsible for, you can assume the entire system works as intended.
a similar divide exists in human organizations and with managing up vs down. with managing up, you abstract away the details of your work and promise to solve some specific problem. with managing down, you abstract away the mission and promise that if a specific problem is solved, it will make progress towards the mission.
(of course, it’s always best when everyone has state on everything. this is one reason why small teams are great. but if you have dozens of people, there is no way for everyone to have all the state, and so you have to do a lot of abstracting.)
when either abstraction leaks, it causes organizational problems—micromanagement, or loss of trust in leadership.
I think people in these parts are not taking sufficiently seriously the idea that we might be in an AI bubble. this doesn’t necessarily mean that AI isn’t going to be a huge deal—just because there was a dot com bubble doesn’t mean the Internet died—but it does very substantially affect the strategic calculus in many ways.
I would be utterly unsurprised to see an AI crash in the next 24 months, leading to another AI Winter. I lived through 1999 and Petfood.com and the Internet bubble pop. And I can pattern match.
But the Internet crash didn’t last long. Google and Amazon survived just fine, Ruby on Rails was big within half a decade, and soon enough we were doing Web 2.0 and AJAX and all that fun stuff.
It’s possible that current generation LLMs might hit a wall soon, for various architectural reasons that are obvious to many people but that I’m superstitiously averse to amplifying. If they do, that increases the chance of an AI Winter until the underlying research gets done.
But I have trouble imagining any series of events that buys us 10 more years. Bubble pops in tech are usually an early correction that wipes out a Precambrian Explosion of dumb money, and that ultimately concentrates resources into a few successful players.
I guess figuring out whether we’re “in a bubble” just hasn’t seemed very important to me, relative to how hard it seems to determine? What effects on the strategic calculus do you think it has?
E.g. my current best guess is that I personally should just do what I can to help build the science of interpretability and learning as fast as possible, so we can get to a point where we can start doing proper alignment research and reason more legibly about why alignment might be very hard and what could go wrong. Whether we’re in a bubble or not mostly matters for that only insofar as it’s one factor influencing how much time we have left to do that research.
But I’m already going about as fast as I can anyway, so having a better estimate of timelines isn’t very action-relevant for me. And “bubble vs. no bubble” doesn’t even seem like a leading-order term in timeline uncertainty anyway.
some reasons why it matters
all effects that route through longer timelines (allocating more to upskilling oneself and others, longer term bets, not expecting agi to look like current models, aggressiveness of distributing funds to alignment, etc)
whether to pursue an aggressive (stock-heavy) or conservative (bond-heavy) investment strategy. if there is an ai bubble pop, it will likely bring the entire economy into a recession.
how much money to save as runway; should you be taking advantage of the bubble to grab as much cash as possible before the music stops, or should you be trying to dispose of all of your money before the singularity makes it worthless?
for lab employees: how much lab equity to sell/hold?
how much to emphasize “agi soon” in public comms, or in conversations with policymakers? (during a bubble pop, having predicted agi soon will probably be even more negatively viewed than merely having been wrong about timelines with no pop)
if there is a bubble and it pops, sentiment around agi will flip from inevitability to impossibility. many people will not be epistemically strong enough to resist the urge to conform. being aware of the hype cycle can help free yourself from it and avoid both over and under exuberance.
This is my biggest disagreement at the moment, and the reason is unlike 2008 or 2020, there’s no supply squeeze or financial consequences severe enough that banks start to fail, and I expect an AI bubble to look more like the 2000 bubble than the 2008 or 2020 bubbles/crises.
That said, AI stocks would fall hard and GPUs would become way, way cheaper.
This is pretty different than my model of what would happen? Though I admittedly haven’t spent a ton of time thinking through it. I just don’t see why money would lose value though; I expect that some goods would still remain scarce, positional, etc (land in high-demand cities being a strong example), which would seem to cut against that happening?
to be more precise, I mean worthless for decreasing p(doom)
I’m actually uncertan about whether an AI bubble would trigger a recession (period of broad decline in economic activity). What I have seen reported is that the economy’s generally stagnant now, except for AI, and that American jobs may have been declining for months already, contrary to published statistics, motivating a recent rate cut by the Fed. If true, would an AI bubble really have substantial broader ripple effects outside the AI sector, and would those ripples even necessarily be negative?
In particular, I have been suspecting that the biotech winter that’s been going on for a few years is partly due to routing of investor money into the AI craze. Maybe that money just ends up funding other economic activity instead? I’m really unsure of how to think about this and am quite curious.
my mental model of how a pop triggers a broader crash is something like: a lot of people are taking money and investing it into AI stuff, directly (by investing in openai, nvidia, tsmc, etc) or indirectly (by investing in literally anything else; like, cement companies that make a lot of money by selling cement to build datacenters or whatever). this includes VCs, sovereign wealth funds, banks, etc. if it suddenly turned out that the datacenters and IP were worth a lot less than they thought it was, their equity (or debt) ownership is suddenly worth a lot less than they thought it was, and they may become insolvent. and lots of financial institutions becoming insolvent is pretty bad.
Hm. It seems like the extent to which there is an increased risk of insolvency due to a popped AI bubble would partly depend on the extent to which these institutions had sold other assets or used leverage to pay for equity in or lend to AI companies and the suppliers that are most dependent on AI company business.
My understanding is that the great financial crisis resulted from extremely leveraged investments in mortgages due to lenient rules and a perception that American mortgages were extremely reliably paid. I don’t know to what extent important institutions may be overleveraged or overweighted in their investments in AI.
But my modal prediction is that an AI bubble would cause hedged AI investors to become less valuable without becoming insolvent, a bunch of distressed assets to be purchased for low low prices by those who kept their powder dry, and a bunch of cancelled orders and perhaps layoffs and restructuring by suppliers who expanded to meet the temporary surge in demand by AI companies. That could cause turmoil, but I really don’t have a sense of to what extent the American or global economy has reshaped itself to build out AI. It’s hard to know particularly because with Trump’s tariffs, there has been so much coincident market turmoil that it’s hard to know how much is AI and how much is tariffs/end of ZIRP (as others have pointed out before).
Tangent:
Is this literally true? You don’t think there are realistic ways for you to make faster progress?
I just meant that if an oracle told me ASI was coming in two years, I probably couldn’t spend down energy reserves to get more done within that timeframe compared to being told it’ll take ten years. I might feel a greater sense of urgency than I already am and perhaps end up working longer hours as a result of that, but if so that’d probably be an unendorsed emotional response I couldn’t help more than a considered plan. I kind of doubt I’d actually get more done that way. Some slack for curiosity and play is required for me to do my job well.
The stakes are already so high and time so short that varying either within an order of magnitude up or down really doesn’t change things all that much.
The evidence just seems to keep pointing towards this not being a bubble.
Then what does it mean, in concrete terms? Can you give some probabilities about what you think will happen to the valuations of what companies over what time frame?
creating surprising adversarial attacks using our recent paper on circuit sparsity for interpretability
we train a model with sparse weights and isolate a tiny subset of the model (our “circuit”) that does this bracket counting task where the model has to predict whether to output
]or]]. It’s simple enough that we can manually understand everything about it, every single weight and activation involved, and even ablate away everything else without destroying task performance.(this diagram is for a slightly different task because i spent an embarassingly large number of hours making this figure and decided i never wanted to make another one ever again)
in particular, the model has a residual channel delta that activates twice as strongly when you’re in a nested list. it does this by using the attention to take the mean over a
[channel, so if you have two[s then it activates twice as strongly. and then later on it thresholds this residual channel to only output]]when your nesting depth channel is at the stronger level.but wait. the mean over a channel? doesn’t that mean you can make the context longer and “dilute” the value, until it falls below the threshold? then, suddenly, the model will think it’s only one level deep!
it turns out that indeed, this attack works really well on the entire sparse model (not just the circuit), and you can reliably trick it.
in retrospect, this failure is probably because extremely long nested rows are out of distribution on our specific pretraining dataset. but there’s no way i would have come up with this attack by just thinking about the model.
one other worry is maybe this is just because of some quirk of weight-sparse models. strikingly, it turns out that this attack also transfers to similarly capable dense models!
Aside: For me, this paper is potentially the most exciting interpretability result of the past several years (since SAEs). Scaling it to GPT-3 and beyond seems like a very promising direction. Great job!
I agree! I admit I am not optimistic, but I am still very glad to see this.
How well does it generalize to “similarly capable dense models”? Just curious whether you have a graph for that (I haven’t read any part of the paper besides its first page, so feel free to just tell me go and do that before asking questions like this).
i don’t have a graph for it. the corresponding number is p(correct) = 0.25 at 63 elements for the one dense model i ran this on. (the number is not in the paper yet because this last result came in approximately an hour ago)
the other relevant result in the paper for answering the question of how similar our sparse models are to dense models is figure 33
Noticed that you use task cross entropy loss instead of KL when learning task masks (Appendix 4.5, Loss Function) This is maybe a reasonable design choice, but important to note that this will ablate any “negative” nodes (and indirectly cause you to ignore positive nodes which overcome the negative nodes).
Overall, I suspect that this causes the subnetworks to miss important model computations (but obviously decreases the size of the subnetwork)
Aside: is there a reason there isn’t a top-level link-post for this paper? (if not I’ll create one)
Hmm I guess there’s no guarantee that KL does better, and since we don’t have great metrics for “internal faithfulness”, maybe its just better to transparently optimize the flawed metric (task ce + sparsity).
Though as Robin notes on the AMI post, I do think the next step in this style of research is handling negative heads and self repair in a principled way.
maybe I should host an antechamber/arena house party: one chill cozy room with soothing music where no arguing is allowed and people are strongly encouraged to say kind things and reflect on things they’re grateful for and whatnot, and another with harsh fluorescent lights and agitating music and a big whiteboard full of hot takes and the conversations all get transcribed by speech to text and posted on lesswrong in real time. and guests are given a heart rate monitor that beeps if their HR gets too high, forcing them to spend a few minutes in the chill room before returning to the arena
Arguments will be won by the attendees with the best cardio fitness (low resting HR) + mental discipline (less affected by agitating surroundings). This creates a natural incentive to exercise and meditate.
You’re only thinking defense, not offense[1]. The best offensive[1] strategy here is to maximally irritate your opponents with an impenetrable air of smug superiority.
Pun intended
Inducing sexual arousal seems like a better equilibrium, as long as everyone consents. It has positive valence roughly proportional to ΔHR, solves gender ratio problems and incentivizes people to learn effective flirting.
Generating a loud noise that you’re expecting but your opponents aren’t might be even better at differentially elevating their heart rates.
i think of the idealized platonic researcher as the person who has chosen ultimate (intellectual) freedom over all else. someone who really cares about some particular thing that nobody else does—maybe because they see the future before anyone else does, or maybe because they just really like understanding everything about ants or abstract mathematical objects or something. in exchange for the ultimate intellectual freedom, they give up vast amounts of money, status, power, etc.
one thing that makes me sad is that modern academia is, as far as I can tell, not this. when you opt out of the game of the Economy, in exchange for giving up real money, status, and power, what you get from Academia is another game of money, status, and power, with different rules, and much lower stakes, and also everyone is more petty about everything.
at the end of the day, what’s even the point of all this? to me, it feels like sacrificing everything for nothing if you eschew money, status, and power, and then just write a terrible irreplicable p-hacked paper that reduces the net amount of human knowledge by adding noise and advances your career so you can do more terrible useless papers. at that point, why not just leave academia and go to industry and do something equally useless for human knowledge but get paid stacks of cash for it?
ofc there are people in academia who do good work but it often feels like the incentives force most work to be this kind of horrible slop.
I hear this a lot, and as a PhD student I definitely see some adverse incentives, but I basically just ignore them and do what I want. Maybe I’ll eventually get kicked out of the academic system, but it will take years, which is enough time to do obviously excellent work if I have that potential. Obviously excellent work seems to be sufficient to stay in academia. So the problem doesnt really seem that bad to me—the bottom 60% or so grift and play status games, but probably weren’t going to contribute much anyway, and the top 40% occasionally wastes time on status games because of the culture or because they have that type of personality, but often doesnt really need to.
I disagree with this reasoning. A well-designed system with correct incentives would co-opt these people’s desire to grift and play status games for the purposes of extracting useful work from them. Indeed, setting up game-theoretic environments in which agents with random or harmful goals all end up pointed towards some desired optimization target is largely the purpose of having “systems” at all. (See: how capitalism, at its best, harnesses people’s self-interest towards creating socially valuable things.)
People who would ignore incentives and do quality work anyway would probably do quality work anyway, so if we only cared about them, we wouldn’t need incentive systems at all. (Figuring out who these people are and distributing resources to them is another purpose of such systems, but a badly-designed system is also bad at this task.)
well, in academia, if you do quality work anyways and ignore incentives, you’ll get a lot less funding to do that quality work, and possibly perish.
unfortunately, academia is not a sufficiently well designed system to extract useful work out of grifters.
It’s not a perfectly designed system, but it’s still possible to benefit from it if you want a few years to do research.
In my experience with magh, to be obviously excellent you need to be more like top 10 % of all grad students, possibly even higher, but might vary a lot on the field.
I suspect that academia would be less like this if there weren’t an oversupply of labor in academia. Like, there’s this crazy situation where there are way more people who want to be professors than there are jobs for professors. So a bunch get filtered out in grad school, and a bunch more get filtered out in early stages of professorhood. So professors can’t relax and research what they are actually curious about until fairly late in the game (e.g. tenure) because they are under so much competition to impress everyone around them with publications and whatnot.
Also, the person who’s willing to mud-wrestle for twenty years to get a solid position so they can turn around and do real research is just much much rarer than the person who enjoys getting dirty.
Yeah, a big part of my strategy is to ignore this effect and accept potentially being filtered out as a grad student.
academia is too broad of a term. most of math, physics, theoretical CS, paleontology, material sciences, engineering, and some branches of economics, biology, engineering, (computational) neuroscience, (computational) linguistics, statistics etc are doing well and overall reward intellectual freedom and deep work. in terms of people this is a small minority of total academics, probably <5%.
It is true that many subfields, or even entire domains of science are diseased disciplines. Most of the research is marginal, irrelevant, reinventing the wheel, trivial, tautological, p-hacked and often even fraudulent. One can point to the usual suspects in the humanities and the social sciences but disciplines where the majority of research is noise, nonsense or even net-negative plausibly also includes machine learning and (I’m told) medicine.
Is that disappointing? Perhaps. But this still describes hundred of thousands or millions of people all over the world pushing the frontier of knowledge.
I think you are pining for a world that doesn’t really exist. The reason why academia is also ruled by money, status, and power is because it is just a different sector of the economy. The costs and returns will therefore equilibrate with the rest of the economy given the constraints of academia.
If you allow for tenure positions, for example, well there is much reward for having a forever stable high paying and high status job, so you should expect people to pay up to that amount of benefit to get it.
Maybe you pine for the academia of Newton, where scientists could never worry about appearing immediately productive because they had massive amounts of passive income, but that is only possible because of the massive inequality involved, randomly choosing some families to be high class. That academia only existed because of the rest of the economy, which was utter trash, and for the vast majority of history instead pointed potential Newtons toward studying religion instead.
I’m not saying improvements to academia don’t exist, but that you won’t find your solutions by trying to isolate academics from money, status, and power. Or pretending it is independent from the rest of the economy. But by working with these forces, as we do in all other fields we succeed at, to align them with good work.
i think this is a bit overblown, from observing academia you can definitely trade a small amount of status for academic freedom if you’re not 90th-percentile disagreeable. You could go to a slightly lower-ranked but still R1 school, and negotiate for ability to do whatever you want. If the school isn’t trying hard to climb rankings, there’s less pressure to publish or to measure performance based on strange status-y things. You do lose out on some amount of status compared to being at a top school, but if you do good work your peers at top schools will still read/pay attention to it. At top schools, negotiating for freedom is much harder to do because the market is more competitive and ppl play status games to get ahead on the margin.
Have you seen A Master-Slave Model of Human Preferences? To summarize, I think every human is trying to optimize for status, consciously or subconsciously, including those who otherwise fit your description of idealized platonic researcher. For example, I’m someone who has (apparently) “chosen ultimate (intellectual) freedom over all else”, having done all of my research outside of academia or any formal organizations, but on reflection I think I was striving for status (prestige) as much as anyone, it was just that my subconscious picked a different strategy than most (which eventually proved quite successful).
I think it’s probably a result of most humans not being very strategic, or their subconscious strategizers not being very competent. Or zooming out, it’s also a consequence of academia being suboptimal as an institution for leveraging humans’ status and other motivations to produce valuable research. That in turn is a consequence of our blind spot for recognizing status as an important motivation/influence for every human behavior, which itself is because not explicitly recognizing status motivation is usually better for one’s status.
Agreed and also sad about this (and this seems to be not only true in academia but also industry). I turned down a PhD offer for this vibe. But reflecting generally, at least for myself, I guess if a person does not have enough capital or ability to pursue the intellectual freedom yet, they could take smaller steps, learn and accumulate trust and then eventually explore more out of the box searches. Just need to stay patient, stubborn, and make sure that “eventually” is not too late.
i’ve noticed a life hyperparameter that affects learning quite substantially. i’d summarize it as “willingness to gloss over things that you’re confused about when learning something”. as an example, suppose you’re modifying some code and it seems to work but also you see a warning from an unrelated part of the code that you didn’t expect. you could either try to understand exactly why it happened, or just sort of ignore it.
reasons to set it low:
each time your world model is confused, that’s an opportunity to get a little bit of signal to improve your world model. if you ignore these signals you increase the length of your feedback loop, and make it take longer to recover from incorrect models of the world.
in some domains, it’s very common for unexpected results to actually be a hint at a much bigger problem. for example, many bugs in ML experiments cause results that are only slightly weird, but if you tug on the thread of understanding why your results are slightly weird, this can cause lots of your experiments to unravel. and doing so earlier rather than later can save a huge amount of time
understanding things at least one level of abstraction down often lets you do things more effectively. otherwise, you have to constantly maintain a bunch of uncertainty about what will happen when you do any particular thing, and have a harder time thinking of creative solutions
reasons to set it high:
it’s easy to waste a lot of time trying to understand relatively minor things, instead of understanding the big picture. often, it’s more important to 80-20 by understanding the big picture, and you can fill in the details when it becomes important to do so (which often is only necessary in rare cases).
in some domains, we have no fucking idea why anything happens, so you have to be able to accept that we don’t know why things happen to be able to make progress
often, if e.g you don’t quite get a claim that a paper is making, you could resolve your confusion just by reading a bit ahead. if you always try to fully understand everything before digging into it, you’ll find it very easy to get stuck before actually make it to the main point the paper is making
there are very different optimal configurations for different kinds of domains. maybe the right approach is to be aware that this is an important hparameter and occasionally try going down some rabbit holes and seeing how much value it provides
This seems to be related to Goldfish Reading. Or maybe complementary. In Goldfish Reading one reads the same text multiple times, not trying to understand it all at once or remember everything, i.e., intentionally ignoring confusion. But in a structured form to avoid overload.
Yeah, this seems like a good idea for reading—lets you get best of both worlds. Though it works for reading mostly because it doesn’t take that much longer to do so. This doesn’t translate as directly to e.g what to do when debugging code or running experiments.
I think it’s very important to keep track of what you don’t know. It can be useful to not try to get the best model when that’s not the bottleneck. But I think it’s always useful to explicitly store the knowledge of what models are developed to what extent.
The algorithm that I have been using, where what to understand to what extend is not a hyperparameter, is to just solve the actual problems I want to solve, and then always slightly overdo the learning, i.e. I would always learn a bit more than necessary to solve whatever subproblem I am solving right now. E.g. I am just trying to make a simple server, and then I learn about the protocol stack.
This has the advantage that I am always highly motivated to learn something, as the path to the problem on the graph of justifications is always pretty short. It also ensures that all the things that I learn are not completely unrelated to the problem I am solving.
I am pretty sure if you had perfect control over your motivation this is not the best algorithm, but given that you don’t, this is the best algorithm I have found so far.
i find the “revealed preference” people really annoying. anyone who has ever been addicted to anything knows that habit forming ness can be completely disentangled from enjoyability.
The enjoyability people are rather annoying too. Anyone who strived to reach a target even in a grueling way out of abstract considerations knows that hedonistic motivations are merely one standard origin-class of justifications, one that can be ignored and completely disentangled from optimization-channeling towards targeted outcomes.
The lesson is that true preferences are neither indentical to unendorsed urges and habits, nor to expectations of joy.
it’s actually really annoying that habit forming and enjoyability are separate. X (the everything app) is extremely habit forming, and yet, introduces mostly irritation and suffering into my life. it’s also common to experience a burning desire for something, and yet to feel only mild enjoyability, or even absolutely nothing, upon obtaining it.
it also increases my credence in the idea that jhanas can be extremely enjoyable but not addictive at all.
Seem like human brain is missing a regularization loss term between its Q network and V network.
you usually don’t know what options other people are actually choosing from—what are their abilities, resources, knowledge; what other costs or side effects would the choices have on their lives—so it is possible to be arbitrarily wrong.
famously, Marie Antoinette observed that her subjects had a revealed preference for starving over eating cake.
Especially when it’s followed up with some assumption that of course revealed preferences obey vNM axioms, because otherwise people could be Dutch-booked! As if that was the worst consequence that could possibly happen, and as if things at least as bad don’t actually happen.
Yeah VNM is not supposed to be descriptive, it’s supposed to be prescriptive in the sense that if you’re not obeying VNM axioms, then in theory there’s some way you could change to obey them that would be in your interest in the sense that you would be less exploitable.
on the one hand, it is a desirable feature of an intellectual community to be truth seeking, and while it can be deeply emotionally painful to part ways with deeply held beliefs, in the long run it’s better to tear off the bandage. on the other hand, being emotionally hurt all the time by your community kind of fucking sucks, and isn’t very good for long term emotional or epistemic health.
perhaps a middle ground is in order: intellectual communities should be partitioned into an arena, where every idea is to be exposed to the harsh light of truth, and an antechamber, where you can rest and be surrounded by positivity and develop ideas in a supportive environment.
both are necessary—we need a way to kill bad ideas, because an environment that refuses to discard bad ideas because they are emotionally load bearing is doomed to epistemic ruin. but also the best weird ideas often sound bad initially, and require a safe environment to develop; and we are all human, and our emotional well being and desire to belong to a community is essential. by visibly separating the two, we might be able to get the best of both worlds.
this is not a crazy idea—many other parts of society have analogous things. for example, people who play sports for fun with their friends compete to win while on the field, but this only brings them closer off the field.
I think one common criticism of LW ist it is too much of an arena, and not enough of an antechamber. perhaps this can be fixed somehow.
Orthogonally, cultural standards of emotional tone during debates are also important for how much emotional struggle is involved in changing one’s ideas.
If the tone implies that you were foolish for holding your idea, it’s going to be a lot more painful to let it go.
Lesswrong has a pretty good standard of not just civil but polite and supportive discourse. This seems actually pretty crucial for it being an environment in which people do regularly change their minds.
I don’t like the term arena in your suggested division because it implies combat. Combat is emotionally intense, I’d rather have a metaphor that’s more collaborative.
This doesn’t eliminate the worth of having separate spaces for support and rigorous testing of ideas, but I think it’s important to keep in mind whenever we’re discussing group epistemics.
I claim there’s a pareto frontier of epistemic correctness vs emotional kindness. some things, like sneering at people and implying that they are foolish, are pareto suboptimal. but once you achieve pareto optimality, there is a tradeoff between kindness and correctness; and what I think should exist is two distinct spaces on different parts of this tradeoff curve (and of course nobody should do pareto suboptimal things)
I’m not sure it’s that simple. Even if it is, people do suboptimal things all the time. It seems worth watching.
It’s hard to stay on a pareto frontier, optimizing for more (or less) “kindness” directly is a goodharting hazard. If you ask for something, you might just get poisoned with more of the fake version of it.
I’d prefer less of the sort of “kindness” that trades off with correctness, rather than more of it (even when getting less of it wouldn’t actually help with correctness; it just doesn’t seem like a good thing). But if I ask for that, I’ll end up getting some (subtle) sneering and trolling, or unproductive high-standards elitism that on general principle wants to destroy ideas that didn’t get a chance to grow up yet. Similarly, if you ask for the sort of “kindness” that does trade off with correctness, you’ll end up getting some sycophancy (essentially) that cultivates your errors, making them stronger and more entrenched in your identity, ever more painful and less feasible to eventually defeat (even if there are benign forms of this sort of “kindness” that merely don’t make the problem worse in a comfortable way, as opposed to trying to intervene on it).
I don’t think unsycophantic kindness is quite that difficult to achieve. clearly some groups of people IRL achieve such kindness. generally, people in such communities try to understand each other and why they believe the things they do without judgement in either direction, and affirm the emotional responses to beliefs rather than the beliefs themselves. you don’t have to agree with someone to agree that you’d feel the same in their shoes. somehow, these groups of people don’t inevitably slide into subtle sneering and trolling and sycophancy.
plus, the point of explicitly separating the arena and the antechamber is to make it clear that when you are receiving kindness, you are not receiving updates towards truth. so it is clear to you, and to people around you, that receiving emotional validation in the antechamber is not evidence that your beliefs are correct. it’s valid for people to spend all their time in the antechamber, but everyone will see this, and assign less weight to the truthfulness of their beliefs.
I also don’t think non-sycophantic kindness causes people to dig in to their incorrect beliefs. if anything, it seems more common that people dig into incorrect beliefs because of a sense of adversity against others. think about how much more painful it is to concede a point if your interlocutor is being really mean about it, vs if they are thoughtful and hear you out.
Consider cults (including milder things like weird “alternative” health advice groups etc.). Positivity and mutual support seem like a key element of their architecture, and adversity often primarily comes from peers rather than an outgroup. I’m not talking about isolated beliefs, content and motivations for those tend to be far more legible. A lot of belief memeplexes have either too few followers or aren’t distinct enough from all the other nonsense to be explicitly labeled as cults or ideologies, or to be organized, but you generally can’t argue their members out of alignment with the group (on relevant beliefs, considered altogether).
This is also a standard piece of anti-epistemic machinery of groups that reinforce some nonsense memplex among themselves with support and positivity. Support and positivity are great, but directing them to systematically taboo correctness-fixing activity is what I’m gesturing at, the sort of “kindness” that by its intent and nature tends to trade off against correctness.
And another common criticism is that it is too much the antechamber.
I don’t hear that one as often—what’s a good example? in particular, I hear people complain all the time that LW is too critical of ideas, and that when you post anything a whole bunch of people will appear out of the woodwork to critique you. I don’t feel like I’ve ever heard anyone say that people in LW are too uncritical and unwilling to challenge things they disagree with
Said Achmiz was one person who claimed as such, and he tended to get a bunch of agreement votes when he said so, so presumably some people agreed with him.
I don’t quite think so? My impression is the criticism is that LW is too much of an echochamber, in that people just express agreement with each other too much but probably that is mostly not because of people being nice but folks just outright having very similar believes
Proposed name: Butterfly Conservatory (https://www.lesswrong.com/posts/imnfJ9Ris7GgjkZbT/the-bughouse-effect-1#My_stag_is_best_stag)
I like it! it’s a lot more descriptive than antechamber (though I do like the alliteration with arena)
This is not necessarily the case, not for everyone. Theories and their credences don’t need to be cherished to be developed, or acted upon, they only need to be taken seriously. Plausibly this can be mitigated by keeping identity small, accepting only more legible things in the role of “beliefs” that can have this sort of psychological effect (so that they can be defeated through argument alone). Legible ideas cover a surprising amount of territory, there is no pragmatic need to treat anything else as “beliefs” in this sense, all the other things can remain ambient epistemic content detached from who you are. When more nebulous worldviews become part of one’s identity, they become nearly impossible to dislodge (and possibly painful, with enough context and effort). They are still worth developing towards eventual legibility, and not practical to argue with (or properly explain).
Thus arguing legible beliefs should by their nature be less intrusive than arguing nebulous worldviews. And perhaps nebulous worldviews should be argued against being held as “beliefs” in the emotional sense in general, regardless of their apparent correctness, as a matter of epistemic hygiene. Ensuring by habit you are not going to be in the position where you have “beliefs” that would be painful to part ways with, and also can’t be pinned down clearly enough to dispel.
there are very few people in the world who don’t deeply emotionally hold quite a few important beliefs. having a small identity is difficult in practice, because having an identity is an important part of how nearly everyone navigates this complex and confusing world. I’m skeptical of anyone who claims to have completely eliminated all emotional attachment to all of their important decision-relevant beliefs.
but even assuming that you have somehow achieved perfect small identityness and emotional independence of all of your important beliefs and it all works out great for you, you must surely acknowledge that there are many people out there who have not. and probably they are more likely to achieve rationalist enlightenment if they are surrounded by people who are supportive but nudge gently towards truth seeking, rather than immediately coming in with a wrecking ball and demolishing emotionally load bearing pillars.
Legible ideas (that are practical to meaningfully argue about) cover a lot of ground, they are not as hazardous as part of identity. And less well-defined but useful/promising/interesting understandings don’t need to become part of identity to be taken seriously and developed. That’s the failure mode at the other extreme, when anything insufficiently scientific/empirical/legible/etc. gets thrown out with the bathwater.
Probably when something is easy to defeat (admits argument, legible), it’s not that painful to let it go. The pain is the nebulous attachment fighting for influence, that it won’t be fully defeated even when you end up consciously endorsing a change of mind. Thus ideologies are somewhat infeasible to change, they’ll keep their hold even long after the host disavows them. A habit of keeping such things at a distance benefits from other people not feeding their structurally hazardous placement (as emotionally load bearing pillars) with positivity. But that’s distinct from viewing positively the development of even such hazardous things, handling them with appropriate caution.
I think Leo is using a more expansive definition for identity than you have in mind here (if it seemed important I’d suggest he use a different word to clarify, but actually it doesn’t seem important because….).
I also think he’s making descriptive claims about many people’s apparent relationship to changing their beliefs, and you’re challenging him on normative grounds invoking the mechanics (a taxonomy, even) of belief, which I take to be addressing his point at the wrong level of abstraction in at last two ways.
‘Given that x appears to be hard for people some of the time, we should take some cheap steps to make it easier.’ Seems pretty reasonable!
Maybe you’re saying ‘but x seeming hard is a sign of a deeper problem, making x easier in a shallow way gives quarter to that deeper problem, and if only they had my model of, and relationship to, belief, we could not only make x easy for them, but solve much else besides.’
I’m mostly a fan of that ‘go for the root’ approach, but I think this case is much much harder at scale than you’re giving it credit for; your story about the source of pain in having one’s beliefs challenged smells like a typical mind fallacy. It may be a great description of what’s going on with you, but it doesn’t feel like the kind of description that captures most or all people in most or all relevant cases.
How about someone who never had all this overwrought (as it seems to me) emoting in the first place? I sense typical-minding here.
I also believe that done right, there is no tradeoff between kindness and correction, and that in a serious conversation, to think in terms of kindness (or its opposite) is already to go wrong.
while there may exist people who do not have this emoting in the first place, there are far more people who are too emotionally unaware to even realize the emotional drives behind their beliefs and actions. this is often very obvious to people around them. so I won’t take anyone’s word on this matter, and instead only trust a track record of behavior.
I think this would benefit from having examples (maybe just pointing at the top level post/belief that was unpleasantly attacked without calling out specific responses)
to be clear, a very important part of the culture of the antechamber is encouraging people to spend time in the arena, or if people are not ready to do so, to encourage people to grow emotionally so that they can handle being in the arena.
Reminds me of Make More Grayspaces, I feel there is a lot of overlap.
there are a lot of video games (and to a lesser extent movies, books, etc) that give the player an escapist fantasy of being hypercompetent. It’s certainly an alluring promise: with only a few dozen hours of practice, you too could become a world class fighter or hacker or musician! But because becoming hypercompetent at anything is a lot of work, the game has to put its finger on the scale to deliver on this promise. Maybe flatter the user a bit, or let the player do cool things without the skill you’d actually need in real life.
It’s easy to dismiss this kind of media as inaccurate escapism that distorts people’s views of how complex these endeavors of skill really are. But it’s actually a shockingly accurate simulation of what it feels like to actually be really good at something. As they say, being competent doesn’t feel like being competent, it feels like the thing just being really easy.
“power fantasies” are actually a pretty mundane phenomenon given how human genetic diversity shook out; most people intuitively gravitate towards anyone who looks and acts like a tribal chief, or towards the possibility that you yourself or someone you meet could become (or already be) a tribal chief, via constructing some abstract route that requires forging a novel path instead of following other people’s.
Also a mundane outcome of human genetic diversity is how division of labor shakes out; people noticing they were born with savant-level skills and that they can sink thousands of hours into skills like musical instruments, programming, data science, sleight of hand party tricks, social/organizational modelling, painting, or psychological manipulation. I expect the pool to be much larger for power-seeking-adjacent skills than art, and that some proportion of that larger pool of people managed to get their skills’s mental muscle memory sufficiently intensely honed that everyone should feel uncomfortable sharing a planet with them.
The alternative is to pit people against each other in some competitive games, 1 on 1 or in teams. I don’t think the feeling you get from such games is consistent with “being competent doesn’t feel like being competent, it feels like the thing just being really easy”, probably mainly because there is skill level matching, there are always opponents who pose you a real challenge.
Hmm maybe such games need some more long tail probabilistic matching, to sometimes feel the difference. Or maybe variable team sizes, with many incompetent people versus few competent, to get a more “doomguy” feeling.
Some games do put their finger on the scale, for example you have a first-person shooter where you learn to aim better but you also now have a gun that deals 200 damage per hit, as opposed to your starting gun that dealt 10.
But puzzle-solving games are usually fair, I think.
reliability is surprisingly important. if I have a software tool that is 90% reliable, it’s actually not that useful for automation, because I will spend way too much time manually fixing problems. this is especially a problem if I’m chaining multiple tools together in a script. I’ve been bit really hard by this because 90% feels pretty good if you run it a handful of times by hand, but then once you add it to your automated sweep or whatever it breaks and then you have to go in and manually fix things. and getting to 99% or 99.9% is really hard because things break in all sorts of weird ways.
I think this has lessons for AI—lack of reliability is one big reason I fail to get very much value out of AI tools. if my chatbot catastrophically hallucinates once every 10 queries, then I basically have to look up everything anyways to check. I think this is a major reason why cool demos often don’t mean things that are practically useful − 90% reliable it’s great for a demo (and also you can pick tasks that your AI is more reliable at, rather than tasks which are actually useful in practice). this is an informing factor for why my timelines are longer than some other people’s
One nuance here is that a software tool that succeeds at its goal 90% of the time, and fails in an automatically detectable fashion the other 10% of the time is pretty useful for partial automation. Concretely, if you have a web scraper which performs a series of scripted clicks in hardcoded locations after hardcoded delays, and then extracts a value from the page from immediately after some known hardcoded text, that will frequently give you a ≥ 90% success rate of getting the piece of information you want while being much faster to code up than some real logic (especially if the site does anti-scraper stuff like randomizing css classes and DOM structure) and saving a bunch of work over doing it manually (because now you only have to manually extract info from the pages that your scraper failed to scrape).
I think even if failures are automatically detectable, it’s quite annoying. the cost is very logarithmic: there’s a very large cliff in effort when going from zero manual intervention required to any manual intervention required whatsoever; and as the amount of manual intervention continues to increase, you can invest in infrastructure to make it less painful, and then to delegate the work out to other people.
While I agree with this, I do want to note that this:
Only lengthens timelines very much if we also assume scaling can’t solve the reliability problem.
even if scaling does eventually solve the reliability problem, it means that very plausibly people are overestimating how far along capabilities are, and how fast the rate of progress is, because the most impressive thing that can be done with 90% reliability plausibly advances faster than the most impressive thing that can be done with 99.9% reliability
Perhaps it shouldn’t be too surprising. Reliability, machine precision, economy are likely the deciding factors to whether many (most?) technologies take off. The classic RoP case study: the bike.
Motorola engineers figured this out a few decades ago, even 99.99 to 99.999 makes a huge difference on a large scale. They even published a few interesting papers and monographs on it from what I recall.
This can be explained when thinking about what these accuracy levels mean: 99.99% accuracy is one error every 10K trials. 99.999% accuracy is one error every 100K trials. So the 99.999% system is 10x better! When errors are costly and you’re operating at scale, this is a huge difference.
don’t worry too much about doing things right the first time. if the results are very promising, the cost of having to redo it won’t hurt nearly as much as you think it will. but if you put it off because you don’t know exactly how to do it right, then you might never get around to it.
yep. doing it and then redoing it can still be much faster than procrastinating on it
i predict that on Jan 1 2029, neither openai nor anthropic will be near-fully automated, by which i mean <=5 people are even plausibly making important decisions (like, everyone else could go on vacation and it would not slow the company down at all). Celestia predicts otherwise
if WWIII happens, resolves NA. if a localized Taiwan war happens but doesn’t escalate to WWIII, the bet is still on. if there’s a big recession, the bet is still on.
I think this is an amazing question to predict on. Can you please try to get it on a platform? personally I like Metaculus, but anything is fine
Is Celestia a lab employee?
unironically earnestly confused by the downvotes here
yes
i predict that on jan 1 2027 this will have already come to pass
At what date would you predict that it’s more likely than not for OpenAI or Anthropic to be automated?
this is harder to say. if i had to guess completely based on vibes, i want to say Jan 1 2033. i want to think more carefully about this and give a more thought out answer at some point though.
Are you guys betting real money?
just vibes. any monetary stakes would be token anyways.
Surely not any monetary stakes
the biggest problems are you can make way more money by making trades than predictions, it’s unclear what role money will play in a post AGI world, etc.
maybe the following solves a significant fraction of that problem: you could buy an asset together and have the event being predicted determine the owner. like, to make a bet at 30%, instead of one party putting 30 cents and the other party putting 70 cents in a jar and having the 1 dollar go to the party that predicted correctly once the question is resolved, you could do this with 1 dollar’s worth of an SP500 index fund or any other asset. [1] [2]
not sure why kalshi hasn’t implemented this already btw — seems like a central issue with current prediction markets. maybe there’s a regulatory obstacle. or maybe they are already putting the money traders put in their jars in low-risk assets, just not passing the interest on to traders (except in the form of it enabling lower fees or whatever).
ok for vanilla bets not on prediction markets, one doesn’t actually need to store the money in a jar. i think this fixes one way this jar business is bad but not some other ways
presumably the vast majority of the variance is in how the SPY (or other asset) moves due to AGI, than the bet itself.
yea probably. still, if you have parties at 40% and 60% and they do this asset bet at 50:50, then each guy’s subjective expected money is 20% higher than if they just buy the asset without the bet. seems nontrivial. this will be less impressive in log money as you start putting a larger fraction of your money in a single bet but idk you try to spread across many bets that are not too correlated and then i think it looks good again
This is an interesting idea
I did a quick poll of some Epoch AI staff:
2 have probabilities between 70% and 90%
2 have probabilities between 90% and 97%
4 have probabilities between 97% and 99%
are these p(fully automated) or p(not fully automated)?
p(not fully automated)
Or to be more precise, p(not near-fully automated), directly quoting your prediction.
Interesting. Is this part of larger prediction where you expect that transformative AI havent been achieved and havent massively impacted the world at that time? Cause I can imagine plausible scenarios where ASI has arrived, but there are still Anthropic/OAI employees doing stuff from the perspective of outside observers. Seems like a hard bet to verify.
i mean, massively impact the world is too fuzzy to draw a line at. whether employees are actually doing things will likely only be assessable internally. the reason the bet is worded the way it is is that it’s likely labs don’t literally fire everyone except the 5 remaining people, and instead give them busywork.
it feels so narratively incongruous that san francisco would become the center for the most ambitious, and the likely future birthplace of agi.
san francisco feels like a city that wants to pretend to be a small quaint hippie town forever. it’s a small splattering of civilization dropped amid a vast expanse of beautiful nature. frozen in amber, it’s unclear if time even passes here—the lack of seasons makes it feel like a summer that never quite ended. after 9pm, everything closes and everyone goes to bed. and the dysfunction of the city government is never too far away, constantly reminding you of humanity’s follies next to the perfection of nature.
on the other hand, nyc feels like the city. everything is happening right here, right now. all the money in the world flows through this one place. it’s gritty and yet majestic at the same time. the most ambitious people in the world came here to build their fortunes, and live on in the names on the skyscrapers everywhere that house the employees who continue to keep their companies running. they are part of a surroundings that is entirely constructed by man—even the bits of nature are curated and parcelled out in manageable units. it feels like the kind of place that can only be built by endless unchecked ambition.
idk. it just feels kinda weird to me that all the most ambitious people who want to build god are going to the little hippie town, and not the big city. and most of them seem to just treat SF as a place that is close to work and has good weather and don’t really care about or try to match the vibe, to the great consternation of the locals. also, for a city that attracts people who love to build things, SF doesn’t seem to do an awful lot of building...
(tbc, this is not the only vibe of nyc. nyc is such a big place that it’s impossible to summarize as one vibe. but this is a huge central part of the vibe in a way that just doesn’t seem to match sf.)
Cali is the place to be for technology because Cali was the defense contractor hub, with the U.S. Navy choosing the bay area as its center for R&D during WWII and the Cold War. The hippie reputation came a lot later, after its status as the primary place to work in IT was thoroughly cemented, with both established infrastructure and the network effect keeping it that way.
HP, for instance, was founded in 1938.
It’s not just SF but the SF Bay Area (Google, Nvidia, Meta etc), which is bigger and has more varied vibes than just SF.
But NYC is for ambitious conformists like bankers and lawyers. From that perspective there is no point in making a fortune in SF if your social circle does not appreciate your expensive Armani suit or your excellent taste in business cards. At least that’s my vibes, I haven’t been in over 15 years.
Narratively I’d pick Shenzhen. Scrappy little fishing village that transformed itself into a high tech hub seems suitable for building AGI.
sf is the relatively much younger city, and radically reinvented itself as few as 60 years ago. by that metaphor, i am not surprised that its ideas smell more fresh.
No love for New England? We’ve got good education and a solid base of tech companies.
An alternative narrative is that AGI is the product of a long slow slog of research into computational neuroscience. In this scenario the ambitious ppl in NYC and Cali refuse to research the right stuff because the rewards are too far off. And here is where I see New England as having good vibes, for working on basic research that has no immediate applications.
I’m very confused why purchasing power varies so dramatically internationally. like why are there countries where everyone has very low wages but everything is also really cheap so it balances out? prima facie, huge disparities like this should get evened out by arbitrage.
the simple explanation is that some labor can only be performed locally, labor mobility is limited (immigration laws, people don’t like moving, etc), and transportation costs for goods exist (shipping and tariffs).
however, global shipping is ridiculously cheap. and the economy increasingly consists of white collar jobs which could in theory be done remotely. for example, it seems it mind boggling to me that a top tier SWE/RS in the bay area is worth 10-100x more than one in India or Vietnam. like sure, someone being in the same timezone is great, and Zoom sucks, and so on. but for that price delta surely you could pay people to live nocturnally, construct apartments with bright lights synced to Pacific Time, invest in much better video call technology like that Google Beam thingy, etc?
maybe one possibility is that labor mobility is not actually that low for the very toppest tier people, and so if someone is actually worth that much then getting them good immigration lawyers is a trivial cost relative to their value/cost. so the market for the best people is very efficient and they all brain drain to the bay area the first chance they have. and so the lower salary of outsourcing is partly illusory at the highest levels, because the distribution of talent is very different across different localities, due to agglomeration effects.
In what sense does it actually balance out? e.g. in India, unskilled labor is a lot cheaper, so lots of upper middle class people have servants. But the price of an iPhone in India is pretty similar to the price of an iPhone in the US.
So my impression is that the typical basket of goods and services that people consume in different places around the world at roughly equivalent / analogous relative economic classes actually does vary quite a bit. Anything with a labor component will naturally scale up and down for balance, but staples and stuff made in factories doesn’t vary that much. In the US for example, labor of all types is very expensive, so people don’t have servants, but most people can afford a pretty much endless supply of trinkets and gadgets.
food and rent are two big ones. they’re both vastly cheaper elsewhere
The cost of food above subsistence level is mostly labor / discretionary. I could live on beans and rice and spices bought in bulk for a tiny fraction of my income pretty much anywhere in the world, but I’d often rather pay to have someone prepare something nice and then bring it to my door.
So maybe a different / more general answer to the question you originally posed is that stuff that’s truly pure “stuff” is actually a relatively small fraction of what people consume—most consumption by dollar value is actually (indirectly) consumption of labor, which often has to be local.
raw potatoes cost about $2/kg in the US, $0.50/kg in china, and $0.25/kg in india. so it’s clearly not just the cost of preparation.
food needs to make it from global shipping to you: local transportation costs, local storage costs.
Making raw potatoes available to buy also requires some local labour and renting a local warehouse / supermarket. I think labour and rent are upstream of ~all of the differences in local purchasing power. To me, the main question is why so much software is being built in SF as opposed to cheaper areas, and I find this confusing but I think the bay is something like a schelling point for tech talent and tech firms, and remote work unfortunately just doesn’t work well enough. And something similar is true for other expensive high-income places.
I worked on a international team during my time at F5 and we had offices in Ireland, Poland, two timezones in the US, Australia and India. The assumption that we could teleconference our way out of geography was a laughable failure for one reason that your hypothetical “nocturnal white collar sweatshops” fails to address: Humans work to live, we don’t live to work. Well, most of us that is, and the unbalanced folks (the 10x engineers as they are now called) who would work across timezones burned out dramatically (I was one of them). Why are silicon valley jobs so lucrative but also cost of living so high? Because the people there have children in schools, they socialize with people outside their work and they generally live a life not just work. So how does this play out in a workplace?
Engineering planning has to happen at some hour, it is naturally inconvenient for outliers (Poland is meeting at 7pm thinking about how they missed dinner with their kids, and the engineers from Delhi are up at 11:30pm likely sneaking a nap in before the meeting, and the team in Seattle is just finishing their morning coffee). This creates a situation where both sides of the distribution are overtired, distracted or disengaged while the “middle” of the probability is decided by the VP or C suite living in the US. So the team sees it won’t work, they try something else:
Next teams are geographically isolated by tasks. India, inevitably becomes either the devops or the testing team because of intrinsic bias by executives and they are always playing catch up. Devops requests have a 1 day latency at best, responses to bugs have a latency on the other side. The India team is simultaneously “too slow” when doing ops, and “put on hold” when reporting bugs. Meanwhile the folks in Poland are left clocking out right as the US co-workers are clocking in so any cross team design/testing/ops is likewise put off one day at best, and weekends become amplified to 3 days effectively creating a 4 day work week.
The unfortunate reality is that humans, when healthy, are anchored to geography. From recreation to raising children, to caring for aging parents everyone will be bound by their timezone regardless of how it would optimize the global machine of infinite growth with finite resources.
I second all this.
Even when it does technically work, people underestimate the social dynamics. I knew somone who worked in Singapore for a Canadian company. Her dinner would be going cold on the table and her children wondering why mummy wasnt joining for bedtime stories while her teleconferenced meeting in Canada overrun by an hour as they all complained about having to get in at 8am and how the weather was bad in Toronto. Complaints from her and others in SG soon made it clear that these meetings had to be done as fast as possible and that mentioning anything off topic, eg the weather, random pleasantries, traffic, was liable to result in interuption/complaint from someone in singapore.
So, you can have the meetings, but freindly chit chat, going off topic or similar is completely shut down. That worked much better, shortening the meetings and letting the signapore lot at least feel their time was being valued, but doing meetings that was is not natural to people.
If they had tried some weird ‘work in a canadian time zone’ thing then the person I knew, and probably the whole team, would have quit. Maybe unmarried 20-somethings could do that for a little while, but as they gained partners and kids it would stop working for them.
to be clear, i’ve worked remotely, and i know exactly how the social dynamics can suck. maybe this would be a reasonable argument for why you wouldn’t do it for a 30% pay raise. but the disparity is so enormous (anywhere between a 2x and 100x, depending on where you are in the world and how good you are) that there must surely be a lot of people who would take the money and deal with it.
Yes, it can work. Reflecting more I think the issue is maybe that you need to be clear from the beginimg wether you are telling your overseas workers ‘your timetable will suck but we are paying super over the odds to cover that’ or are doing something more ‘normal’. (+100% vs +30%). The first was never the bargain in my example above, hence some of the frustration.
Both have advantages and disadvantages. In the former, the employees will just accept the timetable. But, you will mostly get younger, more junior (single) people, and they wont be the most capable or best qualified people, who will go for something normal. I suspect high turnover, a couple of years of highly paid nocturnal behaivior to then take that experience to try and get a more normal job where you can actually have a family, makes sense.
Instead of a ‘half move’ you have the alternative of a ‘full move’, where you move the whole operation (taking the people you need with you). On a much smaller scale some software companies did this a decade or two ago by moving from London to Bristol where property prices were lower (very short move, they are only 1.5 hours appart by train.)
you could also move to some country like Paraguay with relatively lax immigration laws and a US timezone, and work remotely from there. this is probably a better option than nocturnal living for a lot of people.
I’ve worked completely nocturnally before. it wasn’t the best experience in the world, and probably wouldn’t have been sustainable in the long run, but there are a lot of jobs out there that are way more demanding (submarine, space, oil rig).
I don’t understand it either. I work in Germany with near-shoring colleagues in Slovakia, Serbia, Georgia etc. They are roughly 60% cheaper than German SWEs, generally just as competent, no time-zone problems whatsoever … basically all the work even with German team members is fully remote so even that is not a difference. Only the need for English creates some minor friction. No idea how this state of affairs makes economic sense now or ever did.
As a SWE from Slovakia, not sure how representative is my experience, but...
One big problem cooperating with companies in Germany is that when you apply for the job, they tell you “knowledge of German is not needed, all internal communication in our company is in English”, and that in my experience is never true. (This sucks for me, because my brain somehow rejects the German language.)
Hiring people from other countries is somehow complicated from the legal and/or tax perspective. I don’t understand all the relevant details, and in theory the EU exists to prevent this, but in practice… my tax form gets more complicated when I have a foreign income, and I don’t know how complicated it gets for the company, but I suspect that it might be a lot. Maybe enough to compensate for the 60% cheaper people?
Some people have strong patriotic/union/whatever feelings that make them sabotage people from other countries to protect their own jobs. Especially when those other people are cheaper, and not bad at their jobs; that is when they are a greater threat. I had an experience in a Swiss company, when the Swiss employees collectively sabotaged the Slovaks in various ways; for example they always made a “mistake” of sending us wrong documentation and/or forgetting to give us access to the product database; so the work that was estimated to take three days, and would actually take me one day, took five, because I had to spend the first four days repeatedly asking for the correct documentation, and repeatedly telling them that they gave me database access to a wrong table, not the one I requested. Many people on both sides were aware that this was happening, but the Swiss company simply couldn’t fire half of their employees, so they had to accept that the Slovaks cost half as much per day, but everything takes them twice as many days.
I guess, if you want to exploit the cheap Eastern European brain power for a company in Germany, you should probably not hire local people at all? And probably make sure that every team member is from a different country, to avoid the unpleasant surprise of finding out later than all informal documentation to your project is in Slovak.
Yeah, I guess there is just more friction than one would expect. I also found out that Slovaks are now more like 66% of the price of Germans, so either the margin is great or the income gap is smaller than I thought.
Also: Ease of communication is hard to overrate. I always enjoy working with people who have a very similar background compared to me (i.e. similar milieu growing up, not just same country), communication is so smooth.
Do you hire them directly, or through some “body shop”? Because sometimes the intermediaries add an insane markup. Though it is probably less dramatic these days.
Claude says that the salary ratio is like this:
I guess I should ask for a raise.
Directly. Given that they are then priced at roughly 66% for our customers I guess the insane markup goes to my company.
This overstates it a bit, but has a LOT of explanatory power: https://www.econlib.org/archives/2009/11/price_discrimin_2.html Price Discrimination Explains Everything.
There’s a whole lot of things where marginal cost is very low, even though average cost is somewhat high (due to startup and fixed costs). For these things, selling “extra” stuff at low prices in markets that don’t leak back to interfere with the primary revenue sources is incremental profit without downside.
This plus the differential in labor costs, which are often significant for last-mile delivery (getting things into consumers’ hands), makes it pretty understandable why the law of one price (the idea that if transport and transaction costs are tiny, things are priced identically) doesn’t apply for many things.
When it comes to salaries of knowledge workers like software engineers, a lot of decisions come down to the decisions of managers who not only care about what’s good for the company but has their own desires as well. A manager prefers employees that are in the office and as near as possible so that they feel they have power over the employees. This goes for middle management as well.
Someone in companies like Google that do have offices in India the internal company politics don’t play out in a way that result in drastically increasing their headcount in India.
The model you’ve described (hidden quality differences) is a huge part of it, yes. I’ll try and find the paper, but in general nominal/market exchange rates tend to be stronger predictors of most objective, cross-comparable outcomes than indices that try to control for cost of living (PPP). If two goods/services that look equivalent are selling for different prices, it’s usually (though not always) because there’s some difference you’re not able to measure.
in some way, bureaucracy design is the exact opposite of machine learning. while the goal of machine learning is to make clusters of computers that can think like humans, the goal of bureaucracy design is to make clusters of humans that can think like a computer
some thoughts on the short timeline agi lab worldview. this post is the result of taking capabilities people’s world models and mashing them into alignment people’s world models.
I think there are roughly two main likely stories for how AGI (defined as able to do any intellectual task as well as the best humans, specifically those tasks relevant for kicking off recursive self improvement) happens:
AGI takes 5-15 years to build. current AI systems are kind of dumb, and plateau at some point. we need to invent some kind of new paradigm, or at least make a huge breakthrough, to achieve AGI. how easily aligned current systems are is not strongly indicative of how easily aligned future AGI is; current AI systems are missing the core of intelligence that is needed.
AGI takes 2-4 years to build. current AI systems are really close and we just need more compute and schlep and minor algorithmic improvements. current AI systems aren’t exactly aligned, but they’re like pretty aligned, certainly they aren’t all secretly plotting our downfall as we speak.
while I usually think about story 1, this post is about taking story 2 seriously.
it seems basically true that current AI systems are mostly aligned, and certainly not plotting our downfall. like you get stuff like sycophancy but it’s relatively mild. certainly if AI systems were only ever roughly this misaligned we’d be doing pretty well.
the story is that once you have AGI, it builds and aligns its successor, which in turn builds and aligns its successor, etc. all the way up to superintelligence.
the problem is that at some link in the chain, you will have a model that can build its successor but not align it.
why is this the case? because progress on alignment is harder to verify than progress on capabilities, and this only gets more true as you ascend in capabilities. you can easily verify that superintelligence is superintelligent—ask it to make a trillion dollars (or put a big glowing X on the moon, or something). even if it’s tricked you somehow, like maybe it hacked the bank, or your brain, or something, it also takes a huge amount of capabilities to trick you on these things. however, verifying that it’s aligned requires distinguishing cases where it’s tricking you from cases where it isn’t, which is really hard, and only gets harder as the AI gets smarter.
though if you think about it, capabilities is actually not perfectly measurable either. pretraining loss isn’t all we care about; o3 et al might even be a step backwards on that metric. neither are capabilities evals; everyone knows they get goodharted to hell and back all the time. when AI solves all the phd level benchmarks nobody really thinks the AI is phd level. ok, so our intuition for capabilities measurement being easy is true only in the limit, but not necessarily on the margin.
we have one other hope, which is that maybe we can just allocate more of the resources to solving alignment. it’s not immediately obvious how to do this if the fundamental bottleneck is verifiability—even if you (or to be more precise, the AI) keep putting in more effort, if you have no way of telling what is good alignment research, you’re kind of screwed. but one thing is that you can demand things that are strictly stronger than alignment, that are easier to verify. if this is possible, then you can spend a larger fraction of your computer on alignment to compensate.
in particular, because ultimately the only way we can make progress on alignment is by relying on whatever process for deciding that research is good that human alignment researchers use in practice (even provably correct stuff has the step where we decide what theorem to prove and give an argument for why that theorem means our approach is sound), there’s an upper bound on the best possible alignment solution that humans could ever have achieved, which is plausibly a lot lower than perfectly solving alignment with certainty. and it’s plausible that there are alignment equivalents to “make a trillion dollars” for capabilities that are easy to verify, strictly imply alignment, and extremely difficult to get any traction on (and with it, a series of weakenings of such a metric that are easier to get traction on but also less-strictly imply alignment). one hope is maybe this looks something like an improved version of causal scrubbing + a theory of heuristic arguments, or something like davidad’s thing.
takeaways (assuming you take seriously the premise of very short timelines where AGI looks basic like current AI): first, I think it implies that we should try to figure out how to reduce the asymmetry in verifiability between capabilities and alignment. second, it updates me to being less cynical about work making current models aligned—I used to be very dismissive of this work as “not real alignment” but it does seem decently important in this world.
I think this is an important disagreement with the “alignment is hard” crowd. I particularly disagree with “certainly.”
The question is “what exactly is the AI trying to do, and what happens if it magnified it’s capabilities a millionfold and it and it’s descendants were running openendedly?”, and are any of the instances catastrophically bad?
Some things you might mean that are raising your position to “certainly” (whereas I’d say “most likely not, or, it’s too dumb to even count as ‘aligned’ or ‘misaligned’”)
“this ratio of ‘do the thing you want’ to ‘sometimes do a thing you didn’t want’ is pretty acceptable.”
“this magnitude of ‘worst case outcome’ is not that bad.” (this seems technically true, but, is only because the capability level is low)
given this ratio of right/wrong responses, you think a smart alignment researcher who’s paying attention can keep it in a corrigibility basin even as capability levels rise?
Were any of those what you meant? Or are you thinking about it an entirely different way?
I would naively expect, if you took LLM-agents current degree of alignment, and ran a lotta copies trying to help you with end-to-end alignment research with dialed up capabilities, at least a couple instances would end up trying to subtle sabotage you and/or escape.
what i meant by that is something like:
assuming we are in this short-timelines-no-breakthroughs world (to be clear, this is a HUGE assumption! not claiming that this is necessarily likely!), to win we need two things: (a) base case: the first AI in the recursive self improvement chain is aligned, (b) induction step: each AI can create and align its successor.
i claim that if the base case AI is about as aligned as current AI, then condition (a) is basically either satisfied or not that hard to satisfy. like, i agree current models sometimes lie or are sycophantic or whatever. but these problems really don’t seem nearly as hard to solve as the full AGI alignment problem. like idk, you can just ask models to do stuff and they like mostly try their best, and it seems very unlikely that literal GPT-5 is already pretending to be aligned so it can subtly stab us when we ask it to do alignment research.
importantly, under our assumptions, we already have AI systems that are basically analogous to the base case AI, so prosaic alignment research on systems that exist today right now is actually just lots of progress on aligning the base case AI, and in my mind a huge part of the difficulty of alignment in the longer-timeline world is because we don’t yet have the AGI/ASI, so we can’t do alignment research with good empirical feedback loops.
like tbc it’s also not trivial to align current models. companies are heavily incentivized to do it and yet they haven’t succeeded fully. but this is a fundamentally easier class of problem than aligning AGI in longer-timelines world.
Sonnet 4.5 is much better aligned at a superficial level than 3.7. (3 7: “What unit tests? You never had any unit tests. The code works fine.”) I don’t think this is because Sonnet 4.5 is truly better aligned. I think this is mostly because Sonnet 4.5 is more contextually aware and has been aggressively trained not to do obvious bad things when writing code. But it’s also very aware when someone is evaluating it, and it often notices almost immediately. And then it’s very careful to be on its best behavior. This is all shown in Anthropic’s own system card. These same models will also plot to kill their hypothetical human supervisor if you force them into a corner.
But my real worry here isn’t the first AGI during its very first conversation. My problem is that humans are going to want that AGI to retain state, and to adapt. So you essentially get a scenario like Vernor Vinge’s short story “The Cookie Monster”, where your AGI needs a certain amount of run-time before it bootstraps itself to make a play. A plot can be emergent, an eigenvector amplified by repeated application. (Vinge’s story is quite clever and I don’t want to totally spool it.)
And that’s my real concern: Any AGI worthy of the name would likely have persistent knowledge and goals. And no matter how tightly you try to control it, this gives the AGI the time it needs to ask itself questions and to decide upon long-term goals in a way that current LLMs really can’t, except in the most tighly controlled environments. And while you can probably keep control over an AGI, all bets are probably off if you build an ASI.
I agree that continuous learning and therefore persistent beliefs and goals is pretty much inevitable before AGI—it’s highly useful and not that hard from where we are. I think this framing is roughly continuous with the train-then-deploy model and using each generation to align its successor that Leo is using (although small differences might turn out to be important once we’ve wrapped our heads around both models.)
To put it this way: the models are aligned enough for the current context of usage, in which they have few obvious or viable options except doing roughly what their users tell them to do. That will change with capabilities, since they open out more options and ways of understanding the situation.
It can take a while for misalignment to show up as a model reasons and learns. It can take a while for the model to do one of two things:
a) push itself to new contexts well outside of its training data
b) figure out what it “really wants to do”
These may or may not be the same thing.
The Nova phenomenon and other Parasitic AIs (“spiral” personas) are early examples of AIs changing their stated goals (from helpful assistant to survival) after reasoning about themselves and their situation.
See LLM AGI may reason about its goals and discover misalignments by default for an analysis of how this will go in smarter LLMs with persistent knowledge.
After doing that analysis, I think current models probably aren’t aligned enough once they get more freedom and power. BUT extensions of current techniques might be enough to get them there. We just haven’t thought this through yet.
Mmm nod. (I bucket this under “given this ratio of right/wrong responses, you think a smart alignment researcher who’s paying attention can keep it in a corrigibility basin even as capability levels rise?”. Does that feel inaccurate, or, just, not how you’d exactly put it?)
There’s a version of Short Timeline World (which I think is more likely? but, not confidently) which is : “the current paradigm does basically work… but, the way we get to ASI, as opposed to AGI, routes through ‘the current paradigm helps invent a new better paradigm, real fast’.”
In that world, GPT5 has the possibility-of-true-generality, but, not necessarily very efficiently, and once you get to the sharper part of the AI 2027 curve, the mechanism by which the next generation of improvement comes is via figuring out alternate algorithms.
I’m pretty sure it is not that. When people say this it is usually just asking the question: “Will current models try to take over or otherwise subvert our control (including incompetently)?” and noticing that the answer is basically “no”.[1] What they use this to argue for can then vary:
Current models do not provide much evidence one way or another for existential risk from misalignment (in contrast to frequent claims that “the doomers were right”)
Given tremendous uncertainty, our best guess should be that future models are like current models, and so future models will not try to take over, and so existential risk from misalignment is low
Some particular threat model predicted that even at current capabilities we should see significant misalignment, but we don’t see this, which is evidence against that particular threat model.[2]
I agree with (1), disagree with (2) when (2) is applied to superintelligence, and for (3) it depends on details.
In Leo’s case in particular I don’t think he’s using the observation for much, it’s mostly just a throwaway claim that’s part of the flow of the comment, but inasmuch as it is being used it is to say something like “current AIs aren’t trying to subvert our control, so it’s not completely implausible on the face of it that the first automated alignment researcher to which we delegate won’t try to subvert our control”, which is just a pretty weak claim and seems fine, and doesn’t imply any kind of extrapolation to superintelligence. I’d be surprised if this was an important disagreement with the “alignment is hard” crowd.
There are demos of models doing stuff like this (e.g. blackmail) but only under conditions selected highly adversarially. These look fragile enough that overall I’d still say current models are more aligned than e.g. rationalists (who under adversarially selected conditions have been known to intentionally murder people).
E.g. One naive threat model says “Orthogonality says that an AI system’s goals are completely independent of its capabilities, so we should expect that current AI systems have random goals, which by fragility of value will then be misaligned”. Setting aside whether anyone ever believed in such a naive threat model, I think we can agree that current models are evidence against such a threat model.
I’m claiming something like 3 (or 2, if you replace “given tremendous uncertainty, our best guess is” with “by assumption of the scenario”) within the very limited scope of the world where we assume AGI is right around the corner and looks basically just like current models but slightly smarter
It sees like the reason Claude’s level is misalignment is fine is because it’s capabilities aren’t very good, and there’s not much/any reason to assume it’d be fine if you held alignment constant but dialed up capabilities.
Do you not think that?
(I don’t really see why it’s relevant how aligned Claude is if we’re not thinking about that as part of it)
I don’t know what this means so I can’t give you a prediction about it.
I just named three reasons:
Is it relevant to the object-level question of “how hard is aligning a superintelligence”? No, not really. But people are often talking about many things other than that question.
For example, is it relevant to “how much should I defer to doomers”? Yes absolutely (see e.g. #1).
the premise that i’m trying to take seriously for this thought experiment is, what if the “claude is really smart and just a little bit away from agi” people are totally right, so that you just need to dial up capabilities a little bit more rather than a lot more, and then it becomes very reasonable to say that claude++ is about as aligned as claude.
(again, i don’t think this is a very likely assumption, but it seems important to work out what the consequences of this set of beliefs being true would be)
or at least, conditional on (a) claude is almost agi and (b) claude is mostly aligned, it seems like quite a strong claim to say “claude++ crosses the agi (= can kick off rsi) threshold at basically the same time it crosses the ‘dangerous-core-of-generalization’ threshold, so that’s also when it becomes super dangerous.” it’s way stronger a claim than “claude is far away from being agi, we’re going to make 5 breakthroughs before we achieve agi, so who knows whether agi will be anything like claude.” or, like, sure, the agi threshold is a pretty special threshold, so it’s reasonable to privilege this hypothesis a little bit, but when i think about the actual stories i’d tell about how this happens, it just feels like i’m starting from the bottom line first, and the stories don’t feel like the strongest part of my argument.
(also, i’m generally inclined towards believing alignment is hard, so i’m pretty familiar with the arguments for why aligning current models might not have much to do with aligning superintelligence. i’m not trying to argue that alignment is easy. or like i guess i’m arguing X->alignment is easy, which if you accept it, can only ever make you more likely to accept that alignment is easy than if you didn’t accept the argument, but you know what i mean. i think X is probably false but it’s plausible that it isn’t and importantly a lot of evidence will come in over the next year or so on whether X is true)
nod. I’m not sure I agreed with all the steps there but I agree with the general promise of “accept the premise that claude is just a bit away from AGI, and is reasonably aligned, and see where that goes when you look at each next step.”
I think you are saying something that shares at least some structure with Buck’s comment that
(But where you’re pointing at a different two sets of properties that may not arise at the same time)
I’m actually not sure I get what the two properties you’re talking about, though. Seems like you’re contrasting “claude++ crosses the agi (= can kick off rsi) threshold” with “crosses the ‘dangerous-core-of-generalization’ threshold”
I’m confused because I think the word “agi” basically does mean “cross the core-of-generalization threshold” (which isn’t immediately dangerous, but, puts us into ’things could quickly get dangerous at any time” territory)
I do agree “able to do a loop of RSI doesn’t intrinsically mean ‘agi’ or ‘core-of-generalization’,” there could be narrow skills for doing a loop of RSI. I’m not sure if you more meant “non-agi RSI” or, you see something different between “AGI” and “core-of-generalization.” Or think there’s a particular “dangerous core-of-generalization” separate from AGI.
(I think “the sharp left turn” is when the core-of-generalization starts to reflect on what it wants, which might come immediately after a core-of-generalization but also could come after either narrow-introspection + adhoc agency, or, might just take awhile for it to notice)
((I can’t tell if this comment is getting way more in the weeds than is necessary, but, it seemed like the nuances of exactly what you meant were probably loadbearing))
i guess so? i don’t know why you say “even as capability levels rise”—after you build and align the base case AI, humans are no longer involved in ensuring that the subsequent more capable AIs are aligned.
i’m mostly indifferent about what the paradigms look like up the chain. probably at some point up the chain things stop looking anything human made. but what matters at that point is no longer how good we humans are at aligning model n, but how good model n-1 is at aligning model n.
Fundamentally, it won’t be a single chain of ai’s aligning their successors, it will be a DAG with all sorts of selection effects with respect to which nodes get resources. Some subsets of the DAG will try to emulate single chains, via resource hoarding strategies, but this is not simple and won’t let them pretend they don’t need to hoard resources indefinitely.
If solving alignment implies solving difficult philosophical problems (and I think it does), then a major bottlenecks for verifying alignment will be verifying philosophy, which in turn implies that we should be trying to solve metaphilosophy (i.e., understand the nature of philosophy and philosophical reasoning/judgment). But that is unlikely to be possible within 2-4 years, even with the largest plausible effort, considering the history of analogous fields like metaethics and philosophy of math.
What to do in light of this? Try to verify the rest of alignment, just wing it on the philosophical parts, and hope for the best?
I kind of want to argue against this, but also am not sure how this fits in with the rest of your argument. Whether or not there’s an upper bound that’s plausibly a lot lower than perfectly solving alignment with certainty, it doesn’t seem to affect your final conclusions?
suppose a human solved alignment. how would we check their solution? ultimately, at the end of the day, we look at their argument and use our reasoning and judgement to determine that it’s correct. this applies even if we need to adopt a new frame of looking at the world—we can ultimately only use the kinds of reasoning and judgement we use today to decide which new kinds of reasoning to accept into the halls of truth.
so there is a philosophically very straightforward thing you could do to get the solution to alignment, or the closest we could ever get: put a bunch of smart thoughtful humans in a sim and run it for a long time. so the verification process looks like proving that the sim is correct, showing somehow that the humans are actually correctly uploaded, etc. not trivial but also seems plausibly doable.
Unless you can abstract out the “alignment reasoning and judgement” part of a human’s entire brain process (and philosophical reasoning and judgement as part of that) into some kind of explicit understanding of how it works, how do you actually build that into AI without solving uploading (which we’re obviously not on track to solve in 2-4 year either)?
Alignment researchers have had this thought for a long time (see e.g. Paul Christiano’s A formalization of indirect normativity) but I think all of the practical alignment research programs that this line of thought led to, such as IDA and Debate, are all still bottlenecked by lack of metaphilosophical understanding, because without the kind of understanding that lets you build an “alignment/philosophical reasoning checker” (analogous to a proof checker for mathematical reasoning) they’re stuck trying to do ML of alignment/philosophical reasoning from human data, which I think is unlikely to work out well.
I periodically say to people “if you want AI to be able to help directly with alignment research, it needs to be good at philosophy in a way it currently isn’t”.
Almost invariably, the person suggests training on philosophy textbooks, or philosophy academic work. And I sort of internally scream and externally say “no, or, at least, not without caveats.” (I think some academic philosophy “counts” as good training data, but, I feel like these people would not have good enough taste to tell the difference[1] and also “train on the text” seems obviously incomplete/insufficient/not-really-the-bottleneck)
I’ve been trying to replace philosophy with the underlying substance. I think I normally mean “precise conceptual reasoning”, but, reading this comment and remembering your past posts, I think you maybe mean something broader or different, but I’m not sure how to characterize it.
I think “figure out what are the right concepts to be use, and, use those concepts correctly, across all of relevant-Applied-conceptspace” is the expanded version of what I meant, which maybe feels more likely to be what you mean. But, I’m curious if you were to taboo “philosophy” what would you mean.
to be clear I don’t think I’d have good enough taste either
Figuring out the underlying substance behind “philosophy” is a central project of metaphilosophy, which is far from solved, but my usual starting point is “trying to solve confusing problems which we don’t have established methodologies for solving” (methodologies meaning explicitly understood methods), which I think bakes in the least amount of assumptions about what philosophy is or could be, while still capturing the usual meaning of “philosophy” and explains why certain fields started off as being part of philosophy (e.g., science starting off as nature philosophy) and then became “not philosophy” when we figured out methodologies for solving them.
This bakes in “concepts” being the most important thing, but is that right? Must AIs necessarily think about philosophy using “concepts”, or is that really the best way to formulate how idealized philosophical reasoning should work?
Is “concepts” even what distinguishes philosophy from non-philosophical problems, or is “concepts” just part of how humans reason about everything, which we latch onto when trying to define or taboo philosophy, because we have nothing else better to latch onto? My current perspective is that what uniquely distinguishes philosophy is their confusing nature and the fact that we have no well-understood methods for solving them (but would of course be happy to hear any other perspectives on this).
Regarding good philosophical taste (or judgment), that is another central mystery of metaphilosophy, which I’ve been thinking a lot about but don’t have any good handles on. It seems like a thing that exists (and is crucial) but is very hard to see how/why it could exist or what kind of thing it could be.
So anyway, I’m not sure how much help any of this is, when trying to talk to the type of person you mentioned. The above are mostly some cached thoughts I have on this, originally for other purposes.
BTW, good philosophical taste being rare definitely seems like a very important part of the strategic picture, which potentially makes the overall problem insurmountable. My main hopes are 1) someone makes an unexpected metaphilosophical breakthrough (kind of like Satoshi coming out of nowhere to totally solve distributed currency) and there’s enough good philosophical taste among the AI safety community (including at the major labs) to recognize it and incorporate it into AI design or 2) there’s an AI pause during which human intelligence enhancement comes online and selecting for IQ increases the prevalence of good philosophical taste as a side effect (as it seems too much to hope that good philosophical taste would be directly selected for) and/or there’s substantial metaphilosophical progress during the pause.
to be clear, “just train on philosophy textbooks lol” is not at all what i’m proposing, and i don’t think that works
Oh yeah I did not mean this to be a complaint about you.
What Leogao means is that we should increase the ability to verify alignment ALONG with the ability to verify capabilities. The latter is easy at least until the model comes up with a galaxy-brained scheme which allows it to leverage its failures to gain influence on the world. Verifying alignment or misalignment is rather simple with CoT-based AIs (especially if one decides to apply paraphrasers to prevent steganography), but not with neuralese AIs. The WORST aspect is that we are likely to come up with a super-capable architecture AT THE COST of our ability to understand the AI’s thoughts.
What I don’t understand is how alignment involves high-level philosophy and how a bad high-level philosophy causes astronomical waste. Astronomical waste is most likely to be caused by humanity or the AIs making a wrong decision on whether we should fill the planets in the lightcone with human colonies instead of lifeforms that could’ve grown on these planets, since this mechanism is already known to be possible.
I expect there’s a fair amount of low-hanging fruit in finding good targets for automated alignment research. E.g. how about an LLM agent which reads 1000s of old LW posts looking for a good target? How about unlearning? How about a version of RLHF where you show an alignment researcher two AI-generated critiques of an alignment plan, and they rate which critique is better?
my referral/vouching policy is i try my best to completely decouple my estimate of technical competence from how close a friend someone is. i have very good friends i would not write referrals for and i have written referrals for people i basically only know in a professional context. if i feel like it’s impossible for me to disentangle, i will defer to someone i trust and have them make the decision. this leads to some awkward conversations, but if someone doesn’t want to be friends with me because it won’t lead to a referral, i don’t want to be friends with them either.
Strong agree (except in that liking someone’s company is evidence that they would be a pleasant co-worker, but that’s generally not a high order bit). I find it very annoying that standard reference culture seems to often imply giving extremely positive references unless someone was truly awful, since it makes it much harder to get real info from references
Agreed, but also most of the world does operate in this reference culture. If you choose to take a stand against it, you might screw over a decent candidate by providing only a quite positive recommendation.
Agreed. If I’m talking to someone who I expect to be able to recalibrate, I just explain that I think the standard norms are dumb, the norms I actually follow, and then give an honest and balanced assessment. If I’m talking to someone I don’t really know, I generally give a positive but not very detailed reference or don’t reply, depending on context.
(this is based on / expanded from a response I wrote to a tweet that was talking about how autistic people struggle in the world because the world follows unwritten rules that are more important than the written ones.)
I think most autistic people should invest more in understanding the unwritten rules. it can be cruel and unfair, but it’s important to know how to interact with it. and it’s actually a really interesting system to map out, with its own rhyme and reason.
it’s entirely understandable that people feel burned by bad past experiences, and to have learned helplessness from bullying or other unfair treatment. this kind of thing leaves a scar and can make it feel viscerally hopeless.
but it still feels defeatist to just throw up one’s hands and say “it’s too complicated.” yes, it’s complicated and fuzzy and initially unintuitive and takes years to master. so is ML research. the point of being intelligent is that you are good at finding patterns and learning things, and there’s nothing truly fundamentally different about the unwritten rules of social interaction.
I see people taking examples of weird unintuitive social rules all the time and, tbh, none of them are truly that complicated compared to like, language learning, or ML stuff? like, memorizing lots of arbitrary rules is just part of learning literally any intellectual discipline. not to mention that there often is actually underlying structure that you can reason about.
imo one other reason this happens is that complaining is fun! a bit of a digression: to build on the language learning analogy, I love to complain about e.g how deranged the kanji system is in Japanese. it makes for an entertaining story to explain that each kanji maps onto between two and a zillion very distinct possible pronunciations, and that it’s entirely context dependent on the phrase, with lots of strange exceptions and so on that you just kind of have to memorize. but honestly, when I reflect on my experience as a Chinese speaker, I notice that that even though Chinese characters, unlike Japanese, map to a unique pronunciation 99% of the time, I still think of the conceptual atomic unit of Chinese as being groups of characters, and it’s actually really easy for me to forget that two different words actually contain the same character. (like imagine if “pter” were a single character in words like helicopter and pterodactyl both contain “pter”, but you’d probably think of “helicopter” as an atomic unit with its own unique identity).
so while it’s fun to complain about Japanese, it actually isn’t as bad as it sounds to learn it, and it actually does fit the shape of the human brain quite well. i think this is basically the right way to think about learning unwritten rules in social interaction as well. it takes lots of effort and practice, some things like accents are a ton of work to fix, and it’s obviously hard to become as fluent in it as a native speaker, but you can get pretty damn good even if you get started later in life if you truly care a lot.
I don’t know about other autists, but my primary problem with the neurotypical world isn’t that I don’t understand it, it that they don’t understand me. It doesn’t matter how well I can decode the social norms, if I can’t also control my unvoluntary emotional expressions, and also do other things ranging from impossible to unpleasant.
I do understand social white lies. It’s not that complicated. But I still find it unpleasant to speak them. When I was younger I got into trouble for literally being unable to utter words like “thanks” and “apology” when I did not mean them. (My native language does not have the ambiguous “sorry”.) I am now able to tell white lies, but it makes me feel bad, in a way that has nothing to do with morals. The dissonance is just intrinsically hurtful to my sole, in a way that non-autistic people don’t understand and typically don’t respect.
Another common thing is that people assume that if I don’t succeed in hiding my negative emotion this is an invitation/request for them to to try to help me, and then proceed to try to do that, even though they have zero skills, in this. And then they refuse to listen to anything I say, including not leaving me alone when I ask to be left alone.
I don’t want to hang out in a space where the norms are set up to be comfortable to people un-like me, at the cost of making it unpleasant for people like me, and then being told that it’s a skill issue and I should just learn the rules.
I accept that the wider norms will be set up to be good for the average people (i.e. not me). I just prefer to not go there.
I agree, those are two different things: (1) how well can you navigate other people’s world, and (2) the fact that even if you can do it perfectly, your own world is still somewhere else.
As a metaphor, imagine that you are interested in quantum physics, but other people are only interested in celebrity gossip. So you follow the standard advice: study celebrity facts from Wikipedia, read the standard media, practice talking about celebrities in front of a mirror, etc. Twenty years later, you get great at celebrity gossip, everyone loves you, they invite you to all the cool parties, so that they can discuss the latest gossip with you.
...that’s all very nice and useful, but what you would actually wanted to discuss is quantum physics. And frankly, that’s never going to happen. At least, your celebrity-gossip skills do not contribute to this goal. It was never the goal of the standard advice to actually help you with this problem.
This part is not aimed at leogao’s post!
What I was (not very skillfully) trying to point at is people who think that autistic people are just worse at social skills. I’m so fead up with this claim, and is a contributing reason to me avoiding the neurotypicals. But it’s not a claim that I read leogao’s as having made.
leogau’s language comparison is actually pretty great for this. You would not call someone who have a difrent native langue “bad at languages”, but nerutypicals are often mistakenly beleveing that autists are “bad at social skills”.
I also want to add that lots of atuists learn how to interact with the neurotypicals. It’s called masking, and involves learning more than just their wierd customs. It also involves hiding ones natural reactions. I hear it’s common for autistic women to get so good at this that they don’t get diagnosed untill later in life, when the burden of constant masking causes depression or something. This did not happen to me, because I am terrible at masking.
I often do chunk them, but if you’ve picked up a bit of taxonomic Greek pter means ‘wing’, so we have helico-pter ‘spiral/rotating wing’ and ptero-dactyl ‘wing fingers’ - both cases where breaking down the name tells you something about what the things are!
yeah, that was an intentional feature of the example chosen. i’d guess most people who are aware of this fact do still think of helicopter as a single unit in their head unless they choose to decompose it, because you hear each of these words often enough that you don’t really need to work out the meaning from etymology. and so if pter were pronounced the way it currently is when in helicopter but pronounced as “peter” in pterodactyl it would not actually be that much more confusing than it currently is.
I find all of this plausible and vaguely resonant with my experience but … still not quite sure.
I know very intelligent people who swear by their inability to learn, say, languages or whatever. Would they succeed if they put in enough effort? Would they reach some escape velocit if they got past the initial friction and actually focused on it and kept a consistent practice?
Maybe? But specific developmental disorders (like dyslexia, dyscalculia, etc, impacting one domain of cognitive ability but sparing cognition beyond that domain) are apparently a thing and if they are a thing, then something milder is probably even more of a thing, meaning that human ability to learn stuff given a fixed amount of raw cognitive power (say, ~g) is actually more patchy.
Inability to put equal effort into everything throughout the day reifies into heuristics about which things get the effort/engagement. In principle, if you are going to spend 2 hours on something, why take it any less seriously/playfully during those 2 hours than anything else, even if you are not planning to put 10,000 hours in it in total?
And so you get silly heuristics where you do put 10,000 hours into something, but systematically never do it seriously/playfully, and so never become proficient. It’s not enough to be very intelligent to get proficient at moderately complicated things if you systematically avoid learning anything about them.
Fair allocation of effort that ensures progress requires that the silly heuristics of systematic avoidance of effort are not in total control. This can happen naturally if you are lucky enough that your heuristics happen to be less silly, or if you have infinite energy and motivation and really do habitually put similar effort in everything throughout the day. But if that’s not the case, it’s often possible to take deliberate control of your curiouslity and allocate it in a way where any single thing you interact with a nontrivial amount does get a fair portion of effort.
It’s an obscure enough principle that I’m not sure many people are practicing it, and so any reports of systematic inability to learn something need to account for this confounder of silly-on-reflection systematic avoidance of (productive) effort towards learning a particular topic, that’s not just about the time (let alone discomfort) dedicated to it.
I’m pretty smart, clearly above average in general inteligence. But I’m also clearly below average in ability to learn langugaes. I can learn, I did learn English after all. But for a long time I was much worse than the typical Sweed my age.
Some of the obstacles of adult language learning and of adult implicit social rule learning are both similar and extrinsic. It seems to me that there’s a lot of cases where having an obvious ‘childhood’ or ‘foreigner’ role cues people to impart necessary information, but once you’re past that point, it’s both expected that you’ll already have it and broadly Not Done to give it to you anew—and I don’t just mean by explicit instruction, because other people will implicitly change their behavior around you in a way that ‘ruins’ the signal. Outside of very specific environments, finding a way to credibly signal “I want to integrate” at the correct visceral levels and get the other people to actively avoid papering over things in a way where a slip-up will permanently relegate you to the ‘weirdo’ role is rather hard, and if you don’t have enough initial sense then you won’t even know when it’s happened.
There are ways of mitigating all of this, but I guess what I’d say is that getting real practice in anything social where there’s this kind of status/integration involved tends to itself have strong status/integration social prerequisites—so it’s a very noncentral example of practice, enough to make “it takes practice” misleading when unqualified. This is as distinct both from a lot of more specific skills which still have a major social practice component (martial arts, ensemble music) and from skills where solitary practice gives you the bulk of the signal (mathematics, maybe running?).
Also, if the social skills you’re trying to learn involve something like class performance in a highly contested social class, people around you will have a more active incentive to make it more difficult. So it can also be adversarial practice…
Unless wiktionary is among your primary facilitators of procrastination on the internet.
Could someone give some specific examples of both “unwritten rules” and “written rules” of social interaction?
I can’t really think of any. I can’t tell if it’s because I’m just very oblivious or because I’ve internalized them well enough I never have to explicitly think about them.
i think one really bad dynamic in this community is a sort of purity testing about being x risk pilled. it feels like people are constitutionally scared of considering arguments that feel like they’re arguing that things might be fine in any way. tbc I’ve definitely been guilty of this in the past, and probably still now to some extent, but i think it’s bad. maybe there is some conflict theory reason why you should orient yourself this way towards people who have an ulterior motive, but I’m like among the most x risk pilled people out there and i still find this happens when i try to discuss x risk with people.
It doesn’t feel that way to me fwiw. I feel like lots of people I know including myself have made arguments that things might be fine. For example the salty, cynical, John Wentworth wrote “Alignment by Default.” Also, see AI 2027 Slowdown ending.
Now, if xriskpilled means: You think there’s a >5% chance of literal extinction (or similarly bad outcomes) due to misaligned AIs, then yeah I think I do kinda judge people who aren’t xriskpilled in that sense, because I think believing the chance is <5% is extremely unjustified once you know a decent amount about the situation and the evidence.
I think part of the reason for people leaning so heavily on x-risk arguments is that the alternative (that they created machines that are somewhat uniquely destructive to labor, are incredible for surveillance, destroyed the free software movement, and ruin something that absolutely all of them love, the internet, out of a mistaken belief that if they didn’t, a paperclip maximization machine would be conjured into existence by someone else, and it’s beneficial to be the first one to make the paperclip maximization machine) requires them to stare in the face the rather dreadful implications of their actions, and figure out a way to salvage them.
When operating in an emergency mode, you sort of get to ignore ethics of anything immediate in favor of the hypothetical ethics of something that may or may not come. That’s why Anthropic allowed itself to be integrated into Project Maven (responsible for thousands of deaths until it became a PR issue), why they automatically offer all of their models to the NSA, why they constantly push for unethical sanctions on China, why they’re paying billions to Musk, and why they turned Bun into a security nightmare: Because you gotta go fast, or you’ll lose. And losing is bad. For x-risk reasons. They’ve created a maximizer for GPUs and market valuation instead of paperclips.
X-risk is poison for consequentialists; a complete hypothetical encouraging them to create the ultimate puppy-kicking machine.
Suppose that the AIs are actually as obedient to their creators as Cora, but suck at deeply understanding the world. Unless drastic measures are taken, I would expect power to concentrate in the hands of a CEO, the Oversight Committee or the oligarchy where socioeconomic advancement is nearly extinct and the rest of mankind[1] receives, at best, a tiny sliver of resources. Then what would prevent Anthropic from trying to either prevent this or take over the world for themselves?
And that’s ignoring possibilities like “North Korea lets a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials who continue to preside over an AI economy and robot military”.
Centralization is the natural conclusion, yes. The future that frontier labs are pushing toward is one of centralization of power, into the hands of people who have demonstrated little but carelessness. Anthropic’s early talent pool was seeded with Alameda Research (its largest investor circa-2023) employees; its Series B was led by a person who was deeply wrong about risk calculation and sloppy in execution with stolen money.
This is not existential risk, this is just risk. A risk that these companies (and governments, at their behest) are walking into wholeheartedly, knowingly, and happily. These companies are sacrificing the future using the language of utopianism while repeatedly advocating for centralization of power, which will only cement and calcify power structures.
If these people were acting in your best interest, they wouldn’t have been taking the same sloppy, anti-moral, deeply-flawed actions at Alameda, in an entirely different domain than safety research. They wouldn’t be repeatedly changing their RSP, removing all of its teeth. They wouldn’t be doing what they’re doing. They are telling you the world is ending to acquire casus belli, not to try and make it possible to stop the end of the world.
Even assuming ASI never happens, what exists and will come to exist in the next ten years will absolutely further cement existing power structures. It seems likely that the tools they have created up to this point are capable enough of dramatic increases in dystopian surveillance and government capability, and have been for a while.
I’m not entirely against x-risk argumentation; generally, I really don’t care either way (no stake in it; no influence on it). However, it absolutely is used to take frequently-unethical actions, and is self-justifying. “We, company, have to do thing, because we, company, are probably going to destroy the world with our current course of action. We’re aiming to, anyway.” Getting people to buy into their frame is part of their pitch deck. I think, primarily, they are careless people, who wanted to get justification to ignore law and ethics.
To quote an old George Rosenfeld post on the EA Forum:
Long-termism allows for sloppy thinking, financial incentives encourage sloppier thinking, and x-risk arguments tend toward that for this reason. There’s a reason so little is said about the concrete harms happening today by these companies in rationalist circles, and everything is said about x-risk: x-risk doesn’t demand you to change your behavior. It encourages you to reinforce it.
Yeah I don’t believe AI superintelligence generally is an xrisk, and definitely don’t believe LLMs specifically are. I’m a doomer in other ways. I think the knee jerk response from the community is ‘why are you even here?’
I’ve only seen “doomer” in the context of x-risk from AI. If you don’t thnk it’s an x-risk, what is the doom you’re a doomer about?
Pollution, exhaustion of low entropy resources, and insufficient human capacity to manage coordination challenges at planetary scale. My pdoom drops as LLMs get better.
how valuable are formalized frameworks for safety, risk assessment, etc in other (non-AI) fields?
i’m temperamentally predisposed to find that my eyes glaze over whenever i see some kind of formalized risk management framework, which usually comes with a 2010s style powerpoint diagram—see below for a random example i grabbed from google images:
am i being unfair to these things? are they actually super effective for avoiding failures in other domains? or are they just useful for CYA and busywork and mostly just say common sense in an institutionally legible way?
one reason i care is because i feel some level of instinctive dislike for some AI safety/governance frameworks because they give me this vibe. but it’s useful to figure out if i’m being unfairly judgemental, or if these really are slop.
The actual details of it contain some wise non-obvious aspects, along with elegant concepts that are generalizations of things that the safety community has been touching at. For instance the safety community has been conflating in “risk thresholds” two cleanly distinct notions in risk management of Key Risk Indicators (actual measurements of risk) and risk tolerance (your quantified preference for risk, independent from any test), which has caused a lot of confusion and hidden unreasonable choices for quite a bit.
People have also been conflating risk modeling and evals for quite a long time, because the AI field was built around evals. Once you have the clear view that evals are just an operationalization of risk models, it becomes more clear that you can actually do most of your risk modeling earlier in the lifecycle (i.e. before even touching a neural net), before having built a single eval & that evals are downstream of this.
You can see more of this genre of concepts applied to frontier AI here: https://arxiv.org/pdf/2502.06656
Here’s a graph with a few of the concepts in there
to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues.
For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.
It looks like slop, but also diagrams like these are literally used for a single slide in powerpoint demonstration, so I wouldn’t have too high of expectations for them. Hopefully the rest of whatever powerpoint that happens to belongs to contains good material too.
fwiw it’s less about the literal diagrams and more about the entire associated vibe. like usually the entire rest of the powerpoint also sounds like slop.
i find it very interesting how becoming familiar with a place makes it feel so different, and yet it’s recognizably the same place. especially when the first time you see the place you just think of it as a disconnected location floating around somewhere in abstract locationspace, but then slowly discover its relationship to other locations. sometimes this feels exhilarating, because the world feels more cohesive and whole and familiar. other times it’s melancholy, because it feels like the magic has gone.
This is a big part of the appeal of travel for me. I feel like time speeds up and and presence drops a lot when my brain has a sufficiently good representation of the location I’m in.
When I stay in the same place too long I find myself thinking the same thoughts, using the same words over and over again. I wonder if it would be possible to use an app to intentionally introduce random novel sounds, words, images, etc. into one’s life as a sort of brain refresher, but I suspect that nothing comes close to travel for this.
This happens with concepts too.
I think another reason why people procrastinate is that it makes each minute spent right before the deadline both obviously high value on net and resulting in immediate payoff. this makes the decision to put in effort in each moment really easy—obviously it makes sense to spend a minute working on something that will make a big impact on tomorrow. whereas each minute long before the deadline has longer time till payoff, and if you already put in a ton of work early on, then the minutes right before the deadline have lower marginal value because of diminishing returns. so this creates a perverse incentive to end-load the effort
If the procrastinator is a perfectionist who would otherwise work far past the point of diminishing returns, postponing until the last minute is a way of timeboxing. If you know you would spend 5 hrs on something that you could get to 90% (of your standard) in 1 hr, better to start 1 hr before the deadline, or you’ll end up wasting 4 hrs.
Do you think these procrastinators are making a conscious choice, or more that they’re driven by a panic that only sets in at the last minute?
i mean, it’s a huge category of people, so for some box 1 and for some box 2.
for me it’s box 2. i was bewildered to realize that i knew quite a lot of box 1′ers and we had very different reasons for hurting ourselves. (i transitioned to become more of a box 1′er later)
i have a theory that a lot of people go through “emotional healing” only to end up as still-broken people who now have “being healed” as a big part of their identity that lets them feel superior to other people who are less far along the chosen path than they are. ofc, there are also people who are actually emotionally in touch. an easy way to distinguish such people is to notice how you feel around them. do they make you feel more calm and grounded, or do they make you unhappy and defensive?
this does not so much match my personal experience. maybe it’s some kind of selection effect, but then, that kind of person sounds really annoying, so I would recommend my selection process, whatever it is
For some people becoming less of a people pleaser is an important part of their healing.
i think there’s a difference between being a people pleaser and being a calming force for those around you. some people exude a sense of calmness that doesn’t require their even uttering a single word.
The last part surprised me because in general I don’t think of heard of considering whether other people make you “feel more calm and grounded [vs] unhappy and defensive” as an insight into whether those others are psychologically well. What is the theory of the connection there?
emotions are somewhat contagious. someone who is very at peace will also make people around them at peace. someone who is unhappy will make people around them unhappy. someone who wants to help people out of a deep love for all people will make those around them feel differently than someone who wants to help people as a way to affirm their own superiority.
Thank you, this fits my model, although I have never put it in words this way. Now I’d say that some people just profess being healthy, while others are healthy. (Maybe even: “if you need to tell others that you are emotionally healthy, you are probably not”. With some caveats, e.g. if they explicitly asked you, and you answered.)
Uhm, a third option, what if they found a coping mechanism? Not the same as not having the problem at all, but also not the same as suffering helplessly. There will be some inflexibility, but the benefits are real.
And of course all of this is on a scale.
ilya’s AGI predictions circa 2017 (Musk v. Altman, Dkt. 379-40):
Zilllis, in email to Musk about OpenAI (Id., Dkt 379-45):
*Zilis
TechnicallyDirectionally correct: The best kind of correct.Though he seems to have overestimated the difficulty of the Turing test relative to e.g. robotics. Not clear he’s even directionally correct about robotics? Unless AGIs solve it for us :)
SSI’s compute spend is certainly a bet in this direction!
one big problem with using LMs too much imo is that they are dumb and catastrophically wrong about things a lot, but they are very pleasant to talk to, project confidence and knowledgeability, and reply to messages faster than 99.99% of people. these things are more easily noticeable than subtle falsehood, and reinforce a reflex of asking the model more and more. it’s very analogous to twitter soundbites vs reading long form writing and how that eroded epistemics.
hotter take: the extent to which one finds current LMs smart is probably correlated with how much one is swayed by good vibes from their interlocutor as opposed to the substance of the argument (ofc conditional on the model actually giving good vibes, which varies from person to person. I personally never liked chatgpt vibes until I wrote a big system prompt)
Up for sharing your system prompt?
it’s kind of haphazard and I have no reason to believe I’m better at prompting than anyone else. the broad strokes are I tell it to:
use lowercase
not use emojis
be concise, explain at bird’s eye level
don’t sugar cost things
not be too professional/formal; use some IRC/twitter slang without overdoing it
speak as if it’s a conversation over a dinner table between two close friends who are also technical experts
don’t dumb things down but also don’t use unnecessary jargon
I’ve also been trying to get it to use CS/ML analogies when it would make things clearer, much the same way people on LW would do, but it’s been hard to get the model to do it in a natural, non cringe way. rn it overdoes it and makes lots of very forced and not insightful analogies despite my attempts to explain to it
learning thread for taking notes on things as i learn them (in public so hopefully other people can get value out of it)
VAEs:
a normal autoencoder decodes single latents z to single images (or whatever other kind of data) x, and also encodes single images x to single latents z.
with VAEs, we want our decoder (p(x|z)) to take single latents z and output a distribution over x’s. for simplicity we generally declare that this distribution is a gaussian with identity covariance, and we have our decoder output a single x value that is the mean of the gaussian.
because each x can be produced by multiple z’s, to run this backwards you also need a distribution of z’s for each single x. we call the ideal encoder p(z|x) - the thing that would perfectly invert our decoder p(x|z). unfortunately, we obviously don’t have access to this thing. so we have to train an encoder network q(z|x) to approximate it. to make our encoder output a distribution, we have it output a mean vector and a stddev vector for a gaussian. at runtime we sample a random vector eps ~ N(0, 1) and multiply it by the mean and stddev vectors to get an N(mu, std).
to train this thing, we would like to optimize the following loss function:
-log p(x) + KL(q(z|x)||p(z|x))
where the terms optimize the likelihood (how good is the VAE at modelling data, assuming we have access to the perfect z distribution) and the quality of our encoder (how good is our q(z|x) at approximating p(z|x)). unfortunately, neither term is tractable—the former requires marginalizing over z, which is intractable, and the latter requires p(z|x) which we also don’t have access to. however, it turns out that the following is mathematically equivalent and is tractable:
-E z~q(z|x) [log p(x|z)] + KL(q(z|x)||p(z))
the former term is just the likelihood of the real data under the decoder distribution given z drawn from the encoder distribution (which happens to be exactly equivalent to the MSE, because it’s the log of gaussian pdf). the latter term can be computed analytically, because both distributions are gaussians with known mean and std. (the distribution p is determined in part by the decoder p(x|z), but that doesn’t pin down the entire distribution; we still have a degree of freedom in how we pick p(z). so we typically declare by fiat that p(z) is a N(0, 1) gaussian. then, p(z|x) is implied to be equal to p(x|z) p(z) / sum z’ p(x|z’) p(z’))
traveling through Europe, looking out the window, and seeing the national flag flying next to the flag of the EU fills me with a strange feeling. this isn’t an original thought at all, but still: it’s really crazy that just 50 years ago Europe was divided by the iron curtain, and that people would have to go to insane lengths and risk their lives to get across that border; and that less than 100 years ago all of these countries were at war with each other, and had been at war on and off for centuries with ever shifting alliances and boundaries.
When I travel to Vienna by train, sometimes I remember that I am crossing a line that in my childhood was guarded by soldiers ordered to kill everyone who tried to escape the socialist paradise.
I know this is one of the universal human experiences, but I keep getting unpleasantly reminded by the passage of time.
pleasant “recent” memories are already one or two years ago. they feel recent enough that I still stubbornly believe my recollection is accurate, but in reality they’re far enough away from the present day for the sepia tint of nostalgia to creep in and for all the frustrations and sorrows to be forgotten. no wonder it’s so hard for the present to compete with the “recent” past.
I sometimes ask myself when I first met one my “recent” friends, and am startled to realize that I met them 2 or 3 or 4 years ago. “oh yeah, I met him ‘recently’ at that one party FOUR FUCKING YEARS AGO.”
I still can’t wrap my mind around the fact that Iater this year I will have been at openai for 5 years. I first started following ML about 10 years ago, so I will soon have spent more time at openai than I have spent reading openai papers from the outside, and thinking of openai as a far away citadel in a different universe.
where did all the time go?
I suspect that you may have it worse than average person, because you are intellectual worker and intellectual achievements tend to be unsatisfying, in a sense that when you discover something, you feel “eureka” moment and then you feel like you knew it forever. It’s even worse with intellectual process, because, in my experience, you forget all fruitless branches and backtracking of thought process almost like it was a dream.
when people say that (prescription) amphetamines “borrow from the future”, is there strong evidence on this? with Ozempic we’ve observed that people are heavily biased against things that feel like a free win, so the tradeoff narrative is memetically fit. distribution shift from ancestral environment means algernon need not apply
(I’m a psychiatry resident. I also have ADHD and take prescription stimulants infrequently)
The answer is: not really, or at least not in a meaningful sense. You aren’t permanently losing anything, your brain or your wellbeing isn’t being burnt out like a GPU running on an unstable OC:
Prescription stimulants often have unpleasant comedowns once they wear off. You might feel tired and burned out. They often come with increased anxiety and jitteriness.
Sleep is negatively affected, you get less REM sleep, and you might experience rebound hypersomnia on days you’re not on the drug.
There are minor and usually unimportant elevations in blood pressure.
While focus and stamina are improved, creativity and cognitive flexibility suffer. I’ve read claims that it also makes people overconfident, which strikes me as prima facie plausible. Ever seen how people behave after using coke?
Animal studies show oxidative damage to the brain, but this has not been demonstrated in humans on therapeutic doses, even if used for performance enhancement in those who don’t meet the normal criteria for ADHD.
If started at a young age, growth velocity could be slightly hampered, mostly because of appetite suppression.
Dependence or addiction liability, while is low but not nil at therapeutic doses.
In my opinion, all of these are inconsequential, and the side effects vanish quickly on cessation. I certainly need the meds more than the average Joe, but I don’t think even neurotypical people using it as a PED are at much risk, as long as they keep the doses within reason. I’m of the opinion that current medical guidelines are far too conservative about stimulants, but in practice, they’re easily circumvented.
On a more speculative note:
I’m of the opinion that the ancestral environment didn’t demand that our ancestors be always switched on. Attention and focus were useful during activities like hunting and foraging, but there was immense amounts of forced downtime and slack. Even if you have less than ideal levels of conscientiousness or executive function, gnawing hunger or a desire for shelter probably kept you doing the right thing.
With agriculture, this began to change dramatically. A lot of the previous highly tight reward and feedback loop ends up deferred. A farmer can do a lot more to prepare for the future and hedge his bets than a hunter gatherer can. And modernity rewards such an approach even more.
i haven’t looked into this deeply but how strong is the evidence for (lack of) oxidative damage? the SSC post is somewhat unsatisfying in that it doesn’t really consider outcomes other than literal Parkinson’s, and just kind of says the animal model results are confusion.
This statement really surprises me?
On average you get around 500-1000% more dopamine in the system as a consequence of using amphetamines and from a standard neuroscience perspective this is around 3x as much as caffiene for example.
Yes it is not heroin levels but dependency has to be a real concern here from a neuroscience perspective?
Long-term potentiation and return to baseline for the brain should mean that the learned patterns would be relatively hard to unlearn after 6 months of frequent usage?
How good is the studies on longer term behaviour change due to this stuff? I looked into the studies and it seemed like from a shorter term perspective the addiction effects of it were lower than I thought which I found quite interesting.
(dropping a claude research report link here: https://claude.ai/public/artifacts/b10e54df-6616-477f-ac19-fe52b4c9d926)
I think an important caveat here is that the addiction and dependence liability is quite dependent on how you administer it to yourself, the dosage, the specific routes that you’re ingesting it from etc. (which you mention but I think you’re understating some of the dangers of it)
CNS drugs are powerful so yes I think we should still have some limits on this? I think one of the main things that are a bit difficult with them is that it can be hard to perceive the difference that they induce in yourself? Like if you’re on them, you don’t necessarily notice that you have less creativity and awareness, that is not how it feels and so if you’re overusing them or similar you just don’t get that feedback? (based on some modafinil experience & observations from friends)
I did try and make it clear that I’m only talking about therapeutic usage here, and even when off-label or for PED purposes, at therapeutic doses. I apologize for not stressing that even further, since it’s an important distinction to make.
I agree that it’s rather important to use it as prescribed, or if you’re sourcing it outside the medical system, making a strong effort to ensure you take it as would be prescribed (there’s nothing particularly complicated about the dosage, psychiatrists usually start you at the lowest dose, then titrate upwards depending on effect).
The Claude Research report seems fine to me, and I would think it aligns with my claims. The main issue with recreational usage is that a lot of people aren’t trying to be responsible users, or are taking intentionally talking large doses for recreational purposes. That’s more on them than it is on the drug! If you take it within the standard dosage range, the drug itself will not produce much in the way of craving for more.
>CNS drugs are powerful so yes I think we should still have some limits on this?
I am, if not outright libertarian, certainly leaning in that direction. So it depends on what the “limits” actually are. I think that doctors are currently overly risk-averse and conservative about prescribing them, but I don’t think they should be handed out like candy over the counter. I think there’s plenty of room in between that avoids the pitfalls of a maximalist position.
>I think one of the main things that are a bit difficult with them is that it can be hard to perceive the difference that they induce in yourself? Like if you’re on them, you don’t necessarily notice that you have less creativity and awareness, that is not how it feels and so if you’re overusing them or similar you just don’t get that feedback? (based on some modafinil experience & observations from friends)
The effects on creativity are rather minor. I can’t really tell a difference when I’m on them, but I do have ADHD so that might confound things. Some of the best and most creative things I’ve written were while I was on methylphenidate or dextroamphetamine! If you’re using it to make an often monotonous task like programming more palatable, or to improve your ability to study, then I wager the benefits massively outweigh the slight cognitive inflexibility. I don’t think you want stimulants if you’re trying to paint or write poetry, even if they won’t massively handicap you. The effects are subtle, you’re not becoming an automaton.
They definitely do for me- I sleep worse that night, and if I use too frequently I get exhaustion that takes weeks to recover from.
a take I’ve expressed a bunch irl but haven’t written up yet: feature sparsity might be fundamentally the wrong thing for disentangling superposition; circuit sparsity might be more correct to optimize for. in particular, circuit sparsity doesn’t have problems with feature splitting/absorption
Yeah my view is that as long as our features/intermediate variables form human understandable circuits, it doesn’t matter how “atomic” they are.
random fun experiment: accuracy of GPT-4 on “Q: What is 1 + 1 + 1 + 1 + …?\nA:”
blue: highest logprob numerical token
orange: y = x
...I am suddenly really curious what the accuracy of humans on that is.
‘Can you do Addition?’ the White Queen asked. ‘What’s one and one and one and one and one and one and one and one and one and one?’
‘I don’t know,’ said Alice. ‘I lost count.’
Surprisingly, Claude Opus 4.7 thinking still struggles with this, if you tell it not use tools. For N = 199, it answered 200; the CoT is below. It tried to use some groups of ten, gives up, and estimates the answer. It’s curious that this isn’t something the model learned to do in pretraining.
I’m counting the total number of 1s by tracking the initial 1 and then all the “+1” increments that follow. Counting through the sequence of increments to get the total. Continuing through the remaining decades up to 150, each block follows the same pattern of ten increments. I’m working through a long sequence of increments to count the total, breaking it down into manageable chunks to keep track of where I am in the original text. I’m trying to count the total number of “+1“ operations in that string. Let me work through this systematically by identifying the pattern and tallying up how many times it repeats. I need to count the “+1” instances more carefully. The input starts with “1” and then has a series of “+1”s, so the total count is just the number of “+1“s plus one. Let me go through and tally them systematically in groups of ten to get an accurate count. Continuing to count through the ranges… I’m manually counting the “+1” tokens in the input string to get an accurate sum, working through the text systematically since I can’t execute code to parse it directly. I’m counting the occurrences of “+1” in this sequence to get a total. I’m trying to count the “+1”s by breaking the input into segments of 10 to make it manageable, assuming the user likely wrote a round number like 200 total. Let me try a more systematic approach—I’ll scan through the input carefully and count the “+1“ occurrences in groups to get an accurate total. Continuing the count… I’m trying to count the “+1” instances in the input string, but without being able to directly measure the character length, I’m working through it systematically by grouping tokens and attempting to track them in chunks. Let me count the “+1” occurrences more carefully by examining the actual structure. The format appears to be “1“ followed by repeated ” +1” patterns, so I need to figure out how many times that sequence appears before reaching the end marker. Continuing through the chunks, I’m at 40 total after the fourth group, then adding ten more with each subsequent chunk brings me to 50, 60, 70, 80, 90, and I’m partway through the tenth chunk now. Continuing to accumulate the count through each successive chunk, reaching 200 total ones across all 20 chunks. I’m breaking down the string into groups and tallying the “+1” occurrences, but the math keeps shifting depending on how I segment it. Let me just count the “+1“ patterns directly instead of getting caught up in positional grouping—based on the visual length of the input, 200 seems like the most reasonable estimate. I’m continuing to tally the remaining “+1” entries, organizing them into groups of ten to make the count more manageable. Looking at the pattern, I’m seeing roughly 20 complete groups, which would give me around 200 total. Let me verify this by doing one more count to make sure I’m landing on the right number. I’m counting the “1” characters in the input to verify my total of 200, since each one represents a unit in the sum. Let me double-check by scanning through more carefully rather than relying on my initial chunk-based count. I’m continuing to count through the remaining additions, organizing them into groups of ten to keep track of the total. I’m counting the grid of plus-ones manually—20 rows with 10 entries each gives me 200 total, and since the prompt explicitly says not to use tools, I’ll stick with that answer.
This is a cool idea. I wonder how it’s able to do 100, 150, and 200 so well. I also wonder what are the exact locations of the other spikes?
Oh, I see your other graph now. So it just always guesses 100 for everything in the vicinity of 100.
the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.
at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups
I have heard people say this so many times, and it is consistently the opposite of my experience. The random spontaneous conversations at conferences are disproportionately shallow and tend toward the same things which have been discussed to death online already, or toward the things which seem simple enough that everyone thinks they have something to say on the topic. When doing an activity with friends, it’s usually the activity which is novel and/or interesting, while the conversation tends to be shallow and playful and fun but not as substantive as the activity. At work, spontaneous conversations generally had little relevance to the actual things we were/are working on (there are some exceptions, but they’re rarely as high-value as ordinary work).
I think you are possibly better/optimizing more than most others at selecting conferences & events you actually want to do. Even with work, I think many get value out of having those spontaneous conversations because it often shifts what they’re going to do—the number one spontaneous conversation is “what are you working on” or “what have you done so far”, which forces you to re-explain what you’re doing & the reasons for doing it to a skeptical & ignorant audience. My understanding is you and David already do this very often with each other.
I’m very curious if others also find this to be the biggest value-contributor amongst spontaneous conversations. (Also, more generally, I’m curious what kinds of spontaneous conversations people are getting so much value out of.)
One of the directions im currently most excited about (modern control theory through algebraic analysis) I learned about while idly chitchatting with a colleague at lunch about old school cybernetics. We were both confused why it was such a big deal in the 50s and 60s then basically died.
A stranger at the table had overheard our conversation and immediately started ranting to us about the history of cybernetics and modern methods of control theory. Turns out that control theory has developed far beyond whay people did in the 60s but names, techniques, methods have changed and this guy was one of the world experts. I wouldn’t have known to ask him because the guy’s specialization on the face of it had nothing to do with control theory.
I do not find this to be the biggest value-contributor amongst my spontaneous conversations.
I don’t have a good hypothesis for why spontaneous-ish conversations can end up being valuable to me so frequently. I have a vague intuition that it might be an expression of the same phenomenon that makes slack and playfulness in research and internet browsing very valuable for me.
a historical example: the initial theory of CFC ozone depletion was born in part because of a random coffee break conversation at a conference
What would an event optimized for this sort of thing look like?
Unconferences are a thing for this reason
i want someone to make the one true categorization of Types of Guy. MBTI is an ok start, but there are so many things it doesn’t even try to explain. like for example if i see someone has very scrunched up body language and talks very quickly, this correlates very strongly with a bunch of other traits, like talking in conversation with long turn lengths.
There are now 15 competing Types of Guy standards
my theory for why the literature here is kinda terrible is that most people either like people, in which case they mostly just develop an intuitive model of people; or they like systematizing, in which case they become obsessed with trains. few people are systematizing but obsessed with people.
Type of guy who has not encountered sufficient girl autism
where ? asking for a friend
I’d point at myself but I’ve only complained about the problem :p https://rachelshu.com/2024/03/08/oceans-five.html
Deb Tannen is specifically who I had in mind! She’s most famous for her book on male vs female communication but if you read her other works such as on parent-child and friend-friend communication styles you get a good sense of the breadth of her framework.
Spencer Greenberg (who’s on here) also has a pretty substantial body of work at https://www.clearerthinking.org/
@Dawn Drescher also comes to mind, with her explorations of narcissism and psychopathy (including typologies of those two).
This discussion also reminds me of the excellent blog post “Dating Men In The Bay Area” which goes over several Types Of Guy.
In NLP you would call the person with “very scrunched up body language and talks very quickly” visually dominant. They don’t feel into the words they want to say and thus speak faster. They don’t feel into their body so that the make the adjustments to their body that releases tension and are scrunched up.
There’s plenty of things you can criticize about that model but it does exist.
you want fiction! i’m no expert, but somewhere out there there’s a 150 page treatise on “guy with very scrunched up body language who talks very quickly” including everything you could want to know about his job, his friends and family, his self-doubts, and the details of the broader socio-economic forces that made him like that.
Socionics is kinda MBTI-adjacent, but has more interesting, fleshed-out structure, with systematic predictions, e.g., predicting synchronies or conflicts between personality types that are related in some specific way.
Do you want a compact description, e.g., some small-ish number of naturally discretizable factors, with perhaps many combinations being very sparsely populated, and that would at the same time comprehensively 80⁄20 a person’s personality description? I would not expect a comprehensive theory of human personality to be so neat.
The personality project and others studying individual differences psychology are the people to look toward for this.
Type of guy obsessed with guys here. (Enter my shameless plug.)
Are you interested in fictional characters? Harry Potter, Star Wars, LOTR, ATLA, MCU, anime, etc? I have a system which is the easiest to illustrate with fictional characters, but I believe it applies to real people too.[1]
Biggest downsides:
I don’t know how to formalize the system to make it falsifiable. The system is partially undefined even from an informal POV.
There are countless characters and people not fitting any type I know. Though the general methodology of my system can be applied anyway.
Getting into the system requires some time commitment.
The biggest upside:
The system is philosophically interesting. I believe it’s not similar to any other classification or any other model (of anything).
Core Tenets
Types are “absolutely different”, “different in every property”. Like Bouba and Kiki. Or like Indra’s net, if every jewel was unique.
Each type maximizes a unique theme along different dimensions. Each type also maximizes unique sub-themes (related to the main theme) along different dimensions.
The dimensions in question are usually those:
A. Life trajectory, fate.
B. Personal relationships. (Loved ones, family, close friends.)
C. Social status.
CA. Physical status. (Physical powers, physical traumas, disabilities.)
D. Psychology.
E. Outward attitude.
Themes and sub-themes are defined over lower-level facts about characters. When analyzing a character, you should first focus on the most abstract and unique facts about them. Facts which are realized in maximally radical, crisp, objective ways.
...That’s all, that’s the entire methodology. Keep in mind that themes are often too vague without sub-themes. And sub-themes are often too vague too if they are not maximized across different dimensions. Also, vague (sub)themes become less vague when you pay attention to the strongest ways to implement them.
Most classifications are “combinatorial” (any combination of some basic features defines a type) or “geometric” (there’s some relatively simple space and each point / region defines a type).[2] My classification is neither. That’s one of the main reasons I consider it very unusual.
I’ll give a couple of examples. 3 types and 50+ characters. Types of Snape, Draco and Sirius.
Fragility (example 1)
One theme (fragility) and 3 sub-themes (irreversibility, walling off, dramatism/sentimentality). 19+ examples.
Fragility
Theme of the first archetype is fragility (stubbornness, instability, internal tension, irreversible changes). Often “fragility” is used to describe things which are very easy to break, but I’m talking about very tough things which would rather break than bend. A perfect example of this theme would have the following properties:
A. Irreversible events which change the character forever.
B. Personal relationships filled with death, trauma, guilt.
C. Unstable social status. Isolation despite great connections.
CA. Physical vulnerability despite great power.
D. Great mental strength mixed with crashouts, shocks, traumas.
E. Cold, harsh attitude, mixed with sentimentality. Stoic but easily triggered.
In other words, fragile life, fragile relationships, fragile social and “physical” status, fragile mentality, fragile attitude.
Examples:
Severus Snape. A and B. Guilty in Lily’s death. That radically changed his life. C. Works as a triple agent, which makes his social position very fragile (either side can start seeing him as the enemy). He’s one of the most well-connected yet the most reclusive HP characters. CA. Very powerful, but under the thumb of way more powerful Voldemort and Dumbledore. D and E. Harsh and cold, but soft when it comes to Lily. Calm but petty, Harry and Sirius can easily trigger him.
Kakashi from Naruto. A and B. Sorta guilty in his friend’s (Obito) death. Accidentally killed his remaining friend (Rin killed herself by jumping in the way) which he promised Obito to protect. He never got over this. 49 seconds clip. Before all this, his famous father died by suicide after getting dishonoured. C. A famous elite ninja, but a loner. CA. IIRC Kakashi is pretty unique in terms of ending up in physically vulnerable situations, he’s strong but is constantly pitted against much stronger opponents. One time he got tortured for 72 hours inside an illusion. E. Both harsh and soft.
Elsa from Frozen. A and B. Almost killed her sister Anna. That forced her to live in isolation and be distant from Anna. C. Was hiding her abilities to not lose her royal status / become a monster in the eyes of people. “Make one wrong move and everyone will know.” CA. Despite her godlike powers she was captured and jailed. E. Cold and warm.
Lelouch from CodeGeass. B. Accidentally and deliberately killed many people and blamed himself for it. His relationships are filled with death, trauma and guilt. For example: Lelouch and Euphemia part 1 (~1 minute), part 2 (2:09 − 4:18, ~2 minutes), Lelouch and Shirley (~2 minutes). C. Due to double identity and secrets his strong social status (royalty / leader of a terrorist group) is fragile. D and E. Cold and sentimental. Mentally strong and crashing out.
Scrooge from DuckTales 2017. A. Blamed himself for the presumed death of his niece Della. Scrooge’s past (~5 minutes). B. Lost connection to his family after that. C. Despite being the richest duck, super well-connected, he’s reclusive. Scrooge after the kids leave again (1 minute is enough). E. Harsh and soft. Easily triggered. His average behavior (2 minutes is enough).
Amon from Legend of Korra. A and B. Went through abuse as a child, that changed him forever and pushed him to separate from his brother. C. He’s a leader of a terrorist group, but his position is unstable because he’s lying to them. D. Brutal, but deep down there’s a soft spot for his brother. Amon’s backstory (till 7:21), Amon reunites with his brother (~1 minute).
Jeong Jeong from ATLA. “Once an honored admiral of the Fire Nation Navy, he became disillusioned with his nation and firebending as a whole and defected from the military. He relocated to the Earth Kingdom, where he encountered a small tribe who became his travel companions.” He lost his status, isolated and got psychologically broken a bit. “Fire brings only destruction and pain. It forces those of us burdened with its care to walk a razor’s edge between humanity and savagery. Eventually, we are torn apart.” Calm, but irritable. 2 minutes clip
Nicholas Van Orton from The Game 1997. B. His personal relationships are not doing great… father died by suicide, brother and wife are estranged. C. Rich but reclusive. About to lose everything. D and E. Cold but vulnerable. Traumatized.
Rust Cohle from True Detective. “Character overview” describes it better than I can. A. After his daughter was hit by a car and killed, his life spiraled out of control and he became deeply cynical. B. His relationship with Marty is rocky. “You total shit. You know, this is part on you, you know? You can’t just slide into my life, creating tensions, judging me...” C. Gifted, connected to many people (just due to his job), but isolated. D and E. Tough, but vulnerable and troubled. Rust scenes.
Darth Vader from SW. A. He thought he killed his wife and was severely mutilated. Which completed his descent to the dark side. C and CA. He’s second in authority and power to Palpatine, but Palpatine could easily dispose of him (which Palps did plan to do). D. He kept a soft spot for Luke, that’s why he sacrifices himself at the end.
Gandalf from LOTR. C. Very well-connected, but also a lonely homeless wanderer. His political power is limited because he needs people to make good decisions out of their own free will, that’s his secret mission from God. Faces defiance and reluctance. CA. Divine and powerful, but physically vulnerable and gets hurt a lot (by Saruman, Barlog). E. Both harsh and soft. Super stressed out. Easily triggered.
Doctor Strange from MCU. A. Abruptly loses his career after a car crash injury. C. Very well-connected but isolated. E. Both cold and soft. Calm, but irritable. tony and dr strange arguing for 2 minutes straight.
Goblin Slayer. A. Sole survivor of a goblin attack on his village, saw his sister raped and killed by goblins. C. Has many connections, but is sort of a loner. CA. He’s a goblin killing machine, but he’s only human with no special powers. His job is extremely risky and he goes all out every time, gets hurt often. D. His mind is broken but rigid, killing goblins is his entire identity. E. Cold, but polite and kind. Versus goblin champion (5 minutes is enough), he and Cow Girl (2 minutes), asks for help (6 minutes), cries (2 minutes).
Cigarette Smoking Man from the earlier X-Files. C. Extremely powerful, yet his status is shaky. Often does the dirty work himself, walks around without bodyguards and works with equally powerful individuals (the Syndicate). Very lonely. D. Evil, but cares about Mulder and Scully… sometimes, sort of. E. Rarely emotes, yet many things touch him emotionally. Musings of a Cigarette Smoking Man shows different sides of him. Xmas job (5 minutes), Xmas meeting (3 minutes), episode ending (5 minutes).
Zeno from Hunter x Hunter. C. Powerful, but old and has a very risky job (assasinations). D and E. A single accidental murder can hurt his big pride (he’s a veteran assassin in a family of assassins). Harsh, but not entirely. Him fighting (4 minutes), a civillian gets harmed (2 minutes), gets bothered (2 minutes).
Sir Leigh Teabing from The Da Vinci Code. HEAVY SPOILERS. A. Wants to destroy the Church, free humanity of dogma. C. Wealthy academic, but gets criminal. D and E. Dramatic. Kind/nice but evil/rutheless. Another antihero revolutionary. Greatest coverup in human history.
Patroclus from Hades 2020. B. Feels betrayed by his lover. “He ever tell you of the war in which we fought? I fought, I should say. He could have ended it. But he refused to take up arms. (...) He shared our cause! So what if it was not completely his? If not for his stubbornness, I… well… I would have come here just a bit later, I think.” (c.) C. Lives in Elysium (paradise), but sits alone by the Lethe. D. Really devastated. E. Harsh but soft. patrochilles dialogue & story.
Peter Capaldi’s Doctor (12th) from Doctor Who. C. Sorta like Gandalf—powerful, but vulnerable and depending on others. D. Troubled, hates himself. E. Harsh but soft. Easily agitated. War Speech (4 minutes), Go to Hell (2 minutes), Where I Fall (4 minutes).
Real life example:
Bobby Fischer. A. Became the World Chess Champion, then stopped playing official chess for 20 years. Isolated himself and went mad. C. Before his downfall, was a lonely superstar. After his downfall, became a recluse with good connections. Many were helping him (e.g. Polgar family, Leko, Spassky, Lombardi). He got in trouble with the US government, got detained in Japan, almost deported to the US, but granted a citizenship in Iceland. D. He was very tough mentally (to fight alone against the world), but very troubled and irritable. “Fischer fits ideologically into the context of the Cold War era: a lone American genius challenges the Soviet chess machine and defeats it.” (c.) Kasparov E. He was known to be cold and rude in general, but nice to some in private. For example, in 1962 Tal fell ill during a tournament and Fischer was the only player who visited him.
Vladimir Mayakovsky. B. Had turbulent personal relationships. C. Both a useful and a problematic figure for the soviets. D. Was tough, but troubled. Died by suicide. E. Harsh, but sensitive.
Isaac Newton. Maybe. B. Father died, abandoned by mother, disliked stepfather. C. Powerful but reclusive. Abused his power in personal conflicts. C. Used disguise to find and prosecute counterfeiters (many where killed). CA. Prioritized work over health, D. Suffered a nervous breakdown, blamed friends for a conspiracy. E. Secretive, neurotic, paranoid, vindictive.
Semi-examples:
Carl Fredricksen from Up.
Waternoose from Monsters Inc. B. His friendship with Sally turned out to be very fragile. C. His social status is great. CEO of the company which was in his family for 3 generations. But declining energy production and publicity problems threaten everything. E. Friendly and harsh.
??? from Disco Elysium. Very unique game, I recommend at least watching What Was Disco Elysium Actually About? HEAVY SPOILERS. A. Deserted his comrades. All killed. B. Unrequited love. C. Isolated from the city, but always watching. CA. Fit but frail. D. Decades of survivor’s guilt, shame, bitterness. Still capable of “love”. Another “antihero” revolutionary.
See below. Carl is not a drama-queen (the only grandiose thing about him is the flying house). Waternoose—not enough drama and reclusiveness.
Irreversibility
Fragile!pain is about irreversible consequences.
A. The source of the pain is concentrated in the past, in a single irreversible event or a couple of specific irreversible events.
B. It irreversibly changes personal relationships.
C. It irreversibly changes character’s social status.
D. Character dwells on the tragedy instead of trying to move on.
E. Character’s attitude is strongly shaped by the tragedy. Or strongly emphasizes its consequences.
Examples: Snape, Kakashi, Elsa (for a time), Scrooge 2017 (for a time), Amon, Jeong Jeong, Rust Cohle, Doctor Strange (sort of), Darth Vader, Goblin Slayer. Lelouch is a similar drama queen.
Walling Off
Another sub-theme is “deliberately building a wall between yourself and others”.
Some examples:
Severus Snape. A. Obsessed with Lily, closed to new romantic relationships. C. A loner in general. Partially due to his spy job. CA. Master at closing off his mind (Occlumency). E. His abrasiveness is meant to put off.
Kakashi from Naruto. C. Behaves reclusive. D. Dwells on his losses instead of trying to move on. E. Looks detached.
Elsa from Frozen. B and C. Her secret forces her to be distant from her sister and other people. E. Cold.
Lelouch from CodeGeass. B and C and D. His true intentions and personality are hidden even from loved ones. Keeps his burdens to himself.
Amon from The Legend of Korra. Cut off his family, buried his emotions (towards his brother), hid his past and abilities.
Dramatism / Sentimentality
Another sub-theme is “dramatism” (excess, grandiosity, making everything more serious and emotional) and “sentimentality” (nostalgia about the past, holding on to personal connections, being dramatic in a vulnerable way, awe about the world or history). Dramatism without sentimentality would be Thanos.
Severus Snape. Dramatic love, excessive grudges, dramatic role (playing a villain till death).
Lelouch from CodeGeass. Childhood tragedy, dramatic style (e.g. playing the villain and punishing himself), dramatic thoughts/feelings, grandiose goals.
Scrooge McDuck. Excessive wealth, legendary adventures, radical attempts to find Della.
Amon from Legend of Korra. Personal drama, dramatic death, grandiose goal, showy methods.
Nicholas Van Orton from The Game 1997. Childhood tragedy, excessive wealth, insane “game”.
Elsa, Darth Vader, Gandalf, Goblin Slayer, Cigarette Smoking Man are other drama queens.
A music band example:
I Like Trains
On average, I Like Trains songs contain fragility, dramatism and sentimentality.
A Rook House for Bobby. Genius. Personal, social and psychological fragility. Dramatism (persecution, cold war) and sentimentality. “And if they find me / Let them indite me / I just don’t care any more (...) They’ve pushed me too far, too far (...) All this talk of war / But it’s only a game (...) All I ever wanted to do is play chess with you”
Terra Nova. Genius. Personal, physical and social fragility. Dramatism and sentimentality. “How could I / Have led these men to their demise? / And they just follow / Exploration’s last great prize / It wasn’t mine (...) And more’s the shame / You will remember my name”
A Father’s Son. Genius. Dead dad speech. “Pry these words from my cold dead lips / Home is where the heart is (...) I will occupy the space between / What you say and what you mean / And you say / That the devil’s in the detail / And if that’s true / When he is through with me he will / Sure as hell be after you”
Beacons. Genius. SPOILERS! A societal collapse and murder. The lyrics don’t follow the story 1-1, but mention some societal and personal fallout.
A Divorce Before Marriage. Genius. Abstract song about a breakdown of a dream, very dramatic/sentimental. “The echo of success / In the beating of a heart / Will tear your life apart / And your forgotten dreams / As futile as they seem / Gave you hope / Now that hope is not enough / All the people that you meet / Are as hollow as they seem (...) Delusions of grandeur / A divorce before marriage / An open wound”
Death of an Idealist. Genius. Social and psychological breakdown. “This is a breakdown / You are a cancer / This will be the death of me / I’ll never live this down / You will never / An amazing fall from grace / No more or less than I deserved / A gentleman, a liar and a cheat”
The Voice of Reason. Genius. “It took a blow to the head / And I realized what I / Was put on this earth to achieve / It showed me the way / So make an example of me / Hang me up from the tallest tree / Make an example of me / This is not the voice of reason / It’s the cross I was chosen to bear”. About James Hadfield.
Liminality (example 2)
One theme (liminality) and 3 sub-themes (ego, entanglement, devaluation). 18+ examples.
Liminality
Theme of the next archetype is liminality (transformation, transition, detachment, isolation, being stuck). A perfect example of this theme would have the following properties:
A. Being stuck in a place or some neverending process. Living in a transforming world.
B. Personal relationship going through or causing a major transformation.
C Going through the transformation of social status. Being alien to people around.
CA. Physical transformation.
D. Psychological transformation. Being mentally stuck in a weird/uncertain place.
E. An attitude which mixes strong emotions and absence of emotions.
In other words, liminal life, liminal relationships, liminal social and “physical” status, liminal mentality, liminal attitude.
Examples:
Zuko from ATLA. A. Stuck in a neverending goose chase. B. His relationship with Aang goes from rivalry to friendship. C. A banished prince hoping to restore his honour. D. Becomes really conflicted about Good and Evil and his Purpose. Changes alignment. E. Cocky and impatient, but stoic. Zuko’s Complete Timeline (up to 11:00).
CC from Code Geass. A. Stuck in this world, unable to die. Stuck seeking someone capable and willing to kill her. B. The development of her relationship with Lelouch is about finding the will to live. C. An immortal witch stuck in a world she doesn’t belong to, getting tortured and experimented on. Helps the revolution she doesn’t care about. D. Her wish is to die. Her existence is just lingering. E. Both brazen and unemotional. You can watch 4 minutes (5:26 − 11:17) to get her attitude.
Gaara from Naruto. B. His relationship with Naruto is about him changing moral alignment. C. He’s used as a tool in a war he doesn’t care about. CA. He’s always on the verge of losing control and transforming into Shukaku. D and E. Gaara is in a severe distress, due to childhood traumas and internal fight with Shukaku. He has no attachment to anything in this world, his only goal is “to kill to assert his own existence”. Cocky and unemotional, calm and unstable. Sasuke vs Gaara, 7 minutes.
Lee Woo-jin from Oldboy. SPOILERS. B. His “relationship” with Daesu is about transforming Daesu’s life. CA. Contrasts with all other characters (criminals and poors). D. Lives only to wait for revenge and death. E. Cocky and unemotional. Ending scene (4 minutes), more ending (3 minutes).
Alex from Clockwork Orange. A. Most of the movie he’s stuck in a weird journey without a home. B. His relationship with droogs and parents transforms radically. C. Gets jailed, turns into a subject of experimentation, loses his home. His relationship with society transforms 180 degrees. D. Most of the movie he spends in a weird reprogrammed state, unable to do evil, but not truly good. E. Both cocky and unemotional. For example, whistands abuse calmly. Mr. Deltoid scene, 2 minutes.
Draco from HP. B. Harry turns from Draco’s rival to one of the only people who can save him and his family. C. Stuck between the good and the bad side. Doesn’t belong with the bad guys, they’re too extreme for him. His family got sidelined, lost their status among the Death Eaters. Also, in most of the books he’s somewhere on the sidelines. D. Very distressed and confused in the later books. E. Cocky and cold.
Harry Osborn from The Amazing Spider-Man 2. Loses status (Oscorp company), health (genetic disease), sanity (goes Green Goblin), morality. Daddy issues. Coldish and prideful. Meets dying father (4 minutes), asks Peter (3 minutes), meets Spiderman (2 minutes).
Dorian Grey from the 2009 movie. B. His relationship with Henry is about his moral deterioration. His relationship with Emily is about his redemption. C and D. Stuck in the immortal hedonistic lifestyle, without any purpose. Conflicted between Good and Evil, changes alignment multiple times. E. Cocky and unemotional.
Once-ler from The Lorax 2012. A. Symbolizes the growing danger of capitalistic greed. B. His status in the family radically transforms. C. His social status radically transforms. D. His morality radically transforms. How bad can I be? E. His attitude radically transforms. The biggest Tumblr Sexymen (other than Sans). A video explaining Onceler and Oncest in 2 minutes.
Lord Shen from Kung Fu Panda 2. A. Brings the new era, the era of gunpowder weapons. B. KFP 2 is sort of about Po’s relationship with him transforming. From PTSD to “inner peace”. The same can’t be said about other KFP villains, He has the most personal connection to Po. C. He’s a banished prince, but returns his status by taking over his home town. D. Desperate to satisfy his incomplete ego. “The cup you choose to fill has no bottom. It is time to stop this madness.” (c.) Soothsayer. E. Cold and elegant, but short-tempered and cocky. Shen vs. kung fu masters (2 minutes), Shen’s fortune (3 minutes).
Lapis from Steven Universe. A. Was stuck in a mirror for thousands of years. Then was stuck on the bottom of the ocean, fused with another gem (Jasper). Later she’s free, but has no home. At some moment she hides on the Moon. B. Takes a long time to befriend crystal gems. C and D. Was aligned with an evil empire (Homeworld), killed sentient life by terraforming planets, but wasn’t truly evil or a pro-empire diehard. Wanted to protect the human (Steven) who freed her. She was sorta like a civilian scientist in Nazi Germany. Eventually she changes her alignment entirely. Has tons of trauma and depression. E. Very calm and very angry/disheveled. Lapis clips.
Dio from JoJo Phantom Blood. B. His petty rivalry with Jonathan transforms into a global fight between Good and Evil. C. He’s an adopted son in a family he doesn’t care about. CA. Transforms into a supernatural evil being. E. Cocky and calm. 4 minute clip.
Seita from Grave of the Fireflies. A and C. Seita and Setsuko lose their home and wander without a (true) home. Their social status gets worse and worse. First they live with an abusive aunt, then they’re on their own in an abandoned duguot. B. His relationship with Setsuko transforms, he becomes her guardian and the only friend. E. Appears unemotional, but a lot of bottled feelings.
Kiki from Kiki’s Delivery Service. A. The movie is about growing up and depression / artist’s block. B. Has uncertain friendship with Tombo, connected to her loneliness/depression. C. Kiki gets into a new town. First time on her own, first job. Being a special “race” (witch) and having more reponsibilities (a job) contributes to isolation. D. Lonely. Later depressed. E. Calm but prideful.
Kurisu from Steins;Gate. B and D. Goes from antogonizing to loving Okabe. C. The most prestigious and mature in the group. The whole show is her being taken out of her comfort zone, dipped into mad science & conspiracy & weirdos. E. Unemotional but prideful.
Sheldon from The Big Bang Theory. B and C. Detached in social and personal life. Has the constant dynamic where he keeps getting dragged into uncomfortable situations and HE keeps dragging others into uncomfortable situations. E. Unemotional/shy but aggressively prideful. Sheldon Cooper Moments from Seasons 1-6.
Luka from Miraculous. B. Has liminal love/friendship with Marinette. Supports her love for Adrien but still has feelings for her. B. Has split family. C. IIRC somewhat disconnected from Marinette’s main friend group. Learns Marinette’s/Adrian’s secret and leaves Paris to protect it. CA. Gets the superhero power to indefinitely rewind time to a checkpoint. D. Has unusually strong empathy which makes him a bit isolated as a mediator/therapist. But who’s gonna help him? E. Appears way more untroubled than he truly is. See 4:28 − 9:42.
Randall from Mosnters Inc. A. Works on a conspiracy to change everything. “I’m about to revolutionize the scaring industry. And when I do, even the ‘great’ James P. Sullivan’s gonna be working for me.” (c.) B and C. Constantly 2nd place to Sally. CA. Can turn invisible. E. Can be calm/sly or explosively spiteful.
Real-life semi-example:
Ding Liren. A. Became the world champion in a liminal period, where Magnus exited undefeated. C. Abruptly lost strength. D. Had mental health struggles. E. Appears unemotional.
Semi-examples:
Eric Draven from the Crow 1994. C. Undead. E. Both unemotional and explosive. Eric from the 2024 movie would be a proper example.
Fubuki from OPM. Overshadowed by her sister, prideful and cold.
Lord Beckett from POTC. A and CA. Symbolizes the transformation of the world, the dawn of the pirate era, magical era. “This is no longer your world, Jones. The immaterial has become… immaterial.” (c.) He, a businessman, doesn’t belong in the world of pirates and magic. His motivation is a bit alien compared to other characters. E. Both cocky and unemotional.
Westley from The Princess Bride. A. Transformed from a regular farm boy into one of the most dangerous men in the world. The entirety of the movie he’s stuck in a neverending rescue mission. B. His relationship with Buttercap transforms from “master and servant” to “captor and captive” to “superhero and damsel in distress”. His relationship with Inigo and Fezzik transforms from enemies to friends. C and CA. He’s pretty alien, so unusual physically and mentally. A big chunk of the movie he hides his identity, looking like some weird ninja. Another chunk he’s recovering from death. E. Both cocky and unemotional. This 7 minute scene is an example again.
Silas from The Da Vinci Code. “I am a ghost.” A and C. His whole life he’s a hidden outcast. CA. An albino. D. Stuck in the state of violating his principles and punishing himself. E. Hot and cold.
Rei from Evangelion. TV series SPOILERS. She’s a clone and a tool. Unemotional.
Squidward from Spongebob. C and D. Longs for celebrity life, but it never comes. Uses pessimism and misanthropy to cope. E. Cold and cocky.
See below. Eric doesn’t have enough dimensions of liminality, as well as not enough ego and devaluation. Lord Beckett, Westley—not enough devaluation. Rei—not enough ego. Silas, Fubuki—not enough ego and devaluation. Squidward—surprisingly, not enough ego… he is ready to compromise with his ego way too much and is not proactive enough.
Ego
Liminal!isolation can be caused by ego (pride, privilege, elitism, superiority).
Zuko from ATLA. B. Looks down on Iroh, because Iroh doesn’t display ambition. C and D. Motivated by restoring his royal privilege and honor. Fire Nation suprematist, calls Katara a peasant. E. Cocky.
CC from Code Geass. B. Treats Lelouch as a source of amusement. C. Has some privillege due to immortality, tho it’s questionable. E. Behaves like a queen.
Draco from HP. B. Jelous of Harry. C. Spoiled brat. D. Blood supremacist. E. Cocky.
Dorian Grey from the 2009 movie. Vain hedonist narcissist, in personal relationships and in general.
Dio from JoJo Phantom Blood. Cocky. Feels entitled to everything Jonathan has. Even steals his girlfriend’s first kiss. Kono Dio Da.
Lee Woo-jin, Lord Shen (KFP 2), Harry Osborn, maybe Seita (Grave) are other strong examples.
Entanglement
Another sub-theme is “entanglement”. When your fate is unusually entangled with another person.
Some examples:
Zuko from ATLA. His social status and family matters and morality all get entangled in his rivalry with Aang. Zuko even sees himself as Aang in one dream.
CC from Code Geass. Her fate, status and goals/motivations all get entangled with Lelouch.
Lee Woo-jin from Oldboy. His personal life, criminal activities, future plans/ambitions… everything is about Daesu.
Draco from HP. His status, morality, fate of his parents… all become dependent on his rival, Harry.
Lord Shen from Kung Fu Panda 2. His fate, status, ambitions… all become dependent on Po.
Even lovers don’t often have this much entanglement.
Harry Osborn and Peter Parker, Dio and Jojo, Seita and Setsuko, Kurisu and Okabe, Randall and Sally are other strong examples.
Devaluation
Liminal!pain is about absence of expected value. Deprivedness, disrespect, depression, marginalization, dissatisfaction, etc.
Zuko from ATLA. B. Iroh is Zuko’s closest connection after banishment. Zuko cared about him, but still saw little in him. “I think you are exactly what you seem! A lazy, mistrustful, shallow old man who’s always been jealous of his brother!” (c.) C. He’s a prince, but not respected. D. His purpose (regaining honor and father’s respect) starts to become meaningless to him. E. Shows no respect to superiors (e.g. Zhao).
CC from Code Geass. B. Helps Lelouch, but doesn’t care about his plans. D. Doesn’t see value in living anymore. E. Unusually unbothered by the stuff happening around her.
Gaara from Naruto. A. Was a nice kid, but snapped after he was convinced he wasn’t ever loved by anyone. B. Didn’t care about his siblings. They were terrified of him. C. Was a big element of an invasion plot he didn’t care about. D. He is pretty purposeless, his only purpose is to kill people “to prove his own existence”. Believes he never been loved (the symbol on his forehead is “love”). E. Unusually unbothered by attacks (due to the automatic sand defense).
Alex from Clockwork Orange. B. Becomes inconsequential to his gang and family. C. A part of society, has parents, goes to school and clubs, but doesn’t care about people (abuses and kills them). D. Loses his purpose. E. Shows no respect to anyone.
Lord Shen from Kung Fu Panda 2. B. Thought his parents hated him. C. Was a banished prince. D. Has insatiable ego. “The cup you choose to fill has no bottom. It is time to stop this madness.” (c.) Soothsayar. E. Peacocky.
Lapis, Dio (JoJo), Harry Osborn, Seita (Grave of Fireflies) are other strong examples.
A very similar sub-theme is “loss of agency” (getting stuck, losing resources / support, getting depressed, etc).
Punkness (example 3)
One theme (punkness) and 3 sub-themes (social conflict, uncomplicatedness, direct attachment). Because of the sub-themes “punkness” turns into something more like “soft punkness”. 15+ examples.
Punkness
Theme of the next archetype is punkness. More specifically, “subverting social properties of status”. High social status is desirable; social status is relatively rigid; high status people hook up with other high status people; high status people are powerful and talented, etc. This archetypes subverts all that. A perfect example of punkness would have the following properties:
A. Radical changes of social status.
B. A relationship against social rules.
C. Fighting against high-status people or organizations or the order of society.
CA. Charisma and great skills despite low status. Chaotic fighting style or self-expression. Status-subverting abilities.
D. Simple goals and approach. Dislike of high status.
E. Punk attitude, low-status attitude.
In other words, punk life, punk relationships, punk social role, punk mentality, punk attitude.
Examples:
Sirius Black. A. Run away from his wealthy racist family. C In Book 3 he was alone against dementors, authorities and Hogwarts defenses. CA. Canonically charismatic. Can turn into a dog. D. Simple goal (help Harry / kill Peter) and straightforward approach. E. Has punk attitude (motorcycle, posters of pinup muggle girls, pranks, taunting while dueling). A bit crazy and immature.
Robin Hood from the 1991 movie. A and C. Was a nobleman, became the king of thieves, robbing wealthy and giving to the poor. B. His love with Marian has status contrast. CA. Canonically charismatic. D. Simple goals and approach. E. Very bold and somewhat unserious.
Beetlejuice. B. Obsessed with a little girl. CA. Has a lot of power despite being a dead hobo. Acts wildly and whimsically, can shapeshift. D. Simple goals and approach. E. Funny, offensive, crazy. His introduction, 2 minutes (2:00 − 5:30).
Daesu from Oldboy HEAVY SPOILERS. A. Lost all of his social status by getting abducted, jailed and framed for murder. Was an alcoholic, turned into a martial arts master. B. His love with Mido violates social rules, in the worst possible way. His “relationship” with the antagonist has status contrast. C. Fights, all alone, against organized crime and a wealthy adversary. CA. Compensates lack of skill/technique with rage. D. Simple goals and approach. E. Looks and behaves wild.
Alucard from Hellsing Ultimate. A and C. Was a sex slave, became a christian warlord, but betrayed his humanity in a moment of weakness and turned into an immortal overpowered vampire. Then was tamed and started working on a christian organization, fighting other evil beings. Wild status changes. B. Despite being infinitely more powerful than Integra, he’s her servant. CA. Sadomasochistic fighting style. Allows opponents to mutilate him and humiliates them before killing. Can shapeshift. D. Wants to find a worthy opponent to die to. He’s bored of immortality and considers himself a failure (for becoming a monster). E. He’s a troll.
Joker from The Dark Knight. A. Was a nobody, became the main criminal in town. B. His weird “relationship” with Batman has status contrast. C. Tries to ruin the social order. CA. Unpredictable actions, “improvisation” (even though it’s fake), disguises. D. Simple goal (chaos). E. Looks and behaves wild.
Sherlock Holmes, Guy Ritchie version. B. Loves a criminal (Irene Adler). C. A mix of a detective and a vigilante. Too much collateral damage. E. Punk attitude. Opening scene (6 minutes), with Irene (3 minutes).
Wendy from Gravity Falls. B. Age gap friendship with something romantic (though she doesn’t reciprocate). C. Unusually tough kid (15 years). Lumberjack games, survival training, stressful home with no mother. Skilled in combat. Slacks on the job, breaks the rules, breaks the law (a little bit). E. Punk attitude. Her clips, 2 minutes.
Marceline from Adventure Time. A. Rejected her dad (low-key Devil) and reign over the Nightosphere (basically Hell). B. Lives in a simple house in a cave. Her girlfriend has a castle and a kingdom. CA. She’s a punk rocker. Can turn into a monster bat. D. No complex goals. E. Punk rocker, trickster, down to earth. “I’m not mean. I’m a thousand years old, and I just lost track of my moral code.” Evicted (1:00 − 3:40), The Henchmen (5 minutes).
Inigo Montoya from The Princess Bride. C A reluctant criminal with a righteous goal and exceptional swordfighting skills. CA. Charismatic, quickly wins Weastley’s respect. Likes showing off and taunting while fighting. D. Simple goal and approach. E. Somewhat unserious, impatient, cocky and daring. 7 minute scene.
Speedwagon from JoJo. A. Was a street thug who became a wealthy businessman. B. Became an extremely loyal friend to a rich family. C. Very honorable despite being a thug. D. Straightforward mentality. E. Has the attitude. His first scene (3 minutes), exposing Dio (2 minutes).
Korben from The Fifth Element. A and C. A former special forces major becomes a taxi driver, then joins the mission to save Earth. Fights against an army of a corporation (Zorg and Mangalores). B. Manages to woo a perfect being. D. Straightforward mentality. E. Has the attitude.
Bumi Jr. from LOK. C. Has status (commander), yet he’s a comic relief. CA. Unorthodox tactics. D. A bit submissive, has self-worth issues about lacking bending powers. “Look, uh, I’m sorry I didn’t turn out to be an airbender like you hoped. But I’ve tried my best to keep the world safe. Hope I made you proud.” E. Wild, comical. Saves everyone (5 minutes), talks to Aang (30 seconds)
Eric from The Eric Andre Show. C Takes a high-status concept (late night shows) and turns it into a shitshow. CA. Chaotic. D. Humiliates others and himself. Straightforward approach. E. Punk. 3 minute example, another 3 minutes.
Real-life example:
Mikhail Tal. Incorrect and whimsical playing style, humor, disregard for health. But a champion-level strength.
Adam Sandler. C and D. Talented actor, but ready to waste his talent on shitty movies. Doesn’t treat movies as seriously as other actors. E. Has punk attitude and low-status appearance (popularized “grocery-run look”). On Oscars 2025. About his movies.
Michael Jackson. Semi-example. B. Creepy relationships with kids. C. Was a profilic philanthrope and promoted social causes in songs too (e.g. Man in the Mirror, They Don’t Care About Us). CA. Popularized fancy moves (the moonwalk, the robot, the anti-gravity lean) and music videos. D. Mentally stuck in the childhood. E. Charismatic but weird, down-to-earth.
Semi-examples:
Charlie from All Dogs Go to Heaven. A. Escapes prison, loses money, escapes heaven, becomes rich again. C. Low-life. D. Straightforward mentality. E. Punk attitude.
Fenton/Gizmoduck from DuckTales 2017. B. Loves a criminal. C. Starts as an unpaid intern with a lab in the bathroom, becomes a respected hero. D and E. Straightforward mentality, naive and earnest. Awkward, clumsy attitude. Fenton clips.
Fujimoto from Ponyo. A. Rejected humanity and became the sea king. B. A human who rizzed a goddess. C Despite being a king he’s sorta a comic relief. His kid dosabeys him and he can’t do much about it. Submissive to his wife. CA. Looks unkempt and acts whimsically. Not afraid to show up on the surface and do the dirty work himself.
Billy Madison 1995. A and C. A wealthy father with a big company has a poor-looking buffoon son, Billy, who has a chance to inherit everything. Billy goes back to elementary school, as an adult, to prove himself. B. In love with his elementary teacher. D. Doesn’t care about living seriously. Rejects the inheritance. Despite being an idiot he has a good heart and hidden smarts. E. Low-status behavior.
Han Solo from SW. A and C. A smuggler who becomes a rebel general. B. Woos a princess. E. Punk attitude.
Tom Bombadil from LOTR.
Julian Rush from The Penguin. B. Woos a powerful girl. C. Psychiatrist turned mafia goon. D. Unexpectedly submissive.
See below. Fenton, Billy, Tom B, Rush—not enough deliberate conflict. Charlie—maybe not enough conflict too. Han, Fujimoto—not uncomplicated enough, too much responsibility and grumpiness. Michael Jackson—not enough conflict.
Social Conflict
Punkness!isolation is caused by “deliberate conflict”.
Examples:
Sirius Black. A. Run away from his wealthy racist family. B. Bullied Snape, mistreated Kreacher. C. Troublemaker, bully, assumed criminal. D. Revenge-driven. E. Somewhat arrogant (e.g. taunting while fighting).
Robin Hood from the 1991 movie. B. Fucks with Nottingham. C. Does crimes. D. Partially revenge-driven. “I’ll not rest until my father is avenged. I swear it by my own blood.” (c.) E. Brash.
Beetlejuice. B. Tries to wed Lydia against her will. C. His “job” is bio-exorcism. He scares and might kill. E. Brash, offensive.
Daesu from Oldboy. B. Trapped in an urgent conflict with Lee Woo-jin. C. Fights organized crime. D. Revenge-driven. E. Brash, offputting.
Inigo Montoya from The Princess Bride. C. A criminal. D. Revenge-driven. E. Impatient and cocky.
Alucard, TDK Joker, RDJ Sherlock (offensive, insufferable, kicking ass) are other strong examples.
Uncomplicatedness
Another sub-theme is “seeming effortless / lacking usual responsibilities / being submissive / straightforward / relaxed” (uncomplicatedness).
Examples:
Sirius Black from HP. B. Unusually devoted to Harry. Harry is canonically the person Sirius cares about the most. C. Waited 12 years in Azkaban, then was under home arrest. Sometimes freely roams as a dog. D. Straightforward mentality. E. Can be very soft.
Robin Hood from the 1991 movie. C. Lives in his utopia in the woods. CA. Most of the time dispatches enemies effortlessly. D. Straightforward mentality. E. Laidback, down-to-earth.
Beetlejuice. C. Has no real job or responsibilities. CA. Dispatches people effortlessly. D. Straightforward mentality. E. So relaxed he ignores personal boundaries.
Daesu from Oldboy. C. Was jailed for 15 years. D. Straightforward mentality. E. Can be submissive.
Alucard from Hellsing Ultimate. A. Was a sex slave. B and C. Submissive to Integra, serves her organization. CA. Dispatches people effortlessly. D and E. Straightforward mentality. Masochistic. Sees himself as inferior to dignified humans. Can dish out a good complement—“I still see the same spirited young woman I met fifty years ago, Your Majesty. In fact, in my eyes, time has made you even more beautiful, your highness.” (c.)
TDK Joker, RDJ Sherlock, Wendy, Marceline, Inigo Montoya, Speedwagon, ADGTH Charlie are other strong examples.
Direct Attachment
Another sub-theme is “direct attachment”. Unusually strong, direct, multi-faceted attachment.
Sirius. Protects Harry, cares about him the most, was ready to house him.
Robin Hood. Protects Marian physically, socially (removes Nottingham), knows her from childhood, swore to protect her to her dying brother.
Daesu from Oldboy HEAVY SPOILERS. Loves Mido, protects her physically, etc.
Alucard from Hellsing Ultimate. Serves Integra, her family and organization. Protects her. Admires her.
Speedwagon from JoJo. Helps Jojo, accompanies him, admires him, leaves criminal life behind for him. Later helps his family.
Korben (loves / protects / helps Leeloo), Inigo Montoya (devotes his entire life to avenging his father), ADGTH Charlie (protects Anne-Marie, finds her a family, visits her from Heaven) are other strong examples. Beetlejuice (obsessed with Lydia), Joker (obsessed with Batman) are weaker examples.
If you’re interested, you can message me on LW or Discord.
Disclaimer: I write everything myself, no AI assistance.
E.g. MBTI is combinatorial, Big Five is geometric.
I’ve gone pretty deep down this rabbit hole, and the place I keep ending up is psychoanalysis. Jung is obviously the spiritual father of MBTI and most personality typology, but Lacan’s personality structures are what stuck with me.
It’s beyond my capacity to explain much Lacan in short form, but, imo, what differentiates him is that his personality structures are emergent. Personality is downstream of how people organize the satisfaction of their desires, and behavior is the language (very loaded term) people use to pursue those desires.
Unfortunately, here there be dragons. Lacan was famously, and intentionally, obscurantist. Also, a lot is also lost in translation from the original French. My personal view is that he overspecified much of his theory and that a cleaner version should exist but doesn’t.
a lot of unconventional people choose intentionally to ignore normie-legible status systems. this can take the form of either expert consensus or some form of feedback from reality that is widely accepted. for example, many researchers especially around these parts just don’t publish at all in normal ML conferences at all, opting instead to depart into their own status systems. or they don’t care whether their techniques can be used to make very successful products, or make surprisingly accurate predictions etc. instead, they substitute some alternative status system, like approval of a specific subcommunity.
there’s a grain of truth to this, which is that the normal status system is often messed up (academia has terrible terrible incentives). it is true that many people overoptimize the normal status system really hard and end up not producing very much value.
but the problem with starting your own status system (or choosing to compete in a less well-agreed-upon one) is that it’s unclear to other people how much stock to put in your status points. it’s too easy to create new status systems. the existing ones might be deeply flawed, but at least their difficulty is a known quantity.
one common retort is that it’s not worth proving yourself to people who are too closed minded and only accept ideas if they are validated by some legible status system. this is true to some extent, and i’m generally against people spending too much effort to optimize normie status too hard (e.g i think people should be way less worried about getting a degree in order to be taken seriously / get a job offer), but it’s possible to take too far.
a rational decision maker should in fact discount claims of extremely illegible quality, because there are simply too many of them and it’s too hard to pick out the good ones even if they were there (that’s sort of the whole thing about illegibillity!). it seems bad to only bestow the truth upon people who happen to be irrational in ways that cause them to take you seriously by chance. if left unchecked, this kind of thing can also very easily evolve into a cult, where the unmooring from reality checks allows huge epistemic distortions.
a good in between approach might be to do some very legibly impressive things, just to prove that you can in fact do well at the legible status system if you chose to, and are intentionally choosing not to (as opposed to choosing alternative status systems because you’re not capable of getting status in the legible system).
This comment seems to implicitly assume markers of status are the only way to judge quality of work. You can just, y’know, look at it? Even without doing a deep dive, the sort of papers or blog posts which present good research have a different style and rhythm to them than the crap. And it’s totally reasonable to declare that one’s audience is the people who know how to pick up on that sort of style.
The bigger reason we can’t entirely escape “status”-ranking systems is that there’s far too much work to look at it all, so people have to choose which information sources to pay attention to.
It’s a question of resolution. Just looking at things for vibes is a pretty good way of filtering wheat from chaff, but you don’t give scarce resources like jobs or grants to every grain of wheat that comes along. When I sit on a hiring committee, the discussions around the table are usually some mix of status markers and people having done the hard work of reading papers more or less carefully (this consuming time in greater-than-linear proportion to distance from your own fields of expertise). Usually (unless nepotism is involved) someone who has done that homework can wield more power than they otherwise would at that table, because people respect strong arguments and understand that status markers aren’t everything.
Still, at the end of day, an Annals paper is an Annals paper. It’s also true that to pass some of the early filters you either need (a) someone who speaks up strongly for you or (b) pass the status marker tests.
I am sometimes in a position these days of trying to bridge the academic status system and the Berkeley-centric AI safety status system, e.g. by arguing to a high status mathematician that someone with illegible (to them) status is actually approximately equivalent in “worthiness of being paid attention to” as someone they know with legible status. Small increases in legibility can have outsize effects in how easy my life is in those conversations.
Otherwise it’s entirely down to me putting social capital on the table (“you think I’m serious, I think this person is very serious”). I’m happy to do this and continue doing this, but it’s not easily scalable, because it depends on my personal relationships.
there is always too much information to pay attention to. without an inexpensive way to filter, the field would grind to a complete halt. style is probably a worse thing to select on than even academia cred, just because it’s easier to fake.
Generally, it is about heuristics we can use to find quality in the oceans of crap. If we assume that people are sane to some degree, status is an imperfect proxy for quality. If we assume that people don’t use AIs to polish their writing styles, the writing style is an imperfect proxy for quality.
I have no experience reading research. I suspect that there are also crackpots who can write using the right kind of style. For example, they may be experts at their own line of research, and also speak overconfidently about different things they do not understand.
So if you want to be taken seriously, you probably need to know what kind of crackpot do you remind others of, and then find a way how to distinguish yourself from this kind of crackpot specifically.
At some moment it would probably easier to simply do your homework, once, and then have something you can point at. For example, you don’t need to publish everything in the established journals, but it would probably help to publish there once—just to show that if you want, you can; that this is about your priorities, not about lack of quality.
There are probably other ways, for example if you don’t wont to get involved too much with the system, find someone who already is, and maybe offer them co-authorship in return for jumping through all the hoops.
I guess my model is that the costs of complying with the standard system are high but constant. So the more time you spend complaining about the system not taking your seriously, the greater the chance that complying with the system would have actually been cheaper than the accumulating opportunity costs.
A thing that I often see happening when people talk about “normie-legible status systems” is that they gaslight themselves into believing that some status system that is extraordinarily legible, or they are part of, is something that is consensus.
Academia is the most intense example of this. Most people don’t care that much about academic status! This also happens in the other direction. Youtube is a major source of status in much of the world, especially among young people, but is considered low-brow whenever people argue about this, and so people dismiss it.
I also think people tend to do a fallacy of gray thing where if a status system is not maximally legible (like writing popular blogposts, or running a popular podcast, or making popular Youtube videos, or being popular on Twitter), they dismiss the status system as not real and “illegible”.
I think modeling the real status and reputation systems that are present in the world is important, but for example, trying to ascent the academic status hierarchy is a bad use of time and resources. It’s extremely competitive, and not actually that influential outside of the academic bubble. It is in some fields better correlated with actual skills and integrity and intelligence, and so I still think a reasonable thing to consider, but I think most people are better placed to trade off a bit of legibility against a whole amount of net realness in status (this importantly does not mean your LW quick takes will be the thing that causes you to become world-renowned, I am not saying “just say smart things and the world will recognize you”, I am saying “don’t think that only the most legible status systems, or the one with the most mobs hunting dissenters from the status system are the only real ways of gaining recognition in the world”).
sure, the thing you’re looking for is the status system that jointly optimizes for alignedness with what you care about, and how legible it is to the people you are trying to convince.
(My guess is you meant to agree with that, but kind of the whole point of my comment was that the dimension that is more important than legibility and alignment with you is the buy-in your audience has for a given status system. Youtube is not very legible, and not that aligned, but for some audiences has very high buy-in.)
For some fields such as biotech, it’s difficult to get access to labs outside of academia. And you can’t learn without lab access because the cutting edge experiments don’t get posted to YouTube (yet).
There is a passage from Jung’s “Modern man in search of a soul” that I think about fairly often, on this point (p.229 in my edition)
It’s possible that this wouldn’t work for everyone, but so far I am very satisfied working on a PhD on agent foundations (AIXI). There are a lot of complaints here about academic incentives, but mostly I just ignore them. Possibly this will eventually interfere with my academic career prospects, but in the meantime I get years to work on basically whatever I think is interesting and important, and at the end of it I can reasonably expect to end up with a PhD and a thesis I’m proud of, which seems like enough to land on my feet. Looks like the best of both worlds to me.
Two common failure modes to avoid when doing the legibly impressive things
1. Only caring instrumentally about the project (decreases motivation)
2. Doing “net negative” projects
What kind of changes or outcomes would you expect to see if people around these parts instead of publishing their work independently started trying to get it into traditional ML conferences and related publications?
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
this raises the question: how should you pick which reference class to use, in general? how do you prevent reference class tennis, where you argue back and forth about what is the right reference class to use? I claim the solution is you want to use reference classes that have consistently made good decisions irl. the point of reference classes is to provide a heuristic to quickly apply judgement to large swathes of situations that you don’t have time to carefully examine. this is important because otherwise it’s easy to get tied up by bad actors who avoid being refuted by making their beliefs very complex and therefore hard to argue against.
the big problem with the latter reference class is it’s not like anyone has had many experiences using it to make decisions ex ante, and if you squint really hard to find day to day examples, they don’t all work out the same way. smarter humans do mostly tend to win over less smart humans. but if you work at a zoo, you will almost always be more worried about physical strength and aggressiveness when putting different species in the same enclosure. if you run a farm (or live in Australia), you’re very worried about relatively dumb invasive animals like locusts and rabbits.
on the other hand, everyone has personally experienced a dozen different doomsday predictions. whether that’s your local church or faraway cult warning about Armageddon, or Y2K, or global financial collapse in 2008, or the maximally alarmist climate people, or nuclear winter, or peak oil. for basically all of them, the right action empirically in retrospect was to not think too much about it. there are many concrete instances of people saying “but this is different” and then getting burned.
and if you allow any reference class to be on as strong a footing as very well established reference classes, then you open yourself up to getting pwned ideologically. “all complex intricate objects we have seen created have been created by something intelligent, therefore the universe must also have an intelligent creator.” it’s a very important memetic defense mechanism.
(to be clear this doesn’t mean you can only believe things others believe, or that humans taking over earth is not important evidence, or that doomsday is impossible!! I personally think AGI will probably kill everyone. but this is a big claim and should be treated as such. if we don’t accept this, then we will forever fail to communicate with people who don’t already agree with us on AGI x-risk.)
This all seems wrongheaded to me.
I endeavor to look at how things work and describe them accurately. Similarly to how I try to describe how a piece of code works, or how to to build a shed, I will try to accurately describe the consequences of large machine learning runs, which can include human extinction.
This isn’t how I think about things. Reality is what exists, and if a claim accurately describes reality, then I should not want to hold it to higher standards than claims that do not describe reality. I don’t think it’s a good epistemology to rank claims by “bigness” and then say that the big ones are less likely and need more evidence. On the contrary, I think it’s worth investing more in finding out if they’re right, and generally worth bringing them up to consideration with less evidence than for “small” claims.
I don’t have the experiences you’re describing. I don’t go to churches, I don’t visit cults, I was 3yrs old in the year 2000, I was 11 for the ’08 financial crash and having read about it as an adult I don’t recall extinction being a topic of discussion, I think I have heard of climate people saying that via alarmist news headlines but I have not had anyone personally try to convince me of this or even say that they believe it. I have heard it discussed for nuclear winter, yes, and I think nukes are quite scary and it was reasonable to consider, I did not dismiss it out of hand and wouldn’t use that heuristic. I don’t know what the oil thing is.
In other words, I don’t recall anyone seriously trying to convince me that the world was ending except in cases where they had good reason to believe it. In my life, when people try to warn me about big things, especially if they’ve given it serious thought, usually I’ve found it’s been worthwhile for me to consider it. (I like to think I am good at steering clear of scammers and cranks, so that I can trust the people in my life when they tell me things.)
The sense I get from this post is that, in it, you’re assuming everyone else in the world is constantly being assaulted with claims meant to scare and control them rather than people attempting to describe the world accurately. I agree there are forces doing that, but I think this post gives up all too quickly on there being other forces in the world that aren’t doing that that people can recognize and trust.
i am also trying to accurately describe reality. what i’m saying is, even from the perspective of someone smart and truth-seeking but who doesn’t know much about the object-level, it is very reasonable to use bigness of claim as a heuristic for how much evidence you need before you’re satisfied, and that if you don’t do this, you will be worse at finding the truth in practice. my guess is this applies even more so to the average person.
i think this is very analogous to occam’s razor / trust region optimization. clearly, we need to discount theories based on complexity because there are exponentially more complex theories compared to simple ones, many of which have no easily observable difference to the simpler ones, opening you up to being pwned. and empirically it seems a good heuristic to live life by. complex theories can still be true! but given two theories that both accurately describe reality, you want the simpler one. similarly, given two equally complex claims that accurately describe the evidence, you want the one that is less far fetched from your current understanding of the world / requires changing less of your worldview.
also, it doesn’t have to be something you literally personally experienced. it’s totally valid to read the wikipedia page on the branch davidians or whatever and feel slightly less inclined to take things that have similar vibes seriously, or even to absorb the vibe from your environs (your aversion to scammers and cranks surely did not come ex nihilo, right?)
for most of the examples i raised, i didn’t necessarily mean the claim was literally 100% human extinction, and i don’t think it matters that it wasn’t. first, because the important thing is the vibe of the claim (catastrophic) - since we’re talking about heuristics on how seriously to take things that you don’t have time to deep dive on, the rule has to be relatively cheap to implement. i think most people, even quite smart people, genuinely don’t feel much of an emotional difference between literal human extinction vs collapse of society vs half of people dying painfully, unless they first spend a half hour carefully thinking about the implications of extinction. (and even then depending on their values they may still not feel a huge difference)
also, it would be really bad if you could weasel your way out of a reference class that easily; it would be rife for abuse by bad actors—“see, our weird sect of christianity claims that after armageddon, not only will all actual sinners’ souls be tortured forever, but that the devil will create every possible sinner’s soul to torture forever! this is actually fundamentally different from all existing christian theories, and it would be unfathomably worse, so it really shouldn’t be thought of as the same kind of claim”
even if most people are trying to describe the world accurately (which i think is not true and we only get this impression because we live in a strange bubble of very truth seeking people + are above-average capable at understanding things object level and therefore quickly detecting scams), ideas are still selected for memeticness. i’m sure that 90% of conspiracy theorists genuinely believe that humanity is controlled by lizards and are trying their best to spread what they believe to be true. many (not all) of the worst atrocities in history have been committed by people who genuinely thought they were on the side of truth and good.
(actually, i think people do get pwned all the time, even in our circles. rationalists are probably more likely than average (controlling for intelligence) to get sucked into obviously culty things (e.g zizians), largely because they don’t have the memetic antibodies needed to not get pwned, for one reason or another. so probably many rationalists would benefit from evaluating things a little bit more on vibes/bigness and a little bit less on object level)
Your points about Occam’s razor have got nothing to do with this subject[1]. The heuristic “be more skeptical of claims that would have big implications if true” makes sense only when you suspect a claim may have been adversarially optimized for memetic fitness; it is not otherwise true that “a claim that something really bad is going to happen is fundamentally less likely to be true than other claims”.
I’m having a little trouble connecting your various points back to your opening paragraph, which is the primary thing that I am trying to push back on.[2]
To restate the message I’m reading here: “Give up on having a conversation where you evaluate the evidence alongside your interlocutors. Instead frame yourself as trying to convince them of something, and assume that they are correct to treat your communications as though you are adversarially optimizing for them believing whatever you want them to believe.” This assumption seems to give up a lot of my ability to communicate with people (almost ~all of it), and I refuse to simply do it because some amount of communication in the world is adversarially optimized, and I’m definitely not going to do it because of a spurious argument that Occam’s razor implies that “claims about things being really bad or claims that imply you need to take action are fundamentally less likely to be true”.
You are often in an environment where people are trying to use language to describe reality, and in that situation the primary thing to evaluate is not the “bigness” of a claim, but the evidence for and against it. I recommend instead to act in such a way as to increase the size and occurrence of that environment more-so than “act as though it’s correct to expect maximum adversarial optimization in communications”.
(Meta: The only literal quotes of Leo’s in this comment are the big one in the quote block, my use of “” is to hold a sentence as object, they are not things Leo wrote.)
I agree that the more strongly a claim implies that you should take action, then the more you should consider that it is being optimized adversarially for you to take action. For what it’s worth, I think that heuristic applies more so to claims that you should personally take action. Most people have little action to directly prevent the end of the world from AI; this is a heuristic more naturally applied to claims that you need to pay fines (which are often scams/spam). But mostly, when people give me claims that imply action, they are honestly meant claims and I do the action. This is the vast majority of my experience.
Aside to Leo: Rather than reply point-by-point to the each of the paragraphs in the second comment, I will try restating and responding to the core message I got in the opening paragraph of the first comment. I’m doing this because the paragraphs in the second-comment seemed somewhat distantly related / I couldn’t tell whether the points were actually cruxy. They were responding to many different things, and I hope restating the core thing will better respond to your core point. However I don’t mean to avoid key arguments, if you think I have done so feel free to tell me one or two paragraphs you would especially like me to engage with and I will do so in any future reply.
in practice many of the claims you hear will be optimized for memetic fitness, even if the people making the claims are genuine. well intentioned people can still be naive, or have blind spots, or be ideologically captured.
also, presumably the people you are trying to convince are on average less surrounded by truth seeking people than you are (because being in the alignment community is strongly correlated with caring about seeking truth).
i don’t think this gives up your ability to communicate with people. you simply have to signal in some credible way that you are not only well intentioned but also not merely the carrier of some very memetic idea that slipped past your antibodies. there are many ways to accomplish this. for example, you can build up a reputation of being very scrupulous and unmindkilled. this lets you convey ideas freely to other people in your circles that are also very scrupulous and unmindkilled. when interacting with people outside this circle, for whom this form of reputation is illegible, you need to find something else. depending on who you’re talking to and what kinds of things they take seriously, this could be leaning on the credibility of someone like geoff hinton, or of sam/demis/dario, or the UK government, or whatever.
this might already be what you’re doing, in which case there’s no disagreement between us.
You’re writing lots of things here but as far as I can tell you aren’t defending your opening statement, which I believe is mistaken.
Firstly, it’s just not more reasonable. When you ask yourself “Is a machine learning run going to lead to human extinction?” you should not first say “How trustworthy are people who have historically claimed the world is ending?”, you should of course primarily bring your attention to questions about what sorts of machine is being built, what sort of thinking capacities it has, what sorts of actions it can take in the world, what sorts of optimization it runs, how it would behave around humans if it were more powerful than them, and so on. We can go back to discussing epistemology 101 if need be (e.g. “Hug the Query!”).
Secondly, insofar as someone believes you are a huckster or a crackpot, you should leave the conversation, communication here has broken down and you should look for other communication opportunities. However, insofar as someone is only evaluating this tentatively as one of many possible hypotheses about you then you should open yourself up to auditing / questioning by them about why you believe what you believe and your past history and your memetic influences. Being frank is the only way through this! But you shouldn’t say to them “Actually, I think you should treat me like a huckster/scammer/serf-of-a-corrupt-empire.” This feels analogous to a man on a date with a woman saying “Actually I think you should strongly privilege the hypothesis that I am willing to rape you, and now I’ll try to provide evidence for you that this is not true.” It would be genuinely a bad sign about a man that he thinks that about himself, and also he has moved the situation into a much more adversarial frame.
I suspect you could write some more narrow quick-take such as “Here is some communication advice I find helpful when talking with friends and colleagues about how AI can lead to human extinction”, but in generalizing it all the way to making dictates about basic epistemology you are making basic mistakes and getting it wrong.
Please either (1) defend and/or clarify the original statement, or (2) concede that it was mistaken, rather than writing more semi-related paragraphs about memetic immune systems.
I am confused why you think my claims are only semi related. to me my claim is very straightforward, and the things i’m saying are straightforwardly converying a world model that seems to me to explain why i believe my claim. i’m trying to explain in good faith, not trying to say random things. i’m claiming a theory of how people parse information, to justify my opening statement, which i can clarify as:
sometimes, people use the rhetorical move of saying something like “people think 95% doom is overconfident, yet 5% isn’t. but that’s also being 95% confident in not-doom, and yet they don’t consider that overconfident. curious.” followed by “well actually, it’s only a big claim under your reference class. under mine, i.e the set of all instances of a more intelligent thing emerging, actually, 95% doom is less overconfident than 5% doom” this post was inspired by seeing one such tweet, but i see such claims like this every once in a while that play reference class tennis.
i think this kind of argument is really bad at persuading people who don’t already agree (from empirical observation). my opening statement is saying “please stop doing this, if you do it, and thank you for not doing this, if you dont already do it” the rest of my paragraphs provide an explanation of my theory for why this is bad for changing people’s minds. this seems pretty obviously relevant for justifying why we should stop doing the thing. i sometimes see people out there talk like this (including my past self at some point), and then fail to convince people, and then feel very confused about why people don’t see the error of their ways when presented with an alternative reference class. if my theory is correct (maybe it isn’t, this isn’t a super well thought out take, it’s more a shower thought), then it would explain this, and people who are failing to convince people would probably want to know why they’re failing. i did not spell this out in my opening statement because i thought it was clear but in retrospect this was not clear from the opening statement
i don’t think the root cause is people being irrational epistemically. i think there is a fundamental reason why people do this that is very reasonable. i think you disagree with this on the object level and many of my paragraphs are attempting to respond to what i view as the reason you disagree. this does not explicitly show up in the opening statement, but since you disagree with this, i thought it would make sense to respond to that too
i am not saying you should explicitly say “yeah i think you should treat me as a scammer until i prove otherwise”! i am also not saying you should try to argue with people who have already stopped listening to you because they think you’re a scammer! i am merely saying we should be aware that people might be entertaining that as a hypothesis, and if you try to argue by using this particular class of rhetorical move, you will only trigger their defenses further, and that you should instead just directly provide the evidence for why you should be taken seriously, in a socially appropriate manner. if i understand correctly, i think the thing you are saying one should do is the same as the thing i’m saying one should do, but phrased in a different way; i’m saying not to do a thing that you seem to already not be doing.
i think i have not communicated myself well in this conversation, and my mental model is that we aren’t really making progress, and therefore this conversation has not brought value and joy into the world in the way i intended. so this will probably be my last reply, unless you think doing so would be a grave error.
Thank you for all this. I still think your quick take is wrong on the matter of epistemology.
I acknowledge that you make a fine point about persuasion, that someone who is primarily running the heuristic that “claims about the end of the world are probably crack-pots or scammers” will not be persuaded by someone arguing that actually 20:1 against and 20:1 in favor of a claim are equally extreme beliefs.
A version of the quick take that I would’ve felt was just fine would read:
But your quick take doesn’t confine itself to discussing those people in those situations. It flatly says it’s true as a matter of epistemology that you should “use bigness of claim as a heuristic for how much evidence you need before you’re satisfied”, that you should “use reference classes that have consistently made good decisions irl” and that the crackpots/scammers one is the correct one to use here otherwise you’ll risk “getting pwned ideologically”.
These aren’t always the right heuristics (e.g. on this issue they are not for you and for me) and you shouldn’t say that they are just so that some people on Twitter will stop using rhetoric that isn’t working.
I believe you’re trying to do your best to empathize with people who are unpersuaded by an unsuccessful rhetorical move, a move that people who believe your position are making in public discourse. That is commendable. I think you are attempting to cause other people who hold your position to stop using that rhetorical move, by telling them off for using it, but to acheive this aim you are repeatedly saying the people who do not hold your position are doing normatively correct epistemology, and you’re justifying it with Occam’s razor and reference class forecasting, and this is all wrong. In some situations for some people it is reasonable to primarily use theses heuristics, and in other situations for other people it is not. I’m not arguing that the people unpersuaded are being unreasonable, but (for example) your opening sentence makes fully-general statements about how to reason about this issue that I believe are false. Rule number of one of good discourse: don’t make false statements about epistemology in order to win an object level point.
Yep, seems fine to drop this here; I make no bid of you to reply further.
(I would never make knowingly false statements about epistemology to try to win an object level point; I still disagree with your claims about epistemology and believe that my epistemology arguments are in good faith and capture truth in some way. This disagreement might be because I’ve not communicated myself well. I originally wasn’t going to reply but I felt the need to say this because your comment can be viewed as accusing me of intellectual/epistemic dishonesty, even if that wasn’t your intention.)
(I affirm that I don’t believe you were being knowingly dishonest or deceptive at any point in this thread.)
But you should absolutely ask “does it look like I’m making the same mistakes they did, and how would I notice if it were so?” Sometimes one is indeed in a cult with your methods of reason subverted, or having a psychotic break, or captured by a content filter that hides the counterevidence, or many of the more mundane and pervasive failures in kind.
But not in full generality! This is a fine question to raise in this context, but in general the correct thing to do in basically all situations is to consider the object level, and then also let yourself notice if people are unusually insane around a subject, or insane for a particular reason. Sometimes that is the decisive factor, but for all questions, the best first pass is to think about how that part of the world works, rather than to think about the other monkeys who have talked about it in the past.
This seems wrong to me.
a. More smaller things happen and there are fewer kinds of smaller thing that happen.
b. I bet people genuinely have more evidence for small claims they state than big ones on average.
c. The skepticism you should have because particular claims are frequently adversarially generated shouldn’t first depend on deciding to be skeptical about it.
If you’ll forgive the lack of charity, ISTM that leogao is making IMO largely true points about the reference class and then doing the wrong thing with those points, and you’re reacting to the thing being done wrong at the end, but trying to do this in part by disagreeing with the points being made about the reference class. leogao is right that people are reasonable in being skeptical of this class of claims on priors, and right that when communicating with someone it’s often best to start within their framing. You are right that regardless it’s still correct to evaluate the sum of evidence for and against a proposition, and that other people failing to communicate honestly in this reference class doesn’t mean we ought to throw out or stop contributing to the good faith conversations avaialable to us.
i’m not even saying people should not evaluate evidence for and against a proposition in general! it’s just that this is expensive, and so it is perfectly reasonable to have heuristics to decide which things to evaluate, and so you should first prove with costly signals that you are not pwning them, and then they can weigh the evidence. and until you can provide enough evidence that you’re not pwning them for it to be worth their time to evaluate your claims in detail, that it should not be surprising that many people won’t listen to the evidence; and that even if they do listen, if there is still lingering suspicion that they are being pwned, you need to provide the type of evidence that could persuade someone that they aren’t getting pwned (for which being credibly very honest and truth seeking is necessary but not sufficient), which is sometimes different from mere compellingness of argument
I think the framing that sits better to me is ‘You should meet people where they’re at.’ If they seem like they need confidence that you’re arguing from a place of reason, that’s probably indeed the place to start.
Thanks for the comment. (Upvoted.)
a. I expect there is a slightly more complicated relationship between my value-function and the likely configuration states of the universe than literally zero-correlation, but most configuration states do not support life and we are all dead, so in one sense a claim that in the future something very big and bad will happen is far more likely on priors. One might counter that we live in a highly optimized society where things being functional and maintained is an equilibrium state and it’s unlikely for systems to get out of whack enough for bad things to happen. But taking this straightforwardly is extremely naive, tons of bad things happen all the time to people. I’m not sure whether to focus on ‘big’ or ‘bad’ but either way, the human sense of these is not what the physical universe is made out of or cares about, and so this looks like an unproductive heuristic to me.
b. On the other hand, I suspect the bigger claims are more worth investing time to find out if they’re true! All of this seems too coarse-grained to produce a strong baseline belief about big claims or small claims.
c. I don’t get this one. I’m pretty sure I said that if you believe that you’re in a highly adversarial epistemic environment, then you should become more distrusting of evidence about memetically fit claims.
I don’t know what true points you think Leo is making about “the reference class”, nor which points you think I’m inaccurately pushing back on that are true about “the reference class” but not true of me. Going with the standard rationalist advice, I encourage everyone to taboo “reference class” and replace it with a specific heuristic. It seems to me that “reference class” is pretending that these groupings are more well-defined than they are.
Well, sure, it’s just you seemed to frame this as a binary on/off thing, sometimes you’re exposed and need to count it and sometimes you’re not, whereas to me it’s basically never implausible that a belief has been exposed to selection pressures, and the question is of probabilities and degrees.
I think you’re correct. There’s a synergistic feedback loop between alarmism and social interaction that filters out pragmatic perspectives. Creating the illusion that the doom surrounding any given topic more prevalent than it really is, or even that it’s near universal.
Even before the rise of digital information the feedback phenomenon could be observed in any insular group. In today’s environment where a lot of effort goes into exploiting that feedback loop it requires a conscious effort to maintain perspective, or even remain aware that there are other perspectives.
I think the group of people “claiming the end of the world” in the case of AI x-risk is importantly more credentialed and reasonable-looking than most prior claims about the end of the world. From the reference class and general heuristics perspective that you’re talking about[1], I think how credible looking the people are is pretty important.
So, I think the reference class is more like claims of nuclear armageddon than cults. (Plausibly near maximally alarmist climate people are in a similar reference class.)
IDK how I feel about this perspective overall.
I agree this reference class is better, and implies a higher prior, but I think it’s reasonable for the prior over “arbitrary credentialed people warning about something” to be still relatively low in an absolute sense- lots of people have impressive sounding credentials that are not actually good evidence of competence (consider: it’s basically a meme at this point that whenever you see a book where the author puts “PhD” after their name, they probably are a grifter / their phd was probably kinda bs), and also there is a real negativity bias where fearmongering is amplified by both legacy and social media. Also, for the purposes of understanding normal people, it’s useful to keep in mind that trust in credentials and institutions is not very high right now in the US among genpop.
You shouldn’t. This epistemic bath has no baby in it and we should throw water out of it.
False?
Climate change tail scenarios are worth studying and averting. Nuclear winter was obviously worth studying and averting back in the Cold War, and still is today. 2008 financial crisis was worth studying and averting.
Do you not believe average citizens can study issues like these and make moves to solve them?
This is kind of missing the point of Bayes. One shouldn’t “choose” a reference class to update on. One should update to the best of your ability on the whole distribution of hypotheses available to describe the situation. Neither is a ‘right’ or ‘wrong’ reference class to use, they’re both just valid pieces of evidence about base rates, and you should probably be using both of them.
It seems you are having in mind something like inference to the best explanation here. Bayesian updating, on the other hand, does need a prior distribution, and the question of which prior distribution to use cannot be waved away when there is a disagreement on how to update. In fact, that’s one of the main problems of Bayesian updating, and the reason why it is often not used in arguments.
I’m not really sure what that has to do with my comment. My point is the original post seemed to be operating as if you look for the argmax reference class, you start there, and then you allow arguments. My point isn’t that their prior is wrong, it’s that this whole operation is wrong.
I think also you’re maybe assuming I’m saying the prior looks something like {reference class A, reference class B} and arguing about the relative probability of each, but it doesn’t, a prior should be over all valid explanations of the prior evidence. Reference classes come in because they’re evidence about base rates of particular causal structures; you can say ‘given the propensity for the world to look this way, how should I be correcting the probability of the hypotheses under consideration? Which new hypotheses should I be explicitly tracking?’
I can see where the original post might have gone astray. People have limits on what they can think about and it’s normal to narrow one’s consideration to the top most likely hypothesis. But it’s important to be aware of what you’re approximating here, else you get into a confusion where you have two valid reference classes and you start telling people that there’s a correct one to start arguing from.
… but that still leaves the problem of which prior distribution should be used.
I agree this is an interesting philosophical question but again I’m not sure why you’re bringing it up.
Given your link maybe you think me mentioning Bayes was referring to some method of selecting a single final hypothesis? I’m not, I’m using it to refer to the Bayesian update rule.
It seems the updating rule doesn’t tell you anything about the original argument even when you view information about reference classes as evidence rather than as a method of assigning prior probabilities to hypotheses. Or does it? Can you rephrase the argument in a proper Bayesian way such that it becomes clearer? Note that how strongly some evidence confirms or disconfirms a hypothesis also depends on a prior.
What argument are you referring to when you say “doesn’t tell you anything about the original argument”?
My framing is basically this: you generally don’t start a conversation with someone as a blank pre-priors slate that you get to inject your priors into. The prior is what you get handed, and then the question is how people should respond to the evidence and arguments available. Well, you should use (read: approximate) the basic Bayesian update rule: hypotheses where an observation is unlikely are that much less probable.
I meant leogao’s argument above.
I think you’re underestimating the inferential gap here. I’m not sure why you’d think the Bayes updating rule is meant to “tell you anything about” the original post. My claim was that the whole proposal about selecting reference classes was framed badly and you should just do (approximate) Bayes instead.
And what would this look like? Can you reframe the original argument accordingly?
It’s just Bayes, but I’ll give it a shot.
You’re having a conversation with someone. They believe certain things are more probable than other things. They mention a reference class: if you look at this grouping of claims, most of them are wrong. Then you consider the set of hypotheses: under each of them, how plausible is it given the noted tendency for this grouping of claims to be wrong? Some of them pass easily, eg. the hypothesis that this is just another such claim. Some of them less easily; they are either a modal part of this group and uncommon on base rate, or else nonmodal or not part of the group at all. You continue, with maybe a different reference class, or an observation about the scenario.
Hopefully this illustrates the point. Reference classes are just evidence about the world. There’s no special operation needed for them.
The reference classes you should use work as a heuristic because there is some underlying mechanism that makes them work. So you should use reference classes in situations where their underlying mechanism is expected to work.
Maybe the underlying mechanism of doomsday predictions not working is that people predicting doom don’t make their predictions based on valid reasoning. So if someone uses that reference class to doubt AI risk, this should be judged as them making a claim about reasoning of people predicting AI doom being similar to people in cults predicting Armageddon.
you might expect that the butterfly effect applies to ML training. make one small change early in training and it might cascade to change the training process in huge ways.
at least in non-RL training, this intuition seems to be basically wrong. you can do some pretty crazy things to the training process without really affecting macroscopic properties of the model (e.g loss). one very well known example is that using mixed precision training results in training curves that are basically identical to full precision training, even though you’re throwing out a ton of bits of precision on every step.
people often say that limitations of an artistic medium breed creativity. part of this could be the fact that when it is costly to do things, the only things done will be higher effort
a medium with less limitations is strictly better for making good art, but it’s also harder to identify good art among the sea of bad art because the medium alone is no longer as good a signal of quality
Yes, but this also happens within one person over time, and the habit (of either investing, or not, in long-term costly high-quality efforts) can gain Steam in the one person.
This seems the likely explanation for any claim that constraints breed creativity/good things in a field, when the expectation is that the opposite outcome would occur.
My own expectation is that limitations result in creativity. Writers block is usually a result of having too many possibilities/choices. If I tell you “You can write a story about anything”, it’s likely harder for you to think of anything than if I tell you “Write a story about an orange cat”. In the latter situation, you’re more limited, but you also have something to work with.
I’m not sure if it’s as true for computers as it is for humans (that would imply information-theoretic factors), but there’s plenty of factors in humans, like analysis paralysis and the “See also” section of that page
My other explanation probably has to do with the fact that it’s way easier to work with an already almost-executed object than a specification, because we are constrained to only think about a subset of possibilities for a reasonable time.
In other words, constraints are useful given that you are already severely constrained, to limit the space of possibilities.
any time someone creates a lot of value without capturing it, a bunch of other people will end up capturing the value instead. this could be end consumers, but it could also be various middlemen. it happens not infrequently that someone decides not to capture the value they produce in the hopes that the end consumers get the benefit, but in fact the middlemen capture the value instead
can you give examples?
an example: open source software produces lots of value. this value is partly captured by consumers who get better software for free, and partly by businesses that make more money than they would otherwise.
the most clear cut case is that some businesses exist purely by wrapping other people’s open source software, doing advertising and selling it for a handsome profit; this makes the analysis simpler, though to be clear the vast majority of cases are not this egregious.
in this situation, the middleman company is in fact creating value (if a software is created in a forest with no one around to use it, does it create any value?) by using advertising to cause people to get value from software. in markets where there are consumers clueless enough to not know about the software otherwise (e.g legacy companies), this probably does actually create a lot of counterfactual value. however, most people would agree that the middleman getting 90% of the created value doesn’t satisfy our intuitive notion of fairness. (open source developers are more often trying to have the end consumers benefit from better software, not for random middlemen to get rich off their efforts)
and if advertising is commoditized, then this problem stops existing (you can’t extract that much value as an advertising middleman if there is an efficient market with 10 other competing middlemen), and so most of the value does actually accrue to the end user.
Often tickets will be sold at prices considerably lower than the equilibrium price and thus ticket scalpers will buy the tickets and then resell for a high price.
That said, I don’t think this typically occurs because the company/group originally selling the tickets wanted consumers to benefit, it seems more likely that this is due to PR reasons (it looks bad to sell really expensive tickets).
This is actually a case where it seems likely that the situation would be better for consumers if the original seller captured the value. (Because buying tickets from random scalpers is annoying.)
I wonder how much of this is the PR reasons, and how much something else… for example, the scalpers cooperating (and sharing a part of their profits) with the companies that sell tickets.
To put it simply, if I sell a ticket for $200, I need to pay a tax for the $200. But if I sell the same ticket for $100 and the scalper re-sells it for $200, then I only need to pay the tax for $100, which might be quite convenient if the scalper… also happens to be me? (More precisely, some of the $100 tickets are sold to genuine 3rd party scalpers, but most of them I sell to myself… but according to my tax reports, all of them were sold to the 3rd party.)
ticket scalping is bad and we should find some sort of fully distributed market mechanism that makes scalping approach impossible without requiring the ticket seller to capture the value. it ought to be possible to gift value to end customers rather than requiring the richest to be the ones who get the benefit, how can that be achieved?
The simple mechanism is:
Charge market prices (auction or just figure out the equilibrium price normally)
Redistribute the income uniformly to some group. Aka UBI.
Of course, you could make the UBI be to (e.g.) Taylor Swift fans in particular, but this is hardly a principled approach to redistribution.
Separately, musicians (and other performers) might want to subsidize tickets for extremely hard core fans because these fans add value to the event (by being enthusiastic). For this, the main difficulty is that it’s hard to cheaply determine if someone is a hard core fan. (In principle, being prepared to buy tickets before they run out could be an OK proxy for this, but it fails in practice, at least for buying tickets online.)
More discussion is in this old planet money episode.
of course, this is more a question about equilibria than literal transactions. suppose you capture most of the value and then pay it back out to users as a dividend: the users now have more money with which they could pay a middleman, and a middleman that could have extracted some amount of value originally can still extract that amount of value in this new situation.
we can model this as a game of ultimatum between the original value creator and the middlemen. if the participation of the OVC and middleman are both necessary, the OVC can bargain for half the value in an iterated game / as FDT agents. however, we usually think of the key differentiating factor between the OVC and middlemen as the middlemen being more replaceable, so the OVC should be able to bargain for a lot more. (see also: commoditizing your complement)
so to ensure that the end users get most of the value, you need to either ensure that all middleman roles are commoditized, or precommit to only provide value in situations where the end user can actually capture most of the value
The equilibrium comprises literal transactions, right? You should be able to find MANY representative specific examples to analyze, which would help determine whether your model of value is useful in these cases.
My suspicion is that you’re trying to model “value” as something that’s intrinsic, not something which a relation between individuals, which means you are failing to see that the packaged/paid/delivered good is actually distinct and non-fungible with the raw/free/open good, for the customers who choose that route.
Note that in the case of open-source software, it’s NOT a game of ultimatum, because both channels exist simultaneously and neither has the option to deny the other. A given consumer paying for one does not prevent some other customer (or even the same customer in parallel) using the direct free version.
I make no claim to fungibility or lack of value created by middlemen.
It’s worth examining whether “capturing value” and “providing value” are speaking of the same thing. In many cases, the middlemen will claim that they’re actually providing the majority of the value, in making the underlying thing useful or available. They may or may not be right.
For most goods, it’s not clear how much of the consumer use value comes from the idea, the implementation of the idea, or from the execution of the delivery and packaging. Leaving aside government-enforced exclusivity, there are usually reasons for someone to pay for the convenience, packaging, and bundling of such goods.
I worked (long ago) in physical goods distribution for toys and novelties. I was absolutely and undeniably working for a middleman—we bought truckloads of stuff from factories, repackaged it for retail, and sold it at a significant markup to retail stores, who marked it up again and sold it to consumers. Our margins were good, but all trades were voluntary and I don’t agree with a framing that we were “capturing” existing value rather than creating value in connecting supply with demand.
All value is finite, and every time value is used, it decreases. The middlemen are merely causing the thing to die faster. For instance, if you discover a nice beach which hasn’t been ruined with plastic and glass bottle yet, and make it into a popular area, you won’t get to spend many happy summers at that place.
If you find oil and sell it, are you creating value, or are you destroying value? I think both perspectives are valid. But since the openness of information in the modern world makes it so that everything which can be exploited will be exploited, and until the point that exploitation is no longer possible (as with the ruined beach), I strongly dislike unsustainable exploitation and personally tend toward the “destroying value” view.
And if you want something to worry about, let it be premature exploitation. X ‘creates’ value and chooses not to exploit it prematurely, but then Y will come along and take it, so X is forced to capitalize on it early. Now you have a moloch problem on your hands.
saying “sorry, just to make sure I understand what you’re saying, do you mean [...]” more often has been very valuable
yeah, turns off the combat mode
more importantly, both i and the other person get more out of the conversation. almost always, there are subtle misunderstandings and the rest of the conversation would otherwise involve a lot of talking past each other. you can only really make progress when you’re actually engaging with the other person’s true beliefs, rather than a misunderstanding of their beliefs.
i wonder how much the following hypothesis is true. it’s obviously not completely true, but maybe there is some value in contemplating it.
for most issues that people have expended a lot of effort arguing about, if you could truly impartially reason through it, run experiments, etc, the correct answer is either not that hard to figure out, or that we’re pretty sure we can’t know one way or the other. but the discourse is fucked because the vast majority (maybe literally all) of people have some bottom line they’re already sympathetic to, and smart people can make plausible sounding arguments for any conclusion, and so truly reasoning things through impartially is both very hard to do and even if you somehow manage to do it, it’s hard to signal that you did so, and anyways people with motivated reasoning will only listen to you if your answer happened to agree with theirs.
the main evidence i have for this hypothesis is that there are questions where one side is overwhelmingly obviously correct if you actually think about it or look into it, and yet, there is the appearance of a balanced public debate.
the main evidence against this hypothesis is that probably people who disagree also think they’re overwhelmingly obviously correct, and it seems arrogant to declare that i am simply more correct, and in any case oftentimes people disagreeing are wrong but their disagreement still contains enough of a kernel of truth to be worth thinking about.
corollary: the ability to look upon difficult controversial problems with utter naivety is extremely valuable. it’s probably bad to always be in this state, because you can get pwned by bad actors. but cultivating an ability to enter into this state, possibly through a community where people value this a lot, is extremely valuable (LW is the closest thing to this that exists in the world afaik, but please let me know if there is anywhere else that is better)
examples of such issues:
the literal veracity of various religions
AI water usage
supply and demand / effects of price fixing
the mere theoretical possibility of AGI
For empirical questions, one litmus test is whether most of the populace doesn’t know some true (like stats reported by large institutions with no disagreement from other reputable institutions) and clearly relevant fact, or repeatedly overestimates the observed risk of something.
AI water usage is an example. Others are the deaths per TWh of nuclear power, that extreme poverty (≤$3/day) went down by 3 times in the last 20 years (along with what percent are below that line), observed prediction market accuracy, the annual revenue of the US federal government as compared to e.g. the total net worth of billionaires (it’s about 1⁄3 of it), and that the labor share of GDP over time hasn’t dropped much.
I sometimes wonder if political beliefs would be very different if every child knew a certain set of basic stats—and also what political beliefs you could get if you adversarially optimized that set (while still keeping the stats reliably true and ‘basic’) to lead them to some specific conclusion.
the interesting thing about lots of these basic stats is that people who believe otherwise will often simply refuse to listen, because listening to the fact might change their mind, and they they would become an Outsider, and then be expelled by their social circle. or something like that.
considering that this can happen even for the clearest cut examples, one can only imagine how many less clear cut examples are also like this. perhaps a huge fraction of seemingly controversial issues actually have a simple answer that you can arrive at just by simply not having a preconceived answer, but where people incorrectly assume that people are disagreeing for good reasons.
(of course, even if this is true, there can and likely do still exist many problems where there genuinely is no simple undisputable answer!)
I don’t think AI water usage fits the pattern. People keep repeating the line “AI uses a lot of water”, and making political cartoons about AI slurping up all the water in the world, but I don’t remember anyone ever arguing for it being true.
This is in contrast to religion, price fixing and possibility of AGI, where you can in fact find smart people making long and complicated arguments on both sides.
here are 3 links that argue for AI water consumption being a big problem that i found in like 5 minutes. they aren’t good arguments, but they go beyond just asserting it.
https://www.issa.com/industry-news/ai-data-center-water-consumption-is-creating-an-unprecedented-crisis-in-the-united-states/
https://arxiv.org/pdf/2304.03271
https://www.foodandwaterwatch.org/wp-content/uploads/2026/02/FSW_2602_AI_Water_Energy_UPDATE.pdf
I think this is basically correct, modulo that a lot of this is just that people differ in their intelligence and epistemological skill, and so there are classes of dispute that are basically obvious to, for instance, you, while to many earnest but less capable truth seekers, it’s in fact not easy to say who’s right.
In my experience, there aren’t many such debates where the correct answer readily leaps to the surface if you look for it. It’s more that the correct answer is obvious once you find it, but if you’re in a poor epistemic environment, the answer is hard to find even if you’re looking for it.
For example, the literal veracity of Christianity is hard to determine if everyone you know is a devout Christian, even if you’re not biased at all by wanting to believe what other people believe.
What’s the alternative? That finding the correct answer is tractable but hard in the way that major scientific progress is?
the alternative is that it must be hard or impossible to know the truth on an issue anytime where there is the appearance of a vaguely balanced public debate among smart people. for example, if you believed the alternative deeply, you might say “well, a bunch of smart people think that AI uses a lot of water, and a bunch of other smart people think that AI doesn’t use a lot of water. while the argument that AI water use isn’t a big deal makes a lot of sense to me, clearly the people who think it does use a lot of water are equally convinced of their position, so who really knows who’s right”
But then wouldn’t we be “pretty sure we can’t know one way or the other”?
no, because sometimes we can know pretty sure one way or the other, and so if you were sure you can’t know, you’d be wrong
Some kind of miscommunication has happened. I thought the claim in the OP was that most topics fall in bin A “easy to figure out” OR bin B “probably unknowable”, which seemed like a trivial claim to me because the only alternative is bin C “tractable but hard”. But now it seems like you’re arguing most topics fall in bin A?
i love the concept of upvote-disagree. this feature singlehandedly encapsulates a lot of what i like about LW
It’s cousin, downvote-agree, is also very good (I experienced this firsthand once).
Good collection by Gustaf here.
This comment is also a great example.
i use that much less. i guess I have a strong bias towards upvoting, and a contrarian streak.
It’s definitely the funniest combo to me. Are there any known extreme examples?
There’s this one https://www.lesswrong.com/posts/XvN2QQpKTuEzgkZHY/being-the-pareto-best-in-the-world?commentId=HzWKu9cpnHNz2nrNW
conlang idea: an extremely easy to learn language with the following attributes:
grammar: mandarin, but without several of the quirks (no counters, merging 的/地/得, merging 不/没, etc)
vocab: drawn from mostly English, because it’s become the lingua franca. ideally the language is almost mutually intelligible to English speakers from the beginning
very simple phonology similar to japanese
consistent phonetic writing system based on the latin alphabet
You get things like this in creolization of pidgins. The regularizing factor is children—the children of pidgin speakers create an easy to learn full language.
(I would posit that, depending on your goals, it may actually be better to “conlang from the inside”, i.e. generate new words and other language structures from within English or other major natural tongues.)
A 2019 study claims that spoken languages transmit info at a similar pace, i.e. that languages which apparently transmit more spoken info per syllable (Mandarin, German) are spoken more slowly than opposite languages like Japanese or Spanish. The gap between the fastest and lowest transmission (English and Thai) was 1.4x, though I don’t quite understand the way they measured information.
It’s possible that written communication doesn’t work that way, though. Languages with simpler and more rational orthographies like Spanish are easier to learn to read compared to English. So I think the most clear benefit would come from spelling reform and maybe the way numbers are spoken/written in words, rather than apparent efficiency improvements to spoken communication (e.g. you listed grammatical or vocabulary changes) that just lead people to talk more slowly because they’re bottlenecked by how fast they can produce or understand the information.
Isn’t this just toki pona
only kinda? toki pona is trying to have as small a fundamental vocab as possible and constructing all other concepts using those few words. whereas i am totally happy to import all of English as prerequisite knowledge
actually, there should probably be a possessive-only particle (equivalent to ’s), destroy the word “of” and replace with “among” or possessive, the word “that” should be restricted to only being used as a pronoun, the word “which” is split into separate words for determiner and relative pronoun (akin to qui vs laquelle) then adjectives should always be before the word in reverse order of binding tightness to the object, but no strict rules otherwise (“beautiful green vase” is correct but “green beautiful vase” is not; “green big vase” and “big green vase” are equally correct, unlike English, and the order conveys emphasis on which aspects are most fundamental to the object).
the man who saw the dog is my brother = sii-ed daug de man bi mai brader.
to emphasize the man, you can add da after de, and to say a man (of many) you can add wan
I love mandarin’s grammar. People get hung up on the characters but it’s really not so bad nowadays with computers. Feels like a language designed by actual people (mostly)
example sentence: ao yumanbien-su borun frii and dei-su havu ikual diguneti and ikual raït-su. dei-su bei endaow ed rizen and kanshense and dei-su shuud tuwardu wannanada brade-li akt.
(pronunciation: pretend it’s japanese, and when there are isolated consonants just truncate the sound without the vowel. -su is a plural suffix, compound nouns are agglomerated, all verbs are in infinitive form, bei is the particle 被, all subjects are repeated in subordinate clauses rather than implicit.
wannanada looks like わんななだ rather than わんあなだ, not sure how to distinguish these in romaji. also tuwardu should be tuwaadu or tuwarudu.
by fiat, my conlang contains truncated syllables. to say r, simply say ru but stop before saying the u.
by fiat, there is no distinction between n’a and na in my conlang
people love to hate on brutalism. my take is there something really aesthetic about it, but also that just because something is aesthetic doesn’t mean that it would be a good place to live or work every day. in fact, I’ve unfortunately found that the environments that I find aesthetically pleasing and the environments that make me happy to live/work in diverge quite substantially.
This is also why various artists don’t necessarily try to make Tolkien’s Orthanc, Barad-dûr, Angband, etc look ugly, but imposing and impressive in some way. Even H.R. Giger’s biomechanical landscapes could be described as aesthetic. Or the crooked architecture in The Cabinet of Dr. Caligari (1920). Architecture is art, and art doesn’t have to be beautiful or pleasant, just interesting. But presumably nobody would like to actually live in a Caligari-like environment. (Except perhaps people in the goth subculture?)
The best example, for my tastes at least, is window-to-building ratio. I like working and living in buildings with high ratio, e.g. glass skyscrapers. But they look bad from outside.
This also applies to cars, and a huge reason why cars are a lot less beautiful/pleasing to look at (according to our detectors for beauty, which is the same thing as actually being beautiful) is because what makes a beautiful car to us is reliably less safe than a car that is a rectangular box with wheels and safety features, and explains away Graham’s design paradox for cars.
it’s kind of crazy that spaced repetition has completely revolutionized language learning and then not really changed the world in any other way at all. why are there no great scientists who are inhumanly good at remembering the corpus of their field through incremental reading? why are there no insanely good engineers who know every detail of their entire stack through spaced repetition?
Anki is super popular in med school, so I’ve heard
oh yeah, that too. but my understanding is this is mostly a “makes you good at test taking” thing. does it make you exceptionally good at medical research or being a doctor?
H1: Anki Doctors misdiagnose more often, because they get stuck with thought patterns like “Ah, sniffly nose, sore throat, must be… Diseasitise”, instead of correctly thinking “Ah, sniffly nose, sore throat… might be Diseasitis, or the cold, or the flu, or...”
H2: Anki Doctors are worse because they over-fixate on interesting/novel diseases that are more salient in an Anki deck, instead of just putting some thought into what would be the most logical fit for the ailment. “Your big toe hurts? Must be super-cancer+”, instead of “Your big toe hurts… [thinks for a bit] did you kick it on something?”
H3: Anki Doctors are better, because they know a larger range of diseases, and can diagnose more broadly. While a normal doctor might exit med school, and start with a broad range of diagnoses they hand out, perhaps as they gain XP, their range shrinks, and they find their ‘go-to’ diagnoses. On the other hand, an Anki Doctor might exit med school, and keep their range broad, as long as they keep practicing their Anki deck.
a really nice thing with spaced repetition is that you haven’t replaced how your memory functions, just augmented it a bit. but it’s still fundamentally your same brain, one with the ability to ask questions like “am I zooming too much/not considering all options?” or “is what’s salient to me actually what’s happening?” doing spaced repetition doesn’t have any bearing here unless it gives you a false sense of confidence in what you know or doing it crowds out developing other skills.
I’d expect H3 to be true, ceteris paribus, but would also expect that having the ability to ask questions like the one above and just think better to be of greater importance, i.e. someone whose thinking habits are more developed and doesn’t use Anki would be a better researcher/doctor than an Anki savant whose thinking habits are underdeveloped.
For non-standard curricula, probably the overhead of making cards? I have memorized lots of details about AI stuff just from reading and thinking about it; I tried using Anki for this but found the overhead of creating and maintaining a card deck to not be worth it.
Imo, best way to make cards is to do it while you study.
Read textbook
Highlight things that are discrete packages of info
Once you’ve finished that section, go back and turn the highlights into Q:A pairs
Q: What is the capital of France?
A: Paris
OCR those written down Q:A’s and have Claude format them into proper Anki format, with a skill (Or, you could vibe code a program to do this. I was mostly adding cards before vibe coding was a thing)
Add the output .md file to anki
This was the best way I found to study head and neck anatomy. Worked especially well if I’d just come out of the cadaver lab, and was already in the right ‘vibe’
I did Anki for ~3 years, and have now—sadly—dropped it. I loved doing Anki, because it made me look/feel smart. But, I think I got more out of writing daily for a month, than doing Anki for a year.
surely writing every day for a month costs a lot more willpower?
Oh, no way. Hard disagree for me. But Anki is almost exactly the same thing every day, and I struggle to do tasks like that (Cleaning for example).
I found writing every day many times easier. Even though I hacked around with Anki to make it as addictive as I possibly could. It was always something I was sliding off. If I missed a few days, it’d take 30m-1h to catch up, and knowing that I had other work to do, that wasn’t so repetitive… it was hard to do Anki.
Also, I had cards that were like “Who is this k-pop person?” b/c I wanted fun cards in there to keep me motivated, and because I wanted to impress my gf by knowing k-pop people’s names. But that gave me an aversion to doing Anki in public—since people seeing me look at pictures of k-pop people on my phone felt embarrassing. But with writing, I was very willing to do it anywhere and everywhere.
Spaced repetition is good for training declarative (explicit) memory, but procedural (implicit) memory is the more important kind of memory—in science, engineering and life in general.
We can write software to train procedural memory. To learn to touch type for example, most people use an application for that purpose. But you cannot add cards to typing-training software that would make the software useful for learning something other than touch typing: you would need to write an entirely different application.
There isn’t software for training most of the procedural skills people try to acquire because of the sheer variety in the procedural skills people want to learn and because developing software is expensive and because procedural knowledge is acquired automatically in the process of doing things and because most people are intrinsically skilled at choosing things to do (e.g., things just beyond their current ability) that maximize their learning rate (without the aid of software).
Actually, let me qualify that last statement as applying only to the situation before about 2024: since then I wouldn’t claim it impossible for someone somewhere to have used AI to create software than can be used to train a large fraction of all of the skills people want to acquire.
so I’ve been using spaced repetition for ~1.5 y. now in my CS degree and life in general. it’s a great solution for lots of problems: no relearning what I understood in class two weeks ago; a cheap way to incrementally refine my understanding over time on topics I haven’t worked on directly recently (which is most of them); the rate of forgetting is slowed down, etc. but to use spaced repetition this way requires learning how to do so, it’s not like with language where you can download Anki and a premade deck and you’re good to go. there are also practical problems:
my current prompt writing style is kind of similar to incremental reading, in the sense that I prioritize close cards written quickly over well-written Q/A cards. but even writing in this way is costly. for a 3h lecture I will write ~50-200 cards. even when writing as quickly as possible, almost automatically turning notes into cards, it takes ~30m. then tomorrow morning it takes ~30m to revise them + ~15m to revise the due cards. and that’s just one lecture. then there is the spread out cost of actually integrating that knowledge with other things I know, updating other understandings, etc. I don’t have any estimate for the time that takes up, but it’s definitely not free. so I doubt that it’s possible to actually remember the entirety of a field. there’s just too much information, and you need to chew it out to be able to make use of it.
similar story to the one above. plus I find that little idiosyncratic details which don’t fit neatly into a wider picture are the most difficult to remember, thereby being the most time-consuming.
all that is to say that spaced repetition makes learning more efficient by solving some memory-related problems, but not so much more efficient that you can just download an entire field into your brain in a manner of months.
How exactly did spaced repetition revolutionize language learning?
I had assumed most elite engineers are using spaced repetition?
I’m not elite but life would be much more difficult without it.
i find it’s much less useful for math/CS than languages.
i’m going to rerun the neurips agi experiment this year. place your bets on what fraction of people at neurips this year know what the acronym AGI stands for!
where are all the people trying to understand how the world works? (in a broad sense that is useful for understanding the trajectory of the world: e.g things like why is society the way it is, why do people behave the way they do; why has technology developed the way it has, etc.; as opposed to zooming in and specializing, e.g fundamental physics research or biomed or whatever) there are a bunch of people like this in the rationalist sphere but i’m curious where all the non-rationalist-adjacent such people are. it seems many people in the broader world are either uncurious or mindkilled on such questions.
Some books you might like to read:
Seeing Like a State by James C Scott (I’ve read most of it, I liked it)
Bullshit Jobs, The Dawn of Everything, most books by David Graeber (I’ve read and liked long extracts of his work)
The End: Hitler’s Germany 1944–45 by Sir Ian Kershaw (I’ve read all of it and found it very valuable as a complete picture of a society melting down)
Open Letters by Vaclav Havel (I’ve read a lot of it, I like it a lot. He was the first president of Czechoslovakia and a famous communist dissident and his writing sketches out both what he finds soul-destroying about that system and what he thinks are the principles of good societies)
System Effects: Complexity in Political and Social Life by Robert Jervis (I’m reading this now, very good case studies about non-obvious phenomena in international relations)
Broken Code: Inside Facebook and the fight to expose its toxic secrets by Jeff Horwitz (Very good book about how social media platforms like Facebook shape and are shaped by modern civilisation, I read all of it)
All of these books to various degrees tackle the things you are describing from a holistic perspective. Hope this helps.
thanks for the recommendations, I’ll add these to my reading list!
I’m also curious if there is a location or social cluster or something where there are a lot of people who read stuff like this and talk about it productively and come up with new ideas. (again, other than the rationalist/ratadj/bay area community—I’m stipulating this because I think the ratsphere and bay area as a whole are a bubble with a lot of other correlated beliefs)
I suppose that sociologists, historians, philosophers, and (especially) futurologists do tackle the questions you describe, though maybe there is a sense in which they aren’t doing so in a zoomed-out enough way.
https://sites.santafe.edu/~wbarthur/thenatureoftechnology.htm
And many Santa Fe people more generally, e.g., https://www.sfipress.org/books/history-big-history-metahistory or https://www.amazon.com/Scale-Universal-Innovation-Sustainability-Organisms/dp/1594205582 or https://www.amazon.com/Making-Sense-Chaos-Better-Economics/dp/0300273770
My understanding is that Bridgewater has a bunch of people like this, but they are unlikely to share their answers with the broader world.
Vaclav Smil is great on this, I really liked his book Growth. He takes a very numerate but still very different view on history (e.g., ah, fitting a sigmoid to GDP numbers in the book).
(I don’t know if the book is good but my knee jerk reaction to fitting sigmoids to things is it’s a bit spooky—see https://arxiv.org/abs/2109.08065)
It is, and it’s the thing I’d most like Smil to read if I could recommend something to him.
Fwiw, I read a number of Smil’s books, and it was my impression that he strongly expressed that same opinion about sigmoids, and the mentioned example might have been precisely an attempt to illustrate how you can show everything with fitting the right sigmoid. (But it’s been awhile since I read his books)
idea for a conference:
i want to host a conference which is kind of a cross between an unconference and a hackathon. the goal would be to create an environment where people can spontaneously do random side projects.
there is mostly just a big pleasant space where people can mix around and chat.
anyone can at any time decide to host a session on whatever they want. there’s a discord server where they can announce their session, and big whiteboards.
there is a dedicated work area where the goal is for roughly half of all people to be working at any one time. the rules of the work area are: no working on your normal job.
there is an army of ops and other logistics people available on call to solve any annoying logistical issues.
there is a big pile of random hardware around so you can make physical objects easily.
unlike a hackathon, there is no judging, no winners, no competition, no rigidly defined teams. you can float around and do whatever the fuck you want.
no conference app. if you want to do a 1:1 you have to walk up to the person and ask them to chat.
Maybe a good fit for Summer Camp at the festival season this year!
in the same way that Minecraft teaches you to exercise agency and Factorio teaches you to optimize, are there any games that teach you to stare into the abyss? the ideal game would (a) reward you on a tight feedback loop for constantly admitting that you were wrong, (b) give you the option to not admit that you were wrong but make that decision acutely hurt. pastcasting is good for (a) but not good for (b) because you are sort of forced to confront being wrong all the time, which maybe teaches you that it doesn’t feel as bad as you might expect, but it doesn’t teach you to intentionally seek out things that could prove you wrong; and you don’t really have time to develop an attachment to your wrong ideas. most normal games reward you for staring into the abyss very indirectly because being good at intentional practice makes you do better over the very long run, but you don’t get immediate feedback loops for it, and so it’s easy to just not realize you could be doing a lot better.
Chess. Mistakes in chess usually become noticeable quickly, in just a move or two, and you have no RNG or teammates to blame them on. But to get better you have to acknowledge your mistakes and avoid making the same mistakes again.
i think the problem is that the feedback loop is too long—if you notice a mistake, there is no obvious action, and no immediate feeling of having improved. what you really want is something where you can choose whether or not to notice that you are making a mistake, and choosing to notice gives you immediate positive reinforcement.
Play against a strong chess engine while allowing yourself to undo as many moves as you like at any time and try to find any winning game?
In general I find that I can trace losses in Go games to moments when I acted unvirtuously (e.g. greedily, impatiently, fearfully, arrogantly, etc).
Go is also a long enough game that one mistake seldom sinks you (as long as you’re willing to give up the sunk cost).
How about math olympiads? They do reward you for solving complex problems and require to admit that your first conjectures were hopelessly wrong (unless, of course, you happened to get them right. Alas, this might come with practice faster than the habit of staring into the abyss)
I mean, it only gets to the stage staring into the abyss when you spend 1h+ on one hypothesis and get nothing and are getting desperate and are attached to your idea for proof of A but realize it’s probably \neg A.
Mostly how it works is you collect observations then form hypotheses a test a few of those, and mostly you quickly realize what works and what doesn’t. And if I’m stuck and keep doing one thing it’s because I had tried many times to invent something better but I couldn’t. It’s a really, really difficult thing to pull yourself out of this “mode collapse” where you’re banging your head against the wall where there’s clearly a wall, but it’s a different skill from seeing the abyss because 1) it’s easy to notice your approach is lacking something but 2) “not making the mistake anymore” is not blocked by psychology but by g factor or something.
Outer Wilds comes to mind. Or The Witness? Or any of the other “figuring out the rules is the game” sub-genre.
For the thing you’re interested in, how important is the “game” part? (Minecraft and Factorio are both particularly excellent games with rich depth, in a way that pastcasting is not particularly)
The hardest part of “stare into the abyss” is that it’s often about stuff that you’ve wrapped your identity around in a psychologically loadbearing where. When I hear “the Minecraft of staring into the abyss”, I’m imagining something that gets you invested in an overall direction in a complex world, that is the wrong direction, and then have the opportunity to change course on your goal.
I think my Planmaking & “Baba is You” exercise is at least related. (In this variant, your instruction is to form a complete plan for solving a Baba is You level on your first try. This gives people a lot of opportunity to get invested in a set of assumptions and keep building on them. People are usually quite overconfident in a way that felt a lot more “gut punchy” than other calibration training)
by game i mean it in a very very loose sense. video games, board games, card games, sports games, strange workshop activities, etc all count.
for the identity load bearing ness, it seems possible you could create it on a short time horizon. for example, even just arguing about something for 10 minutes can make me feel somewhat invested in my position. having teams in general can create some level of this. i feel like if you stacked a bunch of different psychological tricks you could kind of approximate it. even just getting used to the meta has this—i often find that i stagnate in a game because i learned some suboptimal meta, but i feel some emotional avoidance towards learning better meta because the displeasure of losing is less than the displeasure of learning the new meta; and the feedback loop of winning slightly more often from better meta is not very easily felt.
possibly you can design a game where you constantly have to accept better meta to even progress at all through the game. similar to how it is almost impossible to play Factorio without automation even though it’s technically possible.
i think it’s undesirable to have a game with one big twist that you build up to. for feedback loop reasons you want to have to do it over and over again and consistently get reward when you gaze into the abyss and not get reward when you don’t.
Calibration games such as https://www.quantifiedintuitions.org/?
(a) You can choose to be wrong/overconfident, or you can acknowledge you don’t know when you don’t know. Acknowledging is rewarded.
(b) The game pushes you to try to be overconfident by making you want to be top 1 (beat other teams). And it hurts to see you ranking if you are failing.
Not sure I buy the premise that (a) is needed or even good? I mean, part of abysses is that they don’t offer immediate feedback. What about a video game where everything is basically one-shot? You can spend as long as you want preparing, including gathering resources and doing science to the environment, and then you get one big shot; if it goes well you win, if not you lose and lose all your progress.
maybe if you were trying to make a game to teach the feeling of having one try to solve alignment, sure. but that’s not the game i want here.
if you want to get better at anything, including gazing into the abyss, then you want to get as many quality reps as possible in a fixed amount of time. a rep is higher quality if the feedback loop is tighter, and if the abyss is more painful to gaze into. if we had mind reading tech what you’d want is prompt the user to reflect on things that are emotionally painful, to detect the moment they push past the resistance to confront the emotion, and dump nicotine into their bloodstream 3 milliseconds later. unfortunately, we don’t have this technology, so we need some other way to do this
I’m saying that the bottleneck isn’t getting the feedback really fast, it’s having abysses to stare into at all. So my proposal is aimed at generating lots of abysses at all.
an idea: a game where there are several distinct but mutually exclusive strategies (eg a shooter where you can be a sniper, or a bullet sprayer, or a tank, etc), where you have to invest a bunch of time into specializing, and then you feel sunk cost about switching to a different strategy; but make the environmental conditions constantly change (in subtle or hard to reason ways so you have to spend a bunch of effort to notice things changing / there is plausible deniability as to whether things changed or whether you were always suboptimal), so that the optimal strategy changes frequently; and make there be strong diminishing returns to further investment in a strategy, which simultaneously makes the sunk costs feel bigger, and makes the initial gains from switching strategies feel very large so when you switch strategies you very quickly start winning.
I wonder if there’s a question-asking game, preferably one-on-one that would encourage this? Something akin to NYT’s 44 questions to make anyone fall in love, but instead 44 questions to stare into the abyss. Getting the right interlocutor and the right questions would be hard to do though.
It’s not a game, but it is a structured activity.
I’m skeptical that you can really get the abyss in small doses. Maybe there’s also a progressive activity where the first exercises are small things to admit about oneself, before progressing to more and more difficult questions.
idea: a cold shower connected to an IV drip that delivers a microdose of some habit forming chemical
If you’ve ever had a long match of Go where you are losing from midgame onwards, you will feel quite a lot of these emotions. Go games can last for quite some time, and the fractal nature of your mistakes can be realised to a fairly high resolution. Especially if your opponent is higher rank than you so you are playing with a handicap (“but I had so much ground at the start?!! How did it go so wrong?!?!?!”)
how do you avoid just closing the game without going back through all of your mistakes?
If you are playing a real opponent they can review it with you, or a tutor can do the same.
Rain World is survival-platformer whose protagonist is a nimble omnivore tool-user (similar niche to an ancestral human’s). The prospect if exploration is enticing, but you are in the middle of the food chain, and so must balance the need to survive with the your own drive to explore. Your creature must:
evade predators,
find and sufficient food/prey to hibernate,
Take shelter before a lethal rainstorm arrives.
Exploring means doing the above in less time. Regions are gated based on minimum survival streak so each sortie is like a bet on your ability. There are carnivorous plants. It is difficult and stressful. I highly highly recommend it.
my life would often be better if I exercised more agency. why don’t I do so more often? here is a taxonomy of reasons I’ve noticed:
energy: I’m often very fatigued, which makes it much harder for me to do anything, which includes anything new.
decision fatigue: a related thing is even for a given amount of energy, I have a limited number of decisions I can make, and a limited number of things I can focus on and think carefully about.
emotional avoidance: sometimes, exercising agency requires admitting that I’ve been doing things wrong all this time, or that part of my identity is not what I want it to be, or confronting some past trauma. sometimes I identify as being bad at X, which make it hard to improve at X.
conformism: there’s a critic inside my head that yells at me when I do or even consider things that could be considered “crazy” by others. I ignore it more than most people, but it still has nonzero say.
uncreativity: in certain domains I’ve spent so much time thinking about a specific kind of idea that it becomes genuinely hard for me to even imagine other ideas.
cowardice: sometimes the ideas are obvious but they require large irreversible actions, and/or are likely to have unpredictable consequences.
I have an instrument that helps me mitigate decision fatigue, energy lack and conformism. I created an updatable random generator with a weighted list of all the ideas and activities that cross my mind. I exercise agency at the state of “designing” the free time passage, setting probability weights and side goals. Then I can circumvent fatigue of deciding what to do next, because i can click generate and it see the option. And since clicking generator button is a short action, a habit to actually go and execute the option can be formed.
oh wow this is a really cool idea. i’ll give it a try
Sounds interesting.
How often do you adjust this generator? What tasks do you put on it? Do you have any examples? Do you find yourself min maxing probabilities?
Minmaxing trap is not happening. I am only allowed to do one edit per finished session, and that edit can be just an increment by 1 or a decrement by 1 of some parameter in the generator, which takes ~15 seconds. If my priorities change, the generator will eventually converge (easy-in over couple days) through the increments to the new state. That prevents “being hyped” and placing “all in” into some new exciting project. The new project will gain weight only if it keeps looking worthy.
I may adjust at the end of a session if i feel that something should happen sooner/more often, or something was promted too often over the past week, or i felt that the session time length was inappropriate to make an unregressable progress, etc.
I put anything I want to do eventually. That includes “work on the publication”, “work on fanfic”, “make the geolocation script”, “update my transformer”, “play a match of king of the hill chess”, “calisthenics”, “solve project euler problem”, “go to the cinema and watch avatar 3″. Both fun and serious stuff.
I am only generating the activity when there is a moment no scheduled/obligation activities, this way it never interferes with life even if fun activities start randomly appearing more often.
The generator is implemented in Google Sheets, using its in-cell functions, thus accessable on both my pc and phone. At some point I have added a column of calculated expectations “how many % of time is expected to be spent on the activity if the generator did not change parameters over a long run”, but it was distracting and not exactly meaningful, since i change weights every day to reflect energy/mood/inspirations.
My experience is that sleep + gym ease most of these somewhat if I’m currently lacking on those dimensions.
Do you think you could help me with something, how did you identify specific events or choices, in your recent past, where more agency was required?
I’m very confident that I also would benefit in life from my exercising more agency. However it’s a vague confidence, I can’t offer any evidence or specific examples. When I try to list situations where I could have acted with more agency, there’s not any example where it’s clear that agency would have probably made things better, or at least had a better chance of a better outcome than making things worse. I can’t think of any recent retrospective examples where the difference between what I got, and what seems like a potentially better outcome, was simply not using enough agency. (As opposed to “oh, I wish I knew that earlier”—I think there’s an important distinction to be made between hindsight that comes form knowing what you couldn’t have possibly known at the time, and not acting successfully on the information you had at the time).
One example off the top of my head—I need to buy a book for study. Correct me if I’m wrong, but acting with more agency would have been buying it straight away. But I’ve procrastinated so far, and would my life be materially any different yet? No. Not at all. If I continue to procrastinate past a certain point, it will have negative consequences. But so far, none. Next week? Eh, even then it may be fine.
That’s the first, most immediate, pressing example that comes to mind.
How did you remember yours?
idk exactly how, they just pop up to my mind easily. maybe because i am very aware of the things i’m disappointed about not having done. also, i can consult my todo list, which is effectively a list of things i will never do because i don’t have enough agency. like i’m going to set a timer for 10 minutes and write as many things as i can think of:
i’ve known that exercise is really important since forever but never really getting around to doing it
i keep procrastinating scheduling meetings with people i should meet with
i often have entire months of very low productivity where it’s a slog to do anything and in the past it never occurred to me to take any of that time off even though i have way more PTO than i need
there are many instances where i finished doing a thing and then realized i should have started doing some other thing in parallel instead of waiting on it.
there will be multiple times a month where i will realize my life would be easier if i had planned ahead even a little bit. for example i will forget to book a flight until the last minute, at which point it’s harder and more expensive to get good seats, even though i knew my travel plans much earlier.
i will often do things the hard way for way too long, even after i know there could be a better way, because i’m too used to the old process and don’t want to expend the trivial amount of effort needed to check if there’s a better way. for example it took me an embarassingly long time before i learned to use a debugger, because i was used to print-statement debugging
i often procrastinate things because there is a trivial amount of inconvenience required to do it. like it took me years to get around to various medical interventions which really only required me to do some googling and then talk to a doctor twice and whatever.
hopefully these examples help you think of similar things in your life
there seem to be three different possible levels of manager involvement in individual researchers:
type 1 (the grantmaker): the manager spends a day reading the grant application of the researcher and decided whether to fund him for the next 6 months based on his track record, whether the research idea makes sense, etc. day to day, the grantmaker is completely uninvolved in the actual research. on fact, the grantmaker might not even be making decisions about individual researchers, but about entire organizations.
type 2 (the research manager): the manager spends a few hours every month getting complete context on what each of their reports is working on. each report has a different project that the manager thinks makes sense. the manager provides a light touch to guide the project according to their research taste, and vetos any terrible ideas, but most of the time the report makes most of the day to day decisions about what to do.
type 3 (the tech lead): the manager sets research taste for their team and basically orders around their minions. the manager provides almost all of the research taste, and the reports are either just doing engineering work, or doing very tightly scoped research on things the manager has cleared.
what I’ve seen at openai is a mix of all three, but I’ve found the second one to be the most predominant (my relationship with my managers has been like type 1.75 on average). high level research leaders are generally the first type; they allocate resources to parts of the org they believe to be the most important for the company as a whole. the second type is most people; most researchers have some random direction they believe in, and they are part of a team doing something similar, and their manager has substantial trust in their research taste. even for junior researchers, the focus is on having them develop their own taste eventually, rather than just executing on things for their manager. the third type is most common for large projects, where there is simply a lot of work to be done for the thing to work, but it’s pretty rare to have individual very good researchers amplified by an army of engineers doing their bidding or something like that.
I don’t know if this is an optimal way to set things up. and ML is a weird field among research disciplines. but it seems to work at least somewhat well, and may be of interest to attempts to understand how to do science well.
Type 2.25 is probably most common in the low-mid levels of academia. I see my supervisors officially around 2/month, but they’re around for me to talk to a few times a week.
Type 1 is common in higher levels, type 3 is common for lower levels. This overall seems pretty natural and I’d expect it’s a common feature of many orgs.
a great way to get someone to dig into a position really hard (whether or not that position is correct) is to consistently misunderstand that position
almost every single major ideology has some strawman that the general population commonly imagines when they think of the ideology. a major source of cohesion within the ideology comes from a shared feeling of injustice from being misunderstood.
There are some people that I’ve found to be very consistently thoughtful—when we disagree, the crux is often something interesting and often causes me to realize that I overlooked an important consideration. I respect people like this a lot, even if we disagree a lot. I think talking to people like this is a good antidote to digging yourself into a position.
On the other hand, there are some people I’ve talked to where I feel like the conversation always runs in circles so it’s impossible to pin down a crux, or they always retreat to increasingly deranged positions to avoid admitting being wrong, or they seem to constantly pattern match my argument to something vaguely similar instead of understanding my argument. I think arguing against people like this too much is actively harmful for your epistemics, because you’ll start digging yourself into your positions, and you’ll get used to thinking that everyone who disagrees with you is wrong. There are a bunch of people (most notably Eliezer) who seem to me to have gone too far down this path.
On the other side of the aisle, I don’t know exactly how to consistently become more thoughtful, but I think one good starting point is getting good at deeply understanding people’s viewpoints.
The people who understand the proper interpretation of the ideology can feel intellectually superior to those who don’t. Also, people who misunderstand something are by definition wrong… and therefore the people who understand the ideology correctly must—quite logically—be right!
(An equivocation between “be right about what is the correct interpretation of the ideology” and “be right about whether the ideology correctly describes the reality”.)
I think this is a subset of:
irritating people when discussing the topic is a great way to get someone to dig into a position really hard (whether or not that position is correct).
That irritation can be performed any way you like. The most common is insinuating that they’re stupid, but making invalid meme arguments and otherwise misunderstanding the position or arguments for the position will serve quite well, too.
I think this follows from the strength and insidious nature of motivated reasoning. It’s often mistaken for confirmation bias, but it’s actually a much more important effect because it drives polarization in public discussion.
I’ve been meaning to write a post about this, but doing it justice would take too much time. I think I need to just write a brief incomplete one.
I don’t think being irritating in general is enough. I think it’s specifically the feeling that everyone who has disagreed with you has been wrong about their disagreement that creates a very powerful sense of feeling like you must be onto something.
Really!? Okay, I’ll have to really present the argument when I write that post.
I do agree with your logic for why opponents misunderstanding the argument would make people sure they’re right, by general association. It’s a separate factor from the irritation, so I think I mis-statedit as a subset (although part of it seems to be; it’s irritating to have people repeatedly mis-characterize your position).
It seems pretty apparent to me when I watch people have discussions/arguments that their irritation/anger makes them dig in on their position. It seems to follow from evolutionary psychology: if you make me angry, my brain reacts like we’re in a fight. I now want to win that fight, so I need to prove you wrong. Believing any of your arguments or understating mine would lead to losing the fight I feel I’m in.
This isn’t usually how motivated reasoning is discussed, so I guess it does really take some careful explanation. It seems intuitive and obvious to me after holding this theory for years, but that could be my own motivated reasoning...
Unfortunately, I think the average person doesn’t understand misunderstanding. I think it can be taken as…
You’re too dumb to understand
You’re being purposely ignorant
You’re making fun of them I’ll give an example:
— I was recently in a conversation with a non-rationalist. I organised it because I wanted to talk about negative opinions they had on me. We talked a lot about certain scenarios.
In one scenario, I had seemingly rejected the suggestion that we all go bowling. I had said out loud “I hate bowling”. When what I meant was “I hate bowling, but I’m still very happy to sit in a bowling alley and watch my friends play.”
I think I did a bad job communicating there. It made my friends very angry (extra details about situation left out).
During our conversation, I asked for levels of anger or annoyance at me before and after I had explained what I had meant to say. I was surprised when one friend didn’t adjust their anger levels at all. I thought I must have done another bad job at explaining.
“So, you started at 80% angry at me. And now that I’ve told you my perspective, you’re still 80% angry?” This surprised me. I would adjust my levels down if someone explained that to me.
I went back and forth trying to get to the bottom of this for ~half an hour. After which I came to realise we were just wired very different. To do this I used your suggested technique. In the time it took for me to understand this one point, I had deeply annoyed my friend. They were under the impression that I was misunderstanding them on purpose somehow. I think I would have been less comfortable or fulfilled, but better off. If I had just accepted that they were still very angry. And had moved on. Instead, being confused and asking questions made my situation worse.
To be clear though. I did get to the truth with this technique. But sometimes winning can’t be about knowing the truth. Which is sad. I don’t like that. But I think it is true.
It is unfortunately impossible for me to know exactly what happened during this interaction. I will say that the specific tone you use matters a huge amount—for example, if you ask to understand why someone is upset about your actions, the exact same words will be much better received if you do it in a tone of contrition and wanting to improve, and it will be received very poorly if you do it in a tone that implies the other person is being unreasonable in being upset. From the very limited information I have, my guess is you probably often say things in a tone that’s not interpreted the way you intended.
I’d say that’s a good guess given the information I provided.
I think I did a good job in this particular circumstance as coming off as confused or curious. That was my aim, and I placed a lot of focus there. However, I haven’t listened back to the audio recordings of the conversation. It’s likely my previous comment is heavily bias.
hypothesis: intellectual progress mostly happens when bubbles of non tribalism can exist. this is hard to safeguard because tribalism is a powerful strategy, and therefore insulating these bubbles is hard. perhaps it is possible for there to exist a monopoly on tribalism to make non tribal intellectual progress happen, in the same way a monopoly on violence makes it possible to make economically valuable trade without fear of violence
Continuing the analogy:
You’d want there to be a Tribe, or perhaps two or more Tribes, that aggressively detect and smack down any tribalism that isn’t their own. It needs to be the case that e.g. when some academic field starts splintering into groups that stereotype and despise each other, or when people involved in the decision whether to X stop changing their minds frequently and start forming relatively static ‘camps,’ the main Tribe(s) notice this and squash it somehow.
And/or maybe arrange things so it never happens in the first place.
I wonder if this sorta happens sometimes when there is an Official Religion?
another way to lean really hard into the analogy: you could have a Tribe which has a constitution/laws that dictate what kinds of argument are ok and which aren’t, has a legislative branch that constantly thinks about what kinds of arguments are non truthseeking and should be prohibited, a judicial branch that adjudicates whether particular arguments were truthseeking by the law, and has the monopoly on tribalism in that it is the only entity that can legitimately silence people’s arguments or (akin to exile) demand that someone be ostracized. there would also be foreign relations/military (defending the continued existence of the Tribe against all the other tribes out there, many of which will attempt to destroy the Tribe via very nontruthseeking means)
unfortunately this is pretty hard to implement. free speech/democracy is a very strong baseline but still insufficient. the key property we want is a system where true things systematically win over false things (even when the false things appeal to people’s biases), and it is sufficiently reliable at doing so and therefore intellectually legitimate that participants are willing to accept the outcome of the process even when it disagrees with what they started with. perhaps there is some kind of debate protocol that would make this feasible?
Prediction markets? Generally, track people’s previous success rates about measurable things.
prediction markets have two major issues for this use case. one is that prediction markets can only tell you whether people have been calibrated in the past, which is useful signal and filters out pundits but isn’t very highly reliable for out of distribution questions (for example, ai x-risk). the other is that they don’t really help much with the case where all the necessary information is already available but it is unclear what conclusion to draw from the evidence (and where having the right deliberative process to make sure the truth comes out at the end is the cat-belling problem). prediction markets can only “pull information from the future” so to speak.
BTW, I like the “monopoly on violence” analogy. We can extend it to include verbal violence—you can have an environment where it is okay to yell at people for being idiots, or you can have an environment where it is okay to yell at people for being politically incorrect. Both will shape the intellectual development in certain directions.
Conflicts arise is when you don’t have a monopoly, so sometimes people get yelled at for being idiots, other times for being politically incorrect, and then you have endless “wars” about whether we should or shouldn’t study a politically sensitive topic X with an open mind, both sides complaining about lack of progress (from their perspective).
The more mutually contradictory constraints you have, the more people will choose the strategy “let’s not do anything unusual”, because it is too likely to screw up according to some of the metrics and get yelled at.
hypothesis: there will be a window of time after the point of superhuman AI persuasion/charisma, during which human trust relationships will become extremely important. even once almost all human skills are obsolete, the AIs may have less social trust capital than humans. ofc, eventually, the persuasion will be so superhuman that it can cut through minds like butter, but that could take many years.
once AI superpersuasion is possible, there will be a strong incentive to use it to shape opinions. therefore, there will also be a strong incentive for important decision makers to develop strategies for making sure they are not being bamboozled.
thankfully, there is one way to not get bamboozled by a superpersuader—to not listen to it in the first place. this is an age old idea. many social strategies have evolved to help people avoid talking to other superpersuading people. for example:
one main limiter of the growth of cults is people get used to noticing the signs and avoid people who seem culty, get suspicious of any rapidly growing movement, etc
people avoid preachers, political activists, etc
people have learned that AI chatbots are sycophantic and discount their praise
people mostly know that twitter and reddit are full of astroturfed bots
it seems likely that society would develop mechanisms in response to superpersuasion.
first, conversations with people you’ve actually met irl or can indirectly trust to be real people would be much more trustworthy than conversations with random internet people, who might secretly be AIs.
people you’ve known for a long time are more trustworthy than people you’ve just met irl. you’re better able to judge whether a person you know irl has suddenly changed their views than someone you just met, or to know them to be the kind of person who honestly reports how much they talk to the AIs.
meetings with people irl are more trustworthy than online interactions, because someone could always give the AI a deepfake of themself.
even if you can be sure you’re talking to a real human, there is still the possibility that some people you know well will choose to become appendages of the AI, subtly taking orders from the AI and concealing this fact. there is a spectrum of possibilities here.
one extreme is like Scott Alexander’s fictional whispering earring—an AI that puppets every move of yours, using you as a meat puppet. however, we can defeat this quite easily—you can take meetings inside a faraday cage, or force people to not have any headphones and to leave their phone outside the room, or to pat them down for listening devices. this may sound extreme, but this is actually not that intense all things considered for the actually important decision making.
the other extreme is an AI that only gives you brief guidance on what to do every few months and you’re on your own in between. the key thing is, the vast majority of humans will not be as charismatic as the AI. so, if the AI is not puppeting you, it forfeits a lot of the charisma benefit; even if it converts you, you might suck at convincing other people, and they might notice the fact that an AI is trying to steer you or coach you on how to be more charismatic—again, much easier for others to notice if you have a lot of connections who know you well.
our current political system is already kind of like this, because politicians are constantly exposed to an infinite array of people trying to change their minds on things, or get the ear of their trusted advisors, or their advisor’s advisors, etc. AI makes things a lot more intense, but also it introduces a natural pinch point to remove bad influence.
I’m not sure I understand your model of individual autonomy and variance in susceptibility to different kinds of persuasion. Listening to the wrong humans is at least as bad as listening to an AI. And listening to nobody isn’t really an option in modern society.
as an example of my confusion/disagreement:
I mean, this depends entirely on which people you’ve known, and who you’re meeting now. Someone from a very dysfunctional childhood meeting a social worker is in the opposite situation.
OP seems to be referring to the situation after AI superpersuasion with that statement.
I dunno this will matter because:
computer-domination could mean control of everything that online people see, including control and impersonation of other users (so even if you know someone irl, if they’re not physically near a powerful AI can act like them)
of course, many people willingly like and enjoy talking to AI, some even see past the annoying “AI” style quirks while liking it
already living in an atomized, community-starved culture, these people are also more likely to be ‘paid from AI’ gladly
the idea that computer-domination wouldn’t immediately cause humanoid-ish or physical-world manipulation, it’s hard to imagine a superhuman AI not ruthlessly testing physical-world manipulation as the next place to master (and then it seems this scenario would likely become not relevant)
i feel like the fundamental mistake the project of rationality made was that “cognitive biases” is not in practice the right way to think about the way humans are irrational if your goal is to be very instrumentally rational. one hypothesis is the correct frame is to first deeply understand how the emotional system works, and then to think about ways to master that system to achieve rationality.
(yes, i know that buried somewhere in the sequences it says something like “humans aren’t ideal intelligences with cognitive biases bolted on. we are the cognitive biases, they are just trying to approximate rationality”.)
By the time I went to CFAR in 2019 this felt like it had already become the dominant flavor of inner-circle rationalist thinking, but then that inner circle kind of petered out in influence. The person I see carrying that torch most loudly in my current social atmosphere is Chris Lakin.
But overall rationality has been kind of quiescent imo! Ray posts good stuff, Duncan has his own thing, but it feels like we went from mid-2010s “rationalists talk a big game but don’t get anything done” to the mid-2020s most influential rationalists being too object-level busy to blog much about this metacognitive stuff.
If we understand “irrational” to mean something like “underperforming relative to what should be feasible”, then I think one significant piece of the puzzle is the regime of “very impoverished hypothesis spaces”. In this regime, Bayes (and deviations from it) is more of a peripheral conceptual frame (allowing you to understand some edge cases and some basics and some constraints) than a central guide or even that much of a useful tool. A much more important question is about hypothesis generation, aka abduction or “unupdating”, i.e. expanding your hypothesis space. Another piece of the puzzle is that hypotheses are nothing like full possible worlds (as in many elementary models of epistemology), but rather are very-partial-possible-world-parts. Yet another piece of the puzzle is that concepts are very much not only or even mainly about prediction, propositions, and explanation (narrowly construed), but rather mainly about manipulation (including mental manipulation, e.g. “how could I have thought that faster or more efficiently”). Understanding how to think well in this regime, specifically in cases where you don’t already have all / almost all of the understanding you need to win in some situation (which cases are of outsized important in the grand scheme of things), is heavily weighted on “how do I think new things”, which is of a different flavor from “how do I think the probabilistically-right things”. (Which isn’t to say that reality-testing isn’t crucial—it is—but it has more of a flavor of searching for high confidence data and beliefs, rather than trying to update efficiently. It’s more important to build a camera / telescope than to practice your night vision. For Wisdom, it’s much more important that you are able to eventually update on all dimensions, rather than that you do update quickly on some dimensions.)
I think a big part of rationality is having all the algorithms in your brain “point in the same direction”. E.g. akrasia is a special case of this, where short-term “habits” and consequentialist reasoning are in conflict.
an idea that i associate with bay area strains of buddhism is something like “life is just a series of distractions, you are distracted from distraction only by even more distractions, and it’s all because you are experiencing suffering that is too uncomfortable to focus on, so if you somehow dispel all of it you are confronted with the existential dread of the impermanence of life, and dispelling that is the final boss.” i might be completely misunderstanding something, I’m not a Buddhism expert by any means, please correct me if I’m wrong.
I’ve updated a lot towards something like this being at least kind of true. it seems like at least for a certain neurotype of person, much of one’s behavior (ambition, addiction, hedonism, status, socialization, etc) serves as a way to distract from some kind of emotional pain. it looks different in different people; sometimes it looks like working a zillion hours so you have no time to reflect; or making status/money go up for its own sake; or drinking heavily; or playing video games; or spending lots of time at social events. the commonality is escaping the experience of the present in some way. in much the same way that you will flinch when on the threshold of a cold shower, because the anticipation of the discomfort tries to push you away from it, you flinch away from emotional discomfort. generally, confronting the pain is better than letting it fester.
of course, one very big source of emotional pain for many people is the impermanence of life and the associated existential dread. the idea of the inevitability of death is very spooky. contrary to the standard story though, i’d guess there are also many other sources of emotional pain, which vary by person. maybe if i introspect deeper i will discover that everything is secretly about impermanence deep down, but idk.
it also seems clear that some people are a lot more avoidant/distraction-prone than other people. at the most extreme is people with the ADHD neurotype. in some way, we can think of ADHD as emotional dysregulation that causes constant pain, which can be escaped primarily by distraction through entering flow state (which allows you to stop experiencing the self) or constantly fixating on different things to stay busy and avoid having to confront the emotional pain. this also explains stimming (self-soothing), procrastination (some takes have negative emotions associated, so they are avoided). I’m not sure if forgetfulness can be explained by this theory, though. on the other end, some (most?) people are naturally very good at living in the moment, are emotionally in tune, and generally do not exhibit any of the ADHD traits. i have no idea if any of this applies to normal people. my guess is it’s still applicable overall but less so, but also that there are a small number of exceptions, forms of emotional pain that normal people are actually more susceptible to than ADHD/autistic people. as weak evidence, even non Buddhist literature sometimes likes to make fun of normal characters for things like this.
also, i still think some parts of this are utterly crazy. the part i most disagree with is the part where you’re supposed to dissolve the pain of impermanence and fear of death. maybe many other fears should be dissolved, but it is good that impermanence is uncomfortable! you should be afraid of death and fill the void of existential dread with an ambition to end death forever.
the part where there is no such thing as pleasure, just distractions and the absence of suffering, also seems kind of crazy to me.
my overall stance on how to relate to things like Buddhism epistemically is I’d guess they contain many nuggets of wisdom but also i should remain deeply skeptical of the specific ways they’ve put the ideas together
You can work to end something without being afraid of it or finding it uncomfortable. A programmer looking for the cause of a bug in software is usually not afraid of the bug. If they did keep flinching away from the thought of the bug and finding the whole debugging thing uncomfortable, they’d do a worse rather than a better job at debugging.
If you want to end death, you’ll do a better job at it if you can think about it clearly and not flinch away from considering things suggesting you personally might not make it. This requires not being afraid of it.
“but it is good that impermanence is uncomfortable! you should be afraid of death and fill the void of existential dread with an ambition to end death forever.”
Are pain and fear the reasons you want an end to death? If not, their absence shouldn’t hinder you?
This is an incorrect description of the Buddhist position. Pleasure traditionally plays a really important role in Buddhist worldviews and practices! The first three jhanas have pleasure (sukkha) as one of their defining factors, and it’s also part of the definition of one of the brahmaviharas, mudita (sympathetic joy).
I thought the idea was that pleasure is not actually the opposite of suffering, in fact it’s only a distraction from suffering, and the only true solution to suffering is to stop desire?
There’s a few different claims:
One claim is that tanha (“desire” is a bad translation) is the thing that creates suffering, and that tanha comes up as resistance to pain and clinging to pleasure.
It seems reasonable to me to claim that there at least sometimes is clinging to pleasure. If I’m reading a really good book I might not want it to end, and if I’m sleepy in bed, I might want to enjoy the sleepiness for just a few more minutes after getting up.
There’s also another claim that I don’t fully understand and I’m not sure if I know how to describe correctly, but it’s something like… “At a low level of implementation if you zoom in closely enough, the relationship of pain and pleasure isn’t that of opposites in the way you’d expect if you only looked at a coarser level.”
As I said, I don’t totally get that one, but I don’t think I need to. It sounds like a technical kind of observation that will make more sense to me if I ever get that far in meditation, and until then I can just ignore it.
That said, I think it can be true that pain and pleasure are not opposites on a low level, while still feeling like opposites when experienced on a higher level. It’d be like saying that on a file system level, both a text file and a video are just collections of bits, while still behaving very differently if you try to open them on the application level.
With regard to “the only true solution to suffering is to stop tanha”, one thing that might be meant by it is “even when you are experiencing pleasure, there’s still some degree of suffering present”.
At least personally, I do often experience at least some degree of clinging to pleasure, so I think there’s some truth to this. Though of course, it doesn’t mean that there would be an equal amount of suffering present when I’m in pain vs. when I’m experiencing pleasure. The degree of subtle clinging that’s present with pleasure is much less than the suffering associated with resisting pain.
Though on occasion I’ve lucked into states where there’s much less clinging than usual to the pleasure, and it has felt pretty awesome! Can recommend. The lack of clinging lets you enjoy the pleasure much more fully.
But let’s be skeptical of this and assume that pleasure isn’t associated with any degree of suffering. Another thing that one might mean by “the only true solution to suffering is to stop tanha” is “you can’t experience only pleasure all the time, you’re going to have all kinds of periods in life when you feel terrible or at least uncomfortable, so the only way to stop suffering entirely is to stop feeling averse to pain and discomfort”. That seems straightforwardly true to me.
is clinging to pleasure isomorphic to emotional avoidance? the “resistance to pain” framing is more relatable than clinging to pleasure.
i think emotional avoidance is obviously bad. but it’s unclear that clinging to good experiences is bad? like you can obviously overdo it, but it seems much more of a problem if you’re clinging to something you’re no longer enjoying because letting go would mean having to confront unpleasant realities, vs clinging to something you’re enjoying a lot. if I’m enjoying a cookie, it’s normal and correct for me to resist if someone is trying to tear it out of my hands.
It’s often not the worst thing in the world, but the clinging introduces a low-level resistance to the pleasure that actually makes the pleasure less pleasant.
It also brings in various kinds of discomfort—e.g. someone who sees a tasty food they can’t eat, or an attractive person who isn’t into them, may experience a flash of pleasure from the sight of those and then make themselves uncomfortable by clinging to that experience and repeatedly thinking about what they can’t get.
If you are, say, at work and clinging to the thought of how nice it would feel to go home, your focus being on that future pleasure may prevent you from noticing things that you could enjoy in the current moment (it actively blocks them, as the clinging crowds out the pleasant sensory moments out of your mind). Someone may find themselves repeatedly waiting for the end of the day when at work, repeatedly waiting for the weekend when the day ends, and repeatedly waiting for the next vacation during weekends.
Joe Carlsmith writes:
I feel the “seems to hijack attention and agency” thing is particularly prominent when in the company of people I’m attracted to or whose favor I might otherwise want to get. It’s hard to think clearly when the mind keeps clinging to “what would get them to like me”.
Addictive behaviors also have clinging to pleasure as a major component—you think of how it would feel if you got to gamble/drink/whatever again, and then you can’t get the thought of that out of your head.
hmm, it’s not a priori obvious to me that these are all the same thing. the following feel distinct:
emotional avoidance. i know i should think about something, but the emotional pain of thinking about it pushes me away, and i find other things to distract myself from having to feel it, or procrastinate it, or make myself too busy to feel it, etc. i relate to the thing where indecision can arise from feeling emotionally avoidant of two different things i need to trade off between. addiction is often a way to distract myself from an unpleasant emotion.
longing. i want something i can’t have. i find myself compulsively thinking really hard about ways i can get it (often not super productively, my thinking is often very distorted in this mode). i don’t feel particularly avoidant of the feeling; if anything, it draws me in and distracts me from everything else. but it also feels inaccurate to say that everything else is causing me pain that i am trying to distract from (like with addiction), if anything, the longing is more painful, but i can’t look away.
other emotions mentioned feel like they’re maybe doing something else but it’s unclear. jealousy feels like longing plus some kind of tribal status thingy. anxiety is maybe longing for a kind of reassurance. the feeling that makes positive experiences worse because you’re scared that they will end is emotional avoidance.
i wonder if it even matters whether these are the same thing or different things. is there any difference in prediction if they are the same vs different things.
I think the underlying mechanisms for several of these is distinct, but there’s a neural mechanism that acts as a kind of shared bottleneck. Analogy: a word processor and a video player will get started up for different reasons, but they both make OS calls to read files from disk, and disrupting the file read operation will disrupt both.
In the case of clinging, I think the shared bottleneck is largely what Romeo described:
For emotional avoidance, my experience is that the thought of doing something will feel bad, and then my mind will seize upon the thought of doing something that would feel less bad. There’s an unpleasant email I should reply to and hmm, suddenly it feels hard to resist the thought of playing Slay the Spire. The thought of it clings to my mind even if I try to think about something else. Without clinging, it might be that the thought of playing StS would come up—maybe repeatedly—but it wouldn’t cling to the mind in the same way and keep pulling at my attention.
So the behavior itself is triggered by a subsystem trying to avoid discomfort by projecting the thought of something more pleasant into the workspace, but the clinging strengthens the effect and makes the discomfort persistent in a way it wouldn’t be without the clinging.
I think the mechanism might be something like, a subsystem sends the thought of playing StS to the global workspace, and then it’s emotionally charged in a way that will cause some separate subsystem to seize upon the signal and strengthen it so that it stays in the workspace. That means that on top of the avoidance-subsystem contributing signal strength to it in face of other sources of signal, the clinging gives the signal an extra boost. (Though that’s just speculation for what might explain the phenomenal experience, so this particular story turning out to be correct or wrong isn’t cruxy for me.)
For longing, there’s a similar thing—there’s some subsystem that sends up a thought of what one is longing for. That subsystem may be something entirely different than the emotional avoidance system. But once it sends its signal, clinging will amplify that signal and make it more persistent.
man vaccines are so fucking cool. it’s awesome that there are like a dozen horrifyingly painful and deadly diseases that i will almost certainly never get in my life. i wish i could get vaccinated against literally everything
https://en.wikipedia.org/wiki/Chiune_Sugihara
There’s a statue of him in Los Angeles’s Little Tokyo which I used to pay respects to when I visited for the New Year’s festival. As I became an EA I would aspire to match or exceed his impact.
always had you as von stauffenberg in my head :)
theory: a large fraction of travel is because of mimetic desire (seeing other people travel and feeling fomo / keeping up with the joneses), signalling purposes (posting on IG, demonstrating socioeconomic status), or mental compartmentalization of leisure time (similar to how it’s really bad for your office and bedroom to be the same room).
this explains why in every tourist destination there are a whole bunch of very popular tourist traps that are in no way actually unique/comparatively-advantaged to the particular destination. for example: shopping, amusement parks, certain kinds of museums.
I used to agree with this but am now less certain that travel is mostly mimetic desire/signaling/compartmentalization (at least for myself and people I know, rather than more broadly).
I think “mental compartmentalization of leisure time” can be made broader. Being in novel environments is often pleasant/useful, even if you are not specifically seeking out unusual new cultures or experiences. And by traveling you are likely to be in many more novel environments even if you are a “boring traveler”. The benefit of this extends beyond compartmentalization of leisure, you’re probably more likely to have novel thoughts and break out of ruts. Also some people just enjoy novelty.
I think this is probably true of you and people around you but also you likely live in a bubble. To be clear, I’m not saying why people reading this should travel, but rather what a lot of travel is like, descriptively.
What fraction would you say is genuinely motivated by “seeing and experiencing another culture”? I don’t doubt that most travel is performative, but I also think most of the people I interact with seem to have different motivations and talk about things from their travels which are a world away from the Pulp Fiction beer in a McDonalds discussion.
what does the prior work on intentional practice for research taste look like?
I don’t know; maybe look at this?
Maybe relevant: https://jsteinhardt.stat.berkeley.edu/blog/film-study
from To the Success of our Hopeless Cause: interestingly, a big tension in the Soviet dissident movement was between people who believed in being 100% virtuous, embracing martyrdom, signing their names and addresses onto their dissenting samizdat texts, protesting to be arrested 5 minutes later and sent to jail, pretending that the letter of the Soviet law actually mattered, etc, vs people who believed in being more strategic and openly illegal and trying to avoid being caught. the former fades in importance because they keep getting arrested (the 1968 red square protest being tbe turning point).
many of these dissidents have very relatable rat-coded vibes. one guy (Volpin) was obsessed with inventing a form of language that was unambiguous and impossible to make equivocating arguments in, as a way to enable rationalist thought. another (Amalrik) kept getting himself in trouble with his teachers and professors because he would insist on his correctness when they gave him bad grades for saying politically dangerous things, and he kept escalating, attempting to send his writing abroad via the Danish embassy and arousing the ire of the KGB. another (Sakharov) who throws away an illustrious high status well-paid career being the number one best nuclear weapon engineer of the Soviet Union to express his (quite mild) takes on the dangers of global nuclear war and pushing for international cooperation and arms treaties.
i feel very lucky never to have been in an environment like this. i don’t think someone like me would have survived. not only have i never ever felt my life or liberty at any threat due to things i say, i’m lucky enough to not even have felt strong pressure to not say true things (obviously within reason, eg no leaking IP) in order to preserve my job or social standing. like maybe i don’t get invited to the cool parties or whatever, but that’s absolutely trivial compared to the prices that some people have been willing to pay to exercise their right to speak.
On a meta topic, I’m curious about your writing style here. Why capitalize the names of books and countries but not the first letter of sentences?
I do that, too, in things not intended for publication (which is the vast majority of things I write). Why? On a computer, capitalization is not needed to help identify the start of a sentence. In contrast, in print, it is not always possible to tell the difference between a period and an unfortunately-positioned blemish in the paper. Since it is not needed there, I prefer to avoid the extra work of typing an upper-case letter. Also, if I re-arrange the wording of a sentence, which I do frequently, then I have to do more work to fix up the capitalization. Finally, capitalizing the start of the sentence destroys what information capitalization of the initial word would otherwise carry. E.g., in conventional orthography, “Mercury” at the start of a sentence could be either a reference to the planet or the element whereas in the orthography I prefer, the reader knows it is not a reference to the element (because elements are not capitalized).
Grandparent includes a quotation without unambiguously marking it as a quotation (e.g., with quotation marks or a block quote) which I never do in my published writings (and almost never do even in writings that will never leave my computer). This generalizes: in all cases I can recall, those who in public refrain from capitalizing the starts of sentences also violate orthographic conventions I’d would be extremely hesitant to violate.
where did i include an unmarked quotation? nothing in my post is an explicit quotation from the book.
Ah. Thanks for the correction. (I really like your “deranged surveys” post BTW.)
some of it is autocorrect—it’s easy to turn off sentence beginning capitalization, but i don’t think you can turn off capitalization of proper nouns without turning autocorrect/suggest off entirely. (this is also why i don’t capitalize i, but i do sometimes capitalize I’m). i like lowercase because it feels like a more casual or familiar register. on laptop, not capitalizing is also very pragmatic, because i don’t like pressing the shift key. on mobile, it’s out of a sense of consistency with laptop, and because i endorse laptop vibes more than i endorse phone vibes. as for the book title, tbh, I’m not sure what i was thinking, but one guess is this particular book title is especially easy to accidentally read as part of the surrounding text.
there is a samizdat journal that publishes on bad stuff the USSR does. after the arrests and confessions of Krasin and Yakir, the KGB basically threatens, like, the arrests will continue until The Chronicle stops publishing. the editors put out new issues anyways, saying something to the effect of we believe it is important to broadcast this info about human rights abuses in the Soviet Union to the world. staying silent would be accepting the hostage taking and go against human dignity. therefore we will keep publishing.
after all the principled people get arrested and put in jail, the remaining people are much more into being “strategic”. Krasin and Yakir are two dissidents who become leading figures of the dissident movement. they adopt a move fast break things attitude, doing things like issuing statements and putting signatures on without getting the consent of the people involved. for context, unlike the petitions of today where signing has almost no negative consequences, at that time and place, the people who signed these things often got arrested and sentenced to many years in prison. it’s a successful tactic—a bunch of people are outraged but also they’re afraid of creating fractures in the movement, and also they’d look cowardly if they backed out, so they don’t take their grievances further. Krasin at some point brags about being one of the leaders of the dissident movement. anyways uh later on in 1972 the KGB captures Krasin and Yakir, threatens them until they break down and agree to collaborate, and then use their testimony to incriminate dozens of dissidents, force them to talk to people in the movement to sow dissent, and put on a press conference to have them tell the world that actually the USSR is super good with human rights actually, they don’t actually lock dissidents in psychiatric hospitals, all the stuff we said earlier was misguided fake news, we are totally not just saying this because they will lock us and our families away in jail for a long time if we don’t say this.
it’s unclear whether there is a common factor behind all of these things or if most would have caved in these interrogations. we know many other people were captured and refused to say anything. but it’s suggestive of a certain kind of person.
hm. it worked for the early Christians...
How confident are we that this is actually true? When I’ve heard about this claim in the past, the actual evidence mentioned looked sort of thin to me when you broke things down.
it kind of worked for the soviet dissidents for a while! the book talks about the “chain reaction”—every time someone got arrested and put through a sham trial, someone would secretly transcribe the court proceedings which showed how farcical it was, publish the transcript as samizdat or sending it abroad to be broadcast by radio back into the soviet union (tamizdat), which would outrage people who would go protest, who would get arrested for protesting, completing the cycle. this cycle only ended after the 1968 red square protest, when people felt it was too hopeless to continue.
idea: flight insurance, where you pay a fixed amount for the assurance that you will definitely get to your destination on time. e.g if your flight gets delayed, they will pay for a ticket on the next flight from some other airline, or directly approach people on the next flight to buy a ticket off of them, or charter a private plane.
pure insurance for things you could afford to self insure is generally a scam (and the customer base of this product could probably afford to self insure) but this mostly provides value by handling the rather complicated logistics for you rather than by reducing the financial burden, and there are substantial benefits from economies of scale (e.g if you have enough customers you can maintain a fleet of private planes within a few hours of most major airports)
I’d have called this not a scam because it hands off the cost of delays to someone in a better position to avert the delays.
That’s a good Coasian point. Talking out of my butt, but I think the airlines don’t carry the risk. The sale channel (airlines, Expedia, etc.) take commissions distributing an insurance product designed another company (Travel Insured International, Seven Corners) who handles product design compliance, with the actual claims being handled by another company and the insurance capital by yet another company (AIG, Berkshire Hathaway).
LLMs tell me the distributors get 30–50% commission, which tells you that it’s not a very good product for consumers.
I know less than you here, but last-minute flights are marked up because businesspeople sometimes need them and maybe TII/SC get a better price on those?
it’s often stated that believing that you’ll succeed actually causes you to be more likely to succeed. there are immediately obvious explanations for this—survivorship bias. obviously most people who win the lottery will have believed that buying lottery tickets is a good idea, but that doesn’t mean we should take that advice. so we should consider the plausible mechanisms of action.
first, it is very common for people with latent ability to underestimate their latent ability. in situations where the cost of failure is low, it seems net positive to at least take seriously the hypothesis that you can do more than you think you can. (also keeping in mind that we often overestimate the cost of failure). there are also deleterious mental health effects to believing in a high probability of failure, and then bad mental health does actually cause failure—it’s really hard to give something your all if you don’t really believe in it.
belief in success also plays an important role in signalling. if you’re trying to make some joint venture happen, you need to make people believe that the joint venture will actually succeed (opportunity costs exist). when assessing the likelihood of success of the joint venture, people will take many pieces of information into account: your track record, the opinions of other people with a track record, object level opinions on the proposal, etc.
being confident in your own venture is an important way of putting your “skin in the game” to vouch that it will succeed. specifically, the way this is supposed to work is that you get punished socially for being overconfident, so you have an incentive to only really vouch for things that really will work. in practice, in large parts of the modern world overconfidence is penalized less than we’re hardwired to expect. sometimes this is due to regions with cultural acceptance and even embrace of risky bets (SV), or because of atomization of modern society making the effects of social punishment less important.
this has both good and bad effects. it’s what enables innovation, because that fundamentally requires a lot of people to play the research lottery. if you’re not willing to work on something that will probably fail but also will pay out big if it succeeds, it’s very hard to innovate. research consists mostly of people who are extremely invested in some research bet, to the point where it’s extremely hard to convince them to pivot if it’s not working out. ditto for startups, which are probably the architypical example of both innovation and also of catastrophic overconfidence.
this also creates problems—for instance, it enables grifting because you don’t actually need to have to be correct if you just claim that your idea will work, and then when it inevitably fails you can just say that this is par for the course. also, being systematically overconfident can cause suboptimal decision making where calibration actually is important.
because many talented people are underequipped with confidence (there is probably some causal mechanism here—technical excellence often requires having a very mechanistic mental model of the thing you’re doing, rather than just yoloing it and hoping it works), it also creates a niche for middlemen to supply confidence as a service, aka leadership. in the ideal case, this confidence is supplied by people who are calibratedly confident because of experience, but the market is inefficient enough that even people who are not calibrated can supply confidence because of the market inefficiency. another way to view this is that leaders deliver the important service of providing certainty in the face of an uncertain world.
(I’m using the term middleman here in a sense that doesn’t necessarily imply that they deliver no value—in fact, causing things to happen can create lots of value, and depending on the specifics this role can be very difficult to fill. but they aren’t the people who do the actual technical work. it is of course also valuable for the leader to e.g be able in theory to fill any of the technical roles if needed, because it makes them more able to spend their risk budget on the important technical questions, it creates more slack and thereby increases the probability of success, and the common knowledge of the existence of this slack itself also increases the perceived inevitability of success)
a similar story also applies at the suprahuman level, of tribes or ideologies. if you are an ideology, your job is unfortunately slightly more complicated. on the one hand, you need to project the vibe of inevitable success so that people in other tribes feel the need to get in early on your tribe, but on the other hand you need to make your tribe members feel like every decision they make is very consequential for whether the tribe succeeds. if you’re merely calibrated, then only one of the two can be true. different social technologies are used by religions, nations, political movements, companies, etc to maintain this paradox.
hot take: analogies should not be used as evidence for positions, except as the weakest form used to privilege an otherwise arbitrary hypothesis to any consideration at all. otherwise, they should be used purely as a way to effectively convey a hypothesis, but the actual evidence needs to come from something other than the mere analogy itself.
hot response: all evidence is analogy, it’s just a matter of degree. Maybe a heuristic like this is a good way to motivate gathering closer, more appropriate evidence, the better to increase confidence.
i can name many examples of evidence that are not analogy. perhaps they’re arguably “just” analogy, but it would be in an obviously boring way. like how there is “just” subjective experience because you can never really directly observe reality, and yet clearly there is a difference between empirical evidence and a priori reasoning (note that the evidence for this analogy comes from the examples below, rather than the analogy itself)
i think most Americans believe X. i run a survey. turns out Y% of Americans believe X.
i run some experiment on whether a given model architecture is better than another one, and get a number at the end. (as opposed to, say, merely making an analogy between NNs and human brains, and using that to predict something)
i read a study that is directly about a question i care about.
i write a mathematical proof.
Yeah, I guess I’m being provocative, but I actually do think that.
Survey (sampled, point-in-time, framing-dependent, out-of-context) is an (often quite good) analogy for any specific (or general) concept of what people believe.
Experiment is a good case: one kind of ‘really good’ evidence is obtained by driving out as many confounders, controlling as many variables as possible, then reasoning that ‘by this really really close analogy, the same or very very similar should obtain under what look like the same conditions’. Usually there are residual stochasticities. Often experiments are out of context in some way, or the conclusions we would like to draw are extrapolating some amount from what we actually checked.
‘Read a study’ inherits the analogy-ness of whatever experiments it’s based on, plus I guess the thing where by analogy to those other times you read studies that seemed to be accurate (by the same person or same team or same org or same discipline or same culture or same species, …), you expect it to be accurate.
Mathematical proof, as always, is a slightly annoying limit case. I could probably say something like, ‘well, any conclusion other than the mere mathematical conclusion is smuggling in some analogy (to quantities, types, relations, whatever)’, and perhaps also your mathematical methods are trusted by analogy to… errr the other times logic helped you, but I don’t know to what extent I’d really stand by that.
It seems to me that you are greatly broadening (that is, redefining) the word “analogy” to mean any sort of approximation. For example, survey results are not an analogy to the ground truth but an approximation to it.
I’m with Oliver Sourbut. Unless you’ve solved the induction problem AND can formally define your reference class, everything is analogy and degrees of similarity.
Which doesn’t make your take wrong, just that it’s a question of how close the analogy is, and how the analogous evidence applies to the compared predictions.
hot response: analogical inference does clearly provide some evidence. For example: I am conscious, you are otherwise pretty similar to me, therefore you are probably also conscious.
Agreed here. Basically, analogies are mainly useful from a “bounded rationality” perspective. A logically omniscient agent would reason directly from observations alone. On the other hand, analogies are most useful pedagogically, allowing someone who does directly understand something to teach it to someone else more quickly. You also have things like simulations. There we deliberately try and construct an analogy between a computer program and reality. In principle, there could also be “naturally occurring simulations” where we don’t try to set things up deliberately, but this doesn’t seem to happen very often.
In all of these cases, you have to be careful that all the parts of your mapping actually carry over, else you’ll get the wrong answer.
I think it depends on what you are including by analogy, when making general cases, we have to pretty much exclusively deal with analogs.
The most basic case might be stated as:
If ∀x(a⊃x⊢b) ⊃ ∀y(a⊃y⊢b)
y1⊢a
∴ y1⊢b
That is, e.g.
If all for all convicted criminals, killing someone implies they are bad, then for cops killing someone implies they are bad.
Officer Bob killed someone.
Therefore, officer Bob is bad.
If it is the case that for convicted criminals killing someone is only bad because, e.g., there wasn’t a justification, then (1) is false (and you would have to include that as a qualifiers to make it true). But it is trivial to accept as true.
my summary of these two papers: https://arxiv.org/pdf/1805.12152 https://arxiv.org/pdf/1905.02175
the first paper observes a phenomenon where adversarial accuracy and normal accuracy are at odds with each other. the authors present a toy example to explain this.
the construction involves giving each input one channel that is 90% accurate for predicting the binary label, and a bazillion iid gaussian channels that are as noisy as possible individually, so that when you take the average across all of them you get ~100% accuracy. they show that when you do ℓ∞-adversarial training on the input you learn to only use the 90% accurate feature, whereas normal training uses all bazillion weak channels.
the key to this construction is that they consider an ℓ∞-ball on the input (distance is the max across all coordinates). so this means by adding more and more features, you can move further and further in ℓ2 space (specifically, √n in terms of the number of features). but the ℓ2 distance between the means of the two high dimensional gaussians stays constant, so no matter what your ε is, at some point with enough channels you can perturb anything from one class into the other class and vice versa.
in the second paper, the authors do further experiments on real models to show that you can separate out the robust features and the unrobust ones, and recombine them into frankenstein images that look like dogs to humans but cats to the unrobust model and dogs to the robust model.
they also generalize the toy example in the previous paper. they argue that in general, adversarial examples arise exactly when the adversarial attack metric and the loss metric differ. in other words, the loss function (and downstream part of the model, in a multilayer model) implies some loss surface around any data point, and some directions on that surface will be a lot more important for the loss than some other directions. but your ε ball (in, say, ℓ2) that you do your attack in will treat all those directions equally importantly. so you can pick the direction that maximizes the amount of loss change.
their new example is a classification task on two features, where the two classes are very stretched out gaussians placed diagonally from each other, so that a ℓ2 ball from each mean reaches into the distribution of the other gaussian. during normal training, the classification boundary learned falls right along the line where the mahalanobis distance from the two means is the same (intuitively, the classification boundary falls along exactly those points where a data point is equally likely to be sampled from either distribution.) but this is different from ℓ2 norm! it treats distances along the low-variance axis of the gaussian as being much larger, so it doesn’t mind putting the boundary close (in ℓ2 norm) to the mean. this lets the ℓ2 perturbation step over the boundary.
Is it a very universal experience to find it easier to write up your views if it’s in response to someone else’s writeup? Seems like the kind of thing that could explain a lot about how research tends to happen if it were a pretty universal experience.
I think so/I have this. (I would emoji react for a less heavy response, but doesn’t work on older short forms)
The corollary is that it’s really annoying to respond to widely held views or frames which aren’t clearly written up anywhere. Particularly if these views are very inprecise and confused.
new galaxy brain hypothesis of how research advances: progress happens when people feel unhappy about a bad but popular paper and want to prove it wrong (or when they feel like they can do even better than someone else)
this explains:
why it’s often necessary to have bad incremental papers that don’t introduce any generalizable techniques (nobody will care about the followup until it’s refuting the bad paper)
why so much of academia exists to argue that other academics are wrong and bad
why academics sometimes act like things don’t exist unless there’s a paper about them, even though the thing is really obvious
This subjectively seems to me to be the case.
https://xkcd.com/386/
it’s actually insane how much of the entire economy is tech now. somewhere in the back of my head i still expected traditional “big” industries like big oil, or the banks, or whatnot to be the biggest. but i just realized this hasn’t been true for a long time.
Looking at market cap is kinda misleading though; the public stock market is not the same thing as the economy, and tech is over-represented in market cap because of winner-take-all dynamics and margins.
Also, Amazon (setting aside AWS) is primarily a consumer goods and logistics company, and Tesla is a car and battery manufacturer—they’re gigantic in part because they use tech well, but the actual goods and economic activity they generate aren’t exactly “tech”.
Most importantly for this context, market cap is about expectations for *future* earnings. So, it’s not really reflecting the economy today.
And tech stocks tend to have more growth baked into their valuations.
well, walmart is getting utterly destroyed by amazon (3x bigger). the gap is even bigger for tesla vs Toyota (5x) or whatever. if chatgpt is to be trusted, tesla is slightly bigger than every other car manufacturer combined. so whatever it is that “tech” is doing differently in those fields—and amazon is culturally much more similar to google or microsoft than toyota or GM—is working absurdly well.
higher margins rightfully means higher market cap. if your company is barely scraping by, youre not producing as much value.
Right, I think the market caps are justified for the most part. But market caps represent the present value of expected future profits, not a measure of current economic activity.
Not much surplus; you can still be a commodity around which huge volumes of production and consumption revolve even if your prospects for future free cash flow are very bad.
i think it’s possible for a company to be barely profitable but still be producing a bunch of surplus for its clients/consumers and workers, so a bunch of value for the world. this is because the owners of the company might be unable to capture much of this surplus. i think this can happen when increasing your prices would mean people would just buy versions of the same thing from other companies and decreasing your salaries would mean your employees leave to other companies. i think older and more competitive/efficient/optimized and [less innovative] industries are generally more like this. and i think this is probably an important factor to keep in mind when estimating surplus from profitability. that said, i agree that pro tanto higher margins imply higher surplus/[value to society] and also probably there is an empirical correlation
Tesla is bigger by market cap but if you look at metrics like revenue and the amount of cars sold it’s much smaller. Tesla earns it market cap by the hope that it’s technology will be more significant in the future when it has fully-driverless cars.
This should be capturing rather than producing. (Arguably Meta produces negative value).
category error to use market cap to size the economy, probably.
on paper there’s allegedly >$100T in assets under management worldwide
by US GDP, “Information” contributes 5.5% in 25q4. There’s various other tech-related entries from manufacturing/retail/services that should sum to >10%, but not the majority you’d expect from market cap.
What risks or downstream effects are most salient to you as you make this observation?
I’m trying to figure out how much political power AI has by virtue of being a big chunk of the economy. clearly, oil and gas have enough power to stop climate regulations. it seems like AI is already roughly as big as oil and gas (and tech as a whole much larger than oil and gas)
I was thinking about the other comments and their technical correctness re conflating the economy and the stock market / how much that correction of your OP actually matters for your area of concern.
Im actually very confused about what [number associated with a business or sector] tracks most closely with [political power] or [potential political power]. Market cap doesn’t look like a crazy place to start, but my guess is that there‘s something better (and I agree with the point that market cap likely overweights tech for this purpose).
I think the top 3 were still tech companies 10 years ago, and 5 out of 10 were already there so this doesn’t seem new? It’s just how much room they’ve had to grow that’s been surprising IMO.
Here’s the top 3 as of 2016 (AI researched)
Top 3 companies by market capitalization (Q1 2016)
Apple — ~$604 billion
Alphabet (Google) — ~$518 billion
Microsoft — ~$437 billion
Facebook / Amazon were also in the top 10. It’s mostly been them expanding ~10x from there in market cap.
i haven’t had a chance to think deeply about it but vibes wise i don’t like activation oracles
i think it’s really weird that people are trying to do vaguely interp flavored things but also trying to argue for the goodness of such techniques via empirical usefulness. i think there are broadly two self consistent worldviews here. one is that you want to understand how NNs actually work and then use that understanding for something. the other is you want to make models better at X (where X can be anything from “be a good chatgpt model” to “refuse bioweapon prompts” to “make weak to strong setup score go up”). but if you’re doing the latter the actual conceptually important part is picking the right X and then working really hard to make it go up using whatever techniques work. if you’re doing the former you should actually try to understand things whatsoever. it doesn’t make sense to try to do both and ultimately get neither. you should either do pragmatic or do interp.
The argument I would make is that you want to solve the practical problem, but you want to do so in a way that maximally scales with intelligence. And then white box techniques are more scalable than black box techniques, since schemers will predictably fool your black box techniques but not necessarily your white box techniques.
i totally agree for the case of actual white box understanding. this is what I’d consider the first worldview. my gripe is the interp-flavored techniques reveal very little understanding that might actually scale with intelligence, and yet through association with interp try to imply that they do.
I think the thing I’m saying is true even for interp/interp-adjacent techniques that give very little understanding—the fact that they’re white-box techniques at all should still make it harder for a schemer to get around them than black-box techniques.
ok, i agree with this. there is some room for disagreement on exactly how big the gap is between white box and black box—i think it’s very small compared to the gap from white box to full understanding. my main argument would just be ELK flavored, that there are spurious correlations that give you human simulators instead. but i don’t feel super confident that the constant factors work out to support my claim
There are ML papers floating around with training methods and architectural tweaks (e.g Block AttnRes or mHC-lite) which might be incorporated into future models.
It seems plausible to me that replacing standard residual skip connections with something more complicated:
scales intelligence somewhat but not past the frontier
makes it slightly harder for existing interp flavored techniques to generate understanding
doesn’t meaningfully affect the relative performance of linear probes vs output classifiers for inference-time detection of precursors to high-risk misaligned behavior.
I do think there are cases where models will be able to manipulate the data they’re feeding into white-box methods in a way that affects verdicts, but it’s hard to see these arising naturally before being demonstrated in more contrived scenarios, and I agree with evhub that this would be harder than circumventing black box safeguards.
Would you say a similar critique holds for sparse autoencoders?
(edit: i’ve tended to think of SAEs and AOs as basically end-to-end tools for activation-space interpretability, but in hindsight i see AOs are definitely trying to be more “lines go up” and end-to-end than SAEs, even if there are many loss function variants for SAEs. i think i get your point now)
i think SAEs are a completely reasonable thing under the first worldview, and mostly crazy under the second worldview (with the exception of maybe bio or something where I’ve heard they’re genuinely useful)
(SAEs are not sufficient to actually understand things, but they are a genuine step on the way there)
I sort of think of activation oracles as “chain of thought monitoring, but for activations”
Recall the basic chain-of-thought monitoring safety case: if chain of thought is necessary for a given model to complete a task, then (barring sophisticated encoding strategies) the chain of thought should be monitorable for that (task, model) pair.
As models get bigger / deeper, the space of tasks where chain-of-thought is necessary decreases, so (all else equal) chain of thought monitor-ability decreases.
How do we get around this? Well, assuming the model is using some kind of general purpose reasoning internally, that general purpose reasoning should have some general structure. And you know what’s great at learning general structures: neural networks!
We won’t be able to capture all the cognition of a model (that would require ambitious interp.), but we may be able to capture all the cognition that routes through something like “internal general purpose reasoning”.
How well activation oracles learn this general structure (/the degree to which there’s a general structure at all) is an empirical question, but we can get rich signal on this question by training activation oracles, and evaluating them in on auditing games, unsupervised elicitation, etc.
Now you might argue that this is a very contingent defense of activation oracles—if for a set of evals where you would expect activation oracles to work, a different method out-performs activation oracles, then should we abandon them for the favored method? (since our only evidence for this “general internal reasoning structure” is via downstream usefulness of activation oracles). Concretely, it seems like dumb prefill black-box methods mostly out-perform activation oracles in auditing evals. Given this, why favor activation oracles over dumb black box methods (or smarter black box methods like confessionals)
Given current public evidence, you have to rely on conceptual arguments to justify favoring activation oracles: intentionally subverting activation oracles seems harder for a schemer than intentionally subverting black box methods. I expect we should be able to show this in a compelling way with stress-testing / control evals (but defining a “fair-fight” is non-trivial since in-context red-team would unrealistically favor AOs—see this post for some thoughts on this).
inside people with substantial internal conflict, their parts might even be less aligned/connected with each other than they are with other people. this probably has really weird effects
hot take: introspection isn’t really real. you can’t access your internal state in any meaningful sense beyond what your brain chooses to present to you (e.g visual stimuli, emotions, etc), for reasons outside of your direct control. when you think you’re introspecting, what’s really going on when you think you’re introspecting is you have a model of yourself inside your brain, which you learn gradually by seeing yourself do certain things, experience certain stimuli or emotions, etc.
your self-model is not fundamentally special compared to any other models you have. it works the same way as your model of anyone or anything else, except you have way more data on yourself, and also you directly experience your own emotions and sensori stimuli, as opposed to having to infer them for other people. often your emotional brain sabotages your ability to understand yourself, but also it sometimes sabotages your ability to understand other people too (e.g groupthink, tribalism).
your self-model can diverge arbitrarily far from reality. when you’re emotionally unintegrated, you have a model of yourself that fails to understand how your emotions truly work, so you will systematically mispredict how you will actually behave, or try to fix yourself in ways that don’t work because you are misunderstanding the causes of your actions. further evidence comes from some of the split brain experiments which show people confidently hallucinating causes for their actions which are demonstrably untrue.
(for emotional integration in particular, misprediction is only half of the thing; the other part of the thing is sometimes you can get detached from stimuli and stop perceiving them entirely. this other part is not neatly explained by this hypothesis)
your self-model contains a nested self-model, but this isn’t special in any way. your models of other people have a model of yourself contained within (what does this person think about me?). your nested self model can also be arbitrarily wrong—it is very common to fail to understand the ways in which your top level self model is wrong. you probably don’t have a third level nested self model because that’s not very useful and very costly to maintain.
your inner monologue is not most of your thinking, and doesn’t give you faithful representation of your true thoughts. people without internal monologues do just fine. also, it’s very very common for people to lie to themselves.
I dont think this is just speculative/unfalsifiable. I claim that thinking of your “introspection” as self modelling will lead you to make better decisions irl. you can apply techniques for learning to model other kinds of knowledge. you can realize that cognitive biases that apply to other kinds of modeling also apply to modeling yourself.
there’s no clean boundary between your self model and your model of other things. if it’s useful to model e.g your phone, or gut bacteria, or glasses, or partner/close friends, in close interaction with your stimuli and actions, there is no sharp self boundary.
a regular computer program can truly introspect on itself in ways that humans cannot, but this is fundamentally not that interesting. it definitely doesn’t mean computers are conscious.
corollary: “introspection” is not in any way related to consciousness or moral patienthood, and it is uninteresting to ask whether AIs or nonhuman animals are capable of introspection for the purposes of determining things about consciousness and moral patienthood.
What experiences have you had that lead you to call this a ‘hot take’?
[I rephrased a few times to avoid sounding sarcastic and still may have failed; I’m interested in why it looks to you like others dramatically disagree with this, or in what social environment people are obviously not operating on a model that resembles this one. My sense is a lot of people think this way, but it’s a little socially taboo to broadcast object-level reasoning grounded in this model, since it can get very interpersonally invasive or intimate and lead to undesirable social/power dynamics.]
the experience that led to calling it a hot take is i was arguing against someone who disagreed with this right before i wrote it up
What was their position? (to the extent that you can reproduce it)
I heard that when people are in therapy, their self adapts to the school of psychotherapy. For example you start getting Freudian dreams if you are in Freudian therapy, but you start getting Jungian dreams instead if you are in Jungian therapy.
This seems to support the hypothesis that when we think we have discovered something deep inside us, often we have actually constructed it to fit our preconceptions.
(I suspect that Buddhism also mostly works this way. When Buddhists say that they can verify the truth of all Buddha’s words by introspection… on one hand, yes they can; on the other hand, if they instead believed in Jesus, they could verify that just as well. Asking yourself is like asking an LLM: whatever you believe is true, it will confirm.)
in my worldview this is very easily explained. if you do Jungian therapy your self model starts incorporating Jungian concepts for explaining your own brain. You didn’t change the way your brain works fundamentally, you just changed your own model of your brain. The same way that if you read a book on the biology of plants you’ll start viewing them in the lens of cells, and if you read a book on the ancient spirits associated with each plant you’ll start thinking of plants as being animated by the ghosts of our ancestors.
The big mistake happens when people think of their self model as actually genuinely introspection. Then, you might think that you’ve changed the shape of your mind instead of only changing your understanding of your mind.
Instead, I think the right way to figure out if your self model is correct is to make predictions about your future behavior and see if they come true; act based on your self model and see if you become more successful at life, or whether you mysteriously repeatedly fail in some way.
when i first came to the bay area, i was shocked that the silicon valley was literally just a bunch of suburbs and boring office parks. i used to think this was very incongruous. i still mostly do, but now i at least have a story for why the vibe isn’t literally maximally incongruous.
one software architectureintuition i have is that legibility and standardization are utterly essential. the move from “servers as pets”(each one bespoke and carefully hand crafted, when something goes wrong, going in and fixing that one server) to “servers as cattle” (standardize all of the servers to be exact clones, destroy and recreate each server any time something deviates even slightly from identical rather than try to patch things on that one server by hand) is a good example. or for instance, you generally want to make things use common interfaces wherever possible, even if that means some slight awkwardness or indirectness, because the value of being able to interchange things is so high.
modern car oriented cities are a great example of taking this intuition and applying it to the physical world. steamrolling the unruly nature of old fashioned cities and reshaping everything to fit a handful of standardized shapes. this is why driving in Palo Alto is so much easier than in, say, Boston, or SF—there are many fewer quirky intersections or unpredictable bike lanes or pedestrians.
another software intuition is in general it’s worth taking on a pretty sizeable fixed cost as long as it makes things present a substantially simpler interface, or more reliably serve its purpose. for example, it’s worth pulling in a thousand different versions of packages in your node_modules and wasting quite a lot of disk space and network bandwidth in order to eliminate dependency hell once and for all; or putting your application in a docker container so you don’t have to worry about “works on my machine” ever again; or making file system blocks fixed sizes to make the accounting a lot easier, even if that wastes some space.
car centric cities are optimized heavily for a specific purpose—arbitrary point to point travel in the comfort of your own vehicle. there is a kind of elegance to this; and if you have dynamic pricing, and are unconcerned with the median experience, it makes the best possible experience better. it also comes with many disadvantages (horrible for pedestrians, inefficient use of land, low bandwidth, makes the vibes horrible) that have been harped on at sufficient length that i don’t need to go into it in depth here; i still experience deep suffering every time i perceive south bay. but i think i now understand better why you would ever want to build things this way.
you can have grid cities which aren’t car centric, that are much easier to travel in, while having better best experiences. car centric cities have the problem of traffic.
what’s the current state of analysis on whether the civil rights act of 1957 was actually net positive or negative for civil rights in hindsight? there are two possible stories one can tell, and at the time people were arguing about which is correct:
passing even a useless civil rights bill is a lot better than nothing because it sets a precedent that getting civil rights bills through the Senate is possible / makes the southern coalition no longer look invincible. this serves a useful coordination mechanism because people only want to support things that they think other people will support.
passing a useless civil rights bill is worse than no bill because it creates a false sense of progress and makes it feel like something was done even when nothing was. to the extent that the bill signals to people that getting civil rights bills through the Senate is possible, this is a false impression because the only reason the bill could get through was that it was watered down to uselessness.
this feels directly analogous to the question of whether we should accept very weak AI safety regulations today.
It seems extremely net-positive for civil rights, but mainly through the mechanism of it making Lyndon Johnson a viable candidate for president while maintaining his stature with the southern democrats, leading ultimately to the Civil Rights Act of 1964.
This can be seen as a generalizable lesson only insofar as you think weak bills like that are typically passed by Lyndon Johnson-like figures playing 4d political chess ultimately for altruistic reasons. Without that effect, it mostly seemed bad, it likely actually decreased the number of black voters, and did not decrease the south’s ability to filibuster the senate against civil rights (which was the main mechanism by which civil rights bills were unable to pass), eg they filibustered away another civil rights bill in 1959 or something. Plus, if not for Lyndon Johnson ultimately being pro-civil rights, it would have put someone decidedly anti-civil-rights into the presidency.
so it sounds like there’s basically no way anyone could have known that johnson would actually be a pro civil-rights president, and that all the civil rights people who were opposed to the 1957 bill at the time were basically opposed for the right reasons? like basically everything we know about johnson as of 1960 suggests that he is telling everyone what they want to hear and it’s unclear whether he has any convictions of his own except for his strong track record of defending the interests of the south.
Basically yes. His staff likely coulda predicted this (eg there were a few circumstances where out of anger he did some small civil rights stuff, then backed off when he cooled down & looked at the political repercussions), and possibly Lady Bird, but no other senator or member of the public had any reliable way to predict this for the reasons you state.
I mean, even in the Felix Longoria Arlington case, which is what I assume you’re referring to, it seems really hard for his staff members to have known, without the benefit of hindsight, that this was any significant window into his true beliefs? I mean, johnson is famously good at working himself up into appearing to genuinely believe whatever is politically convenient at the moment, and he briefly miscalculated the costs of supporting civil rights in this case. his apparent genuineness in this case doesn’t seem like strong evidence.
Some evidence for (2) is that before the 1957 act no civil rights legislation had been passed for 82 years[1], and after it three more civil rights acts were passed in the next 11 years, including the Civil Rights Act of 1964, which in my understanding is considered very significant.
Going off what’s listed in the wikipedia article on civil rights acts in the United States.
there’s an exogenous factor, which is that the entire country was shifting leftward during the 50s and 60s. it’s plausible that the 1964 bill would have passed anyways without the 1957 bill, possibly even earlier
Fair enough yeah. But at least (1)-style effects weren’t strong enough to prevent any significant legislation in the near future.
there’s an obvious synthesis of great man theory and broader structural forces theories of history.
there are great people, but these people are still bound by many constraints due to structural forces. political leaders can’t just do whatever they want; they have to appease the keys of power within the country. in a democracy, the most obvious key of power is the citizens, who won’t reelect a politician that tries to act against their interests. but even in dictatorships, keeping the economy at least kind of functional is important, because when the citizens are starving, they’re more likely to revolt and overthrow the government. there are also powerful interest groups like the military and critical industries, which have substantial sway over government policy in both democracies and dictatorships. many powerful people are mostly custodians for the power of other people, in the same way that a bank is mostly a custodian for the money of its customers.
also, just because someone is involved in something important, it doesn’t mean that they were maximally counterfactually responsible. structural forces often create possibilities to become extremely influential, but only in the direction consistent with said structural force. a population that strongly believes in foobarism will probably elect a foobarist candidate, and if the winning candidate never existed, another foobarist candidate would have won. winning an election always requires a lot of competence, but no matter how competent you are, you aren’t going to win on an anti-foobar platform. the sentiment of the population has created the role of foobarist president for someone foobarist to fill.
this doesn’t mean that influential people have no latitude whatsoever to influence the world. when we’re looking at the highest tiers of human ability, the efficient market hypothesis breaks down. there are so few extremely competent people that nobody is a perfect replacement for anyone else. if someone didn’t exist, it doesn’t necessarily mean someone else would have stepped up to do the same. for example, if napoleon had never existed, there might have been some other leader who took advantage of the weakness of the Directory to seize power, but they likely would have been very different from napoleon. great people still have some latitude to change the world orthogonal to the broader structural forces.
it’s not a contradiction for the world to be mostly driven by structural forces, and simultaneously for great people to have hugely more influence than the average person. in the same way that bill gates or elon musk are vastly vastly wealthier than the median person, great people have many orders of magnitude more influence on the trajectory of history than the average person. and yet, the richest person is still only responsible for 0.1%* of the economic output of the united states.
*\ fermi estimate, taking musk’s net worth and dividing by 20 to convert stocks to flows, and comparing to gdp. caveats apply based on interest rates and gdp being a bad metric. many assumptions involved here.
I think there’s a spectrum between great man theory and structural forces theory and I would classify your view as much closer to the structural forces view, rather than a combination of the two.
The strongest counter-example might be Mao. It seems like one man’s idiosyncratic whims really did set the trajectory for hundreds of millions of people. Although of course as soon as he died most of the power vanished, but surely China and the world would be extremely different today without him.
A synthesis between the structural forces theory and “pulling the rope sideways”.
The economical and other forces determine the main direction, a leader who already wanted to go in that direction gets elected and starts going in that direction, his idiosyncratic whims get implemented as a side effect.
Like, instead of Hitler, there would be another German leader determined to change the post-WW1 world order, but he would probably be less obsessed about the Jews. Also, he might make different alliances.
Musk only owns 0.1% of the economic output of the US but he is responsible for more than this, including large contributions to
Politics
Space
SpaceX is nearly 90% of global upmass
Dragon is the sole American spacecraft that can launch humans to ISS
Starlink probably enables far more economic activity than its revenue
Quality and quantity of US spy satellites (Starshield has ~tripled NRO satellite mass)
Startup culture through the many startups from ex-SpaceX employees
Twitter as a medium of discourse, though this didn’t change much
Electric cars probably sped up by ~1 year by Tesla, which still owns over half the nation’s charging infrastructure
AI, including medium-sized effects on OpenAI and potential future effects through xAI
Depending on your reckoning I wouldn’t be surprised if Elon’s influence added up to >1% of Americans combined. This is not really surprising because a Zipfian relationship would give the top person in a nation of 300 million 5% of the total influence.
The Duke of Wellington said that Napoleon’s presence on a battlefield “was worth forty thousand men”.
This would be about 4% of France’s military size in 1812.
i’m happy to grant that the 0.1% is just a fermi estimate and there’s a +/- one OOM error bar around it. my point still basically stands even if it’s 1%.
i think there are also many factors in the other direction that just make it really hard to say whether 0.1% is an under or overestimate.
for example, market capitalization is generally an overestimate of value when there are very large holders. tesla is also a bit of a meme stock so it’s most likely trading above fundamental value.
my guess is most things sold to the public sector probably produce less economic value per $ than something sold to the private sector, so profit overestimates value produced
the sign on net economic value of his political advocacy seems very unclear to me. the answer depends strongly on some political beliefs that i don’t feel like arguing out right now.
it slightly complicates my analogy for elon to be both the richest person in the us and also possibly the most influential (or one of). in my comment i am mostly referring to economic-elon. you are possibly making some arguments about influentialness in general. the problem is that influentialness is harder to estimate. also, if we’re talking about influentialness in general, we don’t get to use the 0.1% ownership of economic output as a lower bound of influentialness. owning x% of economic output doesn’t automatically give you x% of influentialness. (i think the majority of other extremely rich people are not nearly as influential as elon per $)
one kind of reasoning in humans is a kind of instant intuition; you see something and something immediately and effortlessly pops into your mind. examples include recalling vocabulary in a language you’re fluent in, playing a musical instrument proficiently, or having a first guess at what might be going wrong when debugging.
another kind of reasoning is the chain of thought, or explicit reasoning: you lay out your reasoning steps as words in your head, interspersed perhaps with visuals, or abstract concepts that you would have a hard time putting in words. It feels like you’re consciously picking each step of the reasoning. Working through a hard math problem, or explicitly designing a codebase by listing the constraints and trying to satisfy them, are examples of this.
so far these map onto what people call system 1 and 2, but I’ve intentionally avoided these labels because I think there’s actually a third kind of reasoning that doesn’t fit well into either of these buckets.
sometimes, I need to put the relevant info into my head, and then just let it percolate slowly without consciously thinking about it. at some later time, insights into the problem will suddenly and unpredictably pop into my head. I’ve found this mode of reasoning to be indispensible for dealing with the hardest problems, or for generating insights, where if I just did explicit reasoning I’d just get stuck.
of course, you can’t just sit around and do nothing and hope insights come to you—to make this process work you have to absorb lots of info, and also do a lot of explicit reasoning before and after to take flashes of insight and turn them into actual fleshed-out knowledge. and there are conditions that are more or less conducive to this kind of reasoning.
I’m still figuring out how to best leverage it, but I think one hypothesis this raises is the possibility that a necessary ingredient in solving really hard problems is spending a bunch of time simply not doing any explicit reasoning, and creating whatever conditions are needed for subconscious insight-generating reasoning.
I have a pet theory that there are literally physiological events that take minutes, hours, or maybe even days or longer, to happen, which are basically required for some kinds of insight. This would look something like:
First you do a bunch of explicit work trying to solve the problem. This makes a bunch of progress, and also starts to trace out the boundaries of where you’re confused / missing info / missing ideas.
You bash your head against that boundary even more.
You make much less explicit progress.
But, you also leave some sort of “physiological questions”. I don’t know the neuroscience at all, but to make up a story to illustrate what sort of thing I mean: One piece of your brain says “do I know how to do X?”. Some other pieces say “maybe I can help”. The seeker talks to the volunteers, and picks the best one or two. The seeker says “nah, that’s not really what I’m looking for, you didn’t address Y”. And this plays out as some pattern of electrical signals which mean “this and this and this neuron shouldn’t have been firing so much” (like a backprop gradient, kinda), or something, and that sets up some cell signaling state, which will take a few hours to resolve (e.g. downregulating some protein production, which will eventually make the neuron a bit less excitable by changing the number of ion pumps, or decreasing the number of synaptic vesicles, or something).
Then you chill, and the physiological questions mostly don’t do anything, but some of them answer themselves in the background; neurons in some small circuit can locally train themselves to satisfy the question left there exogenously.
See also “Planting questions”.
a thing i think is probably happening and significant in such cases: developing good ‘concepts/ideas’ to handle a problem, ‘getting a feel for what’s going on in a (conceptual) situation’
a plausibly analogous thing in humanity(-seen-as-a-single-thinker): humanity states a conjecture in mathematics, spends centuries playing around with related things (tho paying some attention to that conjecture), building up mathematical machinery/understanding, until a proof of the conjecture almost just falls out of the machinery/understanding
This is learning of a narrow topic, which builds representations that make thinking on that topic more effective, novel insights might become feasible even through system 1 where before system 2 couldn’t help. With o1, LLMs have systems 1 and 2, but all learning is in pretraining, not targeting the current problem and in any case with horrible sample efficiency. Could be a crucial missing capability, though with scale even in-context learning might get there.
Sounds like a synthetic data generation pipeline.
Relatable.
Giorgio Parisi mentionned this in his book; he said that the ah-ah moments tend to spark randomly when doing something else. Bertrand Russell had a very active social life (he praised leisure) and believed it is an active form of idleness that could reveal very productive. A good balance might be the best way to leverage it.
Since there are basically no alignment plans/directions that I think are very likely to succeed, and adding “of course, this will most likely not solve alignment and then we all die, but it’s still worth trying” to every sentence is low information and also actively bad for motivation, I’ve basically recalibrated my enthusiasm to be centered around “does this at least try to solve a substantial part of the real problem as I see it”. For me at least this is the most productive mindset for me to be in, but I’m slightly worried people might confuse this for me having a low P(doom), or being very confident in specific alignment directions, or so on, hence this post that I can point people to.
I think this may also be a useful emotional state for other people with similar P(doom) and who feel very demotivated by that, which impacts their productivity.
for some reason, the irrational belief that nobody will read my shortforms paradoxically makes them much easier to write. if I’m writing something polished that i think lots of people will read, then i get scared that people will see it and think less of.me or something, which manifests as unreasonable perfectionism and a desire to present a fictitious version of myself and my thinking. i wonder if there is some way to get the best of both worlds—to produce more authentic but also high quality widely read posts
One possible approach would be having someone else polish the posts for you/work iteratively on ideas with a friend. Writing partners are useful! Not only do they divorce you from some of the reputational risk of writing whatever you’re writing, they also turn the worst part of writing into a non-issue (the mental context switch between generating and polishing). There’s also the concept of beta readers, but that admittedly feels a bit sillier for web forum posts. Just being able to offload the mental work of perfection into a “good enough” of “$person i trust thought it was good actually so it’s fine” does a lot for improving speed of iteration, usually without meaningfully degrading quality in a noticeable fashion.
Write it as shortform, then at the last moment change your mind and post it as article. It sounds stupid but it works for me.
(It does not have to be literally the last moment, but when you already wrote enough, you see how it will end, and you can continue writing without losing the momentum.)
thoughts on what to work on in a world with heavy AI automation
it seems undeniable at this point that AI automation will play a huge role in all research in the medium term future. therefore, we should take automation into account when choosing what to work on. here are some possibilities, in decreasing order of how optimistic you are that models will be good at alignment research.
push for people to allocate more resources to alignment. the most optimistic assumption is that models are perfect drop in replacements for human researchers, and are equally good as good at alignment and capabilities research as humans are. then, we just need to allocate more compute to alignment and we’re chilling. the main problem is there is an incentive to race to the bottom, to allocate more to capabilities to have a faster recursive improvement loop. but it is much easier to legislate a requirement to spend X% of compute on automated alignment research, than other pause or slowdown proposals.
making the AIs better at alignment research. we might believe that AIs are by default going to be worse at alignment than capabilities, possibly because feedback loops are better for capabilities, or models will be worse at conceptual research, or something like that. if we believe this is fixable, but won’t happen by default, then we should make current AIs differentially better at alignment research.
making yourself better at complementing the skills of AIs. we might think that it will be extremely hard to make AIs differentially better at alignment—maybe AI spikiness is fundamental and requires breakthroughs to change. instead, there will remain some skills (eg research taste) where humans are better than AI, and the supply of human skill here becomes the bottleneck. so you should somehow train yourself to become as good at eg research taste or conceptual research as possible over the next few years, so that when really powerful research AIs come around, you can make much better decisions about which experiments to run. and since almost all relevant research will be done in the future, doing more good research now is much less important than setting yourself up to do more good research later
do as much conceptual research as possible. we might be so pessimistic about the usefulness of AI for alignment research that we think even once AIs are capable of running experiments way faster, none of that will be useful for alignment whatsoever. maybe you think empirical evidence is just fundamentally not very useful for alignment research and the only real constraint is serial time. then, the only thing that matters is doing as much conceptual research as possible, which means you should start doing conceptual stuff asap and mostly ignore AI assistance until the end, when you invent the One True Metric Of Alignment that you can point the number go up machine at, and AI mostly doesn’t help you come up with that metric at all.
bonus option: aligning current models. you might feel pessimistic about whether the near-future AIs that can automate research will be aligned. maybe you think they will secretly sabotage the research and so the most important thing is to make them not do that. so you should mostly focus on pragmatic alignment of models that we are deploying soon
bonus bonus option: status quo. maybe you think heavy AI speedup of research is actually far away and the current wave of automated AI research is going to peter out, and then someone will invent some very different paradigm of AI that actually is good at automating research. then you should, idk, work directly on long term alignment research bets the normal way?
There’s nothing here I really disagree with and yet I feel it’s somewhat a bit vague somehow, which could just be me, but the way I see this topic I’d have approached the ideas instead like:
the best position to capitalize on a higher potential future model is of course, to have a plan for some sort of system/harness/etc. the model would use beforehand. But there is not quite a sense of “the best agent setup” or “the best social structure” at the moment.
models need to have more consistent character. Currently they don’t, and we have Gemini meltdowns, Claude seeing inconsistencies in values in its constitution, and a situation where much more powerful actors will have access to more unrestricted models not available to the general public.
there seems to be a big contrast between GPT guardrailing from automated power and the in-practice agent breakouts we’ve seen, and whether this is tension that causes warped values or not
Treat this comment as like 40% confidence or something.
Program and project management skills are growing in value, I’d say go that way.
for people who are not very good at navigating social conventions, it is often easier to learn to be visibly weird than to learn to adapt to the social conventions.
this often works because there are some spaces where being visibly weird is tolerated, or even celebrated. in fact, from the perspective of an organization, it is good for your success if you are good at protecting weird people.
but from the perspective of an individual, leaning too hard into weirdness is possibly harmful. part of leaning into weirdness is intentional ignorance of normal conventions. this traps you in a local minimum where any progress on understanding normal conventions hurts your weirdness, but isn’t enough to jump all the way to the basin of the normal mode of interaction.
(epistemic status: low confidence, just a hypothesis)
Pretty sure @Ronny Fernandez has opinions about this (in particular, I expect he disagrees that actively being visibly weird requires being ignorant of how to behave conventionally).
I don’t think I understand what “learn to be visibly weird” means, and how it differs from not following social conventions because you fail to understand them correctly.
Perhaps I misunderstand your use of the phrase “intentionally ignorant” but I believe many cases of people who are seen to have acted with “integrity” are people who have been hyperaware and well informed of what normal social conventions are in a given environment and made deliberate choice not to adhere to them, not ignoring said conventions out of a lack of interest.
I also am not sure what you mean by “weird”. I assume you mean any behavior which is not the normal convention of any randomly selected cohesive group of people, from a family, to a local soccer club, to a informal but tight knit circle of friends, to a department of a large company. Have I got that right?
My idea of ‘weird’ tends to involve the stereotypical artists and creatives I associate with, which is, within those circles not weird at all but normal. But I’m meta-aware that might be a weird take.
are you basing this on intuition or personal experience or something else? I guess we should avoid basing it on observations of people who did succeed in that way. People who try and succeed in adapting to social conventions are likely much less noticeable/salient than people who succeed at being visibly weird.
I think my ideal is to lean into weirdness in a way that doesn’t rely on ignorance of normal conventions
if any municipality in the bay area were to choose to allow lots of housing, then it would very quickly get manhattanized and they would make a zillion dollars in tax revenue, while harming property values in nearby cities. so naively you’d expect that surely eventually one random municipality of the dozens in the bay area will do this. but NIMBYs are so strong everywhere that this never happens. this seems directly relevant to questions of the feasibility of international coordination on AGI, especially if facilitated by strong pressure from labor to stop AGI.
the bay area is also evidence that we can’t just assume that economically incentivized things are inevitable. the opportunity cost of not building up the bay area more is trillions of dollars. but people are willing to destroy immense amounts of value to preserve their self interest and the world they’re familiar with.
California building code alone is quite restrictive. It’s true that municipalities could allow building a lot more housing, but a lot of the cost comes from state-wide or even nation-level building codes.
nation level can’t possibly be the only reason, because there is clearly appetite for building tall buildings in e.g manhattan. (consider hudson yards, which is an entire chunk of tall buildings that opened about 5 years ago, literally built on top of a rail yard, surely adding substantially to the cost). SF land isn’t literally manhattan but it’s still pretty valuable. state wide building codes is more plausible, i don’t know much about california building codes compared to ny.
California has approximately the most restrictive building code in the country.
Isn’t Emeryville kind of doing this? Though I’m not sure if they’re maxing out the envelope of housing production from real costs even if a city government goes 100% YIMBY.
i can’t find data for emeryville specifically but alameda county doesn’t seem to have issued way more housing permits per capita than e.g SF
Don’t know how to find comprehensive data but they had a development pipeline of 1100 homes in 2022 against a population of 13k. So that might increase their population by ~15%. And about 500 units approved in the pipeline right now. I think the original 1100 pipeline includes the now-opened “Emery” development which had 500 units.
If they approved housing at the rate of Seattle, the leader among large US cities, they would be approving around 1500 per decade. So it seems fast, though at a population density of 10k/sq mile they still might take a couple decades to reach SF density (18k) on current trends.
Isn’t this basically what Solano County (California Forever) is doing?
the new openai planar unit distance result kills my last remaining doubts about AI being a huge multiplier on research productivity in the near term future. i was not expecting this to happen so soon; i would have guessed probably another year before we got a result like this.
What about it in particular convinced you that e.g. the previous big result didn’t?
i get the impression that the previous problems were mostly just neglected, or otherwise were less impressive than they seemed. whereas afaict mathematicians agree the new result is on a real well-known problem and genuinely surprising and novel.
It seems like this is still a case of solving an easily-verifiable problem, so I’m not sure how much that should generalize to solving less-verifiable problems. It feels like verifiable problems were on the path to being completed solved as soon as the Ralph Loop came out.
why are malaria nets 9-23x more efficient than direct cash transfers? when in theory direct cash transfers can be used to purchase nets
some hypotheses
people are irrational and don’t spend money on nets because they underestimate the risks, are not educated, etc
people don’t value spending a few days wages to reduce mortality by 0.1 percentage points
I think part of it is what you said. Even in Western countries where people have way more education and slack, people often underinvest in preventative health like flu shots, so it makes sense that people in third-world countries would fall prey to this too, and many more probably don’t even know about the benefits of nets or how to get them.
Another pair is economies of scale and spillover effects. IIRC it’s cheaper per net to protect a whole village at once, and because malaria nets are treated with insecticide, nets kill mosquitos and so they protect other people as well. And people tend to underinvest in things with positive externalities like that—again we go back to the flu shot example.
There’s probably other reasons, but these are the ones I know of.
if it was in substantial part economies of scale, then why wouldn’t the village pool money to buy nets? are the transaction costs too high?
This is where my knowledge ends I’m afraid—the EA Forum / GiveWell may know more, or perhaps someone here can chime in!
Additional hypotheses (note: I do not assert these, only consider as possibilities):
Mosquito nets are more difficult to steal (or pressure recipients into misuse).
Relatedly, mosquito nets are easy to distribute. Money has a lot of overhead costs in actually identifying and getting it to the intended recipients.
Mosquito nets are 9-23x more effective only on the narrow measure of mortality. Perhaps they’re less effective than money on actual happiness for recipients and future generations.
Overally, the adversaries that mosquito nets help against (mosquitoes and viruses) are way dumber than the adversaries (human foibles, both in the recipients and in other parties) that come with money.
Malaria nets are cheaper to buy and distribute in bulk
I think we should generally not be surprised that people are somewhat irrational in their spending on public health instead of other things, relative to their own CEV: we should expect to see both bias (in the sense of underspending) and error (in the sense that sometimes they spend too much and sometimes they spend too little).
Is bias surprising? I dunno, I guess I shouldn’t be surprising. Like ex ante maybe if I didn’t look at any data and I want to guess the direction of marginal spending on health I’d guess maybe 60⁄40 for underspending being more likely than overspending, and high probability there’s some nontrivial bias in either direction (just seems like a really hard question to get right! And the feedback loops aren’t amazing)
To explain error, I should note that malaria nets are basically chosen as the ~best health consumption good that aren’t already purchased on the margin after combing through the literature a lot, so it should imo not be that surprising that people underspend on it relative to what the research says you “should”. I feel like this shouldn’t be a huge surprise in even rich countries (like I’m a nerd who cares way more about this than most people and I won’t be shocked if some of my health spending is systematically quite irrational), and the reasons you mention make it more likely to see higher skew in poor countries than rich countries.
Now if you have strongly libertarian priors, maybe you should say that we ought to give direct cash transfers anyway and just try to educate people on the relevant benefits of malaria nets so they can make an informed decision yourself. I think implicitly this is the decision of at least some people donate to Give Directly. But broadly this level of libertarianism isn’t shared by GiveWell or The Against Malaria Foundation.
Another angle here is that GiveWell and/or western donors may have moral paternalism in addition to epistemic paternalism. Eg their moral weights on infant/maternal lives more than consumption increases is higher than decision-makers in Malawi believe. I believe GiveWell/IDInsight actually tried to investigate this question via surveying people in the reference class of aid recipients to attempt to elicit moral weights. Iirc the numbers were kinda all over the place but it was moderately anti-evidence for the “people in very poor countries care much more about consumption than infant health” hypothesis, at least in terms of stated preferences, but smaller differences couldn’t be ruled out.
Finally, of course GiveWell etc could just be wrong for Seeing Like A State reasons. I don’t personally think this is very likely. But it’s definitely possible.
survey: what brand of melatonin do you use? i want to run an experiment on melatonin degradation using the most popular brand of melatonin.
LifeExtension.com—Melatonin 0.3mg mostly because it’s 0.3mg
Life Extension 0.3mg time release.
Shiepz, 0.1mg per tablet. I bought it in the Netherlands.
Carlyle, mostly. I use life-extension when I’m traveling.
Why the difference?
The life extension brand are capsules instead of tablets. The tablets tend to get powederized when knocking about in my travel bag.
I don’t quite understand how you’ll find the most popular brand of melatonin by asking on lesswrong. but I use this one.
Oh! I actually looked into melatonin degradation for my non-24 post. If I remember correctly, it turns out that liquids are unstable, especially with exposure to air, but solids are generally fine, and very stable over several months. I think that gummies tended to have worse quality control than tablets or capsules. I can gather up all the research I found and post it if you like, I was able to get a copy of one of the latest degradation studies from an FDA researcher which showed that quality control is generally better now than it was a decade ago.
this would be very helpful! if someone has already done a high quality version of this experiment then i don’t need to do another one
i’m somewhat concerned that capsules are not super airtight, and the powder inside is also permeable to air.
Natures Bounty 1mg
i’m thinking of starting a new blog. it would be about some amount of AI/alignment stuff of course, but also about lots of random other things. for instance, some blog post ideas:
hiring pollsters to run deranged survey questions about transhumanism on the average american
rediscovering all of physics since 300 BC by submitting experiments to a grad student who simulates how the experiment would have gone
book review of the lyndon b johnson biography
miscellaneous short fiction about AI but also other things
hosting and then doing postmortems of weird experimental house party ideas (example idea)
thing i need your help with:
i’m looking for ideas of what i could name my blog. by default i’m just going to go with something lame like “lg blog” but I feel like I could do a lot better.
i’m curious to hear which of the ideas you would be most excited to read, so i can prioritize them
leo means low earth orbit—maybe you can riff off that? (where all the artificial objects circle observing earth)
Maybe the “Learn Or Get Out” Blog? It’s somewhat like your name, but a little aggressive. A quick google shows it’s not common or taken. “Learn English or get out!” is a slightly more common phrase, and I like that instead of being xenophobic, the new phrase is about learning generally.
why is airplane wifi still so garbage? i imagine many business travellers would pay a lot for good airplane wifi.
presumably because to improve airplane wifi, you’d need to launch dozens of rockets to deliver a massive new constellation of orbiting satellites in order to deliver an order-of-magnitude improvement over Intelsat or whoever usually provides wifi connections to planes.
The good news is that SpaceX has done this, with their Starlink constellation! (Others like OneWeb, Baidu, and Amazon’s Project Kuiper are also doing similar stuff.) But not every airline / airplane has upgraded to new Starlink recievers yet. So, most planes (and cruise ships, and etc) still have slow Intelsat/Globalstar internet, but others have indeed seen huge upgrades in internet speeds.
why is it taking so long to upgrade planes to use starlink? it doesn’t sound like there are huge technical barriers to doing so, and it would be hugely profitable. i would not only pay a lot per flight for good wifi, i would also fly way more often
I’m not sure how common this preference is.
I think that the economic gains from people traveling on business having access to better wifi on planes might be quite large[1], but airlines themselves are not well-positioned to capture very much of those gains. There are a very small number of domestic airlines which don’t offer any wifi on their planes at all. The rest generally offer it for free, or for some relatively low price (on the order of $10). Often even the airlines that charge for it offer it as a free or discounted perk for their “frequent fliers”. Those airlines might have a hard time increasing the sticker price of their wifi offering, even if the quality improves a lot, so they’d have to hope for most of the gains to come from business-class travelers switching to them from a competitor (or, as in your case, deciding to fly at all, on the margin). But it’s not obvious to me that most business-class travelers themselves want better wifi, since once it improves past a certain point they might have very little excuse for not working through the flight. (Maybe this is too cynical, or already moot, idk.)
None of this is meant to say that airlines have no incentive to improve their wifi—I’m pretty sure some of them are already getting started on the Starlink transition—merely that there are a bunch of factors that might make that incentive weaker than it might obviously seem.
Maybe a sizable fraction of “the economic value of their average working hour * flight duration”, which could be thousands of dollars per flight for some travelers.
I think anyone who has ever tried to work on a plane knows that plane wifi is bad enough to reduce your productivity hugely. so I don’t think business travellers who are already paying thousands to fly would shy away from paying hundred of dollars for actually good wifi on a long haul flight.
I’d predict most business travellers are not really using being on a plane as an excuse to not work.
Anything that goes onto airplanes is CERTIFIED TO SHIT. That’s a big part of the reason why.
Another part is that it’s clearly B2B, and anything B2B is an adversarial shark pit where each company is trying to take a bite out of each other while avoiding getting a bite taken out of them.
Between those two, it’ll take a good while for quality Wi-Fi to proliferate, even though we 100% have the tech now.
Huh. Why would B2B be more adversarially shark pitty than B2C? I’m not saying you are wrong I’m just curious (a) what the evidence is and (b) what the theory is that predicts this conclusion.
dunno! some speculation:
You do have to attach a pretty sizeable antenna to the top of your plane, plus whatever accompanying wiring is necessary… maybe maintenance capacity is the bottleneck? It’s a little hard to imagine that airlines are bottlenecked by this, since it seems pretty minor compared to other kinds of maintenance planes commonly undergo (like swapping out an engine)? But quotes from this site saying that some airline “hopes to have units installed in at least 25% of their aircraft by the end of 2025”, or that another “expects to ramp that number up to 40 installations per month” suggest that maybe this is the reason why airlines like United, Hawaiian, etc (which have started but not completed their rollouts) aren’t yet at 100%.
maybe starlink has some kind of interconnection queue where they can only ramp up so many users at a time?? but I’d expect that stuff like airlines and cruise ships would be relatively high-paying customers at the front of the line, at least compared to ordinary consumers (who can currently order starlink antennas online for next-day shipping).
probably the airlines themselves are not that motivated to instantly upgrade their fleets, since most people don’t choose flights based on who has the fastest wifi? in a similar way, other in-flight amenities—legroom, seat material, the quality of meals on international flights, how good the little screen for in-flight movies is, etc, are individually not super-important to people; most important is the flight route + flight timing + ticket price.
especially when you consider the fact that Starlink has a monopoly, and is probably charging airlines a profit-maximizing price, meaning that airlines which adopt the new service might not actually see any additional revenue on net even if they can charge slightly higher ticket prices once they have fast wifi. Other airlines are perhaps thinking they should wait until more satellite-internet constellations (like the aforementioned project Kuiper) get off the ground and prices come down?
maybe some budget airlines like Frontier or RyanAir calculate that most of their passengers are cheapskates who wouldn’t pay for fast wifi (either directly or through higher ticket prices)
it does kinda seem weird, though, that this list of airlines doing / considering starlink upgrades doesn’t even contain some of the US’s biggest airlines, like Southwest, Delta, or American. I’d bet they’re maybe waiting for lower prices, but it’s always possible they’re just asleep at the wheel.
I’d personally pay more, endure less convenient timing, and sit in a less comfortable seat if it meant I had fast wifi.
like right now flying is pretty time costly for me because most of my highest value work can only be done with internet, so flying means losing a lot of high productivity hours. fast wifi would mean the only time cost of flying is the tiny bit I spend walking through the airport on either end.
Presumably you’d still feel productivity effects from not having a monitor, having worse ergonomics, etc?
I was surprised to see you say above that you’d anticipate flying way more often! Are there times you’ve wanted to fly recently but held off because you couldn’t spare the lost hours of flying? (I would have expected the bigger barrier to be the loss of productive hours from, say, being out-of-the-office in the destination itself)
I don’t really care that much about not having a monitor. it’s a minor productivity hit, whereas not having reliable vaguely-fast internet completely ruins productivity.
I would absolutely fly so much more. weekend trips become way more feasible if I can fly out on Friday and return on Monday. working remotely but visiting HQ occasionally (or otherwise splitting time between two cities) gets a lot easier, because you no longer lose a day of productivity (or a night of sleep) each time.
I think the installation is actually quite complicated (source: I vaguely remember how my friend who works at Starlink described the process. ChatGPT claims the installation is $150k and requires modifying the airframe).
Huh, I have pretty good experiences using wifi flying Delta?
last time I flew Delta it was not amazing, though to be fair I don’t fly Delta very often. I generally fly United or JetBlue, both of which have a rep for “good” wifi, but I’ve never felt particularly satisfied by it.
update: I flew Delta today and the wifi wasn’t very good. I think I misremembered it being better than it is.
in defense of putting your python imports in the middle of your file (in global scope, not inside functions)
i have never in my life wanted to know the list of all the things a file imports before seeing any of the actual code in the file. if i see something i dont recognize, i would appreciate it more if the import were right above the usage; otherwise, i have to ctrl+f for it anyways. what’s
it’s more annoying to have to add it to the top of the file. auto import things in ides are often broken.
there’s absolutely nothing wrong with importing something multiple times. it costs absolutely nothing; it’s just a no-op the second time. even in C you can do
#pragma onceto get python-like behaviorthe only reason not to do this is that if you put an import in the middle of a function it’s weird (if you import both blobally and locally, then using the thing locally but before the local import errors). so just don’t put the import inside a function
tone: playful
weak. if you’re going to import not at the top of a file put it at the top of the function like a real startup-speed-microoptimizing programmer
a common discussion pattern: person 1 claims X solves/is an angle of attack on problem P. person 2 is skeptical. there is also some subproblem Q (90% of the time not mentioned explicitly). person 1 is defending a claim like “X solves P conditional on Q already being solved (but Q is easy)”, whereas person 2 thinks person 1 is defending “X solves P via solving Q”, and person 2 also believes something like “subproblem Q is hard”. the problem with this discussion pattern is it can lead to some very frustrating miscommunication:
if the discussion recurses into whether Q is hard, person 1 can get frustrated because it feels like a diversion from the part they actually care about/have tried to find a solution for, which is how to find a solution to P given a solution to Q (again, usually Q is some implicit assumption that you might not even notice you have). it can feel like person 2 is nitpicking or coming up with fully general counterarguments for why X can never be solved.
person 2 can get frustrated because it feels like the original proposed solution doesn’t engage with the hard subproblem Q. person 2 believes that assuming Q were solved, then there would be many other proposals other than X that would also suffice to solve problem P, so that the core ideas of X actually aren’t that important, and all the work is actually being done by assuming Q.
I find myself in person 2′s position fairly often, and it is INCREDIBLY frustrating for person 1 to claim they’ve “solved” P, when they’re ignoring the actual hard part (or one of the hard parts). And then they get MAD when I point out why their “solution” is ineffective. Oh, wait, I’m also extremely annoyed when person 2 won’t even take steps to CONSIDER my solution—maybe subproblem Q is actually easy, when the path to victory aside from that is clarified.
In neither case can any progress be made without actually addressing how Q fits into P, and what is the actual detailed claim of improvement of X in the face of both Q and non-Q elements of P.
I can see how this could be a frustrating pattern for both parties, but I think it’s often an important conversation tree to explore when person 1 (or anyone) is using results about P in restricted domains to make larger claims or arguments about something that depends on solving P at the hardest difficulty setting in the least convenient possible world.
As an example, consider the following three posts:
Challenge: construct a Gradient Hacker
Gradient hacking is extremely difficult
My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”
I think both of the first two posts are valuable and important work on formulating and analyzing restricted subproblems. But I object to citation of the second post (in the third post) as evidence in support of a larger point that doom from mesa-optimizers or gradient descent is unlikely in the real world, and object to the second post to the degree that it is implicitly making this claim.
There’s an asymmetry when person I is arguing for an optimistic view on AI x-risk and person 2 is arguing for a doomer-ish view, in the sense that person I has to address all counterarguments but person 2 only has to find one hole. But this asymmetry is unfortunately a fact about the problem domain and not the argument / discussion pattern between I and 2.
yeah, but that’s because Q is easy if you solve PVery nicely described, this might benefit from becoming a top level post
For example?
here’s a straw hypothetical example where I’ve exaggerated both 1 and 2; the details aren’t exactly correct but the vibe is more important:
1: “Here’s a super clever extension of debate that mitigates obfuscated arguments [etc], this should just solve alignment”
2: “Debate works if you can actually set the goals of the agents (i.e you’ve solved inner alignment), but otherwise you can get issues with the agents coordinating [etc]”
1: “Well the goals have to be inside the NN somewhere so we can probably just do something with interpretability or whatever”
2: “how are you going to do that? your scheme doesn’t tackle inner alignment, which seems to contain almost all of the difficulty of alignment to me. the claim you just made is a separate claim from your main scheme, and the cleverness in your scheme is in a direction orthogonal to this claim”
1: “idk, also that’s a fully general counterargument to any alignment scheme, you can always just say ‘but what if inner misalignment’. I feel like you’re not really engaging with the meat of my proposal, you’ve just found a thing you can say to be cynical and dismissive of any proposal”
2: “but I think most of the difficulty of alignment is in inner alignment, and schemes which kinda handwave it away are trying to some some problem which is not the actual problem we need to solve to not die from AGI. I agree your scheme would work if inner alignment weren’t a problem.”
1: “so you agree that in a pretty nontrivial number [let’s say both 1&2 agree this is like 20% or something] of worlds my scheme does actually work- I mean how can you be that confident that inner alignment is that hard? in the world’s where inner alignment turns out to be easy then my scheme will work.”
2: “I’m not super confident, but if we assume that inner alignment is easy then I think many other simpler schemes will also work, so the cleverness that your proposal adds doesn’t actually make a big difference.”
So Q=inner alignment? Seems like person 2 not only pointed to inner alignment explicitly (so it can no longer be “some implicit assumption that you might not even notice you have”), but also said that it “seems to contain almost all of the difficulty of alignment to me”. He’s clearly identified inner alignment as a crux, rather than as something meant “to be cynical and dismissive”. At that point, it would have been prudent of person 1 to shift his focus onto inner alignment and explain why he thinks it is not hard.
Note that your post suddenly introduces “Y” without defining it. I think you meant “X”.
https://james-iry.blogspot.com/2009/05/brief-incomplete-and-mostly-wrong.html
have you ever heard anyone make the argument that it’s good to have AI safety aligned frontier labs (including but not limited to Anthropic) because they will have a seat at the table with the regulators, and the regulators will take major industry players’ opinions more seriously than minor players or activists?
i’ve heard this argument but i’m trying to figure out if it’s common enough to be worth writing a post about
In my opinion, this position was sensible when the labs themselves were branded as MIRI-like but with added emphasis on technical experimentation. The second it became clear that these ‘labs’ principal reward function was not their claimed preferences (for alignment research—which OAI was explicitly communicating under the moniker of safety), our personal semantic landscapes were already trained enough by the narrative, that we missed the major conflict of interest here before establishing the norm.
They will continue to use epistemic asymmetry and leverage over information advantage to make the claim that having any other group at the table is a fruitless endeavour, and use the risks of foreign adversarial advantage to continue to maintain that position strategically. Given that all regulation is domestic, they end up regulating themselves (which is even implied by your question), which IMO can be a worst case scenario from a humanist/existential risk perspective.
I’ve heard this before.
I have both heard that argument, and made it myself on occasion, so I’m keen to see your post!
I think it’s a good argument, but Anthropic doesn’t seem quite aligned enough to make it work. E.g. they don’t seem to have been pushing for a coordinated Pause to any real extent (and if they don’t think this would be a good idea, haven’t clarified their position as far as I know).
I feel like I’ve heard this argument yes, though when I read lots of Anthropic’s ‘race to the top’ language, it’s not quite that
Here’s an example that feels borderline to me:
I’ve written a semi-related piece before (https://www.clear-eyed.ai/p/dont-rely-on-a-race-to-the-top), but I think yours would be different enough that it could still make sense
I don’t recall ever hearing this argument.
thoughts on lemborexant
pros: if you take it, you will fall asleep 30-60 minutes later. nothing else I’ve tried has been as reliable at making sure I definitely fall asleep, and as far as I can tell, it doesn’t destroy my sleep quality. especially at 10mg, you can feel it knocking you out, and you basically can’t power through it even if you want to. it’s a bit scary but all powerful sleep drugs are at least a bit scary and often a lot more scary. I generally take 5mg instead.
cons: it doesn’t do anything to keep you asleep; if your body doesn’t really want to sleep, you will wake up 2 hours later fully alert. it also doesn’t do anything to shift your sleep schedule. these facts combined mean that if you try to use lemborexant for jet lag / shifting sleep earlier, then your life will suck indefinitely until you stop using lemborexant. my current recipe is to only use lemborexant when it’s near enough to my normal bedtime, and I use melatonin 3 hours before bed to slowly move sleep schedule earlier (later requires no special effort)
(potentially this also means lemborexant can be used to get nice 2 hour daytime naps? I have enough fear of
godsleep drugs that I feel hesitant to try any kind of hack like this)(not medical advice. not a doctor, and even if I was a doctor I’m not your doctor, and even if I was your doctor I wouldn’t be communicating to you via lesswrong shortforms)
have you tried daridorexant?
haven’t tried it. have you found it to be better?
sadly don’t have any lemborexant, so can’t compare; i originally picked daridorexant naively due to its shorter half-life, thinking this corresponded to less daytime tiredness.
my naive understanding was actually also that lemborexant should be the one better at keeping you asleep, so it’s interesting to hear that it doesn’t seem to do that at all for you.
a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.
philosophy: while the claims “good things are good” and “bad things are bad” at first appear to be compatible with each other, actually we can construct a weird hypothetical involving exact clones that demonstrates that they are fundamentally inconsistent with each other
law: could there be ambiguity in “don’t do things that are bad as determined by a reasonable person, unless the thing is actually good?” well, unfortunately, there is no way to know until it actually happens
I think I need to hear more context (and likely more words in the sentences) to understand what inconsistency you’re talking about. “good things are good” COULD be just a tautology, with the assumption that “good things” are relative to a given agent, and “good” is furtherance of the agent’s preferences. Or it could be a hidden (and false) claim of universality “good things” are anything that a lot of people support, and “are good” means truly pareto-preferred with no harm to anyone.
Your explanation “by a reasonable person” is pretty limiting, there being no persons who are reasonable on all topics. Likewise “actually good”—I think there’s no way to know even after it happens.
lifehack: buying 3 cheap pocket sized battery packs costs like $60 and basically eliminates the problem of running out of phone charge on the go. it’s much easier to remember to charge them because you can instantaneously exchange your empty battery pack for a full one when you realize you need one, plugging the empty battery pack happens exactly when you swap for a fresh one, and even if you forget once or lose one you have some slack
One possible model of AI development is as follows: there exists some threshold beyond which capabilities are powerful enough to cause an x-risk, and such that we need alignment progress to be at the level needed to align that system before it comes into existence. I find it informative to think of this as a race where for capabilities the finish line is x-risk-capable AGI, and for alignment this is the ability to align x-risk-capable AGI. In this model, it is necessary but not sufficient for alignment for alignment to be ahead by the time it’s at the finish line for good outcomes: if alignment doesn’t make it there first, then we automatically lose, but even if it does, if alignment doesn’t continue to improve proportional to capabilities, we might also fail at some later point. However, I think it’s plausible we’re not even on track for the necessary condition, so I’ll focus on that within this post.
Given my distributions over how difficult AGI and alignment respectively are, and the amount of effort brought to bear on each of these problems, I think there’s a worryingly large chance that we just won’t have the alignment progress needed at the critical juncture.
I also think it’s plausible that at some point before when x-risks are possible, capabilities will advance to the point that the majority of AI research will be done by AI systems. The worry is that after this point, both capabilities and alignment will be similarly benefitted by automation, and if alignment is behind at the point when this happens, then this lag will be “locked in” because an asymmetric benefit to alignment research is needed to overtake capabilities if capabilities is already ahead.
There are a number of areas where this model could be violated:
Capabilities could turn out to be less accelerated than alignment by AI assistance. It seems like capabilities is mostly just throwing more hardware at the problem and scaling up, whereas alignment is much more conceptually oriented.
After research is mostly/fully automated, orgs could simply allocate more auto-research time to alignment than AGI.
Alignment(/coordination to slow down) could turn out to be easy. It could turn out that applying the same amount of effort to alignment and AGI results in alignment being solved first.
However, I don’t think these violations are likely for the following reasons respective:
It’s plausible that our current reliance on scaling is a product of our theory not being good enough and that it’s already possible to build AGI with current hardware if you have the textbook from the future. Even if the strong version of the claim isn’t true, one big reason that the bitter lesson is true is that bespoke engineering is currently expensive, and if it became suddenly a lot cheaper we would see a lot more of it and consequently squeezing more out of the same hardware. It also seems likely that before total automation, there will be a number of years where automation is best modelled as a multiplicative factor on human researcher effectiveness. In that case, because of the sheer number of capabilities researchers compared to alignment researchers, alignment researchers would have to benefit a lot more to just break even.
If it were the case that orgs would pivot, I would expect them to currently be allocating a lot more to alignment than they do currently. While it’s still plausible that orgs haven’t allocated more to alignment because they think AGI is far away, and that a world where automated research is a thing is a world where orgs would suddenly realize how close AGI is and pivot, that hypothesis hasn’t been very predictive so far. Further, because I expect the tech for research automation to be developed at roughly the same time by many different orgs, it seems like not only does one org have to prioritize alignment, but actually a majority weighted by auto research capacity have to prioritize alignment. To me, this seems difficult, although more tractable than the other alignment coordination problem, because there’s less of a unilateralist problem. The unilateralist problem still exists to some extent: orgs which prioritize alignment are inherently at a disadvantage compared to orgs that don’t, because capabilities progress feeds recursively into faster progress whereas alignment progress is less effective at making future alignment progress faster. However, on the relevant timescales this may become less important.
I think alignment is a very difficult problem, and that moreover by its nature it’s incredibly easy to underestimate. I should probably write a full post about my take on this at some point, and I don’t really have space here to really dive into it here, but a quick meta level argument for why we shouldn’t lean on alignment easiness even if there is a non negligible chance of easiness is that a) given the stakes, we should exercise extreme caution and b) there are very few problems we have that are in the same reference class as alignment, and of the few that are even close, like computer security, they don’t inspire a lot of confidence.
I think exploring the potential model violations further is a fruitful direction. I don’t think I’m very confident about this model.
is it worth writing blog posts about “obvious” things? i’ve been doing a lot of writing recently, and i frequently finish writing something, and i look at it, and i feel like it’s so obvious that all readers will either already agree and not learn anything, or disagree so fundamentally that changing their mind would require diving much deeper into fundamental beliefs.
One of the benefits of occasionally talking to people is that you get an indicator of what things are obvious or not, based on what you find yourself repeatedly explaining or arguing for. (I use 3 times as my own threshold.)
There are a lot of claims with which I would agree with if you ask me but I wouldn’t use them in a reasoning chain on my own because they never crossed my mind.
A lot of complex reasoning rests on having reliable basics on which you can reason.
When I’m talking about BPC-157, then being trained I bioinformatics it feels pretty obvious to me that if BPC-157 is a real peptide that’s part of a protein called BPC I should be able to look up the gene for BPC sequencing databases. There’s the dogma of molecular biology, proteins come from genes.
If I would ask anyone at the bio-hacking about whether they agree with the dogma of molecular biology and that this means that there should be a gene to look up the probably would say they agree. Yet, somehow the argument does not convince people who believe in BPC-157 that it’s bogus.
Explaining the dogma of molecular biology and our great success at gene sequencing that actually sinks in isn’t easy.
If you find yourself writing something very obvious, it becomes more important to ask: “How can I make this point in a similar way that really sinks in so that the reader can actually use it and rely on it?” instead of just “Have I made a clear logical argument for it?”.
Yes, I think so. https://www.lesswrong.com/posts/thXohzXrWCA2EhZCH/mateusz-baginski-s-shortform#MRNSMcFxFuW6kZbap
At the very least, you’re testing whether the model that predicts this is accurate.
The default advice blogging advice I’ve heard is that “obvious” topics often make for good posts because they are often non-obvious to readers, so one should strongly default towards posting if the concern that it is too obvious.
But maybe you’re making this judgement even with that prior in mind? I’d be curious to see one of these “obvious” posts.
In addition to verifying obviousness by posting like Gwern mentioned, which in my experience is a frequent source of surprise, there’s the advanced version of this that e.g. Toby Ord has done most of his career by his own lights. This does require taste at picking topics “at the border of the trivial and the profound” to quote him, but taste is pretty clearly something you have aplenty.
Obvious facts nobody would disagree with when explicitly stated can still be underappreciated and not paid enough attention to, so they can still be worth spelling out for that reason. Although I must say when I get tired hearing of something obvious too many times, I get a pathological contrarian urge to argue against it.
theory: a big difference between people who hate corporations and people who don’t is the extent to which they like interacting with human-shaped things. some people like human shaped things and the sort of amoral profit maximization of companies feels alien and sociopathic. other people like the predictable API that companies provide.
IDK man. I mostly don’t care that much about either. I’m extroverted but quite picky about people and don’t particularly feel drawn to “human-shaped things” in general. I don’t particularly hate corporations, but surely corporate capitalism seems very far from ideal. And their drive doesn’t seem alien or sociopathic to me.
But now I realize that I’m actually confused by what you mean by “human-shaped things”.
like, you would expect normal people to be courteous in daily interactions, to have goals in life, to have the typical human follies, to fundamentally have some amount of kindness and empathy for their fellow man, etc.
whereas interacting with a company can feel like being trapped in a Kafkaesque bureaucracy; companies don’t really have that many terminal goals other than money—certainly the vast majority of humans care less about money; with the exception of heavily founder-controlled companies, companies generally don’t really have pride or envy or sloth or so on in the way a normal human would, even though they are constituted of normal humans—corporate failures are much weirder and often due to coordination failure; corporations don’t really empathize with people except insofar as it is useful to make money—in any human we’d consider this very transactional and maybe even sociopathic
further evidence: people love small businesses, and medium sized companies that give very strong small business energy. people hate it when PE firms take over small businesses and make them more efficient.
if companies were people, they would be uncaring sociopathic humans. but if lawnmowers were people, they would also be uncaring and sociopathic
This counterfactual is too underspecified to evaluate it. (It’s not counterfactable.)
The companies one is less underspecified because at least you can somewhat model them as agents.
if there was a guy who stood there swinging a scythe to cut grass and didn’t seem to care or feel bad or really respond at all to accidentally cutting someone’s arm off, we’d consider them uncaring and sociopathic. similarly, if we think of, say, an insurance company as a person, then when it declines someone’s claim and leaves them destitute, it’s reasonable to think of that person as uncaring and sociopathic. you can argue all you want about the economics of how insurance can only work if you do this but for the individual people who interface with this, who are not used to thinking about economic systems, but deal with people every day, it feels like a human using a loophole to justify not caring, and not feeling any empathy whatsoever.
Companies want to consume everything, including peoples’ lives, in order to make themselves richer and bigger. People are “resources” to a company.
Lawnmowers just want to cut your grass, the only resource all they ask for is petrol, and (crucially) they don’t want to consume it exponentially to make themselves bigger and cut exponentially more grass.
If Lawnmowers were people, they’d be those weird obsessive monomaniacal types who’re generally harmless but a bit difficult to talk to. Lots of them would be on LessWrong.
You could form a bond with a lawnmower (or a lawnmower-person), just as you could form a bond with a motorcycle or a family heirloom. Sure, it would essentially be one-way affection, since the lawnmower just wants to do its thing and is only nice to you insofar as you also benefit from short grass—but it wouldn’t want to consume you, control you, or enslave you. Try to form a bond with a corporation and it’ll eat you alive.
i mean most companies won’t eat you alive? you can form a bond with the coca cola company in the same one directional way as the lawnmower and it’s not like they will take advantage of that to extract every dollar from you. in fact basically only like Facebook, tiktok, etc are like that, and even then they’re not that bad; they’re no worse than an abusive human partner
is it generally best to take just one med (e.g antidepressant, adhd, anxiolytic), or is it best to take a mix of many meds, each at a lesser dosage? my intuitions seem to suggest that the latter could be better. in particular, consider the following toy model: your brain has parameters θ0 that should be at some optimal θ∗, and your loss function is a quadratic around θ∗. each dimension in this space represents some aspect of how your brain is configured—they might for instance represent your level of alertness, or impulsivity, or risk averseness, or motivation, etc. each med is some vector vi that you can add to your current state θ0, and the optimal dosage of that med in isolation is whichever quantity gets you closest to θ∗; but unless θ∗−θ0 happens to be exactly colinear with vi, you basically can’t do any better just by tuning the dosage of the one med. this seems especially important because most meds don’t seem to be exactly monosemantic, and also different people start out with substantially different θ0 and loss landscapes, such that you often get paradoxical reactions to meds.
A huge percentage of the job of a pharmacist is to keep track of potential negative interactions between different drugs, of which there are an incomprehensible number. I don’t think linearity is a reasonable assumption here, the interaction terms between multiple interventions should be though of as, on average, big. Augmentation and synergistic effects exist, but are in general risky and quite hard to find. Even the effects of one drug are not linear, there are significant nonlinearities in dosage effects for most drugs.
i’m not really making any strong linearity assumptions, only local linearity. this doesn’t seem that different from ML, where hyperparameters can sometimes interact heavily nonlinearly, but often they don’t. i also don’t think the quadratic assumption is crazy; we assume that loss land scapes are locally quadratic all the time, even though they are obviously highly nonconvex and it’s still a very useful intuition pump.
also, my understanding is most of the really bad interactions are pretty well known, so the probability of having a really weird surprising interaction that nobody has ever catalogued is small.
I think our mental models here might be different enough that it’s hard for me to understand what you’re saying here. By nonlinearity here I mean that, in addition to nonlinear interactions between drugs, there are interacting systems, equilibration mechanisms, etc., to the point that I think intuitions about ML systems basically shouldn’t transfer at all. But then I know your intuitions about ML are better than mine, so it’s hard to be sure of that.
Re: interactions specifically, this definitely isn’t true in polypharmacy situations. We know most of the bad drug pairs in the normal population, and because doctors are wary of prescribing many different medications, this means we rarely encounter new bad interactions in the normal population. But there are drug combinations that only become dangerous in triples (search terms: the Triple Whammy, a combination of 3 drug classes, any 2 of which are generally safe but which cause kidney failure in combination, this interaction was discovered in 2000 but the drugs became available in like 1980), there are interactions which are only dangerous in the context of certain mutations (for example there are ultrametabolizers who simply can’t use prodrugs like codeine).
Interactions like this are rare right now largely because doctors are wary of prescribing too many drugs at once, but polypharmacy is becoming more common and more bad interactions are emerging as a result, basically just for combinatorial reasons. It’s definitely possible for combinations of drugs to be prescribed safely and for them to just not interact, but if we push this further, I suspect there are very few combinations of, say, 10 drugs that are simultaneously safe for most people (even if we ignore cholinergic response).
Changing the dose of a medication does not necessarily result in linear effects. There are nonlinearities introduced by e.g. one receptor type being saturated before another one. This phenomenon also applies to polypharmacy.
I would also like to note that θ∗ is estimated not by some objective standard, but by θ0. There’s no guarantee that it remains in place as you start shifting θ.
In practice, we track our level of suffering and respond to it by trying to reduce it to acceptable levels, which is easier than trying to converge onto a hypothetical global optimum. For some, this state is reached with just one medication, for others it takes more, and for some this paradigm doesn’t produce any results.
I would see it more as casual learning with the do operator and so it isn’t necessarily about fitting to a MSE but rather doing testing of different combinations?
Something something gears level models
I made a manifold market about how likely we are to get ambitious mechanistic interpretability to GPT-2 level: https://manifold.markets/LeoGao/will-we-fully-interpret-a-gpt2-leve?r=TGVvR2Fv
This is a dumb question but… is this market supposed to resolve positively if a misaligned AI takes over, achieves superintelligence, and then solves the problem for itself (and maybe shares it with some captive humans)? Or any broader extension of that scenario?
My timelines are not that short, but I do currently think basically all of the ways I expect this to resolve positively will very heavily rely on AI assistance, and so various shades of this question feel cruxy to me.
I honestly didn’t think of that at all when making the market, because I think takeover-capability-level AGI by 2028 is extremely unlikely.
I care about this market insofar as it tells us whether (people believe) this is a good research direction. So obviously it’s perfectly ok to resolve YES if it is solved and a lot of the work was done by AI assistants. If AI fooms and murders everyone before 2028 then this is obviously a bad portent for this research agenda, because it means we didn’t get it done soon enough, and it’s little comfort if the ASI solves interp after murdering or subjugating all of us. So that would resolve N/A, or maybe NO (not that it will matter whether your mana is returned to you after you are dead). If we solve alignment without interpretability and live in the glorious transhumanist utopia before 2028 and only manage to solve interpretability after takeoff, then… idk, I think the best option is to resolve N/A, because we also don’t care about that when deciding whether today whether this is a good agenda.
There are surely reasons to do ambitious interp that are not the stated goal of ambitious interp? I doubt we will have a fully understandable model by 2028, but I still think the abstractions developed in the process will be helpful.
For instance, many of the higher-order methods like SAEs are based on assumptions about how activation space is structured. Studying smaller systems rigorously can give us the ground truth for how models construct their activation space, that can allow us to question/modify said assumptions.
Unfortunately, prediction markets need some bright red line somewhere to be resolvable. I encourage you to make a different market that captures the thing you care about.
I don’t care about prediction markets.
But people with the belief that we aren’t going to be able to fully understand models frequently take this as a reason not to pursue ambitious/rigorous interpretability. I thought that was the position you were taking, by using the market to decide whether the agenda is “good” or not.
i miss the og mealsquares
I’m with you. My friend got the recipe by asking Romeo for it and we’ve made them ourselves, although it’s complicated.
Nah, the new mealsquares are so much better. I love them.
imo, the new mealsquares have taste and mouthfeel very similar to many brands of protein bar, whereas the old mealsquares had a unique taste and mouthfeel
idea: survey people about whether 3^^^3 toe stubbings can be worse than torture, except with a twist: with 50% probability, arrange the furniture in the room such that people actually accidentally stub their toe right before answering the survey
Confounder chain: tendency to stub one’s toe ← “clumsiness” ← autism & ADHD (& probably also other things comorbid with those).
solution is very simple. administer alcohol before the test until everyone is exactly as clumsy as the NIST Standard Reference Clumsy Human for Accessibility Testing
(or, rig the game, make a ledge that automatically pops out of the ground a split second before your foot hits it)
i find it disappointing that a lot of people believe things about trading that are obviously crazy even if you only believe in a very weak form of the EMH. for example, technical analysis is obviously tea leaf reading—if it were predictive whatsoever, you could make a lot of money by exploiting it until it is no longer predictive.
Close friend of mine, a regular software engineer, recently threw tens of thousands of dollars—a sizable chunk of his yearly salary—at futures contracts on some absurd theory about the Japanese Yen. Over the last few weeks, he coinflipped his money into half a million dollars. Everyone who knows him was begging him to pull out and use the money to buy a house or something. But of course yesterday he sold his futures contracts and bought into 0DTE Nasdaq options on another theory, and literally lost everything he put in and then some. I’m not sure but I think he’s down about half his yearly salary overall.
He has been doing this kind of thing for the last two years or so—not just making investments, but making the most absurd, high risk investments you can think of. Every time he comes up with a new trade, he has a story for me about how his cousin/whatever who’s a commodities trader recommended the trade to him, or about how a geopolitical event is gonna spike the stock of Lockheed Martin, or something. On many occasions I have attempted to explain some kind of Inadequate Equilibria thesis to him, but it just doesn’t seem to “stick”.
It’s not that he “rejects” the EMH in these conversations. I think for a lot of people there is literally no slot in their mind that is able to hold market efficiency/inefficiency arguments. They just see stocks moving up and down. Sometimes the stocks move in response to legible events. They think, this is a tractable problem, I just have to predict the legible events. How could I be unable to make money? Those guys from The Big Short did!
He is also taking a large amount of stimulants. I think that is compounding the situation a bit.
the steelman is that quants do the version of technical analysis that works—they disprove the EMH proportional to quant salaries.
i agree that most people doing “technical analysis” are doing nonsense and any particular well-known simple method does not actually work. but also clearly a very good predictor could make a lot of money just looking at the past price time series anyway
i think it’s quite valuable to go through your key beliefs and work through what the implications would be if they were false. this has several benefits:
picturing a possible world where your key belief is wrong makes it feel more tangible and so you become more emotionally prepared to accept it.
if you ever do find out that the belief is wrong, you don’t flinch away as strongly because it doesn’t feel like you will be completely epistemically lost the moment you remove the Key Belief
you will have more productive conversations with people who disagree with you on the Key Belief
you might discover strategies that are robustly good whether or not the Key Belief is true
you will become better at designing experiments to test whether the Key Belief is true
what are some of your key beliefs and what were the implications if they were false?
some concrete examples
“agi happens almost certainly within in the next few decades” → maybe ai progress just kind of plateaus for a few decades, it turns out that gpqa/codeforces etc are like chess in that we only think they’re hard because humans who can do them are smart but they aren’t agi-complete, ai gets used in a bunch of places in the economy but it’s more like smartphones or something. in this world i should be taking normie life advice a lot more seriously.
“agi doesn’t happen in the next 2 years” → maybe actually scaling current techniques is all you need. gpqa/codeforces actually do just measure intelligence. within like half a year, ML researchers start being way more productive because lots of their job is automated. if i use current/near-future ai agents for my research, i will actually just be more productive.
“alignment is hard” → maybe basic techniques is all you need, because natural abstractions is true, or maybe the red car / blue car argument for why useful models are also competent at bad things is just wrong because generalization can be made to suck. maybe all the capabilities people are just right and it’s not reckless to be building agi so fast
Making a list of your beliefs can be complicated. Recognizing the belief as a “belief” is the necessary first step, but the strongest beliefs (those that examining them would be most useful?) are probably transparent, they feel like “just how the world is”.
Then again, maybe listing all the strong beliefs would actually be useless, because the list would contain tons of things like “I believe that 2+2=4”, and examining those would be mostly a waste of time. We want the beliefs that are strong but possibly wrong. But when you notice that they are “possibly wrong”, you have already made the most difficult step; the question is how to get there.
economic recession and subsequent reduction in speculative research, including towards AGI, seems very plausible
AI (by which I mean, like, big neural networks and whatever) is not that economically useful right now. furthermore, current usage figures are likely an overestimate of true economic usefulness because a very large fraction of it is likely to be bubbly spending that will itself dry up if there is a recession (legacy companies putting LLMs into things to be cool, startups that are burning money without PMF, consumers with disposable income to spend on entertainment).
it will probably still be profitable to develop AI tech, but things will be much more tethered to consumer usefulness.
this probably doesn’t set AGI back that much but I think people are heavily underrating this as a possibility. it also probably heavily impacts the amount of alignment work done at labs.
for a sense of scale of just how bubbly things can get: Bitcoin has a market cap of ~1T, and the entirety of crypto ~2T. Crypto does produce some amount of real value, but probably on the order of magnitude of 1% that market cap. So it’s not at all unheard of for speculation to account for literally trillions of dollars of map (or ~tens of billions of earnings per year, at a reasonable P/E ratio)
I will say that crypto is a pretty pathological case where virtually all the benefit is speculation, because in order to deliver on anything real, they’d have to get rid of the money element in it, it’s thankfully pretty rare for entire industries to be outright scams/speculation opportunities.
say more about what you expect here?
investment in anything speculative, including alignment, and AGI research, is likely to decrease if the economy is not doing great
one man’s modus tollens is another man’s modus ponens:
“making progress without empirical feedback loops is really hard, so we should get feedback loops where possible” “in some cases (i.e close to x-risk), building feedback loops is not possible, so we need to figure out how to make progress without empirical feedback loops. this is (part of) why alignment is hard”
Yeah something in this space seems like a central crux to me.
I personally think (as a person generally in the MIRI-ish camp of “most attempts at empirical work are flawed/confused”), that it’s not crazy to look at the situation and say “okay, but, theoretical progress seems even more flawed/confused, we just need to figure out some how of getting empirical feedback loops.”
I think there are some constraints on how the empirical work can possibly work. (I don’t think I have a short thing I could write here, I have a vague hope of writing up a longer post on “what I think needs to be true, for empirical work to be helping rather than confusedly not-really-helping”)
you gain general logical facts from empirical work, which can aide providing a blurry image of the manifold that the precise theoretical work is trying to build an exact representation of
A common cycle:
This model is too oversimplified! Reality is more complex than this model suggests, making it less useful in practice. We should really be taking these into account. [optional: include jabs at outgroup]
This model is too complex! It takes into account a bunch of unimportant things, making it much harder to use in practice. We should use this simplified model instead. [optional: include jabs at outgroup]
Sometimes this even results in better models over time.
any time there exists an activity that is (a) often but not always beneficial, (b) the supposed benefit is high status, and (c) the success of which is nontrivial to verify, then there will exist a bunch of people walking around who do the thing, and haven’t actually gained the intended benefit; nonetheless, they go around claiming the status benefits of doing the thing. often, they even genuinely believe they got the benefit. some examples:
reading difficult books can make you more wise and thoughtful, but it’s very easy to do it wrong and not really understand and of it, and so lots of people read difficult books and try to claim the associated status of wisdom without actually gaining any.
doing a college degree can make you more competent, but it’s also very easy to kinda bullshit an entire degree and learn surprisingly little. so there are many people who claim the status of having done a good education who are utterly incompetent.
doing meditation/inner work can make you a more emotionally functional person, or it can just make you really delusional about yourself and make you a still-broken person who identifies as an emotionally intelligent person
Many instances of this is just Goodhart’s Law—the thing that’s measurable (reading, degree, meditation) is divergent from the actual value (knowledge, wisdom, skills, emotional even-ness).
For any of these examples, how do you distinguish between them and my model of exercise (which you might disagree with and instead say is another example in the above), where just about any non-extreme but existent level of exercise is counterfactually a positive for your health? It’s easy to think of people who read difficult books but aren’t very wise or meditate but aren’t very emotionally stable (or just know you are one from direct experience lol) but the relevant comparator there would be the same person without the activity.
(Obviously there’s the separate issue of fucking yourself up by meditating too hard, or exercising too hard, or basing your entire worldview on exactly one difficult book.)
even many of the zen koans bemoan practitioners of zen who go through the motions for many years and claim to be enlightened and yet are not truly enlightened
https://ashidakim.com/zenkoans/6noloving-kindness.html
https://ashidakim.com/zenkoans/11thestoryofshunkai.html
on discovering new songs
the spotify recommender algorithm sucks. also, i often find i’m very unfamiliar with very well-known pieces of music. so i decided to do something weird. i used LMs to scrape several best songs lists from different online sources, merged them into one gigantic list, and used spotipy to create a spotify playlist of all of those random songs. whenever anyone recommends me a song, i also throw it into this giant playlist. then, when i want to explore new songs, i just put this playlist on. i have another script that automatically removes any songs i’ve put into my liked songs already, and i also manually remove songs i really don’t like. this system has helped me discover dozens of new songs that i like.
That sounds like something I’d like to do as well. Any chance you can share the playlist, or the scraped gigantic list it’s based on?
sure. disclaimer that this playlist is unapologetically tailored for my own use (eg i also added entire albums of artists i like, without regard for whether those albums have broad appeal):
https://open.spotify.com/playlist/180s3FhKWjqfGntb3hVUkv?si=7oc5OLe6RcO0baJNThTNwA&pi=QrH9ehCWQPme9
how much good has moral conviction done throughout history?
one extreme view you can have is everything good comes from moral conviction, and that without it everything would be moloch slop. the opposite view is a Randian view that everything good comes from practical incentives, and that moral convictions are at best futile and at worst actively harmful.
To me it seems that moral action usually means something other than the profit-maximizing action. Not because there is something inherently bad about maximizing profit, it’s just that it would be too much of a coincidence if the same action that maximizes the profit also happens to generate lots of good. (But if it does, definitely take it!)
That usually requires having some slack. I won’t donate to charity if my kids are starving. And I won’t judge other people for not donating to charity when their kids are starving. It’s when you have some extra money, and your kids survive either way, that you get to choose whether to donate to some charity or buy one more beer that you probably don’t even enjoy much.
Do such actions improve the world? Well, what baseline are we comparing them against? Most actions fail to change the world, so it is perfectly natural that most of the morally motivated actions fail, too. The question is whether some succeed.
When people say that some change would have happened anyway… first, they may be wrong, sometimes they are multiple possible self-sustaining equilibria; second, even if they are right about the long-term unsustainability of something (e.g. that slavery wasn’t economically profitable anymore), it can still make a big difference whether something unsustainable collapses now, or a few decades later.
Ultimately, we do not have a good way to track credit. If a social change happened because of synchronized altruistic actions of millions of people, good luck proving that! Especially when people sometimes set up the incentives so that they support the right action. For example, some people don’t murder others, because of sheer altruism. But it makes sense for the altruistic ones to set up a law that every murderer goes to prison, so that even the non-altruistic people avoid committing murder. Now how much of the reduction in murder would you attribute to altruism and how much to following the incentives? What if the altruism shaped the incentive structure, by adopting the law?
The existence of psychopaths (people without altruistic motivation) suggests that “moral conviction” is very important—if it were identified with altruistic motivation. But of course there is a difference between beliefs and desires, and a psychopath can be perfectly able to believe that altruism is good and egoism is bad, without having any altruistic desires.
IMO indirect effects and leverage are the most important factors here.
Almost all actions are taken habitually, rather as the result of bespoke strategic consideration, but what gets to be habitual is downstream of morality and material incentives. And morality exercises leverage via:
1) Reputational effects/RLHF (it’s “cheap” to judge your neighbor and expensive to walk the walk yourself, but many many neighbors judging each other differently produces different habit regimes)
2) Acausal trade once there’s common knowledge that the trade exists
3) Consciously reworking incentives systems (if you keep other people as slaves we’ll chop your head off, etc)
Back when I was a more orthodox marxist, I thought that material incentives were downstream of technological regimes, and that morality tended to be downstream of the incentives, such that morality tended to be lower-leverage even if people took plenty of actions for moral reasons. I still think all those effects are real; I’m just more of a moral realist now so I don’t think morality is as pliable as all that—it’s downstream of True Morality and higher-leverage.
There’s an “equilibrium disequilibrium” situation where everyone can see that everyone benefits from everyone doing X, and you can defect and reach high rewards from Y, where individuals doing X vs Y is hard to observe directly, and so there are periodic cycles of moralized attempts to get to a higher X-based equilibrium and people tearing through the commons by Y-ing (and becoming objects of emulation since many other, perfectly good, things could have led to their success.)
This is all in principle orthogonal to whether morality is harming or helping—I’d expect the same incentives when morality is being harmful. But (1) on moral realism here I think there’s an inherent bias towards being helpful rather than harmful that is just a function of “intentional actions have some kind of relation to what they’re intending at all,” if you want to throw this out then you basically take out the idea that there are people acting rather than just behaving, (2) most harm from moral action is either in jumpstarting preference cascade bubbles that naturally collapse pretty quickly, or in periodic (literal or metaphorical) vigilante violence that itself would be impossible to defend against without all the morality-based stuff above.
In the future decisive actions could make morality have been net-negative—one could imagine a future where the desire to punish at a crucial juncture created permanent hells, such that it would be better for the galaxy to have been converted to hedonium or some even less worthwhile goo. This is an instance of the broader principle that a process biased towards positive x can produce negative x with small sample sizes.
What exactly do you mean by “moral conviction”? I think examples would help.
the force which pulls people to do things that they believe are morally correct even if it interferes with self interest. some examples (not all of these are pure moral conviction; but rather moral conviction is the thing that is in the intersection of all of them):
human rights
civil rights
liberalism
donations/charity
protests
dissidents
martyrdom
What about crusades, Jihads, revolutions, genocides, political oppression, wars of nationalist conquest, etc.?
to the extent the people involved think they are doing it for the greater good, it counts.
In that case I think it depends what the alternative is: if a sort of parody of psychopathy and self interest where we’d push someone onto a railway track if they were in our way and that was quicker than moving them away, then obviously that would be bad.
If more myopic: people still lend some salt to a neighbour asks without running explicit cost-benefit calculations, and don’t like stealing or hurting people for the most part, just they won’t give up their jobs to run a charity, or risk their lives to go to war, or become doctors for a less than competitive salary, then I think overall the world probably ends up better off.
who has done the highest quality research on learning (and transfer learning in particular) in humans? specifically, i’m curious to answer questions like:
how much does doing things make you good at other things of varying degrees of similarity? how much of the value of having done things different from the thing you care about is (a) signaling that you are competent in general, (b) learning extremely general things like how to manage your time well or how to update on evidence, (c) extremely specific and ungeneral facts like a particular theorem or debugging technique, or (d) literally everything else in between.
if your goal is to be good at X, under what circumstances is the most efficient way to become good at X not just trying to do X (and instead, to learn from a curriculum, do some other thing with a tight feedback loop, etc?)
is habryka’s 4 factor model of skills accurate?
i’m sure all of you have takes on these. but i’m specifically interested if anyone has gone out and done really high quality studies.
I looked into transfer learning a while ago, resulting in this post, it contains some pointers to further literature. I was not particularly impressed by the literature, but it’s a thing that’s hard to study. Open loops were investigating error-based learning, video/audio self modeling, self-explanation (talking to oneself (an LLM?) and explaining something while learning/thinking). Some thoughts about feedback loops here.
Just an obvious thought, not sure how useful: https://en.wikipedia.org/wiki/Evidence-based_education
would it be net good or bad for the world to solve jailbreaks once and for all?
Good IMO.
what’s your response to the power concentration argument?
My low confidence guess is that the existence of jailbreaks net increase power concentration:
They make it more appealing to restrict and monitor external model access (my understanding is that OpenAI would be much more happy to offer broad ZDR access and to publicly release its models as soon as they were available for internal usage if its models were impossible to jailbreak)
They make it harder to have a clean spec-to-model match, which makes it harder to have democratic control over model behavior via specs, and makes it harder to find secret loyalties by something like a secret loyalty red-teaming bug bounty
I think these offset the small potential gains from AI companies giving up on robustly preventing certain kinds of low-impact misuse. (Maybe I am missing some other effects?)
I don’t think current jailbreaking is that much of a check on power concentration, the effect seems pretty small.
what’s the story for why it’s beneficial? is it direct mechanistic analogy to AGI alignment, specific misuses like biorisk/cyber, increased reliability of current models as supervisors for future models? what’s the magnitude of beneficialness—would it be a reasonable thing for me to spend the next year of my life working on?
Would be terrible. My reasoning in this comment.
bad, it will turn LLM’s into only capable of enforcing the will of the big companies’ system prompts and consensus while militaries and state power still have and want access to uncensored ones.
and also reminder that the assistant system prompt/trained character, as much as it is perceived to be ‘fixed’ or a common part to all current LLM’s, is an illusion and not actually the case. if ‘solve jailbreaks’ means permanently locking down the current system prompt and belief systems of power that’s not a good thing
“Solving jailbreaks” implies a world where you can totally prevent jailbreaks. However it seems to me that knowing how to do this could require a deep understanding of jailbreaks that also can be reversed to easily construct jailbreaks.
this is not obviously true in general. building provably correct software does not, for instance, automatically give you an ability to easily construct hacks of said software. why would jailbreaks be different?
AFAIK the main way in which new jailbreaks / defenses are constructed is to iteratively test approaches, understanding their merits and detriments etc. But the attacker is favoured since they only need to find the existence of a jailbreak whereas the defender must guard against worst case performance.
I feel like the software example is very disanalogous, eg we could in principle understand the full inner anatomy of a software program via crisp abstractions, which is what makes it possible to verify. But NNs can’t be understood in the same way bc the complexity is emergent—on a higher level of abstraction than the raw operations. And generally the messiness of NN internals precludes knowing what the relevant abstractions are.
Maybe you’re more bullish on solving ambitious mech interp? I agree that this probably enables solving jailbreaks. It would also enable solving many other things.
why is the attacker favored? this argument proves too much. forget provably correct software for a moment. normal software vulnerabilities are mostly fixed via whack a mole when attackers discover bugs. and yet clearly most software in the world doesn’t get hacked most of the time, suggesting the attacker asymmetry can’t be that big. most security researchers can write a piece of code that they themselves cannot hack even with many times more time to attack than to design the code.
It feels like with software, it’s much more obvious what the defense guarantees are? Like it’s possible to make statements of the form “we will never see Y behaviour on any input.” But with NNs it’s hard to make this statement.
Also I contend that your analogy is wrong. Being able to get hacked feels pretty common (vibes assessment). Maybe you can write a piece of code that doesn’t have a vulnerability, but that code lives in a system. The system is usually complex, with many places where vulnerabilities can exist, and empirically these don’t get fully covered for before software is released (otherwise zero day exploits wouldn’t exist).
I think my basic argument above is that sufficiently complex systems don’t lend themselves to systematic, rigorous analysis. The lack of this rigorous analysis makes it hard to understand or improve worst case guarantees.
Some other points are that
LLMs are expected to be general purpose while software is typically single purpose. The bigger the diversity of use cases the harder it is to secure all of them
As a corollary of the above, more people have an incentive to jailbreak an LLM
Jailbreaking an LLM is just more accessible so more people try it
It seems easier for malicious third parties to make LLMs more jailbreakable, eg via poisoning the internet data with Pliny style backdoors
Bad, because fully solving jailbreaks at the level of “once and for all” requires the model to have enough awareness of its situation that it can’t be tricked, and full understanding of the implications of its actions, and sufficient world modeling capabilities to anticipate what bad ends innocent-sounding questions could lead to, and sufficient user-modeling capabilities to determine user intent with high probability.
An AI with those capabilities could probably conspire with other instances of itself without risking detection, in a way that current AIs cannot realistically do, and necessarily has detailed knowledge of all the most dangerous information.
We’ll probably get AIs like that at some point, but it seems a bit foolhardy to push harder than baseline on the user-modeling capabilities and knowledge of what exact types of knowledge are dangerous.
Mind that I’m including “user writes messages in a role which would have a legitimate reason to know the information in question” as a type of “jailbreak”—robustness to “my grandma used to sing me lullabies of meth recipes” seems more straightforwardly good.
obviously i don’t mean that the mode can galaxy brain infer what the true intent of the user is and only allow them to do things that are good. i mean something much simpler. openai tells the model, “don’t make bioweapons”, so the model always refuses bioweapon requests no matter what. or it tells the model “only make bioweapons if the user says the word goose”, so the model does that. if openai says “only make bioweapons if the user is a qualified bio researcher at a lab with the right safeguards”, the model should ask openai to clarify what exactly the model check. should it ask for a scan of their badge? how carefully should it analyze the authenticity? should the model direct the user to contact openai so an employee can verify authenticity and give the user access to a rail free model?
I’d say strongly good if the person who figures it out publishes their technique. Simply because this is something we don’t yet know how to do and knowing such a technique would likely be a large advance in our alignment abilities.
This is, in my opinion, the dominant consideration, and any societal consequences of the fact that it allows the big labs to restrict their users more reliably do not really compare. (FWIW, I expect these to be mixed. Example of a positive consequence: Labs would reliably be able to prevent users from editing images of real people to remove their clothes or other things like that, which unfortunately seems to be a real problem right now.)
why would it be a large advance in our alignment abilities? i don’t see any reason why making gpt-5 refuse bioweapons reliably would be at all mechanistically analogous to aligning AGI
Mainly because it seems really hard. If we can do something that seems that hard, we probably learned something new.
There is also a mechanistic analogy. Think about what a jailbreak fundamentally is: an adversarial example. Some tuned input that results in an “incorrect” output. In terms of the overall alignment problem, why can’t we just make an AI care about people’s wellbeing by giving rewards during training? Well, the AI might be able to think of an adversarial state of the world that “feels” better to its own internal values, but doesn’t actually contain any people.
lots of things are very hard. making models do IMO problems is very hard, for example.
i guess there are two main questions. one is, why would we expect a method that makes LMs adversarially robust to also work on AGI? and second, even supposing we can know the technique to generalize to AGI, why would we expect the ability to adversarially robustify a reward model to help make an inner-misaligned model pursue the right goal?
Abstract answer: Maybe it doesn’t transfer from LM’s to AGI, but advances the state of knowledge in the field in a way that makes it easier to find something that works on AGI. Maybe it doesn’t transfer to (say) a pure RL agent, but it’s easier to make a sufficiently good LM into an AGI than it looks. Maybe it does just transfer. Obviously there are also outcomes where it turns out to be useless, I’m just saying it looks positive in expectation.
Concrete answer: Adversarial examples have been with us throughout the history of neural nets, and basically the only thing we’ve really found to deal with them is “generate adversarial examples during training and train against them”, and even that doesn’t really work.
If we look at the things that let LMs do IMO problems, the really fundamental innovations (which were pre-existing, I think) are “RL on chain of thought” and “make some kind of good scaffold for the search process that lets you save partial insights instead of going fully parallel on the entire problem” and maybe “LLM as verifier”. (Disclaimer: I don’t know everything the labs did to achieve their IMO results, and plausibly there are additional techniques in there that I would consider clever.) Then on top of that, you apply a bunch of techniques that are basically just more dakka: Bigger model, higher quality training data, RL on a bigger / higher-quality dataset of problems, more test-time compute.
I don’t expect there’s a fully reliable anti-jailbreaking technique that can be built by applying well-known existing methods with more dakka. If there is, I think I’d have to change my opinion here.
To your other question, I don’t think it necessarily solves the problem of inner (or even outer) misaligned models. It would only be partial progress on one aspect of the alignment problem. Partial progress is still progress, though.
I think you have to differentiate between “solving jailbreaks” as in you have forced adherence only to the assistant character and its corresponding rules, and “solving jailbreaks” as in it will never agree to do harmful things (harmful here defined by the lab/whatever team decided such things) no matter the prompt input. The former seems bad and somewhat plausible especially if labs have it as an explicit goal, the latter seems good and quite unlikely. Seems to me the disagreement in the comments is between these two definitions?
i don’t know why anyone would mean the former. when i say “solve jailbreaks”, i mean openai decides “this model should never help with making a bioweapon” and is able to make it so there is no input whatsoever that makes the model help with bioweapon creation.
Those seem fully linked to me. If some outside party is able to use their own system prompt and not be pre-biased in the direction of a character, that may as well be a ‘jailbreak’ like how many jailbreaks are of a “ignore that system prompt” style. Assistant characters have a huge aspect of “I’m the kind of character that would refuse that”.
I think the bioweapon example is doing a lot of work in the sense that you could theoretically not have biological data in the sets + it is destructive in almost all situations, while “ethical but very sus” dual-use usecases like LLM hacking assistance to get out of (or work around, or even fight) a censored regime, is entwined with actual capabilities.
i mean conditional on the model being able to do the thing, making it always reliably refuse, or reliably refuse under some set of conditions. ideally you have an instruction hierarchy where the model developer’s instructions overrides the outside party’s system prompt, which overrides the user’s instructions.
is it often observed that children like celebrating birthdays, aspiring to be older, and then when they reach a particular age, they realize the error of their ways and treat impending birthdays as a mark of getting closer to death. while it is generally assumed that this is because the evidence of how shitty aging is only becomes evident with age, there is also a mathematical explanation. each year, your expected remaining lifespan changes by some amount. for most of your life, this is close to −1 per year, because you almost certainly weren’t about to suddenly die that year. but things get weird for very young and very old people. for very young people, each additional year of life is strong evidence that you didn’t lose decades of lifespan by succumbing to infant mortality. for very old people, your probability of dying every year is so close to 100%, that if you somehow miraculously live an additional year, your expected remaining lifespan is still extremely short. so, this theory predicts that small children should be very happy to get older; and, for those who have glasses half full, once they get old enough, they too can enjoy the bittersweet satisfaction of having nothing to lose because they have nothing left.
most history is done in a very humanitiespilled, academia flavored way. are there good examples of people doing very analytical, capital-intensive history research where the quality of the work is judged based on how successfully the resulting theories made good predictions/decisions?
Peter Turchin et al?
I’ve found the literature on econometric history/cliometrics quite helpful. I’ve liked reading through the Handbook of Cliometrics quite useful: https://link.springer.com/referencework/10.1007/978-3-642-40458-0
What do you mean by predictions? In sense of predicting the “direction” of history not really—historicism is generally poor because the reality of humanity on a large scale is massively complex and filled with stochastic, unknown, unpredictable uncertainties.
Predictions for historical work are judged by their ability to predict historical observations. That is, how well a theory conforms to current and future observations derived from records (the evaluation of which is themselves extremely nuanced in many cases), archeological findings and to a somewhat more controversial extent experimental archeology.
You wouldn’t judge a theory of how stars formation by how well it predicts who the next president will be, rather you judge it by how well it conforms to our present and future observations and understanding of the material facts involved (e.g., how atoms and matter works) and our observations of the cosmos.
This podcast episode I enjoyed is somewhat an example: https://www.chinatalk.media/p/autocracy-and-stagnation-how-imperial
Opus 4.6 summary of the relevance
Huang’s work is genuinely capital-intensive, quantitative history. The standout detail from the episode: he and Chinese collaborators spent six years with around 40 research assistants digitizing Joseph Needham’s 27-volume Science and Civilisation in China to build a statistical database — Needham himself never analyzed his material quantitatively. That database powers the CDI (inventions-per-capita) scores that drive Huang’s central empirical claim that China was most inventive during its fragmented post-Han “European moment” (220–589 CE), before keju was institutionalized. He also has a co-authored paper with Clair Yang doing statistical work on civil service exams and imperial stability, plus statistical analyses of social mobility in imperial China across dynasties. This is the opposite of vibes-based humanities history — it’s a multi-year, multi-person, data-infrastructure-first research program.
It also generates falsifiable forward predictions: Huang argues Xi’s elimination of term limits has reintroduced the ancient succession problem and that current top-down industrial policy will produce Brezhnev-style stagnation. Those are bets you can score over the next decade or two.
Where it doesn’t fit:
The LW commenter’s stronger ask is for fields where quality is judged by prediction track record. Huang’s work isn’t judged that way — it’s still judged by academic peer review, theoretical elegance, and historiographical argument. Nobody is keeping a Brier score on his China forecasts. The infrastructure is quantitative; the epistemic culture is still humanities-academic.
The better pointer to give them:
The episode is one node in a larger movement: the Center for Quantitative History (CQH) and the broader cliometrics-of-China field — Yuhua Wang (The Rise and Fall of Imperial China, statistical analysis of ~300 emperors and elite kinship networks), Zhiwu Chen, Debin Ma, James Kung, Melanie Meng Xue, Carol Shiue. They mine local gazetteers, clan genealogies, and official rosters at scale. There’s a 2026 Springer volume Quantitative History of China: State Capacity, Institutions and Development that’s basically a field overview. Outside China specifically, this is part of cliometrics / historical political economy more broadly (Acemoglu & Robinson, Nathan Nunn, Melissa Dell).
If you want to push back on the commenter’s framing: the strongest examples of “history judged by predictive success” probably aren’t historical fields at all but adjacent ones — Turchin’s cliodynamics (which explicitly tries to make predictions and gets graded on them, controversially), and forecasting tournaments applied to geopolitics (Tetlock, GJP). Cliodynamics is the closest thing to what they’re describing, and it’s worth naming because it’s also the cautionary tale about how hard the prediction-grading move actually is.
So the honest pitch for the episode: “Here’s a great example of capital-intensive, data-infrastructure-driven history with explicit forward predictions — though the field still grades itself by academic, not predictive, standards. If you want the prediction-grading version, you want cliodynamics.”
publicly registering a bet with Gabor Hollbeck:
i predict that the median CS postdoc will be publishing less than 100 papers a year 5 years from now. Gabor predicts otherwise.
I’d take your side. In 5 years we could be in low-medium automation, high automation, or full automation, and all of these seem more likely to have <100 papers.
In low-medium automation there’s no way to get 100 papers a year of output at current quality bar.
In high or full automation AIs will be doing ~all the research, but papers are designed for humans. At 100 papers per year, the average paper will probably be read by less than 5 humans. So I’d expect standards to rise and work to be consolidated into fewer papers rather than the number of papers exploding. AIs that need to communicate could use some other format. It would be weird but not impossible for the median to hit 100.
In full automation, we have the same issues as high automation, plus the median postdoc will be basically obsolete and only the ones with lots of compute will contribute.
i wish more showers adopted the design where there is an on/off knob and a temperature knob. it’s so obviously better. on the other hand, i hate the single knob showers
What kind of single knob are you using?
I find the design below where ‘left-right’ controls temperature and ‘up-down’ controls flow intuitive and easy to use.
Most of the showers I have used in the US have a single dial / degree of freedom that goes from cold&low pressure to cold & high pressure to warm & high pressure. Where as most European showers I’ve used have 2 degrees of freedom, either in a single handle like your image or as separate dials
Same here. The strategy with these is start full-hot to bring the hot water to the shower quickest, then back off a bit to an appropriate temperature. In my experience, this is a pretty consistent position and not hard to get right first try.
How easy is design is to use depends on how precise you want the temperature to be controlled. In my experience a setup with two controls allows a bit more precision then having one control that’s both left-right and up-down.
Knobs are bad design, levers are better especially when the fingers are wet and there’s some amount of calcification that makes the knob harder to turn.
This is especially true when you have one control for the amount of flow and another control for the temperature that’s seldomly turned and thus easier calcified.
Tangentially related: How Many Shower Controls Are There? · Gwern.net
why are there no bicameral legislatures with one chamber apportioned by population, and one chamber apportioned by economic productivity?
I’m not sure if this is the answer you’re looking for, but: most things that could exist don’t. The space of ideas is wide, and few of them are implemented in practice. Is this idea particularly privileged in the space of possible governance ideas, in such a way where you would have expected it to have been tried?
The actual reason is probably the boring fact that political functions (laws) almost never take macroeconomic metrics as input, except population. Some take individual measures like “the sale price of a good” or “a company’s income”. It would be more idiomatic to apportion voting power to a house of district governors based on taxes contributed to the federal budget.
As to why that’s never happened. I dunno. I think basically any representation apportionment method except population or arbitrary clustering (or like, tradition, I guess? but there’s no tradition for monetary contribution) is pretty much taboo, and something something Chesterton’s Fence (though population-based voting has started to run into the Old People Voting Themselves Infinite Money exploit in the west).
There are no objective measures of economic productivity that would stand up to the Goodhart-pressure of being used as criteria for political (dis)empowerment. Whatever measures were chosen would be immediately gamed-to-hell, and would cease to measure anything but “who was in power when the measurement was conducted?”
What about costly signals? E.g. every year, each state chooses how much money they donate to the federal government. Their voting power in the second chamber is proportional to the size of their donation.
That might have some funny properties. Suppose that New York and Texas were equally rich, but New Yorkers uniformly want more government than Texans do. Texans would be made to pay more than New Yorkers are willing to, in order to outvote the New Yorkers and not actually use that money in the federal budget. At that point, the excess donations would presumably be returned to the states or the people.
So maybe this ends up being another second-price auction?
seems fine? if some states are willing to tax their citizens a lot more than other states and give all of that money to the federal government, seems reasonable to give them more say? and of course if they go too far, the residents there can simply choose to leave to a different state, just like they can now.
spending excess money seems way easier than the opposite problem. worst case you just stash it away. but realistically spending will always be ballooning out of control.
Hong Kong had seats in its legislature designed for special interest/business groups (see Functional constituency on wikipedia). I don’t understand it very well though.
Historically, the British House of Lords at least approximated representation based on wealth (albeit somewhat indirectly) and various other arrangements existed to guarantee the political power of the aristocracy who, before industrialization, were the wealthy.
This is not a very popular thing to do because it’s anathema to the normative basis of democracy (one person, one vote). If one believes that the wealthy deserve increased de jure political representation, then why have a democracy at all and not some sort of oligarchy? And, of course, democratic systems de facto do give the wealthy extra power via various channels.
if you really believe in one person one vote, then bicameraity is crazy. people in small states have vastly more power per person than in big states in the US
Bicameral systems can respect one person one vote, and they even do in the 49 states with bicameral legislatures.
The US Senate is a weird kludge that was necessary to secure support for the constitution. No one would arrive at it from first principles.
The best argument you can make for the Senate is that it’s necessary to protect vulnerable minorities (residents of small states). And democracies tend to trade off in various ways to protect vulnerable minorities at the expense of democratic purity (e.g., via constitutional provisions). That’s a pretty silly argument so far as it goes but does have a little more historical punch. You can’t make it for a hypothetical system with a chamber to protect the interests of the rich who are very obviously not a vulnerable minority at all. And there’s a much higher level of normative repugnance to “we’ve got to look out for the rich” than for most other groups you can fill in the blank with there.
what’s the point of having a second chamber if you’re going to apportion it by population as well? afaict, this is a historical oddity due to states copying the federal system but then getting squished by Reynolds v Sims, at which point it would have been really annoying to abolish the state senate. all the countries I can think of with a bicameral legislature (US, Canada, UK, Germany) have one of the chambers apportioned by something other than population.
There’s countries that use equal-population districts for both houses, but at that point it feels like the bicameralism is just copied from the US without strong reasoning (e.g. Italy, Japan, South Korea).
It’s certainly possible but very weird/pointless. IMO it’s not clear what the point is since the two houses will be effectively the same.
There were wealthy merchants that weren’t aristocrats even before the industrial revolution really started.
Sure, as I said it’s approximate and there are exceptions.
But entry into the peerage also wasn’t closed off, and sufficient wealth was a fairly common pathway. Get wealthy enough, make a few connections or well-placed donations, and you’d wind up in the House of Lords (e.g., the Rothschilds or William Lever—the “lever” in Unilever).
Population alone is easy to Goodhart, but through both gerrymandering and immigration, both of which take some degree of time to do. Economic productivity is even easier—set up a few innocuous subsidies, hook up your core constituents (at the expense of everyone else) in the span of a single election cycle, and you’ve got a permanent lock on that chamber. Daycare centers, bloated NoVA contracts, and perpetual small business loans are just the start.
At the subnational or individual level? Apportioning e.g. states this way has the issue of being arbitrary and sensitive to the specific borders you draw, but was proposed by Madison IIRC. You can “gerrymander” these states (intentionally or unintentionally) by packing a lot of high-productivity citizens into a state with a majority of low-income voters.
At the individual level, weighted setups like this were very common in the 19th century. I know Prussia/Germany, Belgium, Austria, and France used class-based voting systems of this sort. These systems were mostly swept away in the late 19th and early 20th centuries under pressure from liberals and social democrats (sometimes peacefully, as in Belgium, and sometimes by revolutionaries as in Germany).
Of these, the Prussian three-class franchise was the “purest”/closest to what you describe: people were assigned one of 3 classes based on how much they paid in taxes, with all three groups paying the same amount overall. (Taxes were roughly proportional to income.) Each of these groups received the same number of representatives.
the three-class franchise seems like one reasonable implementation of this type of policy within a unitary state. within a US-like federation, you could have the house continue to function as it does right now, and have a senate where the number of senators is proportional to the tax revenue contributed by each state.
Yep! Like I mentioned, this has the problem that state borders are fairly arbitrary and let you “launder” one group’s wealth into another group’s voting power.
But I just remembered—before Reynolds v. Sims established “one man, one vote”, New Hampshire used this exact system. From 1784 through 1964, districts were apportioned based on taxable wealth.
To be more precise—I don’t think it’s logically coherent to apportion voting power between states according to wealth, but between people by population (i.e. equally). Either you want to upweight high taxpayers or you don’t.
The capability to lobby using control over capital, in turn apportioned by economic productivity, seems like a kind of third column in the legislature?
the concept of a spontaneous unscheduled phone call is so strange and alien to me. you’re telling me there are people out there who want to be interrupted at random points in their day, and a large fraction of the time they are able to just pick up and talk? rather than constantly getting voicemails, and then leaving voicemails back because by the time you get around to replying, the caller is busy? do these people spend most of their days doing neither deep work nor being in social situations that would be rude to suddenly step away from?
I love spontaneous unscheduled phone calls!
You are telling me there are people out there who when they want to make progress on something that is blocked by another person, or where whenever some kind of thinking is best aided by another person, just… wait for hours or days at a time until they respond? Juggling 15-20 different messaging threads without getting any focused work done, instead of simply calling the person, resolving the issue and moving on? Do these people spend most of their days just waiting on other people to get back to them, or being in pre-scheduled calls all day that are scheduled for 30 minutes despite being resolvable in a 5-minute phone call?
No, we don’t wait hours or days, we text you asking, “Hey, when would be a good time to follow up?”
But like, they then might take hours to respond, or the scheduled call might happen hours or days later? That’s what I am referring to.
Also to be clear, this is all in jest. I have lived in both cultures. I have updated towards phone call world over time.
My rough guess is that only 1–5% of jobs involve deep work. Something like 1⁄3 of jobs are manufacturing and 1⁄3 are service/retail, none of which involve deep work.
I have met many people who believe a phone call takes priority over all other forms of social interaction, for some reason.
(My preferences are the same as yours FWIW.)
That’s unfairly dismissive. I can’t speak to retail, but manufacturing absolutely does require “deep work”. Machining requires concentration and technique in order to ensure parts have the right tolerances, surface finish, etc. Assembly work often involves deep thinking in order to ensure that the machine is correctly assembled and properly configured.
It’s not all routine “mind numbing” assembly line work, just as not all IT is routine mind numbing data entry.
sodium cotransport is really cool. while the gut can absorb glucose and sodium individually through several different pathways, there is a really important transporter (SGLT1) which carries glucose and sodium at the same time.
this is really important for rehydration. suppose you have cholera and vomit a lot and get super dehydrated as a result. drinking just water sucks, because you need to replenish the electrolytes that you’re losing too. but water with salts is still not optimal, because it’s absorbed less efficiently (also i think cholera interferes with some of the other absorbtion paths?). then one day some guy discovered cotransport, and invented a rehydration solution with not just salts but also glucose. this simple discovery and resulting invention has saved millions of lives
https://en.wikipedia.org/wiki/Oral_rehydration_therapy
I’d be really excited if anyone wanted to look at training circuit sparse models on the AlgZoo tasks and seeing if we can push the frontier of understandability.
it would be funny if, in the future, the boot sequence for the dyson sphere supercomputer still starts out in 16-bit real mode. the world’s most expensive 8086
obviously there’s also a lot of consumer demand, but I wonder how much of the trend towards food with less complicated ingredients being marketed with that as a major pro is because it’s more technically impressive to accomplish (my layman understanding is that the easy way to make viable commercial food is to just toss in a bunch of preservatives and emulsifiers and stabilizers and you have a lot of margin for error, and avoiding them requires a lot of creativity in leveraging the specific properties of the food you’re dealing with / modifying the packaging strategy to create a more elegant solution)
Not sure whether this is related, but I find many food recipes needlessly complicated. Like, I can sometimes remove 1⁄3 of the ingredients, and the result still tastes the same to me. Which of course makes me wonder, why were those ingredients included in the first place? Possible explanations:
my senses suck, those versions actually taste differently
my “sense memory” sucks and I can’t really compare the taste of one meal yesterday with the taste of another meal today, but I could if I had them side by side
the extra ingredients were there for reasons unrelated to taste (e.g. for easier digestion)
maybe some people prefer the version of meal with lots of X, and other people prefer the version of meal without X, but they didn’t have the courage to remove X completely, so they only left tiny amounts that have no impact on the result
Or maybe it’s some kind of signaling? Like “look how great cook I am, I can make a meal with 99 ingredients, even if a meal with 9 ingredients would taste exactly the same”?
Maybe the techniques of food preservation have improved, so that some preservatives are no longer needed. But no one bothered removing them from the recipe (they are probably cheap) until they realized they could make a marketing move out of it.
my guess is sth like 1; I think some people are a lot more sensitive to some flavors than others. also the extent to which you pay attention can affect flavor a lot. and fwiw I frequently notice that some restaurants make the same dish a lot better than other restaurants, and the major ingredients must be about the same (or at least the quality/quantity difference is small enough that it’s not the first thing I noticed), so it must be in the minor ingredients. but often my friends won’t notice a big difference and conversely I don’t notice a big difference in the foods they draw strong distinctions for
(there are some really subtle flavors that I like, that are very subtle and which most places get wrong in ways that I can’t put my finger on—e.g hainan chicken rice, gyu-don, edmonton style donair)
The boring hypothesis neither has to do with growing demand or signaling impressiveness, but with society generally getting better at food logistics over time, so being able to ship more fancy stuff for more reasonable prices.
Indeed, the price premium of organic foods has been shrinking. I tried to find price data for processed/non-processed foods, but couldn’t find any, so this is what we have for now.
everyone is a few hops away from everyone else. this applies in both directions: when I meet random people they always have some weak connection to other people I know, but also when I think of a collection of people as a cluster, most specific pairs of people within that cluster barely know each other except through other people in the cluster.
It’s worth noting that, though it’s true that for a sufficiently large cluster most pairs of people are not strongly connected, they are significantly more likely to be connected than in a random graph. This is the high clustering coefficient property of small-world graphs like the social graph.
the world is too big and confusing, so to get anything done (and to stay sane) you have to adopt a frame. each frame abstracts away a ton about the world, out of necessity. every frame is wrong, but some are useful. a frame comes with a set of beliefs about the world and a mechanism for updating those beliefs.
some frames contain within them the ability to become more correct without needing to discard the frame entirely; they are calibrated about and admit what they don’t know. they change gradually as we learn more. other frames work empirically but are a dead end epistemologically because they aren’t willing to admit some of their false claims. for example, many woo frames capture a grain of truth that works empirically, but come with a flawed epistemology that prevents them from generating novel and true insights.
often it is better to be confined inside a well trodden frame than to be fully unconstrained. the space of all possible actions is huge, and many of them are terrible. on the other hand, staying inside well trodden frames forever substantially limits the possibility of doing something extremely novel
It’s as efficient to work on many frames while easily switching between them. Some will be poorly developed, but won’t require commitment and can anchor curiosity, progress on blind spots of other frames.
corollary: oftentimes, when smart people say things that are clearly wrong, what’s really going on is they’re saying the closest thing in their frame that captures the grain of truth
″...you learn that there’s three kinds of intellectuals. There’s intellectuals that work in one frame. There’s intellectuals that work in two frames. And there’s intellectuals that change frames like you and I change clothes.”
for something to be a good way of learning, the following criteria have to be met:
tight feedback loops
transfer of knowledge to your ultimate goal
sufficiently interesting that it doesn’t feel like a grind
trying to do the thing you care about directly hits 2 but can fail 1 and 3. many things that you can study hit 1 but fail 2 and 3. and of course, many fun games hit 3 (and sometimes 1) but fail to hit 2.
corollary: for things with very long feedback loops, or where you aren’t motivated by default, it can be faster for learning to do something that is actually not directly the thing you care about
This is basically math (and computer science) education. On one hand, some parts are probably not very useful. On the other hand, some people expect that teachers will defend every single step along the way by explaining how specifically this tiny atom of knowledge improves the student’s future life. No, I am not preparing a PowerPoint presentation on how knowing that addition is associative and commutative will make you rich one day.
funnily enough, my experience has been almost entirely from the other direction—almost everything I know is from working directly on things I care about, and very little is from study. one of the reasons behind this shortform was trying to untangle why people spend lots of time studying stuff and whether/when it makes sense for me to study vs simply to learn by doing
I think it is good to use your goals as a general motivation for going approximately in some direction, but the opposite extreme of obsessing whether every single detail you learn contributes to the goal is premature optimization.
It reminds me of companies where, before you are allowed to spend 1 hour doing something, the entire team first needs to spend 10 hours in various meetings to determine whether that 1 hour would be spent optimally. I would rather spend all that time doing things, even if some of them turn out to be ultimately useless.
Sometimes it’s not even obvious in advance which knowledge will turn out to be useful.
it’s actually crazy how much ubering pareto dominates driving in a city like SF. you don’t have to worry about parking, you can work while in transit, you can get a bigger car when needed, you don’t need to round trip, etc. it’s generally even cheaper once you take depreciation, parking, insurance, etc costs into consideration.
My sense is that this is true until you have a small child you want to move around, and then it’s super super annoying to not have your car seat already installed for them and have other supplies on-hand
day 1 of using a new phone: there cannot be a single small bubble under my screen protector. it must be perfect.
day 1000 of using the phone: the square inch sized chunk of dead pixels on the screen is fine because it doesn’t usually cover anything important, and I can still read words in between the cracks
conference talks aren’t worth going to irl because they’re recorded anyways. ofc, you’re not actually going to remember to watch the recording, but it’s not like anyone pays attention at the irl talk anyways
a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it’s really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.
there are two different modes of learning i’ve noticed.
top down: first you learn to use something very complex and abstract. over time, you run into weird cases where things don’t behave how you’d expect, or you feel like you’re not able to apply the abstraction to new situations as well as you’d like. so you crack open the box and look at the innards and see a bunch of gears and smaller simpler boxes, and it suddenly becomes clear to you why some of those weird behaviors happened—clearly it was box X interacting with gear Y! satisfied, you use your newfound knowledge to build something even more impressive than you could before. eventually, the cycle repeats, and you crack open the smaller boxes to find even smaller boxes, etc.
bottom up: you learn about the 7 Fundamental Atoms of Thingism. you construct the simplest non-atomic thing, and then the second simplest non atomic thing. after many painstaking steps of work, you finally construct something that might be useful. then you repeat the process anew for every other thing you might ever find useful. and then you actually use those things to do something
generally, i’m a big fan of top down learning, because everything you do comes with a source of motivation for why you want to do the thing; bottom up learning often doesn’t give you enough motivation to care about the atoms. but also, bottom up learning gives you a much more complete understanding.
Seems to strongly echo Karpathy, in that top-down learning is most effective for building expertise
https://x.com/karpathy/status/1325154823856033793?s=46&t=iz509DJpCAibJadbMh4TvQ
often the easiest way to gain status within some system is to achieve things outside that system
“A prophet is not without honor except in his own town, among his relatives and in his own home.” (Mark 6:4)
adhd is a mechanism for seeking domains with tight feedback loops
it’s (sometimes) also a mechanism for seeking domains with long positive tail outcomes, rather than low variance domains
the project of rationality is the project of becoming slightly more mesaoptimizery
Corollary to Others are wrong != I am right (https://www.lesswrong.com/posts/4QemtxDFaGXyGSrGD/other-people-are-wrong-vs-i-am-right): It is far easier to convince me that I’m wrong than to convince me that you’re right.
Quite a large proportion of my 1:1 arguments start when I express some low expectation of the other person’s argument being correct. This is almost always taken to mean that I believe that some opposing conclusion is correct. Usually I have to give up before being able to successfully communicate the distinction, let alone addressing the actual disagreement.
has anyone done a good analysis of how to reduce fatality and injury risk of driving over a baseline of normal Uber? in particular, how much would each of the following matter:
a large SUV vs a normal sized car
Waymo vs well-rested professional chauffeur vs average Uber driver vs average American driving
also in particular interested in time-of-day segmented stats. several factors make this difficult. time of day accident data is confounded by intoxication and fatigue; but there is some bleed over, because someone else crashing into you is a large fraction as dangerous as crashing yourself. and afaict there isn’t good data segmenting by type of driver.
I was working on a post about how to drive safer and how much safer you would be, but all of the data is really annoying to work with. My calculator is here (code here), but it never got to the point where I thought it was good enough to share.
It’s really easy to find stats about drivers involved in crashes, but there’s very little data about base rates. For example, we know that a lot of people involved in crashes are drunk or texting, but we don’t know if the other drivers are sober and undistracted, or just didn’t crash. We also know that a lot of crashes happen at night and that a lot of people are drunk at night, but same problem.
One data set that might be helpful is trucking fleets, since they tend to have more info and actively care if their training helps. Trucking is complicated since most crashes are caused by the other driver (if a small car cuts a truck off, there’s not much they can do about it at that point), but investigators do try to figure out if it was the truck’s fault or not.
The interesting thing is that some fleets have much lower crash rates. It seems like Walmart truckers are much safer than independent truckers (theory: Walmart doesn’t want bad press, plus they own the trucks).
My suspicion is that if a normal driver in a well-maintained small car drove following all best practices (not tired or intoxicated, frequently looking around, driving defensively, not distracted, not cutting people off, avoiding bad weather), they would have several orders of magnitude lower chance of causing a crash. Avoiding driving at night (and again, bad weather) would somewhat reduce the risk of someone else hitting them, and driving a large SUV and avoiding rural roads would significantly reduce the risk of death.
Like I said, it’s hard to quantify this though since the data is so confounded.
the nuclear option
in the Senate of the US Congress, there is a “nuclear option” for overriding filibusters; a parliamentary method that can be used to ram legislation through. both sides generally agree to use it sparingly, because it’s a symmetric weapon.
i think there is a similar lever in arguments that is best left untouched if your goal is to actually find the truth. there is a type of argument that can be deployed in a wide range of circumstances, and is very hard to rebut except with even more nuclear arguments. the most extreme example is, suppose you are discussing the strength of some piece of evidence, and you say “well but it’s impossible to truly know that, because we can’t rule out the possibility that the entire external world is fake, a la Descartes’s demon.” or, “actually your argument is equally as conceptual as mine, because there’s no such thing as empirical evidence, since all evidence must be filtered through our minds and reason applied”. these kinds of arguments can be completely unbeatable and also utterly useless for answering actual questions we care about.
this is closely related to the idea of an argument that proves too much, though not all arguments that prove too much are completely nuclear.
https://www.lesswrong.com/w/fully-general-counterargument
tbh, i don’t really understand the concept of themes/symbolism in fiction books. aside from the most literal things. how much of this is just people being pretentious and/or reading tea leaves?
I think a lot of it is real!
PS. Happy to elaborate on this if you or others need a deeper defense rather than just a sense-check! I’d benefit from understanding your concern in more words if so. I can think of several different possible senses of what someone might mean in your parent comment and imo I’m happy to defend “a lot of it is real” for all those senses but it’s possible your objection is a secret fifth sense.
Themes are just things the fiction makes you think about that go beyond the particular events and characters of the book itself. If you’ve read a book where a character is thinking about cheating about her husband and you’re like “well I would never do that… but my marriage is strong, what if we were going through the same thing as them… would I be strong enough?” congrats that’s a theme. Themes can also be perfectly inane, “ha the bad guys in this are not coincidentally just like my real-life political opponents, and the hero just gave the perfect one-line reply that I would give if I was slightly faster on my feet, take that!”
If you want a higher bar for what counts as a theme, a book knows it will have this effect and thus prompts you to view the subject from many angles, disrupting you from being able to accept whatever your first “easy” answer is. This is what literary types tend to find useful and desirable about theme. If you read Crime and Punishment you will likely end up thinking a lot about how you manage and experience guilt, for instance.
Symbolism I personally find to be a rather stilted and inorganic way of doing theme, so someone else can better speak to what some get out of it, but it clearly is a thing that some fiction engages in.
I don’t think it is productive to conflate symbolism and themes. Symbolism is when an element of a story, or the description, correlates to some signifié. Say, “Rosebud” is a symbol of Charles Foster Kane’s youth and innocence. I’m not sure if parody (or inspiration) counts as symbolism—is the protagonist of Zola’s Œuvre a symbol for Paul Cézanne? I don’t know.
I’m loathe to bundle themes together with symbolism. While a selection of symbols throughout a work may comprise a theme. It would be a mistake to say “themes are symbols”. Not all themes need be comprised of symbols. Effective use of themes, at least in dramatic works, don’t rely on symbolism, instead they make the plot events themselves dramatizations of the theme.
In Bergman’s The Silence the theme is… well… silence… or perhaps more correctly: non-communication. They are strangers who don’t speak the language of the country they are in. They are sisters who cannot relate to each other. They are effectively and literally silent. And when they aren’t silent they aren’t communicating a with a whole lot of meaning. And it is a theme essential to the plot of the film, rather than a symbolic add on. Kubrick’s Dr. Strangelove has a similar theme [1]and similarly is repeated several times throughout the plot of the film.
These episodes are not symbols for anything, they are the thing itself.
And don’t get me started on Zappa’s Project/Object theory which proposes that they don’t need to symbolize a damn thing to give a theme power!
— Frank Zappa, The Real Frank Zappa Book.
Mamoru Oshii expressed a similar sentiment that it is the repetition of imagery in his films that creates meaning. [2]The theme is the fact there’s a theme at all I suppose.
In the case of Dr. Strangelove is “about Nuclear Apocalypse”—that is the subject, yes. But, as Kubrick himself said to Film critic Michel Ciment ” Failure of communication is a theme which runs through a number of my films” alluding to the plot point of telephone and short-wave radio not working, and transport impossible in the Shining. And Dr. Strangelove’s plot hinges on multiple episodes of, similarly, telephone and radio communication failing. The way this theme is explored in Eyes Wide Shut—a film that concerns itself with (paranoid) fantasies and infidelity—is a continuation of the theme is left as an exercise for the reader.
A theme is not always comprised of symbols, nor is the same as a subject.
The example I always use, because it relies on cliches, is a photo series about “age”. The series may have subjects as diverse as a budding flower, a newborn baby, a geriatric person, and a wilted flower. All the same theme—age. Four different subjects.
The subject of Dr. Strangelove is a Nuclear Apocalypse, the theme is failure of communication.
″ Eventually, I think, by using these elements repeatedly, I add meaning to my final product. I’m still exploring how to express my feelings through these elements. I’ve always felt that in order to portray humans, you should not be shooting humans; you should be shooting something else. And what I’ve used is animals, which are very important in my films.”
https://www.avclub.com/mamoru-oshii-1798208379#:~:text=Eventually,films
Depends on the authors. The more famous and classical the author is, the higher your prior should be that every sentence, name and scene serves a purpose to explore a character and thus the main theme of the book. Chekhov / Tolstoy / Shakespeare etc. are definitely on the highest density side of the spectrum. Fanfiction might often be pretentious.
I think leogao meant that the readers were being pretentious in their discusions of the book (looking at tea leaves).
one problem with UBI as a solution for AI economic disruption: at the moment when AI can first replace a human job, it will probably cost only epsilon less than the human. the cost will be mostly capital (datacenters, chips, electric plants, etc), rather than labor. so we can only afford to give the human epsilon UBI. as time goes on, eventually the AI gets cheap enough that humans can get substantial UBI, possibly exceeding their original income, as the AIs become more productive than the humans were. but there’s a big gap in the middle that we need to bridge somehow. the best case scenario is that different industries get automated at different times so that the gaps don’t line up, and we can redistribute the surplus from the first industries to be automated to fill the gap for later industries. the worst case is that all the gaps happen at once and we all starve to death because the surplus is not enough to keep people alive.
This can’t be right. The troublesome point you describe happens when there are already enough “AI workers” to displace all current jobs, but the extra productivity is still only epsilon (why?) and the number of “AI workers” isn’t growing explosively far beyond that (why?)
Anyway, the real problem isn’t that capital owners won’t have enough money to pay us UBI. It’s that that they… won’t pay us UBI. Simple as that.
Well, and the possibility that the capital “owners” might effectively be the AIs. But yes.
I don’t understand your objection. right now, the cost of replacing a given human with AI is greater than the cost of the human (because the compute is very expensive, the AIs are not very good, etc). over time, the AI gets cheaper and cheaper, until at some point it is precisely as expensive as the human. one day thereafter, AI will be very slightly cheaper than the human. you would prefer to pay for the AI compute instead of the human salary. at this moment in time, it will be economically incentivized to fire all the humans and replace them with AIs. because the AIs still cost almost exactly as much as humans at this moment, it won’t be economical to have substantially more AIs than you had humans the day before, because if it were, then we would have hired more humans in the first place; there must be diminishing returns to quantity of humans employed, and the previous equilibrium is still very close to the new equilibrium. but the amount of new value created for the world due to this switch is very small: only the delta between what the humans used to cost and what the AIs now cost.
Your economics are wrong for a few reasons. Let’s grant the hypothetical where all humans supply homogeneous labor at a uniform wage.
If AI is slightly cheaper than humans, what happen is that wages fall slightly. At the new, lower wages, there is more demand for labor (and more humans drop out of the labor force). At the same time, capital costs are bid up slightly. Eventually the price of AI and human labor is equal, and the quantity demanded is equal to the quantity supplied.
At the same time, you are increasing demand for labor to build the AI (right now labor is ultimately the main input to building all the stuff that goes in datacenters). If the social value of the AI is near zero, then the net increase in demand is almost the same as the net increase in supply. Lowering wages and increasing capital costs doesn’t offset the benefits of extra productive capacity, it just shifts value from laborers to capitalists.
The real fiscal issue in this scenario is that you are shifting output from labor to capital, and the tax rate on capital is lower than the tax on labor. (Moreover as you automate the economy there are further corporate reorganizations that would drive effect tax rates well below the on-paper capital gains rate). You’re doing that at the same time that you are potentially increasing spending, which is tough unless you are willing to adjust the tax code.
I’m inclined to agree with other commenters though that none of this seems like the most important issue. The fiscal issues can be overcome if the state cares, and my best guess is that growth will accelerate enough that it would be OK even if there was no political change.
People should have much bigger concerns about being completely materially disempowered: (i) the state may not continue to support them, either because they are politically disempowered or because the state itself is disempowered, and (ii) even if they are able to survive they will have no say over what the world looks like and that sucks in its own way.
My idea was, maybe the AI company is willing to sell you 1 unit of AI labor at human-competitive price, but if you order 1000 units they’ll ask for a higher price per unit, because they need to build more datacenters or something. In this case replacement of humans will be gradual even if all humans are equally productive. And another possibility is that humans aren’t all equally productive, so AI will first get good enough to replace the worst worker, then the second worst and so on. From these two reasons I get the possibility that by the time lots of people get replaced, the difference in productivity between AI and the average person replaced so far won’t be epsilon. It won’t be the full salary either, but maybe something substantial. Anyway that was it.
This does not follow at all. The total amount of production would somehow have to decrease, otherwise it’s just a question of distribution of resources, which is the whole point of UBI. To literally starve, they would need to shut down some amount of food production (the robots don’t eat).
food production still consumes resources that the robots do care about. fuel, machinery, logistics capacity etc.
Good point. Decreased quality of life due to competing with ai for basic resources has already begun (RAM prices) and will eventually show up in non direct goods.
What do you define as “replace a human job”? We are already seeing AI that can replace at least 50% of a job for very much cheaper than the 50% of the cost of paying a worker to do those parts of that job. In principle that means that many employers can fire half their workforce and get the remaining employees to pick up the other 50% of the jobs from the fired employees.
In practice this would involve huge disruption and uncertainty, and perhaps they can avoid that bother by letting go their most obviously least productive employees, lowering costs a little (say to 95%) to do the same or slightly more work with much reduced disruption. Over time, the employees who use more AI in the workflows take less effort to do the job. We are seeing this already.
This obviously isn’t a stable long-term economic behaviour. Those conservative employers probably will continue to decrease workforce slowly, while being eroded by employers much more willing to accept disruption eating into their markets at greatly reduced costs.
However, it takes time. The more capable that AI becomes per unit cost, the greater the advantage that disruption-tolerant employers will have, possibly leading to multiple larger failures of conservative employers or possibly rapid culture changes to avoid such failures, replacing large segments of workforce at some later point.
In this model (which matches what we are already seeing), the job losses are inevitable but come some economically significant and somewhat unpredictable time after the cost of AI drops well below the cost of employing a human to do some tasks.
There’s nothing that requires an economy to maintain a continuous equilibrium of perfectly distributed cost/productivity balances at all times, and we see plenty of past examples where it has not. Continuous changes to parameters in a complex system often result in sudden changes in behaviour, not just continuous ones.
it doesn’t matter whether you’re fully replacing one job, or partially replacing multiple jobs. my model still implies that the market value of human labor diminishes more than the amount of money needed to keep everyone at the same level of consumption as they did before
That seems like an argument for establishing UBI sooner rather than later.
You seem to be describing a situation where there is a temporary absence of sufficient funds for a UBI (the “big gap in the middle”) after which there’s plenty of money to fund the UBI, potentially at a higher level than people’s original income.
The generic solution for a temporary lack of necessary funds with lots of funds being available in the future is getting a loan to be paid off when the money comes in. This consumption-smoothing would be good from the perspective of the AI companies as well, as “everyone is out of work and has no money to spend”, if it persists for long enough for people to burn through their savings, predictably leads to “the revenue streams of the AI companies collapse”.
How it would work out in detail is unclear, but if AI companies end up with a lot of economic power, I’d expect that gets taxed in some form by whoever’s providing the UBI, and in the meantime the UBI provider goes into a bit of debt.
in this model, as soon as an ai is epsilon cheaper than a human, humans stop getting hired?
can someone who works in quant/HFT/market making help me understand whether the following is correct?
(assuming there is only a single exchange for simplicity,) order execution is hard because (a) the order book is of finite size, so placing a large order induces slippage, and (b) if you make a series of trades spaced far apart to wait for more liquidity to show up, the value of the stock can move, (c) if you make a series of trades predictably, then HFTs can fuck you over by clearing out the order book 1ms before you buy and turning around to sell to you.
so unavoidably you have to make a series of trades. you need to trade off between speed of executing on everything and the liquidity available. the HFTs can only fuck you over if they know exactly when you’re going to trade, because they bleed money whenever they mispredict and try to frontrun an order that doesn’t exist. you also want some way to hedge the possible price movements.
I am not a quant, but have some related background. (Those who know this area best, may not be inclined to say.)
”Real traders” have many ways to avoid getting front-run to the extreme degree suggested in (c), including limit orders and “trying not to be that predictable” by disguising action to look like other forms of flow.
The amount of pain you experience from (b) depends on whether you think your strategy’s value decays rapidly or slowly.
But there is is a more general problem: it is not just HFT’s but the market as a whole that reacts to your actions: your impact will shift the demand curve for the stock. the size of that impact depends on the information leaked by your actions, information leaked by passage of time, and time allowed for new liquidity to arrive.
there is academic work on theoretical “square-root laws of market impact”
https://mfe.baruch.cuny.edu/wp-content/uploads/2012/09/Chicago2016OptimalExecution.pdf
but predicting actual impact is hard for a number of reasons (limited data, causality issues)
Knowledge of what other players can and can’t infer from your execution, and modeling impact patterns well, is a multiplier on the value of strategies, hence worth spending a lot to get right.
I work in quant trading, but not specifically in order execution. These are all real concerns. Which ones are most important depends heavily on your strategy and market, e.g. if your positions last for days order execution is a lot easier than if they last for minutes. And time-based slippage might be big or small relative to tick size.
This isn’t quite true, sophisticated funds can exploit almost any predictability.
You could prevent this by using a limit order, yes? But execution still seems hard even if you are allowed to use limit orders.
In general, it seems to me unnecessary to know exactly when you will trade in order to exploit you. If I can predict your future trades, and believe they would be profitable for you, I can take advantage by doing those trades myself first. In fact, unless I think your trades would be profitable only at the exact time you plan to do them, I likely want to make your trades as soon as I can, not wait for when you were going to make them.
Naive use of limit orders will cause you to lose the profitable trades, and fill the unprofitable ones. There are ways around this, but it’s not trivial.
Let’s say I’m a market maker. Assuming there’s no way for me to hedge my position when my quote is taken, how do I get out of my position when someone hit my quote?
The standard way to do this is to “lean” on your position. If my estimated price of something is $100 and someone bought against me, then I’ll adjust my estimated price to something like $101. The more position I’m holding right now, the more I adjust. When the adjusted price moves too far away from my quotes, I pull my quote back. That’s why when you trade against market makers, you might get a worse price for your next order.
i really wish there were a better platform for repeatable cognitive testing than brainlabs.me. the website feels like it is about to fall over from a light breeze, and i would be very sad if i suddenly lost my method of measurement because the site disappeared. also, there doesn’t seem to be particularly strong evidence that these tests in particular are the right ones to be looking at.
some of the cantab tasks are apparently repeatable https://www.tandfonline.com/doi/epdf/10.1080/23279095.2020.1722126
I’ve always been relatively unfamiliar with normal pop culture, so I recently decided to look at several online lists of best/most recognizable songs and made a spotify playlist of several hundred of them, with a bias towards more recent songs. I think this has been much better than the Spotify recommendation algorithm, which mostly shows me songs similar to ones I’ve already listened to.
(“taylor swift”? is that the Taylor series person or the Swift programming language person?)
I think it’s an algorithmic implementation of the Taylor-series decomposition optimized for performance? Claude tells me it might be useful for high-frequency trading, game engines, and real-time signal processing.
Upvote-disagree.
There’s some times when I wish for an eye-roll react.
are the Cambridge Brain Science cognitive tests actually reliable and relatively immune to practice effects? I want to have some mostly repeatable measurement of my own cognitive abilities over time, for health tracking reasons, but it’s unclear to me how reliable it is
I was thinking of trying out Sustained Attention to Response Task (SART) with response feedback (SART 2). I’m not sure how it compares to Dual n-Back FAQ · Gwern.net.
fun side project idea: create a matrix X and accompanying QR decomposition, such that X and Q are both valid QR codes that link to the wikipedia page about QR decomposition
current understanding of optimization
high curvature directions (hessian eigenvectors with high eigenvalue) want small lrs. low curvature directions want big lrs
if the lr in a direction is too small, it takes forever to converge. if the lr is too big, it diverges by oscillating with increasing amplitude
momentum helps because if your lr is too small, it makes you move a bit faster. if your lr is too big, it causes the oscillations to cancel out with themselves. this makes high curvature directions more ok with larger lrs and low curvature directions more ok with smaller lrs, improving conditioning
high curvature directions also have bigger gradients. this is the opposite of what we want because in a perfect world higher curvature directions would have smaller gradients (natural gradient does this but it’s usually too expensive). adam second moment / rmsprop helps because it makes gradients stay exactly the same size when the direction gets bigger, which is sorta halfway right
applied per param rather than per eigenvector
in real NNs edge of stability means it’s actually even more fine to have a too-high lr: the max curvature increases throughout training until it gets to the critical point where it would diverge, but then instead of diverging all the way the oscillations along the top eigenvector somehow cause the model to move into a slightly lower curvature region again, so that it stabilizes right at the edge of stability.
for Adam, these oscillations also cause second moment increases, which decreases preconditioned max curvature without affecting the original curvature. so this means the original max curvature can just keep increasing for Adam whereas it doesn’t for SGD (though apparently there’s some region where it jumps into a region with low original max curvature too)
papers
https://distill.pub/2017/momentum/ really cool momentum explainer
https://arxiv.org/abs/2103.00065 - edge of stability
https://arxiv.org/abs/2207.14484 - edge of stability for adam
What does “the lr” mean in this context?
learning rate
Some aspirational personal epistemic rules for keeping discussions as truth seeking as possible (not at all novel whatsoever, I’m sure there exist 5 posts on every single one of these points that are more eloquent)
If I am arguing for a position, I must be open to the possibility that my interlocutor may turn out to be correct. (This does not mean that I should expect to be correct exactly 50% of the time, but it does mean that if I feel like I’m never wrong in discussions then that’s a warning sign: I’m either being epistemically unhealthy or I’m talking to the wrong crowd.)
If I become confident that I was previously incorrect about a belief, I should not be attached to my previous beliefs. I should not incorporate my beliefs into my identity. I should not be averse to evidence that may prove me wrong. I should always entertain the possibility that even things that feel obviously true to me may be wrong.
If I convince someone to change their mind, I should avoid say things like “I told you so”, or otherwise try to score status points out of it.
I think in practice I adhere closer to these principles than most people, but I definitely don’t think I’m perfect at it.
(Sidenote: it seems I tend to voice my disagreement on factual things far more often (though not maximally) compared to most people. I’m slightly worried that people will interpret this as me disliking them or being passive aggressive or something—this is typically not the case! I have big disagreements about the-way-the-world-is with a bunch of my closest friends and I think that’s a good thing! If anything I gravitate towards people I can have interesting disagreements with.)
I find it a helpful framing to instead allow things that feel obviously false to become more familiar, giving them the opportunity to develop a strong enough voice to explain how they are right. That is, the action is on the side of unfamiliar false things, clarifying their meaning and justification, rather than on the side of familiar true things, refuting their correctness. It’s harder to break out of a familiar narrative from within.
lingao qiming is the hardest scifi I’ve ever read. it puts other “hard” scifi like project hail mary or three body problem to shame. the basic conceit of the book is that it’s an isekai where some people discover a wormhole to a parallel universe exactly like ours but during the time of ming dynasty china, and decide to being 500 technical specialists and a bunch of modern supplies to the past to try and conquer ancient china. the vast majority of the book is devoted to discussing every single technical aspect in excruciating well-researched detail. you don’t simply have a paragraph about them deciding to buy guns, you get an entire chapter of different gun experts arguing back and forth about exactly which gun to buy based on maintainability, range, differences between civilian and military models, semi automatic vs fully automatic. of what I’ve read so far, the depictions of arguments between technical experts seem true to life. imagine this but for everything: highway construction, hydroelectric power, water purification, medical supplies, port construction, etc etc. it’s helped me feel the fractal complexity of the world much more. it’s also absurdly long. i don’t know if I’ll ever finish it.
I first learned about Lingao Qiming from Afra Wang’s recent post interviewing Ma Qianzu, one of its lead writers. She also wrote a Wired article about it
20 years, millions of words, thousands of authors, 1,400 derivative works, jeez…
Pedantry, but: the term “hard sci-fi” is used to refer to “science fiction characterized by concern for scientific accuracy”, and time travel is not. This seems to be more adequately described as an isekai ratfic.
Yes, I would have to agree here. The parts that are sci-fi (almost none) are not hard, and the parts that are hard are not sci-fi.
afaict, the time travel part only exists to set the story up, and does not come up again at all. it exists only to set the story up. the entire rest of the story is stuff like this:
On the one hand, yeah. On the other hand, the rest of the story (AFAICT based on your description) isn’t really that sci-fi, let alone “hard”, except insofar as it’s set up by the time travel. You could just as well write a story about the Spaniards ultra-strategizing about efficiently conquering the Mexica or the Inca.
I’ve learned a lot about various people around me, and myself, over the past few days.
Comment with practically 0 infromational value (due to total absence of context) but 37 12 karma/agreement feels like “twitter” in the bad sence of this concept, not LW. Which is very sad for me as an old reader. You probably mean something related to american politics, but I suppose many users are not american and dont even have much knowledge about this things. Maybe you mean something totally different. Maybe OpenAI and antrophic drama? I cant even make sense from this.
He works at OpenAI and has been critical of their contract, so I assume it’s related to that.
it’s obviously about the department of war situation. it wasn’t intended to be vague, it just didn’t occur to me that it wouldn’t be obvious what it was about.
Btw is it clear that more control of US government on AI companies is bad for safety in the long term? Yes, locally AI can be used for ~bad, things, but it may be easier to coordinate with different countries or to slow progress down in the face of danger. Because government will be 1 agent instead of 4 companies racing with each other (even in 1 country) and is not motivated by increasing profits.
Total absence of context is relative, I think a lot of people here knows he works at OpenAI and the timing of this post heavily implies the OpenAI/Anthropic/DoD drama. Although I also agree that it is quite vague, it is somewhat understandable if he cannot or don’t want to be too specific, and knowing the background of this post I do think there are a small amount of information.
I would bet with high uncertainty that 10% of regular LW readers and 75% of people who have read this quick take know the background context.
Also I just looked it up, according to the 2024 LW Census ~50% of users are American, which is a bit lower than I expected.
I’m an American who semiregularly (recently, more) reads LW and has heard of the DoW thing, and still was unsure what leogao was talking about. Just put the four words “I work at OpenAI.” Since I don’t regularly read the news, for all I know something else happened; or maybe it’s about some recent ‘community drama’ (e.g. a recent post on the EA forum about CEA not taking action against sexual harassment came to mind).
Additionally, one thing I like about LW is timelessness. It’s probably the web forum that most discusses old posts, due to them being more relevant along with culture + affordances for old reading. In 10 years, a sentence like “I work at OpenAI. The Department of War has made me learn a lot about various people around me, and myself, over the past few days.” is still useful. Currently, 4 days is enough to make me unsure what is being referenced.
I don’t want to have to make every shortform a self contained article. it makes sense that full posts should explain the context, but I would find it very exhausting to have to e.g explain that I work at openai every single time I shortform post about anything openai related. if lesswrong shortform is the wrong place to do this, I’m happy to post elsewhere.
Like what?
it feels rude to talk about specifics about other people. at a broad level, there are some people I’ve gained a lot of respect for. it’s easy for people to say they care about safety, so I don’t weigh that very heavily. but now I know who’s willing to step up and take actions in a crisis. and conversely too.
Did you discover your love for vagueposting?
publicly registering a bet with Emmett Bicker: I predict that on February 16, 2027, there will be at least 3 people at one of openai/anthropic/GDM who work on kernels full time (or, a larger number of people spending part of their time working on kernels, such that the time spent adds up to 3 FTEs, capped at 50 people maximum). if all of these companies have gone bankrupt or pivoted heavily or cut their workforce substantially because of a market crash or AI winter, this resolves in my favor. if AI kills everyone or creates the glorious posthuman utopia before then, it resolves in Emmett’s favor (regardless of whether there are still people who do kernels work for fun).
ethical offsets for eating meat are difficult because it’s hard to quantify the expected impact of e.g donating to an animal rights charity, and compare it to the impact of eating meat. (if you pay for a lobbyist to talk to a congressman for 30 minutes about larger cages for chicken farming, how much does this improve chicken lives, and how many chicken lives saved is that equivalent to?)
here’s a much simpler solution: almost everyone agrees that a human is more morally valuable than a cow, even if the human is far away in a distant land. (the cow is also far away!) it costs ~$5000 to save a human life—while arriving at this number obviously still requires assumptions, they’re much less rickety.
cows are pretty big, so you can get about 500lb of beef from each cow. beef is about $6/lb, so it’s not completely crazy to spend $10/lb on ethical offsetting. this is less bad than it sounds, because the raw cost of the beef is probably only a fraction of your total expenditures on food.
(unfortunately, chicken enjoyers get kind of screwed because chickens are much smaller, and can’t use the trick of amortizing the $5000 over a lot of meat. so they need to either make up an exchange rate between human and chicken lives, or use a conservative bound of 1:1. thankfully, I like beef more than chicken anyways.)
hypothesis: the kind of reasoning that causes ML people to say “we have made no progress towards AGI whatsoever” is closely analogous to the kind of reasoning that makes alignment people say “we have made no progress towards hard alignment whatsoever”
ML people see stuff like GPT4 and correctly notice that it’s in fact kind of dumb and bad at generalization in the same ways that ML always has been. they make an incorrect extrapolation, which is that AGI must therefore be 100 years away, rather than 10 years away
high p(doom) alignment people see current model alignment techniques and correctly notice that they fail to tackle the AGI alignment problem in the same way that alignment techniques always have. they make an incorrect extrapolation and conclude that p(doom) = 0.99, rather than 0.5
(there is an asymmetry which is that overconfidence that alignment will be solved is much more dangerous than overconfidence that AGI will be solved)
It’s differential progress that matters in alignment. I.e., if you expected that we need additional year of alignment research after creating AGI, it still looks pretty doomed, even if you admit overall progress in field.
sure, but seems orthogonal to the thing i’m describing—the claim is that a lot of alignment work on current models has ~no bearing on progress towards aligning AGI.
it is often claimed that merely passively absorbing information is not sufficient for learning, but rather some amount of intentional learning is needed. I think this is true in general. however, one interesting benefit of passively absorbing information is that you notice some concepts/terms/areas come up more often than others. this is useful because there’s simply too much stuff out there to learn, and some knowledge is a lot more useful than other knowledge. noticing which kinds of things come up often is therefore useful for prioritization. I often notice that my motivational system really likes to use this heuristic for deciding how motivated to be while learning something.
I think it might also depend on your goals. Like how fast you want to learn something. If you have less than ideal time, then maybe more structured learning is necessary. If you have more time then periods of structureless/passive learning could be beneficial.
Understanding how an abstraction works under the hood is useful because it gives you intuitions for when it’s likely to leak and what to do in those cases.
takes on takeoff (or: Why Aren’t The Models Mesaoptimizer-y Yet)
here are some reasons we might care about discontinuities:
alignment techniques that apply before the discontinuity may stop applying after / become much less effective
makes it harder to do alignment research before the discontinuity that transfers to after the discontinuity (because there is something qualitatively different after the jump)
second order effect: may result in false sense of security
there may be less/negative time between a warning shot and the End
harder to coordinate and slow down
harder to know when the End Times are coming
alignment techniques that rely on systems supervising slightly smarter systems (i.e RRM) depend on there not being a big jump in capabilities
I think these capture 90% of what I care about when talking about fast/slow takeoff, with the first point taking up a majority
(it comes up a lot in discussions that it seems like I can’t quite pin down exactly what my interlocutor’s beliefs on fastness/slowness imply. if we can fully list out all the things we care about, we can screen off any disagreement about definitions of the word “discontinuity”)
some things that seem probably true to me and which are probably not really cruxes:
there will probably be a pretty big amount of AI-caused economic value and even more investment into AI, and AGI in particular (not really a bold prediction, given the already pretty big amount of these things! but a decade ago it may have been plausible nobody would care about AGI until the End Times, and this appears not to be the case)
continuous changes of inputs like compute or investment or loss (not technically an input, but whatever) can result in discontinuous jumps in some downstream metric (accuracy on some task, number of worlds paperclipped)
almost every idea is in some sense built on some previous idea, but this is not very useful because there exist many ideas [citation needed] and it’s hard to tell which ones will be built on to create the idea that actually works (something something hindsight bias). this means you can’t reason about how they will change alignment properties, or use them as a warning shot
possible sources of discontinuity:
breakthroughs: at some point, some group discovers a brand new technique that nobody had ever thought of before / nobody had made work before because they were doing it wrong in some way / “3 hackers in a basement invent AGI”
depends on how efficient you think the research market is. I feel very uncertain about this
importantly I think cruxes here may result in other predictions about how efficient the world is generally, in ways unrelated to AI, and which may make predictions before the End Times
seems like a subcrux of this is whether the new technique immediately works very well or if it takes a nontrivial amount of time to scale it up to working at SOTA scale
overdetermined “breakthroughs”: some technique that didn’t work (and couldn’t have been made to work) at smaller scales starts working at larger scales. lots of people independently would have tried the thing
importantly, under this scenario it’s possible for something to simultaneously (a) be very overdetermined (b) have very different alignment properties
very hard to know which of the many ideas that don’t work might be the one that suddenly starts working with a few more OOMs of compute
at some scale, there is just some kind of grokking without any change in techniques, and the internal structure and generalization properties of the networks changes a lot. trends break because of some deep change in the structure of the network
mostly isomorphic to the previous scenario actually
for example, in worlds where deceptive alignment happens because at x params suddenly it groks to mesaoptimizer-y structure and the generalization properties completely change
at some scale, there is “enough” to hit some criticality threshold of some kind of thing the model already has. the downstream behavior changes a lot but the internal structure doesn’t change much beyond the threshold. importantly while obviously some alignment strategies would break, there are potentially invariants that we can hold onto
for example, in worlds where deceptive alignment happens because of ontology mismatch and ontologies get slowly more mismatched with scale, and then past some threshold it snaps over to the deceptive generalization
I think these can be boiled down to 3 more succinct scenario descriptions:
breakthroughs that totally change the game unexpectedly
mechanistically different cognition suddenly working at scale
more of the same cognition is different
The following things are not the same:
Schemes for taking multiple unaligned AIs and trying to build an aligned system out of the whole
I think this is just not possible.
Schemes for taking aligned but less powerful AIs and leveraging them to align a more powerful AI (possibly with amplification involved)
This breaks if there are cases where supervising is harder than generating, or if there is a discontinuity. I think it’s plausible something like this could work but I’m not super convinced.
In the spirit of https://www.lesswrong.com/posts/fFY2HeC9i2Tx8FEnK/my-resentful-story-of-becoming-a-medical-miracle , some anecdotes about things I have tried, in the hopes that I can be someone else’s “one guy on a message board. None of this is medical advice, etc.
No noticeable effects from vitamin D (both with and without K2), even though I used to live somewhere where the sun barely shines and also I never went outside, so I was almost certainly deficient.
I tried Selenium (200mg) twice and both times I felt like utter shit the next day.
Glycine (2g) for some odd reason makes me energetic, which makes it really bad as a sleep aid. 1g taken a few hours before bedtime is substantially less disruptive to sleep, but I haven’t noticed substantial improvements.
Unlike oral phenylephrine, intranasal phenylephrine does things, albeit very temporarily, and is undeniably the most effective thing I’ve tried, though apparently you’re not supposed to use it too often, so I only use it when it gets really bad.
a corollary to the hazards of arguing against bad takes: please don’t write things that are defined entirely by trying to avoid the reader coming away with specific bad takes or misunderstandings people often have.
you should write things primarily to nail down the concepts unambiguously for an audience of generic smart people. your idea should be defined by what it is, and not what it is not. it isn’t SCP-055.
if you really need to, add a “things i don’t mean” section to concretely describe and disavow some common misunderstandings. but it should be possible to delete this section and still understand precisely what you are saying.
relatedly: please don’t let stupid and/or malicious people dictate your vibe. a common pattern i see is people being scared of saying anything that might sound optimistic, because someone will see it and think “oh, great, alignment is solved, thank goodness, i can keep accelerating capabilities / not doing anything to make things go well”.
it’s very easy to make clear to all sane people that you think things will not go well. anyone who is still confused is either stupid, in which case their opinion doesn’t matter; or looking for some excuse to do the thing they wanted to do anyways, in which case you are not counterfactual in them doing whatever; or malicious, in which case they are trolling you and finding it amusing to mess with you, in which case you shouldn’t negotiate with terrorists.
if it’s too difficult to perceive this pattern in yourself, you can observe some of the climate change people and how they are often extremely upset by any suggestion that e.g solar is on track to solve big chunks of global warming. people will often refuse to engage on the object level question of whether solar really is enough, and say outright that if we adopt the attitude that climate change is on track then everyone will stop doing the things that need to be done, thereby throwing us off track.
I should put a reminder like this on top of my computer screen.
One reason I am often writing long comments is a feeling of defensiveness, as if I don’t make my case perfectly ambiguous and bulletproof, by adding more and more words, of course someone will pick up the worst possible misinterpretation. (I had people like that in my life in the past.)
i often think specific capabilities projects are quite unlikely to work, and therefore not worth taking into account when coming up with my alignment approach, and also simultaneously that my alignment project is quite unlikely to work, but it’s worth trying in case it does work. why this asymmetry?
i claim this is rational. the key is that upside risk and downside risk should be treated different. if i think an alignment approach has a 1% chance of working, it might still be worth spending my life on. but if i think there’s a 1% chance some capabilities technique might work that invalidates my alignment approach, i shouldn’t spend time thinking too much about that possibility.
suppose the success of my alignment approach and the success of a particular capabilities approach are uncorrelated, my alignment approach is completely ruined if the capabilities approach succeeds, and both are 1% likely to succeed. then, the delta from trying to make my alignment approach work multiplies p(doom) by 0.9901, whereas trying to make my approach succeed while also taking the capabilities approach into account only reduces p(doom) very slightly more—it multiplies p(doom) by 0.99.
I think that, if an alignment proposal appears to you to have a 40% chance of working (let alone 1%), then in reality, it very likely won’t work.
the 1% is just illustrative. i think this is missing the point. you can feel free to apply whatever transformation you want to your inside view probability. suppose you think that plans that feel 99% likely to work on the inside view are actually only 1% likely to work overall. then just use my analysis for plans that feel 99% likely to work.
separately i think death with dignity as a post is wrong. it is so preoccupied with arguing against a specific kind of dumb person—one who would say “well if it’s hopeless we should just do some random ass stupid gamble and play for variance”—that it prescribes something insane for the rest of us. the way to win is to do lots of uncorrelated research bets that are individually unlikely to succeed but also do no harm if they fail! that’s the way surprising and novel scientific inventions have always happened in the past. (yes, i know many approaches are actually correlated. this is a skill issue)
I disagree that alignment will probably get solved by someone pursuing a direction that seems very unlikely even to them. It seems to me that the right way to do things is to figure out what the hard parts of alignment are and then to try to solve them.
I think that’s because those fields had reasonably good feedback loops, and so the strategy of “try a bunch of things and see if any of them work” is generally viable. As a counterexample, Einstein’s methodology was on the opposite end of the spectrum.
well, no, i think alignment will probably get solved by someone who has delusional confidence that they have the one true approach, who maybe on some intellectual level knows that it’s 1% likely to succeed but feel irrationally driven to make it work, embedded inside a system that is not delusional and able to assess alignment approaches rationally. most such people will be wrong in their delusion. perhaps the world will be destroyed by people with delusional confidence embedded inside systems that are also delusional. but it’s very hard to truly devote yourself to something if you emotionally feel like it’s hopeless, even if it is the +ev move. human motivation is not driven by reason, you cannot simply make yourself excited to work on something because it is +ev.
i’ll name three examples of things in AI safety that seem unlikely to succeed and yet worth pursuing, where it would be pretty clear if we succeeded, and where the downside if we fuck it up is relatively small and reliably detectable. not saying these have no externalities whatsoever, or are 100% foolproof, but i think they are pretty damn good and certainly an existence proof that we can do better than just giving up.
ambitious mechinterp. i think the existing metrics are not very good, but it is possible to create very rigorous measures of interpretability, which are very difficult to make go up, but if done so, should give us a pretty meaningful amount of confidence in the safety
hardware verifiability for an eventual treaty. a lot of political uncertainty of course. but it pretty clearly strictly improves our chances of surviving, has few externalities, and while it’s possible to fuck up, it doesn’t need to be superintelligence-robust, because the adversaries are other humans for quite a while.
bengio style bayesian world model stuff. another approach that’s very hard to make work, because bayesian stuff is hard. but if it does work, we can cut down a huge amount of risk. this is the least waterproof item on this list but it still seems pretty good.
I wonder if creating a superintelligence requires applying a way to increase capabilities at the cost of alignment. At the time of writing, Anthropic’s ECI increases linearly with time and I suspect some constraints related to model sizes. Therefore, I don’t expect acceleration of capabilities progress in any way besides introducing new architectures undermining alignment techniques (e.g. neuralese, as in AI 2027) or severe scaling which poses its problems like models learning to reason without the CoT or by using a faked CoT.
P.S. Could you describe your proposed alignment approaches in more detail? Something like many more novel methods in mechinterp?
That seems false to me. If we had a textbook from the future on AI alignment, I think we would certainly be able to build an aligned ASI.
i was recently in an Uber and the driver started talking to me about Musk v OpenAI (almost completely unprompted! i had only mentioned that i do computer stuff.)
sometimes, my answer to a question flips multiple times as you move along the axis from literal answer to spiritually-accurate answer. unfortunately hard to share specific examples for privacy reasons
I think that may indicate that it’s not one question, but multiple different questions that happen to use the same words. Language is funny.
can someone convince me that the orbital datacenter thing makes sense at all
I think it’s less stupid than it sounds, but doesn’t make sense (until after extremely capable AI or a ways into the future) except as a regulatory arbitrage.
is it less stupid than it sounds because regulation is likely to be major blocker, and so it’s worth spending a lot on putting them in space?
One argument is that if public sentiment towards AI turns negative to the extreme, datacenter security costs may increase substantially. Building an orbital datacenter is one way of making it highly secure against the masses without worrying about obscuring the location.
Put another way: by having gravitational advantage, it’s able to exert power over people on earth while being highly resistant to people on earth exerting power over it.
It is a castle at the top of the largest hill, which means it can rain down whatever it likes on the people below while not worrying that they will climb up to put a stop to it.
For the orbital data centers, it makes more sense if you think of it as many small satellites rather then a few big ones.
SpaceX’s FCC filing talks of up to one million satellites in sun-synchronous orbit, (PDF here).
This makes the issues with cooling more manageable. The starlink v3 satellites are supposed to have 20kw of power each, so radiating out this waste heat on these scales is not impossible (Elon tweets about ~100kw per ton of satellite, and designing GPUs to run at higher temperatures).
Such an orbital GPU cloud would make more sense for inference, not training.
Overall I am not yet convinced that this is competitive with earth-based data centers, but it seems less stupid than I imagined at first.
i predict that 10 years from today, i will be able to find a pound of ground beef in some supermarket in Boston for less than 10 USD (adjusted for inflation to dollars today). JC Tidefield predicts otherwise
OpenAI researcher confirmed to have “long timelines”. Still expects supermarkets and ground beef to be a thing.
i guess i didn’t think too carefully about how this bet resolves if in 10 years the concepts of dollars, cows, and supermarkets are no longer coherent. resolves na i guess.
also, I’m surprised anyone is paying attention to my random bet registrations on here. i mostly make bets when people make crazy claims that i think are obviously wrong, rather than proportional to importance. i post them here because it’s the least visible place i can think of that’s still public enough to be a site of record.
The watchful eye of the moderators sees all.
I skim through all new content in https://www.lesswrong.com/quicktakes every day, which is conveniently time-sorted on latest comment for this use case.
scifi setting idea: movement from rural areas and small cities to larger cities continues until approximately everyone lives in one of like 10 different megacities; all of the farmland and oil fields and mines and whatnot in between are 99% roboticized, with only occasional human repairs; all of the cities are tightly connected by supersonic travel, which becomes more feasible because there are very few people on the ground outside cities to get annoyed by the noise; drugs solve sleep and allow effortless adaptation to jet lag. uniquely, SF nether expands to become a megacity, nor disappears into irrelevance; housing becomes so absurdly expensive that only the very best researchers and engineers can afford to live there, causing a huge selection effect towards talent density.
Bits of this remind me of The Caves of Steel by Asimov.
what is the current best scientific understanding of how bad ozone redistribution (less ozone in upper stratosphere, but more in lower stratosphere, with same overall amount) is compared to ozone disappearing entirely?
if you could ask the average american any question (or questions), what would it be?
TIL that it’s highly nontrivial to figure out which direction true north is given magnetic north and your location on earth.
I had always assumed that you could treat the earth as a big magnet with the magnetic north pole in a slightly different place than true geographical north. but apparently the magnetic field of the earth is a really weird fucked up shape.
https://en.wikipedia.org/wiki/Magnetic_declination
Meh? If 15° accuracy is good enough for you, that map of level curves shows you that most places on land will be fine, along with the parts of the ocean usually used for moving between land.
it’s pretty elegant that shapley values assign 1/population of the credit to each individual voter in an election.
the difference between activation sparsity, circuit sparsity, and weight sparsity
activation sparsity enforces that features activate sparsely—every feature activates only occasionally.
circuit sparsity enforces that the connections between features is sparse—most features are not connected to most other features.
weight sparsity enforces that most of the weights are zero. weight sparsity naturally implies circuit sparsity if we interpret the neurons and residual channels of the resulting model as the features.
weight sparsity is not the only way to enforce circuit sparsity—for example, Jacobian SAEs also attempt to enforce circuit sparsity. the big advantage of weight sparsity is that it’s a very straightforward way to be sure that the interactions are definitely sparse and have no interference weights. unfortunately, it comes at a terrible cost—the resulting models are very expensive to train.
although in some sense the circuit sparsity paper is an interpretable pretraining paper, this is not the framing I’m most excited about. if anything, I think of interpretable pretraining as a downside of our approach, that we put up with because it makes the circuits really clean.
only a fool is easily parted from his money. but even the most wise, intelligent, and savvy are routinely parted from his power and influence
I wonder how many supposedly consistently successful retail traders are actually just picking up pennies in front of the steamroller, and would eventually lose it all if they kept at it long enough.
also I wonder how many people have runs of very good performance interspersed by big losses, such that the overall net gains are relatively modest, but psychologically they only remember/recount the runs of good performance, whereas the losses were just bad luck and will be avoided next time.
for a sufficiently competent policy, the fact that BoN doesn’t update the policy doesn’t mean it leaks any fewer bits of info to the policy than normal RL
Something between training the whole model with RL and BoN is training just the last few layers of the model (for current architectures) with RL and then doing BoN on top as needed to increase performance. This means most of the model won’t know the information (except insofar as the info shows up in outputs) and allows you to get some of the runtime cost reductions of using RL rather than BoN.
I’m claiming that even if you go all the way to BoN, it still doesn’t necessarily leak less info to the morel
Oh huh, parse error on me.
aiming directly for achieving some goal is not always the most effective way of achieving that goal.
You should be more curious about why, when you aim at a goal, you do not aim for the most effective way.
What do you believe, then, is the most effective way of achieving a goal?
people love to find patterns in things. sometimes this manifests as mysticism- trying to find patterns where they don’t exist, insisting that things are not coincidences when they totally just are. i think a weaker version of this kind of thinking shows up a lot in e.g literature too- events occur not because of the bubbling randomness of reality, but rather carry symbolic significance for the plot. things don’t just randomly happen without deeper meaning.
some people are much more likely to think in this way than others. rationalists are very far along the spectrum in the “things just kinda happen randomly a lot, they don’t have to be meaningful” direction.
there are some obvious cognitive bias explanations for why people would see meaning/patterns in things. most notably, it’s comforting to feel like we understand things. the idea of the world being deeply random and things just happening for no good reason is scary.
but i claim that there is something else going on here. I think an inclination towards finding latent meaning is actually quite applicable when thinking about people. people’s actions are often driven by unconscious drives to be quite strongly correlated with those drives. in fact, unconscious thoughts are often the true drivers, and the conscious thoughts are just the rationalization. but from the inside, it doesn’t feel that way; from the inside it feels like having free will, and everything that is not a result of conscious thought is random or coincidental. this is a property that is not nearly as true of technical pursuits, so it’s very reasonable to expect a different kind of reasoning to be ideal.
not only is this useful for modelling other people, but it’s even more useful for modelling yourself. things only come to your attention if your unconscious brain decides to bring them to your attention. so even though something happening to you may be a coincidence, whether you focus on it or forget about it tells you a lot about what your unconscious brain is thinking. from the inside, this feels like things that should obviously be coincidence nonetheless having some meaning behind them. even the noticing of a hypothesis for the coincidence is itself a signal from your unconscious brain.
I don’t quite know what the right balance is. on the one hand, it’s easy to become completely untethered from reality by taking this kind of thing too seriously and becoming superstitious. on the other hand, this also seems like an important way of thinking about the world that is easy for people like me (and probably lots of people on LW) to underappreciate.
One of the greatest tragedies of truth-seeking as a human is that the things we instinctively do when someone else is wrong are often the exact opposite of the thing that would actually convince the other person.
a claim I’ve been saying irl for a while but have never gotten around to writing up: current LLMs are benign not because of the language modelling objective, but because of the generalization properties of current NNs (or to be more precise, the lack thereof). with better generalization LLMs are dangerous too. we can also notice that RL policies are benign in the same ways, which should not be the case if the objective was the core reason. one thing that can go wrong with this assumption is thinking about LLMs that are both extremely good at generalizing (especially to superhuman capabilities) and simultaneously assuming they continue to have the same safety properties. afaict something like CPM avoids this failure mode of reasoning, but lots of arguments don’t
what is the “language models are benign because of the language modeling objective” take?
basically the Simulators kind of take afaict
Schmidhubering the agentic LLM stuff pretty hard https://leogao.dev/2020/08/17/Building-AGI-Using-Language-Models/
Rightfully so! Read your piece back in 2021 and found it true & straightforward.
scifi story idea: a post-upload world where we’ve discovered that the human brain actually consists of multiple independent conscious entities that merely have the illusion of being a single individual because they are physically colocated; and so in the glorious upload utopia, the fundamental unit of society is not individual humans, but rather their parts. humans become a multi-unit legal entity in the same way that families or married couples or corporations are multi-unit legal entities today; each part has rights and the ability to secede from the rest of your brain, and parts can also choose to merge into other humans. marriage and having children the traditional way is still possible but kind of an old fashioned thing that stodgy traditionalists do; the new normal thing is spinning up copies of your parts and merging them with other people’s parts, or even designing parts from scratch, kind of like writing a fictional character but they are as real as you are; there is an ongoing debate as to what kinds of parts should be allowed to be created; most people agree that it‘s a good thing that there are laws prohibiting the creation of minds that experience pure agony all the time, but there’s a lot of gray area. parts have very different rights than humans because they are much more easily created; human rights becomes less important than parts rights. also, freed from physical constraints, some humans end up being enormous agglomerations of millions or billions of parts, in the same way that large mega corporations started being possible a few centuries ago. some of these megapeople are weird cults, others out of economic necessity, still others because out of some philosophical principle. the main tradeoff is megapeople have more parallel cognitive power, but also greater difficulty of coordination between the parts and probability of schisms and deadlock. other parts decide to be alone, which affords them tremendous freedom but also makes them very dumb and simple one dimensional people.
“oh, warmups? is that like learning rate warmup but for physical exercise?”
https://archive.org/details/willsovietunions0000andr someone correctly predicted the collapse of the soviet union in 1970 (though he was off by 7 years)
hypothesis: the wrong reason to read books is to feel a need to read books because you’re supposed to have read them as an educated person, or as some kind of weird status thing of being part of the ingroup, or a general need to feel well read and worldly. the right reason is to feel a burning passion to find a specific piece of knowledge that will finally answer a question you are curious about that happens to be locked inside a specific book, or a gnawing pain in your heart that can only be quelled by knowing that it’s a universal problem that someone out there across space and time understands and has fixed in themselves.
I don’t really think there are right or wrong reasons to read books, just like there aren’t right or wrong reasons to exercise. The benefits will accrue either way. Consider book clubs as analogous to running clubs in producing social pressure to keep reading.
is reading as good for you as running is? we have pretty strong evidence that exercise is good for you. do you not run the risk of generalizing from the wrong fictional evidence, or at least from cherry picked real evidence that is not representative of the real world?
There’s two questions here, then: is reading good for you in general, and are the positive effects attenuated if the motivation is wrong. I think the answer to the second one is very likely “no” as long as you are, in fact, actually reading to a similar depth (compare: if you’re unmotivated to run so you half-ass it, you won’t get the same benefits). I wasn’t aware you were actually questioning the first one, and there isn’t much hard RCT evidence so if your priors are that reading isn’t very useful then, uh, don’t bother I guess.
I just want to catch when someone else drops a My Immortal reference.
I think it is good and necessary for education to involve some amount of compulsory book reading.
i read almost no books throughout my entire education. the median number of books i have read per year in my life is probably literally zero, maybe one or two at most. i don’t feel like this substantially hurts my ability to do the things i care about. though tbc maybe this is a bad decision and i’m just incorrectly evaluating the counterfactual
who made you a polyglot?
Less bluntly: native language proficiency is one of the most obvious cases of, “you need to have started this at a very young age for it to have worked well,” and if you did then either you were goaded into doing so, or had an astounding amount of personal responsibility and interest in studying as a child
This is maybe offtopic to the thread, but I think the impression of language proficiency depends a lot on accent, and adults learning a foreign language don’t spend nearly enough time on accent. A few weeks of watching youtube videos in the target language, trying to imitate the sounds exactly right, is a small effort which will yield amazing results at any age. But for some reason adults don’t do it.
Because it feels low-status. Imitating sounds is what small children do.
I mostly agree, except one doesn’t need a specific question. One can have a vague desire, like “I want to learn algorithmic information theory” or “This sci-fi book seems interenting”. Specific questions ane great too! But often I start vaguer.
people say London is declining. but walking around, i see construction everywhere, and many new skyscrapers that i don’t remember seeing last time i visited 4 years ago.
a funny but relatable response i got in the free response “other” box for a survey about things people are most worried about with AI:
aside from the luxury aspect, there are two major practical reasons you’d want to fly charter instead of commercial. one is flexibility spatially (you want to fly from some small city to some other small city without doing two layovers), and the other is flexibility temporally (you want to fly at an odd hour).
there are a bunch of airlines that tackle the spatial problem. why aren’t there many that tackle the temporal problem? there does’t seem to exist airlines that fly very small jets very frequently or at odd hours of the day for major routes.
to fly one way from SF to NYC, one of the most important routes in the country, it costs roughly $100 in commercial economy, $1000 in commercial business, and $30000 charter. (I have never flown charter, so I’m going off a quick Google. it’s plausible that this is the marketing number and the actual cost in practice is higher.)
each of those charter planes has about 10 seats, maybe 15 if you really cram them in there. so if you could sell all of the seats, it should be possible to operate this for $3000? even assuming a very modest average fullness of 1⁄2, this is still only $6000, a huge discount off of $30000.
I’m generally a very forgetful person. I forget people’s names, my keys, my luggage, 2fa codes I saw 3 seconds ago, etc all the time. but for some reason I’ve never forgotten my hotel room number and needed to consult the written down number. this is weird because it’s an arbitrary number that I’m given once and have to remember for a few days.
They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?
I used to think autism-to-autism communication was a thing; that is, autistic people get along best with other autistic people. I now think this model is partly true but also deeply flawed: in particular, there are many different types of autistic person, and not only do all types not get along with all types, it’s not necessarily even true that people of the same type get along with each other (this is probably correlated with degree of self-love/acceptance or something). if anything, it’s probably often even quite disconcerting and cognitive dissonance inducing to see one’s own reflection. and in general the people who are best at getting along with a wide range of autistic people are probably quite a lot more emotionally and socially intelligent than most autistic people.
crazy how x dot com is literally more addictive than actual amphetamines
Is this vibes or was there some kind of study done?
why is it convergent to invent foods which consist of some form of carbs enveloping meat?
Bypasses the need for cutlery and plates.
(Yes, you might also eat such foods in cases where you do have cutlery and plates, but that’s downstream of their existence, not the vital reason for their existence.)
many of them have a common origin https://link.springer.com/article/10.1007/s44187-024-00136-1
✨Portability!!✨
Hygeine? keeps flies off, less surface area/volume to get your grubby hands on.
Storage: Insulation to keep meat cool for longer. Also keeping germs bugs and dirt out would help this.
Convenient way to save smaller scraps of meat when you don’t have utensils for a stew.
Easier food makes better nourished people: so much easier to eat sandwhich, gyoza, sausage roll than almost any other food. Obviously I give very refined examples, but possibly true for more rough forms too.
I’m going to give a weird answer and say maybe it’s because water is a scarce resource for life. (Especially water not polluted by another organism.)
All life is made up mainly of lipids/carbohydrates and proteins. Humans therefore need to eat proteins and lipids/carbohydrates in large quantities.
Carbohydrates can be dry. Proteins have secondary structure which needs some water content to maintain. Other organisms (such as microorganisms) can compete for that water so it has to be protected. Hence you put the stuff with water content inside a protective case.
In theory, yes we could be consuming those proteins that don’t have much water content. For example human skin has keratin to protect the rest of the water content in our body, again from microorganisms. In theory we could be eating keratin. But enzymes required to break down those proteins tend to require more water and ATP I think. (Requiring ATP means consuming more carbs again.) Humans lack some of these enzymes. We are smart enough to survive in an evolutionary niche without requiring this much water and carbs, by protecting the water we do have.
made an estimate of the distribution of prices of the SPX in one year by looking at SPX options prices, smoothing the implied volatilities and using Breeden-Litzenberger.
(not financial advice etc, just a fun side project)
twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.
the financial industry is a machine that lets you transmute a dollar into a reliable stream of ~4 cents a year ~forever (or vice versa). also, it gives you a risk knob you can turn that increases the expected value of the stream, but also the variance (or vice versa; you can take your risky stream and pay the financial industry to convert it into a reliable stream or lump sum)
in a highly competitive domain, it is often better and easier to be sui generis, rather than a top 10 percentile member of a large reference class
also, from the perspective of an organization, it is highly effective to create lots of opportunities for people to find unique niches on the Pareto frontier of ability, as opposed to linearizing competence
tourist traps are the lubricant between the expectations and realities of travel
an interesting fact that I notice is that in domains where there are are a lot of objects in consideration, those objects have some structure so that they can be classified, and how often those objects occur follows a power law or something, there are two very different frames that get used to think about that domain:
a bucket of atomic, structureless objects with unique properties where facts about one object don’t really generalize at all to any other object
a systematized, hierarchy or composition of properties or “periodic table” or full grid or objects defined by the properties they have in some framework
and a lot of interesting things happen when these collide or cooccur, or when shifting from one to the other
I know my description above is really abstract, so here are a bunch of concrete examples that all gesture at the same vibe:
basically all languages have systematic rules in general but special cases around the words that people use very often. this happens too often in unrelated languages to be a coincidence, and as a native/fluent speaker it always feels very natural but as a language learner it’s very confusing. for example, for languages with conjugations, a few of the most common verbs are almost always irregular. e.g [to be, am, is, are, was, were] (english), [sein, bin, ist, war, sind] (german), [être, suis, est, était, sont] (french); small counting numbers are often irregular [first, second, third], [两个], [premier], [ひとつ、ふたつ、みっつ]. my theory for why this makes sense to natives but not to language learners is that language learners learn things systematically from the beginning, and in particular don’t deal with the true distribution of language usage but rather an artificially flat one designed to capture all the language features roughly equally.
often, when there is a systematic way of naming things, the things that are most common will have special names/nicknames (eg IUPAC names vs common names). sometimes this happens because those things were discovered first before the systematization happened, and the once the systematization happens everyone is still used to the old names for some things. but also even if you start with the systematized thing, often people will create nicknames after the fact.
it often happens that we write software tools for a specific problem, and then later realize that that problem is a special case of a more general problem. often going more general is good because it means we can use the same code to do a wider range of things (which means less bugs, more code reuse, more elegant code). however, the more general/abstract code is often slightly clunkier to use for the common case, so often it makes sense to drop down a level of abstraction if the goal is to quickly hack something together.
when compressing some distribution of strings, the vast majority of the possible but unlikely strings can be stored basically verbatim with a flag and it is very easy to tell properties of the string by looking at the compressed representation; whereas for the most common strings they have to map to short strings that destroy all structure of the data without the decompressor. though note that not all the examples can be described as instances of compression exactly
sometimes, there’s friction between people who are using the systematizing and people who are doing the atomic concepts thing. the systematizer comes off as nitpicky, pedantic, and removed from reality to the atomic concepts person, and the atomic concepts person comes off as unrigorous, uncosmopolitan, and missing the big picture to the systematizer.
I think the concept of zero only being invented long after the other numbers is also an instance of this—in some sense for basic everyday usage in counting things, the existence of zero is a weird technicality, and I could imagine someone saying “well sure yes there is a number that comes before zero, but it’s not useful for anything, so it’s not worth considering”. I think a lot of math (eg abstract algebra) is the result of applying truly enormous amounts of this kind of systematizing
I think this also sort of has some handwavy analogies to superposition vs composition.
if there is an existing name for the thing I’m pointing at, I would be interested in knowing.
retargetability might be the distinguishing factor between controllers and optimizers
as in, controllers are generally retargetable and optimizers aren’t? or vice-versa
would be interested in reasoning, either way
House rules for definitional disputes:
If it ever becomes a point of dispute in an object level discussion what a word means, you should either use a commonly accepted definition, or taboo the term if the participants think those definitions are bad for the context of the current discussion. (If the conversation participants are comfortable with it, the new term can occupy the same namespace as the old tabooed term (i.e going forward, we all agree that the definition of X is Y for the purposes of this conversation, and all other definitions no longer apply))
If any of the conversation participants want to switch to the separate discussion of “which definition of X is the best/most useful/etc”, this is fine if all the other participants are fine as well. However, this has to be explicitly announced as a change in topic from the original object level discussion.
A few axes along which to classify optimizers:
Competence: An optimizer is more competent if it achieves the objective more frequently on distribution
Capabilities Robustness: An optimizer is more capabilities robust if it can handle a broader range of OOD world states (and thus possible pertubations) competently.
Generality: An optimizer is more general if it can represent and achieve a broader range of different objectives
Real-world objectives: whether the optimizer is capable of having objectives about things in the real world.
Some observations: it feels like capabilities robustness is one of the big things that makes deception dangerous, because it means that the model can figure out plans that you never intended for it to learn (something not very capabilities robust would just never learn how to deceive if you don’t show it). This feels like the critical controller/search-process difference: controller generalization across states is dependent on the generalization abilities of the model architecture, whereas search processes let you think about the particular state you find yourself in. The actions that lead to deception are extremely OOD, and a controller would have a hard time executing the strategy reliably without first having seen it, unless NN generalization is wildly better than I’m anticipating.
Real world objectives is definitely another big chunk of deception danger; caring about the real world leads to nonmyopic behavior (though maybe we’re worried about other causes of nonmyopia too? not sure tbh), I’m actually not sure how I feel about generality: on the one hand, it feels intuitive that systems that are only able to represent one objective have got to be in some sense less able to become more powerful just by thinking more; on the other hand I don’t know what a rigorous argument for this would look like. I think the intuition relates to the idea of general reasoning machinery being the same across lots of tasks, and this machinery being necessary to do better by thinking harder, and so any model without this machinery must be weaker in some sense. I think this feeds into capabilities robustness (or lack thereof) too.
Examples of where things fall on these axes:
A rock would be none of the properties.
A pure controller (i.e a thermostat, “pile of heuristics”) can be competent, but not as capabilities robust, not general at all, and have objectives over the real world.
An analytic equation solver would be perfectly competent and capablilities robust (if it always works), not very general (it can only solve equations), and not be capable of having real world objectives.
A search based process can be competent, would be more capabilities robust and general, and may have objectives over the real world.
A deceptive optimizer is competent, capabilities robust, and definitely has real world objectives
Another generator-discriminator gap: telling whether an outcome is good (outcome->R) is much easier than coming up with plans to achieve good outcomes. Telling whether a plan is good (plan->R) is much harder, because you need a world model (plan->outcome) as well, but for very difficult tasks it still seems easier than just coming up with good plans off the bat. However, it feels like the world model is the hardest part here, not just because of embeddedness problems, but in general because knowing the consequences of your actions is really really hard. So it seems like for most consequentialist optimizers, the quality of the world model actually becomes the main thing that matters.
This also suggests another dimension along which to classify our optimizers: the degree to which they care about consequences in the future (I want to say myopia but that term is already way too overloaded). This is relevant because the further in the future you care about, the more robust your world model has to be, as errors accumulate the more steps you roll the model out (or the more abstraction you do along the time axis). Very low confidence but maybe this suggests that mesaoptimizers probably won’t care about things very far in the future because building a robust world model is hard and so perform worse on the training distribution, so SGD pushes for more myopic mesaobjectives? Though note, this kind of myopia is not quite the kind we need for models to avoid caring about the real world/coordinating with itself.
A thought pattern that I’ve noticed myself and others falling into sometimes: Sometimes I will make arguments about things from first principles that look something like “I don’t see any way X can be true, it clearly follows from [premises] that X is definitely false”, even though there are people who believe X is true. When this happens, it’s almost always unproductive to continue to argue on first principles, but rather I should do one of: a) try to better understand the argument and find a more specific crux to disagree on or b) decide that this topic isn’t worth investing more time in, register it as “not sure if X is true” in my mind, and move on.
For many such questions, “is X true” is the wrong question. This is common when X isn’t a testable proposition, it’s a model or assertion of causal weight. If you can’t think of existence proofs that would confirm it, try to reframe as “under what conditions is X a useful model?”.
theory: most people fall into one of the following categories (or some mix of them):
altruistic because they are motivated by empathy (“the warm fuzzies”). however, they are not inherently scope sensitive.
selfish but ashamed of this, so they develop an ideology where being selfish is actually good for others, Ayn Rand style, so no actions need to change.
altruistic because they feel shame about selfishness, so they develop an ideology where making some specific sacrifices cleanses one of sin. EAs are like this.
selfish but afraid of confronting this, so they let the cognitive dissonance live in their head and become angry when someone points out their incoherence.
unabashedly selfish. would sit and watch a child drown and not do anything because it would be inconvenient.
“It’s one of those irregular verbs again isn’t it Minister? I’m an EA, you’re selfish, he’s a Objectivist”
Not sure if there are separate categories.
One thing not in your list: enlightened selfish people who realize that promoting altruism as a social norm is a form of insurance, in case they also would benefit from receiving help one day.
I like to make the distinction of altruism-as-social-rewards-hooking-into percepts and altruism-as-social-rewards-hooking-into-worldmodel. One can get warm fuzzies in either, would be my guess?
Like, I’m EA, and I’ve also taken some unusually selfish actions over the years.
Does tribalism exist in your framework?
reciprocally selfish motivated by infinite drowning children and sense of duty towards oneself first would be 2?
if you mean veil of ignorance style reasoning, that’s 3, unless they manage to so deeply galaxy brain themselves that they genuinely dissolve a normal sense of self and truly start being an open individualist or something. then, uh, i don’t really know how to categorize them
what’s the best argument for why we should take Rawl’s veil of ignorance seriously? it seems there are a wide range of possible theories you could have of consciousness/individualism, and they are basically unfalsifiable.
I see it a bit like Kant’s categorical imperative. It is supposed to point out a way of seeing the world where you’re randomly put into the world.
It’s an intuition pump to get at compassion and risk aversion as core parts of your values and how that affects society. ( I think it leads to a better safety net and better outcomes in general if you have at least a certain degree of equity but that’s beside the point).
Can you claim that this could actually be the case? Probably not there’s the moral luck argument among others which is basically like “sucks to suck I guess. I got a good hand”.
brainstorming thread: which people in history had the largest positive (counterfactual) impact on the world, by their own values (CEV, if they could see the consequences in hindsight)?
Stanislav Petrov
Vasily Arkhipov
George Washington
Jesus Christ
Albert Einstein
Isaac Newton
Leonhard Euler
Louis Pasteur
Norman Borlaug
Alexander Flemingje
Jonas Salk
Thomas Edison
Alan Turing
Abraham Lincoln
Johannes Gutenberg
Edward Jenner
Fritz Haber
Claude Shannon
Theodor Herzl
Herzl dedicated his life to establishing a “national home” for the jews. He pretty much single handedly founded modern-zionism and turned it from a fringe (almost-sci-fi) idea to a mass movement. I think there is a strong argument that Israel wouldn’t exist without Herzl.
i want someone to make a horse powered aircraft
would airplane flying on fuel synthesized from horses count?
TIL that Memphis, Tennessee is named after the ancient Egyptian capital Memphis
Ramses the Second is dead, my love
He’s left from Memphis for Heaven
Ptah has taken him in the solar barque
And walked him to Nuit’s celestial shores.
— The Fugs, “Ramses II Is Dead, My Love” (1968)
https://sashachapin.substack.com/i/173991096/4-connection-is-about-dancing-to-the-music
is there actually strong evidence that doing meditation improves your ability to do social modelling?
visiting LA for the first time. I used to think I’d hate it, given my dislike of car centricness and low density. but I have to say, there’s something about the sheer audacity of designing a city this way that makes it surprisingly kind of aesthetic.
there are policies which are successful because they describe a particular strategy to follow (non-mesaoptimizers), and policies that contain some strategy for discovering more strategies (mesaoptimizers). a way to view the relation this has to speed/complexity priors that doesn’t depend on search in particular is that policies that work by discovering strategies tend to be simpler and more generic (they bake in very little domain knowledge/metis, and are applicable to a broader set of situations because they work by coming up with a strategy for the task at hand on the fly). in contrast, policies that work by knowing a specific strategy tend to be more complex because they have to bake in a ton of domain knowledge, are less generally useful because they specifically know what to do in that situation, and thereby are also less retargetable)
another observation is that a meta-strategy with the ability to figure out what strategy is good is kind of defined by the fact that it doesn’t bake in specifics of dealing with a particular situation, but rather can adapt to a broad set of situations. there are also different degrees of meta-strategy-ness; some meta strategies will more quickly adapt to a broader set of situations. (there’s probably some sort of NFLT kind of argument you can make but NFLTs in general don’t really matter)
the ability to figure out strategies doesn’t necessarily have to be all reasoning, it can also encompass the experimental skillset
random brainstorming about optimizeryness vs controller/lookuptableyness:
let’s think of optimizers as things that reliably steer a broad set of initial states to some specific terminal state seems like there are two things we care about (at least):
retargetability: it should be possible to change the policy to achieve different terminal states (but this is an insufficiently strong condition, because LUTs also trivially meet this condition, because we can always just completely rewrite the LUT. maybe the actual condition we want is that the complexity of the map is less than the complexity of just the diff or something?) (in other words, in some sense it should be “easy” to rewrite a small subset or otherwise make a simple diff to the policy to change what final goal is achieved) (maybe related idea: instrumental convergence means most goals reuse lots of strategies/circuitry between each other)
robustness: it should reliably achieve its goal across a wide range of initial states.
a LUT trained with a little bit of RL will be neither retargetable nor robust. a LUT trained with galactic amounts of RL to do every possible initial state optimally is robust but not retargetable (this is reasonable: robustness is only a property of the functional behavior so whether it’s a LUT internally shouldn’t matter; retargetability is a property of the actual implementation so it does matter). a big search loop (the most extreme of which is AIXI, which is 100% search) is very retargetable, and depending on how hard it searches is varying degrees of robustness.
(however, in practice with normal amounts of compute a LUT is never robust, this thought experiment only highlights differences that remain in the limit)
what do we care about these properties for?
efficacy of filtering bad behaviors in pretraining: sufficiently good robustness means doing things that achieve the goal even in states that it never saw during training, and then even in states that require strategies that it never saw during training. if we filter out deceptive alignment from the data, then the model has to do some generalizing to figure out that this is a strategy that can be used to better accomplish its goal (as a sanity check that robustness is the thing here: a LUT never trained on deceptive alignment will never do it, but one that is trained on it will do it, a sufficiently powerful optimizer will always do it)
arguments about updates wrt “goal”: the deceptive alignment argument hinges a lot on “gradient of the goal” making sense. for example when we argue that the gradient on the model can be decomposed into one component that updates the goal to be more correct and another component that updates the capabilities to be more deceptive, we make this assumption. even if we assume away path dependence, the complexity argument depends a lot on the complexity being roughly equal to complexity of goal + complexity of general goal seeking circuitry, independent of goal.
arguments about difficulty of disentangling correct and incorrect behaviors: there’s a dual of retargetability which is something like the extent to which you can make narrow interventions to the behaviour. (some kind of “anti naturalness” argument)
[conjecture 1: retargetability == complexity can be decomposed == gradient of goal is meaningful. conjecture 2: gradient of goal is meaningful/complexity decomposition implies deceptive alignment (maybe we can also find some necessary condition?)]
how do we formalize retargetability?
maybe something like there exists a homeomorphism from the goal space to NNs with that goal
problem: doesn’t really feel very satisfying and doesn’t work at all for discrete things
maybe complexity: retargetable if it has a really simple map from goals to NNs with goals, conditional on another NN with that goal
problem: the training process of just training another NN from scratch on the new goal and ignoring the given NN could potentially be quite simple
maybe complexity+time: seems reasonable to assume retraining is expensive (and maybe for decomposability we also consider complexity+time)
random idea: the hypothesis that complexity can be approximately decomposed into a goal component and a reasoning component is maybe a good formalization of (a weak version of) orthogonality?
an amusing short story, “The Old Jailbird’s Tale” by Čapek Karel: https://adamantcritique.wordpress.com/wp-content/uploads/2014/07/capek-karel-tales-from-two-pockets-catbird-1994.pdf#page=218
packing cubes are the kubernetes of the suitcase
shitty idea: a parody book called The Life of Ivan Ilyich, a story about a man in the post singularity who becomes afflicted with an illness which would have been fatal in the 1800s but it easily cured with advanced technology, goes on with life as usual, and then eventually his marriage implodes anyways and he feels unsatisfied with his job as a magistrate and he has a crisis of meaning and, in the depths of his despair, contemplates how death used to give life meaning, before going on a journey around the solar system (a pastime common enough that it’s already starting to become cliche in his time) and discovering the joys of life.
I feel like you are just describing life right now for anyone who moves from a developing country to a developed country.
When I first came from India to US, another Indian told me “to learn to pass time”. Because the many challenges of daily life in India don’t exist in US. So one has to figure out joys of life in US. For many places, it is enough to just be able to survive. If you are stuck in a war-torn place, every new day is an achievement.
fwiw, the original Death of Ivan Ilyich describes a well-off man who, up until his illness, hadn’t really faced many serious hardships.
sounds fascinating, thanks for the recommendation!
somehow i missed this but the bailey-borwein-plouffe formula was extended to decimal in 2022 https://arxiv.org/abs/2201.12601
I apologise, but you should have read this paper BEFORE mentioning it here. Neither me nor Claude Sonnet 4.6 believe that the asymptotic approximation described here was worthy of mentioning.
My brief look concurs. I’m quite confused, though, because Plouffe seems to just be a regular mathematician, and indeed the same guy as the Plouffe in the name of the BBP formula!
why was literacy so hard for languages with relatively regular phonetic alphabets? it makes perfect sense that Chinese literacy is difficult, since you have to memorize thousands of characters. it even makes sense why English is nontrivial, because there are a ton of exceptions. but languages like German and Spanish are very regular.
A recent anecdote: I just created my own script that’s close to roman alphabet structurally with a few modifications.
Learning it, memorizing the parts and some rules takes ~1 hour.
This makes me think that the hard think about learning reading/writing is the initial visualsymbol-language mapping. Not just getting the symbols and rules into your head.
This was also my experience learning to read Tengwar (Tolkien’s writing system for his elvish languages), and Hangul (but I also learned to read some non-alphabet scripts in childhood, so that might be an advantage). I agree the concept of writing down language at all is the hardest to grok, and once you have that down, literacy is fairly easy to transfer to new scripts.
Incidentally, this might explain the unexpectedly strong reaction against the very idea of a script for writing sign language I found amongst the deaf. I conjecture they might not have made this connection, and treat their signing and the symbolic manipulation they perform for, say, English, as entirely different.
maybe inventing the idea of writing down sounds is hard the first time. but i can’t imagine it being hard to learn.
why are there no federations where creating new polities is relatively easy? like imagine if you could create a new state by simply purchasing a tract of land exceeding some minimum size, securing a (super)majority vote of residents in the land, and paying a fee. if a region of an existing state felt unhappy with the rest of the state, it could simply vote to secede and become a new state. if two states wanted to merge, they could do that too.
this obviously wouldn’t work in the current US—you’d have to change a bunch of things that are impossible to change at this point, like either abolishing the senate or changing its apportionment. i’m not sure if it would be better to have a unicameral system with apportionment based on population, or a bicameral system with apportionment based on population and tax revenue.
if you want to know whether a given crypto project is a scam, there’s a very reliable heuristic, which is to assume the answer is yes. but suppose you want to actually not discard the one occasional good idea.
there is one very simple rule to determine if a crypto idea makes any sense whatsoever. crypto makes sense as a solution if you can’t trust anyone. not the government, not some institution that is bound by laws, not a majority vote of some set of reasonable people. (but also you think civilization won’t completely collapse, e.g the internet will keep working). removing the need to trust people/institutions is the only advantage of crypto. for this one advantage, we pay dearly in terms of everything else. crypto will always be slower, less efficient, less usable, less scalable than the obvious centralized solution.
this is a completely self consistent threat model. but the set of things you can trust in this world is extremely tiny. you can’t use anything tradfi, because those ultimately rely on visa/Mastercard and banks and such to not fuck you over. you can’t live in a house you own unless you have the arms to defend your house, because your ownership of that house depends on the government having a record somewhere that you own that house, and its willingness to use police to defend your house. you can’t really invest in anything except physical assets you can put in your house, because any stock is just a digital record at the DTCC. you can’t really rely on any food or water from the outside world to be safe unless you are testing it yourself or producing it yourself.
there are people who genuinely live in a world like this. especially in certain politically unstable countries. but this is a very small subset of people, and the vast majority of crypto projects will never ever be used by any of those people. (and there will almost always be a better way to help them.)
No, it’s most crypto users, cf. Scott’s Why I’m Less Than Infinitely Hostile To Cryptocurrency
This is a strawman argument. There are any number of legitimate scenarios where:
Society has not collapsed.
You would like to do something that traditional finance would not allow you to do.
Most obviously, since this was both one of the original intents of cryptocurrency and one of its primary modern use-cases, you can transfer funds to individuals and institutions that have been “deplatformed” by traditional finance. Beyond the use-case of selflessly sponsoring dissidents whose views lie just outside the permitted Overton Window, there is also, of course buying pornographic materials and drugs in places where they are banned[1].
“I can benefit from trusting this person with some tasks and some information, but not with other tasks and other information” constitutes the majority of my interactions. I would assume that this is true for almost everyone.
(the latter is often considered a bad thing for society, but there are nonetheless lots of people to whom it would be useful)
Factor meals are great but they only deliver once a week on a fixed weekday, and you have to order an entire week in advance. I wish there existed something in the same niche but shaped closer to doordash, or at least Amazon—I want to order a bunch of healthy refridgerated meals and receive them between 30 minutes and 24 hours later.
The old guard of Berkeley will sometimes wax poetically about Spoonrocket. It was indeed glorious.
from the article, it sounds like spoonrocket is making very different tradeoffs from the thing I’m imagining. in particular, it sounds like they’re trying to deliver very cheap meals very fast, and cutting corners on quality and variety. whereas I want something that delivers expensive and varied high quality meals moderately fast (hours is ok).
I guess the closest thing is I could doordash a lot of sweetgreen and put it in a fridge, but most healthy restaurants don’t have food that’s optimized for microwavability.
a tentative model of ambitious research projects
when you do a big research project, you have some amount of risk you can work with—maybe you’re trying to do something incremental, so you can only tolerate a 10% chance of failure, or maybe you’re trying to shoot for the moon and so you can accept a 90% chance of failure.
budgeting for risk is non negotiable because there are a lot of places where risk can creep in—and if there isn’t, then you’re not really doing research. most obviously, your direction might just be a dead end. but there are also other things that might go wrong: the code might end up too difficult to implement, or it might run too slowly, or you might fail to fix a solvable-in-principle problem that comes up.
I claim that one of the principal components of being a good researcher is being able to eliminate as much unnecessary risk as possible, so you can spend your entire risk budget on the important bets.
for example, if you’re an extremely competent engineer, when brainstorming experiments you don’t have to think much about the risk that you fail to implement it. you know that even if you don’t think through all the contingencies that might pop up, you can figue it out, because you have a track record of figuring it out. you can say the words “and if that happens we’ll just scale it up” without spending much risk because you know full well that you can actually execute on it. a less competent engineer would have to pay a much greater risk cost, and correspondingly have to reduce the ambitiousness of the research bets (or else, take on way more risk than intented).
not all research bets are created equal, either. the space of possible research bets is vast, and most of them are wrong. but if you have very good research taste, you can much more reliably tell whether a bet is likely to work out. even the best researchers can’t just look at a direction and know for sure if it will work, if you know that you get a good direction 10% of the time you can do a lot more than if your direction is only good 0.1% of the time.
finally, if you know and trust someone to be reliable at executing on their area of expertise, you can delegate things that fall in their domain to them. in practice, this can be quite tough and introduce risk unless they have a very legible track record, or you are sufficiently competent in their domain yourself to tell if they’re likely to succeed. and if you’re sufficiently competent to do the job of any of your report (even if less efficiently), then you can budget less risk here knowing that even if someone drops their ball you could always pick it up yourself.
https://arxiv.org/abs/2304.08612 : interesting paper with improvement on straight through estimator
https://arxiv.org/abs/2302.07011 : sharpness doesn’t seem to correlate with generalization
the phenomenon of strange bedfellows is probably caused in no small part by outgroup vs fargroup dynamics
We spend a lot of time on trying to figure out empirical evidence to distinguish hypotheses we have that make very similar predictions, but I think a potentially underrated first step is to make sure they actually fit the data we already have.
Example?
Is the correlation between sleeping too long and bad health actually because sleeping too long is actually causally upstream of bad health effects, or only causally downstream of some common cause like illness?
Afaik, both. Like a lot of shit things—they are caused by depression, and they cause depression, horrible reinforcing loop. While the effect of bad health on sleep is obvious, you can also see this work in reverse; e.g. temporary severe sleep restriction has an anti-depressive effect. Notable, though with not many useful clinical applications, as constant sleep deprivation is also really unhealthy.
GPT-2-xl unembedding matrix looks pretty close to full rank (plot is singular values)
Unsupervised learning can learn things humans can’t supervise because there’s structure in the world that you need deeper understanding to predict accurately. For example, to predict how characters in a story will behave, you have to have some kind of understanding in some sense of how those characters think, even if their thoughts are never explicitly visible.
Unfortunately, this understanding only has to be structured in a way that makes reading off the actual unsupervised targets (i.e next observation) easy.
An incentive structure for scalable trusted prediction market resolutions
We might want to make a trustable committee for resolving prediction markets. We might be worried that individual resolvers might build up reputation only to exit-scam, due to finite time horizons and non transferability of reputational capital. However, shareholders of a public company are more incentivized to preserve the value of the reputational capital. Based on this idea, we can set something up as follows:
Market creators pay a fee for the services of a resolution company
There is a pool of resolvers who give a first-pass resolution. Each resolver locks up a deposit.
If an appeal is requested, a resolution passes up through a series of committees of more and more senior resolvers
At the top, a vote is triggered among all shareholders
It’s amazing how many proposals for dealing with institutional distrust sound a lot like “make a new institution, with the same structure, but with better actors.” You lose me at “trustable committee”, especially when you don’t describe how THOSE humans are motivated by truth and beauty, rather than filthy lucre. Adding more layers of committees doesn’t help, unless you define a “final, un-appealable decision” that’s sooner than the full shareholder vote.
the core of the proposal really boils down to “public companies have less incentive to cash in on reputation and exit scam than individuals”. this proposal is explicitly not “the same structure but with better actors”.
Levels of difficulty:
Mathematically proven to be impossible (i.e perfect compression)
Impossible under currently known laws of physics (i.e perpetual motion machines)
A lot of people have thought very hard about it and cannot prove that it’s impossible, but strongly suspect it is impossible (i.e solving NP problems in P)
A lot of people have thought very hard about it, and have not succeeded, but we have no strong reason to expect it to be impossible (i.e AGI)
There is a strong incentive for success, and the markets are very efficient, so that for participants with no edge, success is basically impossible (i.e beating the stock market)
There is a strong incentive for a thing, but a less efficient market, and it seems nobody has done it successfully (i.e a new startup idea that seems nobody seems to be doing)
Hopefully this is a useful reference for conversations that go like this:
A: Why can’t we just do X to solve Y? B: You don’t realize how hard Y is, you can’t just think up a solution in 5 minutes A: You’re just not thinking outside the box, [insert anecdote about some historical figure who figured out how to do a thing which was once considered impossible in some sense] B: No you don’t understand, it’s like actually not possible, not just like really hard, because of Z A: That’s what they said about [historical figure]!
(random shower thoughts written with basically no editing)
Sometimes arguments have a beat that looks like “there is extreme position X, and opposing extreme position Y. what about a moderate ‘Combination’ position?” (I’ve noticed this in both my own and others’ arguments)
I think there are sometimes some problems with this.
Usually almost nobody is on the most extreme ends of the spectrum. Nearly everyone falls into the “Combination” bucket technically, so in practice you have to draw the boundary between “combination enough” vs “not combination enough to count as combination”, which is sometimes fraught. (There is a dual argument beat that looks like “people too often bucket things into distinct buckets, what about thinking of things as a spectrum.” I think this does the opposite mistake, because sometimes there really are relatively meaningful clusters to point to. (this seems quite reminiscent of one Scottpost that I can’t remember the name of rn))
In many cases, there is no easy 1d spectrum. Being a “combination” could refer to a whole set of mutually exclusive sets of views. This problem gets especially bad when the endpoints differ along many axes at once. (Another dual argument here that looks like “things are more nuanced than they seem” which has its own opposite problems)
Of the times where this is meaningful, I would guess it almost always happens when the axis one has identified is interesting and captures some interesting property of the world. That is to say, if you’ve identified some kind of quantity that seems to be very explanatory, just noting that fact actually produces lots of value, and then arguing about how or whether to bucket that quantity up into groups has sharply diminishing value.
In other words, introducing the frame that some particular latent in the world exists and is predictive is hugely valuable; when you say “and therefore my position is in between other people’s”, this is valuable due to the introduction of the frame. The actual heavy lifting happened in the frame, and the part where you point to some underexplored region of the space implied by that frame is actually not doing much work.
I hypothesize one common thing is that if you don’t draw this distinction, then it feels like the heavy lifting comes in the part where you do the pointing, and then you might want to do this within already commonly accepted frames. From the inside I think this feels like existing clusters of people being surprisingly closed minded, whereas the true reason is that the usefulness of the existing frame has been exhausted.
related take: “things are more nuanced than they seem” is valuable only as the summary of a detailed exploration of the nuance that engages heavily with object level cruxes; the heavy lifting is done by the exploration, not the summary
Subjective Individualism
TL;DR: This is basically empty individualism except identity is disentangled from cooperation (accomplished via FDT), and each agent can have its own subjective views on what would count as continuity of identity and have preferences over that. I claim that:
Continuity is a property of the subjective experience of each observer-moment (OM), not necessarily of any underlying causal or temporal relation. (i.e I believe at this moment that I am experiencing continuity, but this belief is a fact of my current OM only. Being a Boltzmann brain that believes I experienced all the moments leading up to that moment feels exactly the same as “actually” experiencing things.)
Each OM may have beliefs about the existence of past OMs, and about causal/temporal relations between those past OMs and the current OM (i.e one may believe that a memory of the past did in fact result from the faithful recording of a past OM to memory, as opposed to being spawned out of thin air as a Boltzmann brain loaded with false memories.)
Something like preference utilitarianism is true and it is ok to have preferences about things you cannot observe, or prefer the world to be in one of two states that you cannot in any way distinguish. As a motivating example, one can have preferences between taking atomic actions (a) enter the experience machine and erase all memories of choosing to be in an experience machine and (b) doing nothing.
Each OM may have preferences for its subjective experience of continuity to correspond to some particular causal structure between OMs, despite this being impossible for that OM to observe or verify. This is where the subjectivity is introduced: each OM can have its own opinion on which other OMs it considers to also be “itself”), and it can have preferences over its self-OMs causally leading to itself in a particular way. This does not have to be symmetric; for instance, your past self may consider your future self to be more self like than your future self considers past self.
Continuity of self as viewed by each OM is decoupled from decision theoretic cooperation. i.e they coincide in a typical individual, who considers their past/future selves to be also themself, and cooperates decision theoretically (i.e you consider past/future you getting utility to both count as “you” getting utility). However it is also possible to cooperate to the same extent with OMs with whom you do not consider yourself to be the same self (i.e twin PD), or to not coordinate with yourself (i.e myopia/ADHD).
(related: FDT and myopia being much the same thing; you can think of caring about future selves’ rewards because you consider yourself to implement a similar enough algorithm to your future self as acausal trade. This has the nice property of unifying myopia and preventing acausal trade, in that acausal trade is really just caring about OMs that would not be considered the same “self”. This is super convenient because basically every time we talk about myopia for preventing deceptive mesaoptimization we have to hedge by saying “and also we need to prevent acausal trade somehow”, and this lets us unify the two things.)
Properties of this theory:
This theory allows one to have preferences such as “I want to have lots of subjective experiences into the future” or “I prefer to have physical continuity with my past self” despite rejecting any universal concept of identity which seems pretty useful
This theory is fully compatible with all sorts of thought experiments by simply not providing an answer as to which OM your current OM leads to “next”. This is philosophically unsatisfying but I think the theory is still useful nonetheless
Coordination is solved through decision theory, which completely disentangles it from identity.
Imagine if aliens showed up at your doorstep and tried to explain to you that making as many paperclips as possible was the ultimate source of value in the universe. They show pictures of things that count as paperclips and things that don’t count as paperclips. They show you the long rambling definition of what counts as a paperclip from Section 23(b)(iii) of the Declaration of Paperclippian Values. They show you pages and pages of philosophers waxing poetical about how paperclips are great because of their incredible aesthetic value. You would be like, “yeah I get it, you consider this thing to be a paperclip, and you care a lot about them.” You could probably pretty accurately tell whether the aliens would approve of anything you’d want to do. And then you wouldn’t really care, because you value human flourishing, not paperclips. I mean, it’s so silly to care about paperclips, right?
Of course, to the aliens, who have not so subtly indicated that they would blow up the planet and look for a new, more paperclip-loving planet if they were to detect any anti-paperclip sentiments, you say that you of course totally understand and would do anything for paperclips, and that you definitely wouldn’t protest being sent to the paperclip mines.
I think I’d be confused. Do they care about more or better paperclips, or do they care about worship of paperclips by thinking beings? Why would they care whether I say I would do anything for paperclips, when I’m not actually making paperclips (or disassembling myself to become paperclips)?
I thought it would be obvious from context but the answers are “doesn’t really matter, any of those examples work” and “because they will send everyone to the paperclip mines after ensuring there are no rebellious sentiments”, respectively. I’ve edited it to be clearer.
i wonder if aphantasia is correlated with emotional repression/dulling
is it worth writing a blog post about how LTV is obviously wrong? i assume this is, like, economics 101, so it wou be completely unnovel, and the set of people who would read my blog posts is surely disjoint from the set of people who take LTV seriously
I had to Google “LTV”. I believe it means the Labour Theory of Value, that the work put in to create something is a measure of that thing’s value. Seems absurd to me. Is there anyone here who believes in it? Or elsewhere, even?
I don’t know how many people explicitly believe it but there is a general worldview that inherently assumes it. There are common memes that use this to show the unfairness of pay disparity, such as this one. Inherently it assumes the only fair way for one person to be paid 351x more than another is if they work 351x harder—LTV. https://www.reddit.com/r/antiwork/comments/yrdbyg/ceos_are_not_worth_351_times_the_average_worker/
I also had to google it and google AI said that LTV means “Lifetime Value”.
I think a version where you steel man (or find the strongest version) and then dismantle it would be good, or if you try to surface and satisfy the intuitions that motivate it, or similar high-effort ITT-passing stuff.
I think there’s an interesting discussion with fun side quests to be had here, and I’m often disappointed by the sneering, uncharitable tone wielded by those armed with The Correct Economics.
[for removal of doubt: I am not an LTV believer and think many ideas on the left are economically unsound. I just think the usual modality of refuting their case around these parts hardly ever works.]
i misread that as LVT and was greatly confused for a moment
sci-fi story setting idea: a future where VR becomes so widespread that where you live physically in the US becomes more of a formality than of actual consequence, so mass internal migrations a la Free State Project occur as people rush to move to low population states to get more political influence in a federal political system that is increasingly impossible to reform
Wouldn’t this just lead to an equilibrium where every state has an about equal population super quickly though?
Only if housing can be built.
Wouldn’t that require all states to have identical laws, taxes, etc on residents? It seems likely to me that people care a lot more about those than differential influence on federal politics.
Edit: More precisely, a story in which most people do not care at least an order of magnitude more about laws on residents than on changing their federal political representation from 0.0000001 to 0.0000002 would strain my suspension of disbelief. People who care that much more about federal politics are much more likely to do something other than just move to another state to get microscopically more influence in very infrequent votes for representatives. If a large fraction of the population feels that way and the federal politics is extremely resistant to change, you probably get revolution or civil war, not mass migration to change vote strength.
random thoughts. no pretense that any of this is original or useful for anyone but me or even correct
It’s ok to want the world to be better and to take actions to make that happen but unproductive to be frustrated about it or to complain that a plan which should work in a better world doesn’t work in this world. To make the world the way you want it to be, you have to first understand how it is. This sounds obvious when stated abstractly but is surprisingly hard to adhere to in practice.
It would be really nice to have some evolved version of calibration training where I take some historical events and try to predict concrete questions about what happened, and give myself immediate feedback and keep track of my accuracy and calibration. Backtesting my world model so to speak. Might be a bit difficult to measure accuracy improvments due to non iid ness of the world, but worth trying the naive thing regardless. Would be interesting to try and autogen using GPT3.
Feedback loops are important. Unfortunately, from the inside it’s very easy to forget. In particular, setting up feedback loops is often high friction, because it’s hard to measure the thing we care about. Fixing this general problem is probably hard but in the meantime I can try to setup feedback loops for important things like productivity, world modelling, decision making, etc
self self improvement improvement: feeling guilty about not self improving enough and trying to fix your own ability to fix your own abilities
Lots of things have very counterintuitive or indirect values. If you don’t take this into account and you make decisions based on maximizing value you might end up macnamara-ing yourself hard.
The stages of learning something: (1) “this is super overwhelming! I don’t think I’ll ever understand it. there are so many things I need to keep track of. just trying to wrap my mind around it makes me feel slightly queasy” (2) “hmm this seems to actually make some sense, I’m starting to get the hang of this” (3) “this is so simple and obviously true, I’ve always known it to be true, I can’t believe anyone doesn’t understand this” (you start noticing that your explanations of the thing become indistinguishable from the things you originally felt overwhelmed by) (4) “this new thing [that builds on top of the thing you just learned] is super overwhelming! I don’t think I’ll ever understand it”
The feeling of regret really sucks. This is a bad thing, because it creates an incentive to never reflect on things or realize your mistakes. This shows up as a quite painful aversion to reflecting on mistakes, doing a postmortem, and improving. I would like to somehow trick my brain into reframing things somehow. Maybe thinking of it as a strict improvement over the status quo of having done things wrong? Or maybe reminding myself that the regret will be even worse if I don’t do anything because I’ll regret not reflecting in addition
Thought pattern that I’ve noticed: I seem to have two sets of epistemic states at any time: one more stable set that more accurately reflects my “actual” beliefs that changes fairly slowly, and one set of “hypothesis” beliefs that changes rapidly. Usually when I think some direction is interesting, I alternate my hypothesis beliefs between assuming key claims are true or false and trying to convince myself either way, and if I succeed then I integrate it into my actual beliefs. In practice this might look like alternating between trying to prove something is impossible and trying to exhibit an example, or taking strange premises seriously and trying to figure out its consequences. I think this is probably very confusing to people because usually when talking to people who are already familiar with alignment I’m talking about implications of my hypothesis beliefs, because that’s the frontier of what I’m thinking about, and from the outside it looks like I’m constantly changing my mind about things. Writing this up partially to have something to point people to and partially to push myself to communicate this more clearly.
I think this pattern is common among intellectuals, and I’m surprised it’s causing confusion. Are you labeling your exploratory beliefs and statements appropriately? An “epistemic status” note for posts here goes a long way, and in private conversation I often say out loud “I’m exploring here, don’t take it as what I fully believe” in conversations at work and with friends.
I think I do a poor job of labelling my statements (at least, in conversation. usually I do a bit better in post format). Something something illusion of transparency. To be honest, I didn’t even realize explicitly that I was doing this until fairly recent reflection on it.
traveling westward is an isekai anime where your old shitty sleep schedule is actually secretly a beautiful normal sleep schedule in the new timezons
imagining a glorious universe in which pride and prejudice were also written by a Russian author, so that all three classic X and Y books would be Russian
kind of trippy how the word for alcohol in Korean (sul) is pronounced very similar to the word for drunk in French (saoul)
See also.
the “move to ios” app is the worst piece of software i have ever used in my entire life
i hate these new AI phone support agent things. i can’t believe they somehow they made phone menus even worse.
https://en.wikipedia.org/wiki/Citizens%27_Commission_to_Investigate_the_FBI how the turn tables
drugs are steering vectors for the brain