I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.
This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
Most s-risk scenarios vaguely analogous to historical situations don’t happen in a post-AGI world, because there humans aren’t useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It’s not useful for the entities in power to do any of the things with traditionally terrible side effects.
Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it’s a distinct kind of problem. It doesn’t necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
I might be misunderstanding, but doesn’t this sort of assume that all tyranny is purely about resources?
No matter the level of abundance, its not clear that this makes power any less appealing to the power hungry, or suffering any less enjoyable to the sadists. So I don’t see why power-centralisation in the wrong hands would not be a problem in a post-AGI world.
Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won’t give you power. By orthogonality, any terrible thing can in principle be someone’s explicit intended target (an aspiration, not just a habit shaped by circumstance), but that’s rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn’t quite work.
How bad do you think power centralization is? It’s not obvious to me that power centralization guarantees S-risk. In general, I feel pretty confused about how a human god-emperor would behave, especially because many of the reasons that pushed past dictators to draconian rule may not apply when ASI is in the picture. For example, draconian dictators often faced genuine threats to their rule from rival factions, requiring brutal purges and surveillance states to maintain power, or they were stupid / overly paranoid (an ASI advisor could help them have better epistemics), etc. I’m keen to understand your POV better.
I think most people who work on control think that its a necessary intermediary step towards alignment, because aligning ASI will require use of (potentially not yet aligned) AI.
Yes, concentrated power* is bad, and I for one is 100 % always keeping this top of mind.
*EDIT: Too much unchecked power is bad.
But when it comes to control, it’s not at all as simple as you put it. Sure, solving control issues is not enough, but it is not bad on its own either.
First, S-risks from rogue AI seem just as likely, so why would control be a worse outcome? Maybe I misunderstand. If so, you should be more clear what you mean
Secondly, and more importantly, control problems need to be solved even for current and human-level AIs.
Thirdly, if we fear SI (ASI), then having control solutions apolied to its progenitors can buy precious time.
Fourth, you can go for control solutions pre-SI that are decentralized.
An idea I wrote about recently is something as simple as batteries. (It illustrates the point easily.) You can have one battery reliance. Or 100. You can put batteries in APIs, critical GPU clocks, etc. in various data centers, and have that service be operated by local authorities.
Control solutions are factors in a game.
The end.
PS. Sometimes here it seems people unconsciously have belief in belief, that there is just one or two outcomes and one or two solutions, and that everything will resolve in one or two steps. Black and white thinking, in other words. We must watch out for this fallacy and remain vigilant.
I think it leads to S-risks. I think people will remain in charge and use AI as a power-amplifier. The people most likely to end up with power like having power. They like having control over other people and dominating them. This is completely apparent if you spend the (unpleasant) time reading the Epstein documents that the House has released. We need societal and governmental reform before we even think about playing with any of this technology.
The answer to the world’s problems doesn’t rely on a bunch of individuals who are good at puzzles solving a puzzle and then we get utopia. It involves people recognizing the humanity of everyone around them and working on societal and governmental reform. And sure this stuff sounds like a long-shot but we’ve got to try. I wish I had a less vague answer but I don’t.
I don’t think you need to worry about individual humans aligning ASI only with themselves because this is probably much more difficult than ensuring it has any moral value system which resembles a human one. It is much more difficult to justify only caring about Sam Altman’s interests than it is for humans or life forms in general, which will make it unlikely that specifying this kind of allegiance in a way which is stable under self modification is possible, in my opinion.
Is intology a legitimate research lab? Today they talked about having an AI researcher that performed better than humans on RE-bench at 64 hr time horizons. This seems really unbelievable to me. The AI system is called Locus.
Per its LinkedIn it’s a tiny 2-10 member lab. Their only previous contribution was Zochi, a model for generating experiments and papers, one seemingly being accepted into ACL 2025. But there’s barely any transparency on what their model actually is, even on their technical report.
I personally see red flags with Intology too, main one being that such a performance form a tiny lab is hard to believe. On RE-Bench they compare against Sonnet 4.5, which has the best performance thus far per its model card, so them achieving superhuman results seems strange. Then there’s the fact there seems to be no paper as it’s their early results, the fact these results are all self-reported with minimal verification (a single Tsinghua student checked the kernels), and we have no technical details on the system itself or even what the underlying model is.
Another smaller lab with seemingly big contributions I can think of would be Sakana AI,but even they have far more employees and much more contributions + actual detailed papers for their models. And even they had an issue at one point where their CUDA Engineer system reported a 100x CUDA speedup that turned out to be cheating. Here Intology claims to get 20x-100x speedups like candy.
I just don’t understand why the people there would lie about something like this. This isn’t even very believable. It looks like the guy who founded it was a bright ML PhD and if he’s not telling the truth why would he throw away his reputation over this? Maybe it’s real but I’m pretty skeptical. I looked at their Zochi paper and I don’t see that they offered any proof that the papers they attributed to Zochi were written by Zochi.
It’s happened before, see Reflexion (I hope I’m remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don’t seem to get lasting consequences. But yeah I also don’t know why Intology would be lying, but the fact there’s no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won’t ever know. They say they plan on sharing Locus’ discoveries “in the coming months”, but until they actually do there’s no way to verify past checking their kernel samples on GitHub.
For now I’m heavily, heavily skeptical. Agentic scaffolds don’t usually magically 10x frontier models’ performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).
Making the (tenuous) assumption that humans remain in control of AGI, won’t it just be an absolute shitshow of attempted power grabs over who gets to tell the AGI what to do? For example, supposing OpenAI is the first to AGI, is it really plausible that Sam Altman will be the one actually in charge when there will have been multiple researchers interacting with the model much earlier and much more frequently? I have a hard time believing every researcher will sit by and watch Sam Altman become more powerful than anyone ever dreamed of when there’s a chance they’re a prompt away from having that power for themselves.
You’re assuming that: - There is a single AGI instance running. - There will be a single person telling that AGI what to do - The AGI’s obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don’t find that scenario very likely, though.
I see no reason why any of these will be true at first. But the end-goal for many rational agents in this situation would be to make sure 2 and 3 are true.
what is the plan for making task-alignment go well? i am much more worried about the possibility of being at the mercy of some god-emperor with a task-aligned AGI slave than I am about having my atoms repurposed by an unaligned AGI. the incentives for blackmail and power-consolidation look awful.
Why? I figure all the AI labs worry mostly about how to get the loot, without ensuring that there’s going to be any loot in the first place. Thus there won’t be any loot, and we’ll go extinct without any human getting to play god-emperor. It seems to me like trying to build an AGI tyranny is an alignment-complete challenge, and since we’re not remotely on track to solving alignment, I don’t worry about that particular bad ending.
the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there.
The question is not whether alignment is impossible (though I would be astonished if it was), but rather whether it’s vastly easier to increase capabilities to AGI/ASI than it is to align AGI/ASI, and ~all evidence points to yes. And so the first AGI/ASI will not be aligned.
The very short answer is that the people with the most experience in alignment research (Eliezer and
Nate Soares) say that without an AI pause
lasting many decades the alignment project is essentially hopeless because there is not enough time. Sure, it is possible the alignment project succeeds in time, but the probability is really low.
Eliezer has said that AIs based on the deep-learning paradigm are probably particularly hard to align, so it would probably help to get a ban or a long pause on that paradigm even if research in other paradigms continues, but good luck getting even that because almost all of the value currently being provided by AI-based services are based on deep-learning AIs.
One would think that it would be reassuring to know that the people running the labs are really smart and obviously want to survive (and have their children survive) but it is only reassuring before one listens to what they say and reads what they write about their plans on how to prevent human extinction and other catastrophic risks. (The plans are all quite inadequate.)
This seems way overdetermined. For example, AI labs have proven extremely successful at spending arbitrary amounts of money to increase capabilities (<-> scaling laws), and there’s been no similar ability to convert arbitrary amounts of money into progress on alignment.
You’re probably right but I guess my biggest concern is the first superhuman alignment researchers being aligned/dumb enough to explain to the companies how control works. It really depends on if self-awareness is present as well.
Everything feels so low-stakes right now compared to future possibilities, and I am envious of people who don’t realize that. I need to spend less time thinking about it but I still can’t wrap my head around people rolling a dice which might have s-risks on it. It just seems like a -inf EV decision. I do not understand the thought process of people who see -inf and just go “yeah I’ll gamble that.” It’s so fucking stupid.
They are not necessarily “seeing” -inf in the way you or me are. They’re just kinda not thinking about it, or think that 0 (death) is the lowest utility can realistically go.
What looks like an S-risk to you or me may not count as -inf for some people.
I think humanity’s actions right now are most comparable those of a drug addict. We as a species dont have the necessary equivalent of executive function and self control to abstain from racing towards AGI. And if we’re gonna do it anyway, those that shout about how we’re all gonna die just ruin everyone’s mood.
Or for that matter to abstain towards burning infinite fossil fuels. We happen to not live on a planet with enough carbon to trigger a Venus-like cascade, but if that wasn’t the case I don’t know if we could stop ourselves from doing that either.
The thing is, any kind of large scale coordination to that effect seems more and more like it would require a degree of removal of agency from individuals that I’d call dystopian. You can’t be human and free without a freedom to make mistakes. But the higher the stakes, the greater the technological power we wield, the less tolerant our situation becomes of mistakes. So the alternative would be that we need to willingly choose to slow down or abort entirely certain branches of technological progress—choosing shorter and more miserable lives over the risk of having to curtail our freedom. But of course for the most part, not unreasonably!, we don’t really want to take that trade-off, and ask “why not both?”.
What looks like an S-risk to you or me may not count as -inf for some people
True but that’s just for relatively “mild” S-risks like “a dystopia in which AI rules the world, sees all and electrocutes anyone who commits a crime by the standards of the year it was created in, forever”. It’s a bad outcome, you could classify it as S-risk, but it’s still among the most aligned AIs imaginable and relatively better than extinction.
I simply don’t think many people think about what does an S-risk literally worse than extinction look like. To be fair I also think these aren’t very likely outcomes, as they would require an AI very aligned to human values—if aligned for evil.
No, I mean, I think some people actually hold that any existence is better than non-existence, so death is -inf for them and existence, even in any kind of hellscape, is above-zero utility.
I just think any such people lack imagination. I am 100% confident there exists an amount of suffering that would have them wish for death instead; they simply can’t conceive of it.
One way to make this work is to just not consider your driven-to-madness future self an authority on the matter of what’s good or not. You can expect to start wishing for death, and still take actions that would lead you to this state, because present!you thinks that existing in a state of wishing for death is better than not existing at all.
This just boils down to “humans aren’t aligned,” and that fact is why this would never work, but I still think it’s worth bringing up. Why are you required to get a license to drive, but not to have children? I don’t mean this in a literal way, I’m just referring to how casual the decision to have children is seen by much of society. Bringing someone into existence is vastly higher stakes than driving a car.
I’m sure this isn’t implementable, but parents should at least be screened for personality disorders before they’re allowed to have children. And sure that’s a slippery slope, and sure many of the most powerful people just want workers to furnish their quality of life regardless of the worker’s QOL. But bringing a child into the world who you can’t properly care for can lead to a lifetime of avoidable suffering.
I was just reading about “genomic liberty,” and the idea that parents would choose to make their kids iq lower than possible, that some would even choose for their children to have disabilities like them is completely ridiculous. And it just made me think “those people shouldn’t have the liberty of being parents.” Bringing another life into existence is not casual like where you work/live. And the obligation should be to the children, not the parents.
Historically attempts to curtail this right lead to really really dark places. Part of living in a society with rights and laws is that people will do bad things the legal system has no ability to prevent. And on net, that’s a good thing. See also.
There is also the related problem of intelligence being negatively correlated with fertility, which leads to a dysgenic trend. Even if preventing people below a certain level of intelligence to have children was realistically possible, it would make another problem more severe: the fertility of smarter people is far below replacement, leading to quickly shrinking populations. Though fertility is likely partially heritable, and would go up again after some generations, once the descendants of the (currently rare) high-fertility people start to dominate.
From what I understand, JVN, Poincaré, and Terence Tao all had/have issues with perceptual intuition/mental visualization. JVN had “the physical intuition of a doorknob,” Poincaré was tested by Binet and had extremely poor perceptual abilities, and Tao (at least as a child) mentioned finding mental rotation tasks “hard.”
I also fit a (much less extreme) version of this pattern, which is why I’m interested in this in the first place. I am (relatively) good at visual pattern recognition and math, but I have aphantasia and have an average visual working memory. I felt insecure about this for a while, but seeing that much more intelligent people than me had a similar (but more extreme) cognitive profile made me feel better.
Does anybody have a satisfactory explanation for this profile beyond a simplistic “tradeoffs” explanation?
Edit: Some claims about JVN/Poincare may have been hallucinated, but they are based at least somewhat on reality. See my reply to Steven
(Not really answering your question, just chatting.)
What’s your source for “JVN had ‘the physical intuition of a doorknob’”? Nothing shows up on google. I’m not sure quite what that phrase is supposed to mean, so context would be helpful. I’m also not sure what “extremely poor perceptual abilities” means exactly.
You might have already seen this, but Poincaré writes about “analysts” and “geometers”:
It is impossible to study the works of the great mathematicians, or even those of the lesser, without noticing and distinguishing two opposite tendencies, or rather two entirely different kinds of minds. The one sort are above all preoccupied with logic; to read their works, one is tempted to believe they have advanced only step by step, after the manner of a Vauban who pushes on his trenches against the place besieged, leaving nothing to chance. The other sort are guided by intuition and at the first stroke make quick but sometimes precarious conquests, like bold cavalrymen of the advance guard.
The method is not imposed by the matter treated. Though one often says of the first that they are analysts and calls the others geometers, that does not prevent the one sort from remaining analysts even when they work at geometry, while the others are still geometers even when they occupy themselves with pure analysis. It is the very nature of their mind which makes them logicians or intuitionalists, and they can not lay it aside when they approach a new subject.
Not sure exactly how that relates, if at all. (What category did Poincaré put himself in? It’s probably in the essay somewhere, I didn’t read it that carefully. I think geometer, based on his work? But Tao is extremely analyst, I think, if we buy this categorization in the first place.)
I’m no JVN/Poincaré/Tao, but if anyone cares, I think I’m kinda aphantasia-adjacent, and I think that fact has something to do with why I’m naturally bad at drawing, and why, when I was a kid doing math olympiad problems, I was worse at Euclidean geometry problems than my peers who got similar overall scores.
Oh I was actually hoping you’d reply! I may have hallucinated the exact quote I mentioned but here is something from Ulam: “Ulam on physical intuition and visualization,” it’s on Steve Hsu’s blog. And I might have hallucinated the thing about Poincaré being tested by Binet, that might just be an urban legend I didn’t verify. You can find Poincaré’s struggles with coordination and dexterity in “Men of Mathematics,” but that’s a lot less extreme than the story I passed on. I am confident in Tao’s preference for analysis over visualization. If you have the time look up “Terence Tao” on Gwern’s website.
I’m not very familiar with the field of neuroscience, but it seems to me that we’re probably pretty far from being able to provide a satisfactory answer to these questions. Is that true from your understanding of where the field is at? What sorts of techniques/technology would we need to develop in order for us to start answering these questions?
Apologies in advance if this is a midwit take. Chess engines are “smarter” than humans at chess, but they aren’t automatically better at real-world strategizing as a result. They don’t take over the world. Why couldn’t the same be true for STEMlord LLM-based agents?
It doesn’t seem like any of the companies are anywhere near AI that can “learn” or generalize in real time like a human or animal. Maybe a superintelligent STEMlord could hack their way around learning, but that still doesn’t seem the same as or as dangerous as fooming, and it also seems much easier to monitor. Does it not seem plausible that the current paradigm drastically accelerates scientific research while remaining tools? The counter is that people will just use the tools to try and figure out learning. But we don’t know how hard learning is, and the tools could also enable people to make real progress on alignment before learning is cracked.
Welcome to Less Wrong. Sometimes I like to go around engaging with new people, so that’s what I’m doing.
On a sentence-by-sentence basis, your post is generally correct. It seems like you’re disagreeing with something you’ve read or heard. But I don’t know what you read, so I can’t understand what you’re arguing for or against. I could guess, but it would be better if you just said.
hi, thank you! i guess i was thinking about claims that “AGI is imminent and therefore we’re doomed.” it seems like if you define AGI as “really good at STEM” then it is obviously imminent. but if you define it as “capable of continuous learning like a human or animal,” that’s not true. we don’t know how to build it and we can’t even run a fruit-fly connectome on the most powerful computers we have for more than a couple of seconds without the instance breaking down: how would we expect to run something OOMs more complex and intelligent? “being good at STEM” seems like a much, much simpler and less computationally intensive task than continuous, dynamic learning. tourist is great at codeforces, but he obviously doesn’t have the ability to take over the world (i am making the assumption that anyone with the capability to take over the world would do so). the second is a much, much fuzzier, more computationally complex task than the first.
i had just been in a deep depression for a while (it’s embarassing, but this started with GPT-4) because i thought some AI in the near future was going to wake up, become god, and pwn humanity. but when i think about it from this perspective, that future seems much less likely. in fact, the future (at least in the near-term) looks very bright. and i can actually plan for it, which feels deeply relieving to me.
For me, depression has been independent of the probability of doom. I’ve definitely been depressed, but I’ve been pretty cheerful for the past few years, even as the apparent probability of near-term doom has been mounting steadily. I did stop working on AI, and tried to talk my friends out of it, which was about all I could do. I decided not to worry about things I can’t affect, which has clarified my mind immensely.
The near-term future does indeed look very bright.
Hey Carl, sorry to bother you what I’m about to say is pretty irrelevant to the discussion but I’m a highschool student looking to gather good research experience and I wanted to ask a few questions. Is there any place I can reach out to you other than here? I would greatly appreciate any and all help!
You shouldn’t worry about whether something “is AGI”; it’s an I’ll-defined concept. I agree that current models are lacking the ability to accomplish long-term tasks in the real world, and this keeps them safe. But I don’t think this is permanent, for two reasons.
Current large-language-model type AI is not capable of continuous learning, it is true. But AIs which are capable of it have been built. AlphaZero is perhaps the best example; it learns to play games to a superhuman level in a few hours. It’s a topic of current research to try to combine them.
Moreover, tool-type AIs tend to be developed to provide agency, because it’s more useful to direct an agent than it is a tool. This is a more fully fleshed out here: https://gwern.net/tool-ai
Much of my probability of non-doom is resting on people somehow not developing agents.
MuZero doesn’t seem categorically different from AlphaZero. It has to do a little bit more work at the beginning, but if you don’t get any reward for breaking the rules: you will learn not to break the rules. If MuZero is continuously learning then so is AlphaZero. Also, the games used were still computationally simple, OOMs more simple than an open-world game, let alone a true World-Model. AFAIK MuZero doesn’t work on open-ended, open-world games. And AlphaStar never got to superhuman performance at human speed either.
I am in violent agreement. Nowhere did I say that MuZero could learn a world model as complicated as those LLMs currently enjoy. But it could learn continuously, and execute pretty complex strategies. I don’t know how to combine that with the breadth of knowledge or cleverness of LLMs, but if we could, we’d be in trouble.
Fun Fact of the Day: Kanye West’s WAIS is within two points of a fields medalist’s (the fields medalist is Richard Borcherds, their respective IQs are 135 and 137).
Extra Fun Fact: Kanye West was bragging about this to Donald Trump in the Oval Office. He revealed that his digit span was only 92.5 (which is what makes me think he actually had a psychologist-administered WAIS).
Extra Extra Fun Fact: Richard Borcherds was administered the WAIS-R by Sacha Baron Cohen’s first cousin.
Possible, but seems unlikely. Unless there’s some verified record, the mere fact he may have taken a valid test is very weak evidence that his claimed scores are accurate and not exaggerated.
What if Trump is channeling his inner doctor strange and is crashing the economy in order to slow AI progress and buy time for alignment? Eliezer calls for an AI pause, Trump MAKES an AI pause. I rest my case that Trump is the most important figure in the history of AI alignment.
Yes, the likely outcome of a long tariff regime is China replaces the U.S. as the hegemon + AI race leader and they can’t read Lesswrong or EA blogs there so all this work is useless.
I think a lot of people are confused by good and courageous people and don’t understand why some people are that way. But I don’t think the answer is that confusing. It comes down to strength of conscience. For some people, the emotional pain of not doing what they think is right hurts them 1000x more than any physical pain. They hate doing what they think is wrong more than they hate any physical pain.
So if you want to be an asshole, you can say that good and courageous people, otherwise known as heroes, do it out of their own self-interest.
Sure. The people I’m talking about choose to care as much as they do. Good and courageous people can choose to not have hope and not care about others, but they choose to care.
Contrary view: The use of self-torture to promote goodness is an s-risk. The kingdom of heaven looks like people doing good deeds for each other out of love and delight, not out of guilt- and shame-avoidance.
If you’re making fun of what I’ve expressed about S-risks go fuck yourself. If you’re not then I think you’re naive. Anger is the main way change happens. You’ve just been raised on a society that got ravaged by Russian psy-ops that the elites encouraged to weaken the population. It can feel good to uplift others while simultaneously feeling fucking awful knowing that innocent people are suffering.
And just to be fucking clear, if you were making fun of me, please say it like a fucking man and not some fucking castrated male. If you were making fun of me you’re a low T faggot who’s not as smart as he thinks he is. There are 10 million Chinese people smarter than you.
To be clear, I only intend the last paragraph if you were being a bitch. If not then consider that it’s only addressed to a hypothetical cunty version of you.
Moderator warning: This is well outside the bounds of reasonable behavior on LW. I can tell you’re in a pretty intense emotional state, and I sympathize, but I think that’s clouding your judgment pretty badly. I’m not sure what it is you think you’re seeing in the grandparent comment, but whatever it is I don’t think it’s there. Do not try to write on LW while in that state.
Also, if it is true that a lot of people are confused by good and courageous people, I am unclear where the confusion comes from. Good behaviour gets rewarded from childhood, and bad behaviour gets punished. Not perfectly, of course, and in some places and times very imperfectly indeed, but being seen as a good person by your community’s definition of “good” has many social rewards, we’re social creatures… I am unclear where the mystery is.
Were the confused people raised by wolvesnon-social animals?
I don’t actually buy the premise that a lot of people are confused by moral courage, on reflection.
This doesn’t match my experience of what good people are generally like. I find them to be often happy to do what they are doing, rather than extremely afraid of not doing it, as I imagine would be the case if their reasons for behaving as they do were related to avoidance of pain.
There are of course exceptions. But if thinking I had done the wrong thing was extremely painful to me, literally “1000x more than any physical pain” I predict I’d quite possibly land on the strategy “avoid thinking about matters of right and wrong, so as to reliably avoid finding out I’d done wrong.” A nihilistic worldview where nothing was right or wrong and everything I might do is fine, would be quite appealing. Also, since one can’t change the past, any discovery that I’d done something wrong in the past would be an unfixable, permanent source of extreme pain for the rest of my life. In that situation, I’d probably rationalize the past behaviour as somehow being good, actually, in order to make the pain stop… which does not pattern-match to being a good person long term, but rather the opposite, being someone who is pathologically unable to admit fault, and has a large bag of tricks to avoid blame.
How rare good people are depends heavily on how high your bar for qualifying as a good person is. Many forms of good-person behaviour are common, some are rare. A person who has never done anything they later felt guilty about (who has a functioning conscience) is exceedingly rare. In my personal experience, I have found people to vary on a spectrum from “kind of bad and selfish quite often, but feels bad about it when they think about it and is good to people sometimes” to “consistently good, altruistic and honest, but not perfect, may still let you down on occasion”, with rare exceptions falling outside this range.
How far along are the development of autonomous underwater drones in America? I’ve read statements by American military officials about wanting to turn the Taiwan straight into a drone-infested death trap. And I read someone (not an expert) who said that China is racing against time to try and invade before autonomous underwater drones take off. Is that true? Are they on track?
I’m weighing my career options, and the two issues that seem most important to me are factory farming and preventing misuse/s-risks from AI. Working for a lab-grown meat startup seems like a very high-impact line of work that could also be technically interesting. I think I would enjoy that career a lot.
However, I believe that S-risks from human misuse of AI and neuroscience introduce scenarios that dwarf factory-farming in awfulness. I think that there are lots of incredibly intelligent people working on figuring out how to align AIs to who/what we want. But I don’t think there’s nearly the same amount of effort being made towards the coordination problem/preventing misuse. So naturally, I’d really like to work on solving this, but I just don’t even know how I’d start tackling this problem. It seems much harder and much less straightforward than “help make lab-grown meat cheap enough to end factory farming.” So, any advice would be appreciated.
I am pretty good at math. At a T20 math program I was chosen for special mentorship and research opportunities over several people who made Top 500 on the Putnam due to me being deemed “more talented” (as nebulous as that phrase is, I was significantly faster in lectures than them and was able to digest graduate texts much quicker than them, I was also able to solve competition-style problems they couldn’t). My undergrad got interrupted by a health crisis so I never got a chance to actually engage in research or dedicated Putnam prep, but I believe most (maybe all if I’m being vain) of my professors would have considered me the brightest student in my year. I don’t know a lot about programming or ML at this point, but I am confident I could learn. I’m two years into my undergrad and will likely be returning next year.
My default drive-by recommendation is that you try to get involved in research related to these issues. You could try to get advice from Chi Nguyen, who works on s-risk and is friendly and thoughtful; you can contact her here.
I got into reading about near death experiences and it seems a common theme is that we’re all one. Like each and every one of us is really just part of some omniscient god that’s so omniscient and great that god isn’t even a good enough name for it: experiencing what it’s like to be small. Sure, why not. That’s sort of intuitive to me. Given that I can’t verify the universe exists and can only verify my experience it doesn’t seem that crazy to say experience is fundamental.
But if that’s the case then I’m just left with an overwhelming sense of why. Why make a universe with three spatial dimensions? Why make yourself experience suffering? Why make yourself experience hate? Why filter your consciousness through a talking chimpanzee? If I’m an omniscient entity why would I choose this? Surely there’s got to be infinitely more interesting things to do. If we’re all god then surely we’d never get bored just doing god things.
So you can take the obvious answer that everything exists. But then you’re left with other questions. Why are we in a universe that makes sense? Why don’t we live in a cartoon operating on cartoon logic? Does that mean there’s a sentient SpongeBob? And then there’s the more pressing concern of astronomical suffering. Are there universes where people are experiencing hyperpain? Surely god wouldn’t want to experience I Have No Mouth and I Must Scream. It doesn’t seem likely to me that there are sentiences living in cartoons, so I’ll use that to take the psychologically comforting position that not everything we can imagine exists.
But if that’s the case then why this? Why this universe? Why this amount of suffering? If there’s a no-go zone of experience where is it? I have so many questions and I don’t know where the answers are.
I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.
This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
We don’t know of an alignment target that everyone can agree on, so solving alignment pretty much guarantees misuse by at least some people’s lights.
I mean “not solving alignment” pretty much guarantees misuse by everyone’s lights? (In both cases conditional on building ASI)
It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.
Most s-risk scenarios vaguely analogous to historical situations don’t happen in a post-AGI world, because there humans aren’t useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It’s not useful for the entities in power to do any of the things with traditionally terrible side effects.
Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it’s a distinct kind of problem. It doesn’t necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
I might be misunderstanding, but doesn’t this sort of assume that all tyranny is purely about resources?
No matter the level of abundance, its not clear that this makes power any less appealing to the power hungry, or suffering any less enjoyable to the sadists. So I don’t see why power-centralisation in the wrong hands would not be a problem in a post-AGI world.
Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won’t give you power. By orthogonality, any terrible thing can in principle be someone’s explicit intended target (an aspiration, not just a habit shaped by circumstance), but that’s rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn’t quite work.
How bad do you think power centralization is? It’s not obvious to me that power centralization guarantees S-risk. In general, I feel pretty confused about how a human god-emperor would behave, especially because many of the reasons that pushed past dictators to draconian rule may not apply when ASI is in the picture. For example, draconian dictators often faced genuine threats to their rule from rival factions, requiring brutal purges and surveillance states to maintain power, or they were stupid / overly paranoid (an ASI advisor could help them have better epistemics), etc. I’m keen to understand your POV better.
I think most people who work on control think that its a necessary intermediary step towards alignment, because aligning ASI will require use of (potentially not yet aligned) AI.
I partly agree in spirit.
Yes, concentrated power* is bad, and I for one is 100 % always keeping this top of mind.
*EDIT: Too much unchecked power is bad.
But when it comes to control, it’s not at all as simple as you put it. Sure, solving control issues is not enough, but it is not bad on its own either.
First, S-risks from rogue AI seem just as likely, so why would control be a worse outcome? Maybe I misunderstand. If so, you should be more clear what you mean
Secondly, and more importantly, control problems need to be solved even for current and human-level AIs.
Thirdly, if we fear SI (ASI), then having control solutions apolied to its progenitors can buy precious time.
Fourth, you can go for control solutions pre-SI that are decentralized.
An idea I wrote about recently is something as simple as batteries. (It illustrates the point easily.) You can have one battery reliance. Or 100. You can put batteries in APIs, critical GPU clocks, etc. in various data centers, and have that service be operated by local authorities.
Control solutions are factors in a game.
The end.
PS. Sometimes here it seems people unconsciously have belief in belief, that there is just one or two outcomes and one or two solutions, and that everything will resolve in one or two steps. Black and white thinking, in other words. We must watch out for this fallacy and remain vigilant.
What do you think is realistic if alignment is possible? Would the large corporations make a loving machine or a money-and-them-aligned machine?
I think it leads to S-risks. I think people will remain in charge and use AI as a power-amplifier. The people most likely to end up with power like having power. They like having control over other people and dominating them. This is completely apparent if you spend the (unpleasant) time reading the Epstein documents that the House has released. We need societal and governmental reform before we even think about playing with any of this technology.
The answer to the world’s problems doesn’t rely on a bunch of individuals who are good at puzzles solving a puzzle and then we get utopia. It involves people recognizing the humanity of everyone around them and working on societal and governmental reform. And sure this stuff sounds like a long-shot but we’ve got to try. I wish I had a less vague answer but I don’t.
I don’t think you need to worry about individual humans aligning ASI only with themselves because this is probably much more difficult than ensuring it has any moral value system which resembles a human one. It is much more difficult to justify only caring about Sam Altman’s interests than it is for humans or life forms in general, which will make it unlikely that specifying this kind of allegiance in a way which is stable under self modification is possible, in my opinion.
Is intology a legitimate research lab? Today they talked about having an AI researcher that performed better than humans on RE-bench at 64 hr time horizons. This seems really unbelievable to me. The AI system is called Locus.
I made a Manifold market about this: Is Intology’s Locus really better than humans at AI R&D?
Looks like the Manifold market on this is at 9% it’s really better than a human, with 8 participants.
I wouldn’t be surprised if it’s good enough to be noteworthy, though!
Per its LinkedIn it’s a tiny 2-10 member lab. Their only previous contribution was Zochi, a model for generating experiments and papers, one seemingly being accepted into ACL 2025. But there’s barely any transparency on what their model actually is, even on their technical report.
I personally see red flags with Intology too, main one being that such a performance form a tiny lab is hard to believe. On RE-Bench they compare against Sonnet 4.5, which has the best performance thus far per its model card, so them achieving superhuman results seems strange. Then there’s the fact there seems to be no paper as it’s their early results, the fact these results are all self-reported with minimal verification (a single Tsinghua student checked the kernels), and we have no technical details on the system itself or even what the underlying model is.
Another smaller lab with seemingly big contributions I can think of would be Sakana AI,but even they have far more employees and much more contributions + actual detailed papers for their models. And even they had an issue at one point where their CUDA Engineer system reported a 100x CUDA speedup that turned out to be cheating. Here Intology claims to get 20x-100x speedups like candy.
I just don’t understand why the people there would lie about something like this. This isn’t even very believable. It looks like the guy who founded it was a bright ML PhD and if he’s not telling the truth why would he throw away his reputation over this? Maybe it’s real but I’m pretty skeptical. I looked at their Zochi paper and I don’t see that they offered any proof that the papers they attributed to Zochi were written by Zochi.
It’s happened before, see Reflexion (I hope I’m remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don’t seem to get lasting consequences. But yeah I also don’t know why Intology would be lying, but the fact there’s no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won’t ever know. They say they plan on sharing Locus’ discoveries “in the coming months”, but until they actually do there’s no way to verify past checking their kernel samples on GitHub.
For now I’m heavily, heavily skeptical. Agentic scaffolds don’t usually magically 10x frontier models’ performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).
people lie about some crazy shit
Making the (tenuous) assumption that humans remain in control of AGI, won’t it just be an absolute shitshow of attempted power grabs over who gets to tell the AGI what to do? For example, supposing OpenAI is the first to AGI, is it really plausible that Sam Altman will be the one actually in charge when there will have been multiple researchers interacting with the model much earlier and much more frequently? I have a hard time believing every researcher will sit by and watch Sam Altman become more powerful than anyone ever dreamed of when there’s a chance they’re a prompt away from having that power for themselves.
You’re assuming that:
- There is a single AGI instance running.
- There will be a single person telling that AGI what to do
- The AGI’s obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don’t find that scenario very likely, though.
I see no reason why any of these will be true at first. But the end-goal for many rational agents in this situation would be to make sure 2 and 3 are true.
Correct, those goals are instrumentally convergent.
what is the plan for making task-alignment go well? i am much more worried about the possibility of being at the mercy of some god-emperor with a task-aligned AGI slave than I am about having my atoms repurposed by an unaligned AGI. the incentives for blackmail and power-consolidation look awful.
Why? I figure all the AI labs worry mostly about how to get the loot, without ensuring that there’s going to be any loot in the first place. Thus there won’t be any loot, and we’ll go extinct without any human getting to play god-emperor. It seems to me like trying to build an AGI tyranny is an alignment-complete challenge, and since we’re not remotely on track to solving alignment, I don’t worry about that particular bad ending.
the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there.
The question is not whether alignment is impossible (though I would be astonished if it was), but rather whether it’s vastly easier to increase capabilities to AGI/ASI than it is to align AGI/ASI, and ~all evidence points to yes. And so the first AGI/ASI will not be aligned.
Your argument is actually possible, but what evidences do you have, that make it the likely outcome?
The very short answer is that the people with the most experience in alignment research (Eliezer and Nate Soares) say that without an AI pause lasting many decades the alignment project is essentially hopeless because there is not enough time. Sure, it is possible the alignment project succeeds in time, but the probability is really low.
Eliezer has said that AIs based on the deep-learning paradigm are probably particularly hard to align, so it would probably help to get a ban or a long pause on that paradigm even if research in other paradigms continues, but good luck getting even that because almost all of the value currently being provided by AI-based services are based on deep-learning AIs.
One would think that it would be reassuring to know that the people running the labs are really smart and obviously want to survive (and have their children survive) but it is only reassuring before one listens to what they say and reads what they write about their plans on how to prevent human extinction and other catastrophic risks. (The plans are all quite inadequate.)
This seems way overdetermined. For example, AI labs have proven extremely successful at spending arbitrary amounts of money to increase capabilities (<-> scaling laws), and there’s been no similar ability to convert arbitrary amounts of money into progress on alignment.
You’re probably right but I guess my biggest concern is the first superhuman alignment researchers being aligned/dumb enough to explain to the companies how control works. It really depends on if self-awareness is present as well.
Everything feels so low-stakes right now compared to future possibilities, and I am envious of people who don’t realize that. I need to spend less time thinking about it but I still can’t wrap my head around people rolling a dice which might have s-risks on it. It just seems like a -inf EV decision. I do not understand the thought process of people who see -inf and just go “yeah I’ll gamble that.” It’s so fucking stupid.
They are not necessarily “seeing” -inf in the way you or me are. They’re just kinda not thinking about it, or think that 0 (death) is the lowest utility can realistically go.
What looks like an S-risk to you or me may not count as -inf for some people.
I think humanity’s actions right now are most comparable those of a drug addict. We as a species dont have the necessary equivalent of executive function and self control to abstain from racing towards AGI. And if we’re gonna do it anyway, those that shout about how we’re all gonna die just ruin everyone’s mood.
Or for that matter to abstain towards burning infinite fossil fuels. We happen to not live on a planet with enough carbon to trigger a Venus-like cascade, but if that wasn’t the case I don’t know if we could stop ourselves from doing that either.
The thing is, any kind of large scale coordination to that effect seems more and more like it would require a degree of removal of agency from individuals that I’d call dystopian. You can’t be human and free without a freedom to make mistakes. But the higher the stakes, the greater the technological power we wield, the less tolerant our situation becomes of mistakes. So the alternative would be that we need to willingly choose to slow down or abort entirely certain branches of technological progress—choosing shorter and more miserable lives over the risk of having to curtail our freedom. But of course for the most part, not unreasonably!, we don’t really want to take that trade-off, and ask “why not both?”.
True but that’s just for relatively “mild” S-risks like “a dystopia in which AI rules the world, sees all and electrocutes anyone who commits a crime by the standards of the year it was created in, forever”. It’s a bad outcome, you could classify it as S-risk, but it’s still among the most aligned AIs imaginable and relatively better than extinction.
I simply don’t think many people think about what does an S-risk literally worse than extinction look like. To be fair I also think these aren’t very likely outcomes, as they would require an AI very aligned to human values—if aligned for evil.
No, I mean, I think some people actually hold that any existence is better than non-existence, so death is -inf for them and existence, even in any kind of hellscape, is above-zero utility.
I just think any such people lack imagination. I am 100% confident there exists an amount of suffering that would have them wish for death instead; they simply can’t conceive of it.
One way to make this work is to just not consider your driven-to-madness future self an authority on the matter of what’s good or not. You can expect to start wishing for death, and still take actions that would lead you to this state, because present!you thinks that existing in a state of wishing for death is better than not existing at all.
I think that’s perfectly coherent.
I mean, I guess it’s technically coherent, but it also sounds kind of insane. That way Dormammu lies.
Why would one even care about their future self if they’re so unconcerned about that self’s preferences?
This just boils down to “humans aren’t aligned,” and that fact is why this would never work, but I still think it’s worth bringing up. Why are you required to get a license to drive, but not to have children? I don’t mean this in a literal way, I’m just referring to how casual the decision to have children is seen by much of society. Bringing someone into existence is vastly higher stakes than driving a car.
I’m sure this isn’t implementable, but parents should at least be screened for personality disorders before they’re allowed to have children. And sure that’s a slippery slope, and sure many of the most powerful people just want workers to furnish their quality of life regardless of the worker’s QOL. But bringing a child into the world who you can’t properly care for can lead to a lifetime of avoidable suffering.
I was just reading about “genomic liberty,” and the idea that parents would choose to make their kids iq lower than possible, that some would even choose for their children to have disabilities like them is completely ridiculous. And it just made me think “those people shouldn’t have the liberty of being parents.” Bringing another life into existence is not casual like where you work/live. And the obligation should be to the children, not the parents.
Historically attempts to curtail this right lead to really really dark places. Part of living in a society with rights and laws is that people will do bad things the legal system has no ability to prevent. And on net, that’s a good thing. See also.
There is also the related problem of intelligence being negatively correlated with fertility, which leads to a dysgenic trend. Even if preventing people below a certain level of intelligence to have children was realistically possible, it would make another problem more severe: the fertility of smarter people is far below replacement, leading to quickly shrinking populations. Though fertility is likely partially heritable, and would go up again after some generations, once the descendants of the (currently rare) high-fertility people start to dominate.
>be me, omnipotent creator
>decide to create
>meticulously craft laws of physics
>big bang
>pure chaos
>structure emerges
>galaxies form
>stars form
>planets form
>life
>one cell
>cell eats other cell, multicellular life
>fish
>animals emerge from the oceans
>numerous opportunities for life to disappear, but it continues
>mammals
>monkeys
>super smart monkeys
>make tools, control fire, tame other animals
>monkeys create science, philosophy, art
>the universe is beginning to understand itself
>AI
>Humans and AI together bring superintelligence online
>everyone holds their breath
>superintelligence turns everything into paper clips mfw infinite kek
From what I understand, JVN, Poincaré, and Terence Tao all had/have issues with perceptual intuition/mental visualization. JVN had “the physical intuition of a doorknob,” Poincaré was tested by Binet and had extremely poor perceptual abilities, and Tao (at least as a child) mentioned finding mental rotation tasks “hard.”
I also fit a (much less extreme) version of this pattern, which is why I’m interested in this in the first place. I am (relatively) good at visual pattern recognition and math, but I have aphantasia and have an average visual working memory. I felt insecure about this for a while, but seeing that much more intelligent people than me had a similar (but more extreme) cognitive profile made me feel better.
Does anybody have a satisfactory explanation for this profile beyond a simplistic “tradeoffs” explanation?
Edit: Some claims about JVN/Poincare may have been hallucinated, but they are based at least somewhat on reality. See my reply to Steven
(Not really answering your question, just chatting.)
What’s your source for “JVN had ‘the physical intuition of a doorknob’”? Nothing shows up on google. I’m not sure quite what that phrase is supposed to mean, so context would be helpful. I’m also not sure what “extremely poor perceptual abilities” means exactly.
You might have already seen this, but Poincaré writes about “analysts” and “geometers”:
Not sure exactly how that relates, if at all. (What category did Poincaré put himself in? It’s probably in the essay somewhere, I didn’t read it that carefully. I think geometer, based on his work? But Tao is extremely analyst, I think, if we buy this categorization in the first place.)
I’m no JVN/Poincaré/Tao, but if anyone cares, I think I’m kinda aphantasia-adjacent, and I think that fact has something to do with why I’m naturally bad at drawing, and why, when I was a kid doing math olympiad problems, I was worse at Euclidean geometry problems than my peers who got similar overall scores.
Oh I was actually hoping you’d reply! I may have hallucinated the exact quote I mentioned but here is something from Ulam: “Ulam on physical intuition and visualization,” it’s on Steve Hsu’s blog. And I might have hallucinated the thing about Poincaré being tested by Binet, that might just be an urban legend I didn’t verify. You can find Poincaré’s struggles with coordination and dexterity in “Men of Mathematics,” but that’s a lot less extreme than the story I passed on. I am confident in Tao’s preference for analysis over visualization. If you have the time look up “Terence Tao” on Gwern’s website.
I’m not very familiar with the field of neuroscience, but it seems to me that we’re probably pretty far from being able to provide a satisfactory answer to these questions. Is that true from your understanding of where the field is at? What sorts of techniques/technology would we need to develop in order for us to start answering these questions?
In case anyone else is going looking, here is the relevant account of Tao as a child and here is a screenshot of the most relevant part:
Apologies in advance if this is a midwit take. Chess engines are “smarter” than humans at chess, but they aren’t automatically better at real-world strategizing as a result. They don’t take over the world. Why couldn’t the same be true for STEMlord LLM-based agents?
It doesn’t seem like any of the companies are anywhere near AI that can “learn” or generalize in real time like a human or animal. Maybe a superintelligent STEMlord could hack their way around learning, but that still doesn’t seem the same as or as dangerous as fooming, and it also seems much easier to monitor. Does it not seem plausible that the current paradigm drastically accelerates scientific research while remaining tools? The counter is that people will just use the tools to try and figure out learning. But we don’t know how hard learning is, and the tools could also enable people to make real progress on alignment before learning is cracked.
Welcome to Less Wrong. Sometimes I like to go around engaging with new people, so that’s what I’m doing.
On a sentence-by-sentence basis, your post is generally correct. It seems like you’re disagreeing with something you’ve read or heard. But I don’t know what you read, so I can’t understand what you’re arguing for or against. I could guess, but it would be better if you just said.
hi, thank you! i guess i was thinking about claims that “AGI is imminent and therefore we’re doomed.” it seems like if you define AGI as “really good at STEM” then it is obviously imminent. but if you define it as “capable of continuous learning like a human or animal,” that’s not true. we don’t know how to build it and we can’t even run a fruit-fly connectome on the most powerful computers we have for more than a couple of seconds without the instance breaking down: how would we expect to run something OOMs more complex and intelligent? “being good at STEM” seems like a much, much simpler and less computationally intensive task than continuous, dynamic learning. tourist is great at codeforces, but he obviously doesn’t have the ability to take over the world (i am making the assumption that anyone with the capability to take over the world would do so). the second is a much, much fuzzier, more computationally complex task than the first.
i had just been in a deep depression for a while (it’s embarassing, but this started with GPT-4) because i thought some AI in the near future was going to wake up, become god, and pwn humanity. but when i think about it from this perspective, that future seems much less likely. in fact, the future (at least in the near-term) looks very bright. and i can actually plan for it, which feels deeply relieving to me.
For me, depression has been independent of the probability of doom. I’ve definitely been depressed, but I’ve been pretty cheerful for the past few years, even as the apparent probability of near-term doom has been mounting steadily. I did stop working on AI, and tried to talk my friends out of it, which was about all I could do. I decided not to worry about things I can’t affect, which has clarified my mind immensely.
The near-term future does indeed look very bright.
Hey Carl, sorry to bother you what I’m about to say is pretty irrelevant to the discussion but I’m a highschool student looking to gather good research experience and I wanted to ask a few questions. Is there any place I can reach out to you other than here? I would greatly appreciate any and all help!
You shouldn’t worry about whether something “is AGI”; it’s an I’ll-defined concept. I agree that current models are lacking the ability to accomplish long-term tasks in the real world, and this keeps them safe. But I don’t think this is permanent, for two reasons.
Current large-language-model type AI is not capable of continuous learning, it is true. But AIs which are capable of it have been built. AlphaZero is perhaps the best example; it learns to play games to a superhuman level in a few hours. It’s a topic of current research to try to combine them.
Moreover, tool-type AIs tend to be developed to provide agency, because it’s more useful to direct an agent than it is a tool. This is a more fully fleshed out here: https://gwern.net/tool-ai
Much of my probability of non-doom is resting on people somehow not developing agents.
Whoops, meant MuZero instead of AlphaZero.
MuZero doesn’t seem categorically different from AlphaZero. It has to do a little bit more work at the beginning, but if you don’t get any reward for breaking the rules: you will learn not to break the rules. If MuZero is continuously learning then so is AlphaZero. Also, the games used were still computationally simple, OOMs more simple than an open-world game, let alone a true World-Model. AFAIK MuZero doesn’t work on open-ended, open-world games. And AlphaStar never got to superhuman performance at human speed either.
I am in violent agreement. Nowhere did I say that MuZero could learn a world model as complicated as those LLMs currently enjoy. But it could learn continuously, and execute pretty complex strategies. I don’t know how to combine that with the breadth of knowledge or cleverness of LLMs, but if we could, we’d be in trouble.
Fun Fact of the Day: Kanye West’s WAIS is within two points of a fields medalist’s (the fields medalist is Richard Borcherds, their respective IQs are 135 and 137).
Extra Fun Fact: Kanye West was bragging about this to Donald Trump in the Oval Office. He revealed that his digit span was only 92.5 (which is what makes me think he actually had a psychologist-administered WAIS).
Extra Extra Fun Fact: Richard Borcherds was administered the WAIS-R by Sacha Baron Cohen’s first cousin.
(For reference, 135 is 2.33 SDs, which works out to about 1 in 100, i.e. you’re the WAISest person in the room with 100 randomly chosen adults. Cf. https://tsvibt.blogspot.com/2022/08/the-power-of-selection.html#samples-to-standard-deviations )
Interesting, seems believable. Being intelligent probably helps a lot with being a successful musician.
Possible, but seems unlikely. Unless there’s some verified record, the mere fact he may have taken a valid test is very weak evidence that his claimed scores are accurate and not exaggerated.
What if Trump is channeling his inner doctor strange and is crashing the economy in order to slow AI progress and buy time for alignment? Eliezer calls for an AI pause, Trump MAKES an AI pause. I rest my case that Trump is the most important figure in the history of AI alignment.
Trump shot an arrow into the air; it fell to Earth, he knows not where...
Probably one of the best succinct summaries of every damn week that man is president lmao
If that was his goal, he has better options.
Yes, the likely outcome of a long tariff regime is China replaces the U.S. as the hegemon + AI race leader and they can’t read Lesswrong or EA blogs there so all this work is useless.
LessWrong is uncensored in China.
VPNs exist and are probably widely used in China + much of “all this work” is on ArXiv etc.
I think a lot of people are confused by good and courageous people and don’t understand why some people are that way. But I don’t think the answer is that confusing. It comes down to strength of conscience. For some people, the emotional pain of not doing what they think is right hurts them 1000x more than any physical pain. They hate doing what they think is wrong more than they hate any physical pain.
So if you want to be an asshole, you can say that good and courageous people, otherwise known as heroes, do it out of their own self-interest.
People can just decide to do things of their own volition, without peculiar arrangements of pain or pleasure being in charge of their will.
Sure. The people I’m talking about choose to care as much as they do. Good and courageous people can choose to not have hope and not care about others, but they choose to care.
I claim that I am unusually Good (people who know me well would agree—many of them have said as much, unprompted). This is not how it works for me.
Contrary view: The use of self-torture to promote goodness is an s-risk. The kingdom of heaven looks like people doing good deeds for each other out of love and delight, not out of guilt- and shame-avoidance.
If you’re making fun of what I’ve expressed about S-risks go fuck yourself. If you’re not then I think you’re naive. Anger is the main way change happens. You’ve just been raised on a society that got ravaged by Russian psy-ops that the elites encouraged to weaken the population. It can feel good to uplift others while simultaneously feeling fucking awful knowing that innocent people are suffering.
And just to be fucking clear, if you were making fun of me, please say it like a fucking man and not some fucking castrated male. If you were making fun of me you’re a low T faggot who’s not as smart as he thinks he is. There are 10 million Chinese people smarter than you.
To be clear, I only intend the last paragraph if you were being a bitch. If not then consider that it’s only addressed to a hypothetical cunty version of you.
Moderator warning: This is well outside the bounds of reasonable behavior on LW. I can tell you’re in a pretty intense emotional state, and I sympathize, but I think that’s clouding your judgment pretty badly. I’m not sure what it is you think you’re seeing in the grandparent comment, but whatever it is I don’t think it’s there. Do not try to write on LW while in that state.
I understand. I’ll try to keep it more civil.
Also, if it is true that a lot of people are confused by good and courageous people, I am unclear where the confusion comes from. Good behaviour gets rewarded from childhood, and bad behaviour gets punished. Not perfectly, of course, and in some places and times very imperfectly indeed, but being seen as a good person by your community’s definition of “good” has many social rewards, we’re social creatures… I am unclear where the mystery is.
Were the confused people raised by
wolvesnon-social animals?I don’t actually buy the premise that a lot of people are confused by moral courage, on reflection.
This doesn’t match my experience of what good people are generally like. I find them to be often happy to do what they are doing, rather than extremely afraid of not doing it, as I imagine would be the case if their reasons for behaving as they do were related to avoidance of pain.
There are of course exceptions. But if thinking I had done the wrong thing was extremely painful to me, literally “1000x more than any physical pain” I predict I’d quite possibly land on the strategy “avoid thinking about matters of right and wrong, so as to reliably avoid finding out I’d done wrong.” A nihilistic worldview where nothing was right or wrong and everything I might do is fine, would be quite appealing. Also, since one can’t change the past, any discovery that I’d done something wrong in the past would be an unfixable, permanent source of extreme pain for the rest of my life. In that situation, I’d probably rationalize the past behaviour as somehow being good, actually, in order to make the pain stop… which does not pattern-match to being a good person long term, but rather the opposite, being someone who is pathologically unable to admit fault, and has a large bag of tricks to avoid blame.
It’s not fear. It’s anger. Also good people are rare. The people you think of as good people are likely just friendly.
How rare good people are depends heavily on how high your bar for qualifying as a good person is. Many forms of good-person behaviour are common, some are rare. A person who has never done anything they later felt guilty about (who has a functioning conscience) is exceedingly rare. In my personal experience, I have found people to vary on a spectrum from “kind of bad and selfish quite often, but feels bad about it when they think about it and is good to people sometimes” to “consistently good, altruistic and honest, but not perfect, may still let you down on occasion”, with rare exceptions falling outside this range.
How far along are the development of autonomous underwater drones in America? I’ve read statements by American military officials about wanting to turn the Taiwan straight into a drone-infested death trap. And I read someone (not an expert) who said that China is racing against time to try and invade before autonomous underwater drones take off. Is that true? Are they on track?
I’m weighing my career options, and the two issues that seem most important to me are factory farming and preventing misuse/s-risks from AI. Working for a lab-grown meat startup seems like a very high-impact line of work that could also be technically interesting. I think I would enjoy that career a lot.
However, I believe that S-risks from human misuse of AI and neuroscience introduce scenarios that dwarf factory-farming in awfulness. I think that there are lots of incredibly intelligent people working on figuring out how to align AIs to who/what we want. But I don’t think there’s nearly the same amount of effort being made towards the coordination problem/preventing misuse. So naturally, I’d really like to work on solving this, but I just don’t even know how I’d start tackling this problem. It seems much harder and much less straightforward than “help make lab-grown meat cheap enough to end factory farming.” So, any advice would be appreciated.
What are your skill sets?
Forethought has done work recently related to preventing S-risk arising from AI.
I’m pretty in favor of trying to tackle the most important cause area.
I am pretty good at math. At a T20 math program I was chosen for special mentorship and research opportunities over several people who made Top 500 on the Putnam due to me being deemed “more talented” (as nebulous as that phrase is, I was significantly faster in lectures than them and was able to digest graduate texts much quicker than them, I was also able to solve competition-style problems they couldn’t). My undergrad got interrupted by a health crisis so I never got a chance to actually engage in research or dedicated Putnam prep, but I believe most (maybe all if I’m being vain) of my professors would have considered me the brightest student in my year. I don’t know a lot about programming or ML at this point, but I am confident I could learn. I’m two years into my undergrad and will likely be returning next year.
My default drive-by recommendation is that you try to get involved in research related to these issues. You could try to get advice from Chi Nguyen, who works on s-risk and is friendly and thoughtful; you can contact her here.
Thank you so much! I will contact her.
I got into reading about near death experiences and it seems a common theme is that we’re all one. Like each and every one of us is really just part of some omniscient god that’s so omniscient and great that god isn’t even a good enough name for it: experiencing what it’s like to be small. Sure, why not. That’s sort of intuitive to me. Given that I can’t verify the universe exists and can only verify my experience it doesn’t seem that crazy to say experience is fundamental.
But if that’s the case then I’m just left with an overwhelming sense of why. Why make a universe with three spatial dimensions? Why make yourself experience suffering? Why make yourself experience hate? Why filter your consciousness through a talking chimpanzee? If I’m an omniscient entity why would I choose this? Surely there’s got to be infinitely more interesting things to do. If we’re all god then surely we’d never get bored just doing god things.
So you can take the obvious answer that everything exists. But then you’re left with other questions. Why are we in a universe that makes sense? Why don’t we live in a cartoon operating on cartoon logic? Does that mean there’s a sentient SpongeBob? And then there’s the more pressing concern of astronomical suffering. Are there universes where people are experiencing hyperpain? Surely god wouldn’t want to experience I Have No Mouth and I Must Scream. It doesn’t seem likely to me that there are sentiences living in cartoons, so I’ll use that to take the psychologically comforting position that not everything we can imagine exists.
But if that’s the case then why this? Why this universe? Why this amount of suffering? If there’s a no-go zone of experience where is it? I have so many questions and I don’t know where the answers are.