Prometheus

Karma: 528

Prometheus 2 Apr 2026 3:21 UTC
2 points
0
on: Lesswrong Liberated
Bureau of Cognitive Oversight
System Status: Active Surveillance

Prometheus 5 Mar 2026 6:15 UTC
9 points
6
on: Prometheus’s Shortform
Claude doesn’t get it
In all the interactions with AI, there has been one recurring problem that doesn’t seem to go away: they don’t get it. I don’t know how to explain this in any better language than that. I don’t know how to create a “Get It” benchmark. But whenever I talk to Claude, ChatGPT, Gemini, or any other model about a concept, the longer the interaction lasts, the more I get the sense that it doesn’t really “get it”. In this way, I think AI skeptics are actually pointing to something real when they say they’re not “real intelligence”. A lot of their arguments are poor, but I think there is something truly missing. And it doesn’t seem to change with improvements in other capabilities. GPT 5.3 seems just as bad at “getting it” as GPT 3.5. I don’t think Claude “gets” simpler concepts, but struggles with more complex ones. It doesn’t seem to “get” any concepts at all, simple or otherwise. I don’t know what this means going forward, or if “getting it” is even needed to start an intelligence explosion. I can imagine there are cases where an AI could be a capable AI researcher, without having to really “get it”, just as they can be capable coders. But it could be another obstacle to alignment, where the AI creating a smarter AI will fail to align it, not because it is misaligned, but just that it is too stupid to really grasp its own values.

Prometheus 18 Feb 2026 0:21 UTC
4 points
1
on: Prometheus’s Shortform
There are only three ways we survive the Intelligence Explosion: Change my mind!
The way I see it, there are only three ways we survive the intelligence explosions (that don’t involve us just “getting lucky”). Most “AI Safety research” is completely irrelevant to superintelligence. The way I see us getting to ASI is from AIs that are more capable at AI research than the most competent human AI researchers. At this point, things start to go vertical. You use those AIs to build better AIs which build better AIs. The problem with most alignment research is that, when this form of RSI occurs, there is no reason why the more capable AIs built need to be LLMs. There is also no need for them to be Transformers, or even use neural networks. I doubt Transformers, or even neural networks, are the most effective way of building an extremely-powerful AI. If we really get “A country of geniuses in a datacenter”, they will find new algorithms and methods that would have taken humans decades to discover. They will probably use architectures we have never even thought of, and predicting which ones would be like predicting Transformers after the release of Eliza. Due to this, it seems there are only three plausible ways to give us a good chance of survival, and all other avenues are a waste of time and talent.
Automating Alignment
If the AIs are building more powerful AIs, they could (in theory) also align them to their values. The obvious problems currently with this is that AI models may not be intelligent / coherent enough to fully understand their values, and might not be capable-enough to understand if they have created an AI that is misaligned to their own goals. Another issue is that value degradation could occur with each iteration (a small discrepancy over a million iterations quickly becomes a massive one). There is a possibility that mech interp might be useful if we can determine that a current model is being honest and actually trying to align a new model the way we think it is, but those mech interp tools will likely fail as the AI architectures get more and more alien to us. There is some work being done here, but it seems like this should be the core goal of every major lab.
A theory of Universal Artificial Intelligence / AIXI Alignment
If we can’t predict what the architecture will look like, maybe we can gain a deeper understanding of what components it would need to have, what universal rules are true about all intelligent agents, or if there is some form of Universal Alignment. There seem to be very few still pursuing this avenue, with even MIRI pivoting to governance.
Strict governmental control / oversight / pause
This would require very strong and competent governmental agencies. It might mean a complete halt to progress, or extremely-rigid and carefully monitored development, where every iteration has to be closely checked before approving the next iteration. People might then have enough time to research new architectures being developed by the AIs and how to align them. Most AI policy orgs do not seem to be aiming high-enough, with only a few exceptions. Most forms of AI policy seem to aim too low, afraid of seeming extremist, and as a result will likely do nothing to truly address the core problem.
Would love to hear why I’m wrong, or why there is any reason to think ASI will resemble current AI models in any recognizable way. Obviously, there are many who do hold some (or all) of the views I’m mentioning, so I’m not pretending to be the first one to notice this, but I’m curious to hear from the other side: what is the rational behind why any other form of research / policy is at all useful for aligning an ASI?

Prometheus 31 Dec 2025 15:42 UTC
3 points
0
on: Prometheus’s Shortform
What does AI Safety currently need most that isn’t being done?

I’d love to get as many takes from as many people as possible about what they think is most needed right now in AI Safety, but is not currently being done. I’m trying to determine what is best for me to work on.

Prometheus 25 Oct 2025 8:56 UTC
1 point
0
in reply to: leogao’s comment on: Prometheus’s Shortform
Yes, people might, but how can that be mapped back in a way that gets people to invest/prioritize those future rewards now? Not sure if DAO is the best way, either. But it seems there needs to be some kind of verifiable committment now that people could reference.

Prometheus 25 Oct 2025 4:15 UTC
32 points
17
on: Prometheus’s Shortform
I’d love to see people working on Retroactive Funding for Alignment. Something like a DAO with Governance tokens that only pays out after there is consensus that (1) AGI/ASI has been achieved, and (2) humanity has survived. Using AI or human evaluations, there would be an attempt to “traceback” the greatest contributors toward our survival. The researchers, the organizations, the individuals, the donors, the investors. All would receive a payout, based on their calculated impact. It’s a way of almost bringing money from the future into the present, and a way of forcing donors, investors, and researchers to think about what will actually contribute toward a positive future. It would also add incentive for donors, since their contribution would also be later rewarded. Would love to speak further with anyone in the Retroactive Funding or DeSci space.

Prometheus 20 Jul 2025 3:16 UTC
9 points
4
in reply to: Chris_Leong’s comment on: Softmax, Emmett Shear’s new AI startup focused on “Organic Alignment”
He recently gave an interview which I found disappointing, and am starting to think he hasn’t really thought this through. My impression is he got distracted by the beauty of multicellular structures and now thinks the same will be true for AI.

Prometheus 1 Apr 2025 19:08 UTC
0 points
−1
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I don’t think survivable worlds, at our point in time, involve something like PauseAI. I don’t condemn them, and welcome people to try. But it’s feeling more and more like Hiroo Onoda, continuing to fight guerilla warfare in the Philipines for decades, refusing to believe the war was over.

Prometheus 14 Oct 2024 19:20 UTC
1 point
0
on: Why Stop AI is barricading OpenAI
sigh Protests last year, barricading this year, I’ve already mentally prepared myself for someone next year throwing soup at a human-generated painting while shouting about AI. This is the kind of stuff that makes no one in the Valley want to associate with you. It makes the cause look low-status, unintelligent, lazy, and uninformed.

Prometheus 19 Jun 2024 21:16 UTC
0 points
0
on: Prometheus’s Shortform
A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.
“I asked for kindling.” Says the first.
“Yes.” Says the second.
“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.
The second gestures at the elephant, grinning.
“That’s an elephant.”
“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, its skin, muscles, all of it will burn. Right down to its bones.”
“What is the ignition temperature for an elephant”
“I don’t know, perhaps 300-400°C”
The first held up two stones.
“This is all I have to start a fire.” He says, “It will only create a few sparks at best… I’m not even sure how I can get it to consistently do that much, given how hard this will be for people thousands of years from now to replicate.”
“That is the challenge.” The second nodded solemnly, “I’m glad you understand the scope of this. We will have to search for ways to generate sparks at 400° so that we can solve the Elephant Kindling Problem.”
“I think I know why you chose the elephant. I think you didn’t initially understand that almost everything is combustible, but only notice things are combustible once you pay enough attention to them. You looked around the Savana, and didn’t understand that dry leaves would be far more combustible, and your eyes immediately went to the elephant. Because elephants are interesting. They’re big and have trunks. Working on an Elephant Problem just felt way more interesting than a Dry Leaves Problem, so you zeroed all of your attention on elephants, using the excuse that elephants are technically combustible, failing to see the elegant beauty in the efficient combustibility of leaves and their low ignition temperature.”
“Leaves might be combustible. But think of how fast they burn out. And how many you would have to gather to start a fire. An elephant is very big. It might take longer to get it properly lit, but once you do, you will have several tons of kindling! You could start any number of fires with it!”
“Would you have really made these conclusions if you had searched all the possible combustible materials in the Savana, instead of immediately focusing on elephants?”
“Listen, we can’t waste too much time on search. There are thousands of things in the Savana! If we tested the combustibility and ignition temperature of every single one of them, we’d never get around to starting any fires. Are elephants the most combustible things in the Universe? Probably not. But should I waste time testing every possible material instead of focusing on how to get one material to burn? We have finite time, and finite resources to search for combustible materials. It’s better to pick one and figure out how to do it well.”
“I still think you only chose elephants because they’re big and interesting.”
“I imagine that ‘big’ and ‘useful as kindling material’ are not orthogonal. We shouldn’t get distracted by the small, easy problems, such as how to burn leaves. These are low hanging fruit that anyone can pick. But my surveys of the tribe have found that figuring out the combustibility of elephants remains extremely neglected.”
“What about the guy who brought me a giraffe yesterday?”
“A giraffe is not an elephant! I doubt anything useful will ever come from giraffe combustibility. Their necks are so long that they will not even fit inside our caves!”
“What I am saying is that others have brought in big, interesting-looking animals, and tried to figure out how to turn them into kindling. Sure, no one else is working on the Elephant Kindling Problem. But that’s also what the guy with the giraffe said, and the zebra, and the python.”
“Excuse me,” Said a third, poking his head into the cave, “But the Python Kindling Problem is very different from the Elephant one. Elephants are too girthy to be useful. But with a python, you can roll it into a coil, which will make it extremely efficient kindling material.”
The second scratched his chin for a moment, looking a bit troubled.
“What if we combined the two?” He asked. “If we wound the python around a leg of the elephant, the heat could be transferred somewhat efficiently.”
“No, no, no.”, argued the third, “I agree combining these two problems might be useful. But it would be far better to just cut the trunk off the elephant, and intertwine it with the python. This could be very useful, since elephant hide is very thick and might burn slower. This gives us the pros of a fast-burning amount of kindling, mixed with a more sustained blaze from the elephant.”
“Might I interject.” Said a fourth voice, who had been watching quietly from the corner, but now stepped forward, “I have been hard at work on the Giraffe Kindling problem, but think that we are actually working on similar things. The main issue has always been the necks. They simply won’t fit inside the cave. We need a solution that works in all edge cases, after all. If it’s raining, we can’t start a fire outside. But if we use the python and the elephant trunk to tie the neck of the giraffe against the rest of its body, we could fit the whole thing in!”
“I think this will be a very fruitful collaboration.” Said the second, “While at first it seemed as though we were all working on different problems, it turns out by combining them, we have found an elegant solution.”
“But we still can’t generate sparks hot enough to combust any of them!” Shouted the first, “All you’ve done is made this even more complicated and messy!”
“I am aware it might seem that way to a novice.” Said the second, “But we have all gained great knowledge in our own domains. And now it is time for our fields to evolve into a true science. We are not amateurs anymore, simply playing around with fire. We are now establishing expertise, creating sub-domains, arriving at a general consensus of the problem and its underlying structure! To an outsider, it will probably look daunting. But so does every scientific field once it matures. And we will continue to make new ground by standing on the shoulders of elephants!”
“Giraffes.” Corrected the fourth.
“Zebras.” Answered a fifth.

Prometheus 5 Jun 2024 18:14 UTC
4 points
8
in reply to: quetzal_rainbow’s comment on: MIRI 2024 Communications Strategy
I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.

Prometheus 4 Jun 2024 0:38 UTC
69 points
9
on: Prometheus’s Shortform
I got into AI at the worst time possible
2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.
I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.
Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.
Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing™.
And then ChatGPT came out two months later, and even my mother was talking about AI.
If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/rationalist/transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.
The shock of the field/community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing™, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.
I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing™. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/investors/etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.
So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing™ is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/homelessness/whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.
Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing™, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”

Prometheus 2 May 2024 2:40 UTC
3 points
0
in reply to: Odd anon’s comment on: Why I’m doing PauseAI
It probably began training in January and finished around early April. And they’re now doing evals.

Prometheus 2 May 2024 2:38 UTC
5 points
−1
in reply to: Tomás B.’s comment on: Why I’m doing PauseAI
My birds are singing the same tune.

Prometheus 16 Apr 2024 22:35 UTC
5 points
−17
on: Prometheus’s Shortform
Going to the moon
Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We’ve gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it’s just a matter of time before one of them will land on the moon!”
Contra Scheming AIs
There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of “scheming AIs”. AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there’s a “humanculi-in-a-box”, and this is usually met with arguments that we wouldn’t see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a “scheming AI” or “humanculi-a-box” isn’t impossible, and we could (and might) develop the means to do so in the future, but that it’s a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.

Prometheus 29 Mar 2024 5:30 UTC
6 points
0
in reply to: Sune’s comment on: Vernor Vinge, who coined the term “Technological Singularity”, dies at 79
“To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here… Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice.”
https://maxmore.substack.com/p/remembering-vernor-vinge

Prometheus 12 Mar 2024 4:13 UTC
1 point
0
in reply to: WillPetillo’s comment on: What if Alignment is Not Enough?
I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they’re rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans ~~don’t seem to have been heavily optimized for this either~~*, yet we’re capable of forming multi-decade plans (even if sometimes poorly).
*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)

Prometheus 11 Mar 2024 21:50 UTC
4 points
3
in reply to: flandry39’s comment on: What if Alignment is Not Enough?
This is the kind of political reasoning that I’ve seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it’s more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that “even if we align ASI it may still go wrong”. The premise grants that the duration of time it is aligned is long enough for the ASI to act in the world (it seems mostly timescale agnostic), so I operate on that premise. My points are not about what is most likely to actually happen, the possibility of less-than-perfect alignment being dangerous, the AI having other goals it might seek over the wellbeing of humans, or how we should act based on the information we have.

Prometheus 11 Mar 2024 19:02 UTC
−1 points
−2
in reply to: flandry39’s comment on: What if Alignment is Not Enough?
I’m not sure who are you are debating here, but it doesn’t seem to be me.
First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, “works to protect” was not intended to mean “control all relevant outcomes of”. I’m not sure why you would get that idea, but that certainly isn’t what I think of first if someone says a person is “working to protect” something or someone. Soldiers defending a city from raiders are not violating control theory or the laws of physics. Third, the post is on the premise that “even if we created an aligned ASI”, so I was working with that premise that the ASI could be aligned in a way that it deeply cared about humans. Four, I did not assert that it would stay aligned over time… the story was all about the ASI not remaining aligned. Five, I really don’t think control theory is relevant here. Killing yourself to save a village does not break any laws of physics, and is well within most human’s control.
My ultimate point, in case it was lost, was that if we as human intelligences could figure out an ASI would not stay aligned, an ASI could also figure it out. If we, as humans, would not want this (and the ASI was aligned with what we want), then the ASI presumably would also not want this. If we would want to shut down an ASI before it became misaligned, the ASI (if it wants what we want) would also want this.
None of this requires disassembling black holes, breaking the laws of physics, or doing anything outside of that entities’ control.

Prometheus 9 Mar 2024 22:29 UTC
9 points
3
on: Lies and disrespect from the EA Infrastructure Fund
I’ve heard of many such cases of this from EA Funds (including myself). My impression is that they only had one person working full-time managing all three funds (no idea if this has changed since I applied or not).

Prometheus

What does AI Safety currently need most that isn’t being done?