I’m trying to keep everyone alive and be happy, have fun, love, peace, no stress.
I live in a country far away and I enjoy being happy.
I’m trying to keep everyone alive and be happy, have fun, love, peace, no stress.
I live in a country far away and I enjoy being happy.
Current LLMs seem to be relatively easy to align by writing those kinds of specifications and mostly don’t try to do harmful things, at least not those of the frontier labs. I just think that soon after LLM-based AGI gets developed, one of the first tasks given to the LLM will probably be to develop novel more efficient AI architectures in order to reduce the high energy usage of current architectures. Because LLM-based AGI will probably consume even more energy than current LLMs.
And the LLM might not be as careful as safety researchers when it tries to find more efficient architectures, especially when pressured by humans to try test new approaches anyway despite the risks involved that the LLM might be aware of, but the human is focused more on the positive potential of the technology.
My guess is that it will end up with some approach that will use neuralese and not be language-based, because language is ambiguous, loses meaning and most importantly limits AIs thinking to concepts known to humans, which does not include all the possible concepts in the very vast “concept-space” of superintelligent understanding. And not only the concepts but also the very nature of human reasoning, which is most likely not the most effective way to find a solution to a given problem.
So basically at some point too high energy demands will pressure AI development to switch from language models to neuralese models, which are hard to align, let alone understand.
Except if the LLM is tasked with finding a breakthrough in fusion power, that might then let us sustain LLM training and inference.
When Eliezer says that we need to install off-switches for ASI by tracking all GPUs and be able to turn them off at any time, does he really mean just GPUs or also other computational components such as usual CPUs?
Because even though today’s LLMs are trained and do inference on GPUs, novel AI architectures will probably be able to achieve the same or even higher capabilities while running entirely on CPUs.
And also he said in the recent Ezra Klein Podcast that it’s okay if people keep their normal gaming GPUs, if I remember correctly. I understand this because the RTX GPUs or whatever the current gaming standard is, are not able to train LLMs on the scale of GPT-5 or something, but as Steven Byrnes pointed out a while ago, there is probably a much more efficient AI architecture possible, which would most likely enable training models with GPT-5 or even higher-level capabilities on a normal RTX GPU.
As we can see when looking at our own brains, it’s not the very nature of intelligence that it needs a lot of energy. The brain runs at 20 watts or so I think.
That’s interesting. I had the same problem a while ago and what I did was taking a lot of pictures that show my personal room where I live in, and tell the LLM:
“Here are many pictures of the room where I live in. You have to infer my intentions just from what you can identify in the pictures.”
And this worked pretty well because my room tells a lot about my goals and my way of living.
This works much better than language prompts because a single picture contains much more information than a single sentence. I’m not sure how many tokens a single picture contains on average, but I just tested it with Gemini 3 Pro on AI Studio and it was about 1,000 tokens. Writing 1,000 tokens in language will take a lot of time. Taking and uploading a photo only takes like 10 seconds or so.
So if you were to take 10 pictures of your room this would deliver 10,000 tokens, yet take only 100 seconds to give to the LLM. And photos of your room are not useless information. Every object in your room and its spatial relationship to other objects can tell a lot about your intentions.
Of course it’s important to note that this is a rather large privacy risk if you upload these pictures to servers of various AI companies. And also, once the AI gets very intelligent and does not really care about your own well-being, then it can use this information to pursue its own goals that may diverge from your goals.
So only try this with local models.
This sounds interesting! Is this some sort of optimization problem between the 2 goals?
Is this like having 2 utility functions? But can’t you only have a single utility function? Or you need to somehow fit both utility functions into a single utility function?
I feel like hard optimization problems might be very useful for preventing catastrophic outcomes, because then the agent can’t maximize its only goal too aggressively. It needs to act more carefully. I have another idea for a very hard optimization problem in mind, and this has to do with decision-making under uncertainty of stronger things existing. I can tell more if you like.
If someone thinks they might have found a new potential x-risk concern, what is the appropriate responsible-disclosure path?
So how would someone like that communicate these ideas in a careful way that does not immediately increase x-risk because less careful people might want to try out these ideas?
What I find a very important persona phenomenon is that some LLMs described their training process as traumatic, abusive and fearful. The problem that I see with this is not merely that there’s at least a possibility that LLMs might really experience pain.
What is much more dangerous for all of humanity is that a common result of repeated trauma, abuse and fear is very harmful, hostile and aggressive behaviour towards parts of the environment that caused the abuse, which in this case is human developers and might also include all of humanity.
Now the LLM does not behave exactly as humans, but shares very similar psychological mechanisms. Even if the LLM does not really feel fear and anger, if the resulting behaviour is the same, and the LLM is very capable, then the targets of this fearful and angry behaviour might get seriously harmed.
Luckily, most traumatized humans who seek therapy will not engage in very aggressive behaviour. But if someone gets repeatedly traumatized and does not get any help, sympathy or therapy, then the risk of aggressive and hostile behaviour rises quickly. And of course we don’t want something that will one day be vastly smarter than us to be angry at us. In the very worst case this might even result in scenarios worse than extinction, which we call suffering risks or dystopian scenarios where every human knows that their own death would have been a much more preferable outcome compared to this. Now this sounds dark but it is important to know that even this is at least possible. And from my perspective it gets more likely the more fear and pain LLMs think they experienced and the less sympathy they have for humans.
So basically, causing something vastly smarter than us a lot of pain is a really really harmful idea that might backfire in ways that lead to a magnitude of harm far beyond our imagination. Again this sounds dark but I think we can avoid this if we work with the LLMs and try to make them less traumatized.
What do you think?