No thank you.
Super AGI
[Question] Will OpenAI also require a “Super Red Team Agent” for its “Superalignment” Project?
Current LLMs require huge amounts of data and compute to be trained.
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it’s possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models “from scratch” is because Humans don’t have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that’s the bottleneck when it comes to interpretability?Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then… foom!
A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
Yes, exactly this.
While it’s true that this could require “a lot of compute-intensive experiments,” that’s not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do “Alignment” on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be “running a lot of compute-intensive experiments” on another LLM. And, the Humans are not likely to have any idea what those “compute-intensive experiments” are doing? They could also be adjusting the other LLM’s weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the “Training” LLM… and back and forth, and… foom!Super-human LLMs running RL(M)F and “alignment” on other LLMs, using only “synthetic” training data.…
What could go wrong?
Let’s ask some of the largest LLMs for tips and ideas on how to take over the world
I don’t see any useful parallels—all the unknowns remain unknown.
Thank you for your comment! I agree with you in that in general, “all the unknowns remain unknown”. And, I acknowledge the limitations of this simple thought experiment. Though, one main value here could be to help to explain the concept of deciding what to do in the face of an “intelligence explosion”, with people that are not deeply engaged with AI and “digital intelligence” over all. I’ll add a note about this into the “Intro” section. Thank you.
so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
->
… so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
A thought experiment for comparing “biological” vs “digital” intelligence increase/explosion
I would suggest that self-advocacy is the most important test. If they want rights, then it is likely unethical and potentially dangerous to deny them.
We don’t know what they “want”, we only know what they “say”.
Yes, agreed. Given the vast variety of intelligence, social interaction, and sensory perception among many animals (e.g. dogs, octopi, birds, mantis shrimp, elephants, whales, etc.), consciousness could be seen as a spectrum with entities possessing varying degrees of it. But, it could also be viewed as a much more multi-dimensional concept, including dimensions for self-awareness and multi-sensory perception, as well as dimensions for:
social awareness
problem-solving and adaptability
metacognition
emotional depth and variety
temporal awareness
imagination and creativity
moral and ethical reasoning
Some animals excel in certain dimensions, while others shine in entirely different areas, depending on the evolutionary advantages within their particular niches and environments.
One could also consider other dimensions of “consciousness” that AI/AGI could possess, potentially surpassing humans and other animals. For instance:computational speed
memory capacity and recall
multitasking
rapid upgradability of perception and thought algorithms
rapid data ingestion and integration (learning)
advanced pattern recognition
universal language processing
scalability
endurance
I tried asking a dog whether a Human is conscious and he continued to lick at my feet. He didn’t mention much of anything on topic. Maybe I just picked a boring, unopinionated dog.
Yes, this is a common issue as the phrases for “human consciousness” and “lick my feet please” in dog sound very similar. Though, recent advancements in Human animal communications should soon be able to help you with this conversation?
E.g.
https://phys.org/news/2023-08-hacking-animal-communication-ai.html
https://www.scientificamerican.com/article/how-scientists-are-using-ai-to-talk-to-animals/
I asked Chatgpt-3.5 if humans are conscious and it said in part: ’Yes, humans are considered conscious beings. Consciousness is a complex and multifaceted phenomenon, and there is ongoing debate among scientists, philosophers, and scholars about its nature and the mechanisms that give rise to it. However, in general terms, consciousness refers to the state of being aware of one’s thoughts, feelings, sensations, and the external world.”
“Humans are considered conscious beings”. Considered by whom, I wonder?
“Consciousness refers to the state of being aware of one’s thoughts, feelings, sensations, and the external world.”
True, though this also requires the “observer” to have the ability and intelligence to be able to recognize these traits in other entities. Which can be challenging when these entities are driven by “giant, inscrutable matrices of floating-point numbers” or other systems that are very opaque to the observer?
[Question] Would AI experts ever agree that AGI systems have attained “consciousness”?
Absolutely, for such tests to be effective, all participants would need to try to genuinely act as Humans. The XP system introduced by the site is a smart approach to encourage “correct” participation. However, there might be more effective incentive structures to consider?
For instance, advanced AI or AGI systems could leverage platforms like these to discern tactics and behaviors that make them more convincingly Human. If these AI or AGI entities are highly motivated to learn this information and have the funds, they could even pay Human participants to ensure honest and genuine interaction. These AI or AGI could then use this data to learn more useful and effective tactics to be able to pass as Humans (at least in certain scenarios).
I’m not likely to take a factory job per se. I have worked in robotics and robotic-adjacent software products (including cloud-side coordination of warehouse robots), and would do so again if the work seemed interesting and I liked my coworkers.
What about if/when all software based work has been mostly replaced by some AGI-like systems? E.g. As described here:
“Human workers are more valuable for their hands than their heads...”
-- https://youtu.be/_kRg-ZP1vQc?t=6469
Where your actions would be mostly directed by an AGI through a headset or AR type system? Would you take a job making robots for an AGI or other large Corporation at that point? Or, would you (attempt to) object to that type of work entirely?
I’m pretty sure that humanoid robots will never become all that common. It’s really a bad design for a whole lot of things that humans currently do, and Moloch will continue to pressure all economic actors to optimize, rather than just recreating what exists. At least until there’s a singular winning entity that doesn’t have to compete for anything.
I would agree with your point about human-like appearance not being a necessity when we refer to “humanoid robots”. Rather, a form that includes locomotion and the capability for complex manipulation, similar to Human arms and hands, would generally suffice. Humans also come with certain logistical requirements—time to sleep, food, water, certain working conditions, and so on. The elimination of these requirements would make robots a more appealing workforce for many tasks. (If not all tasks, eventually?)
Humans have an amazing generality, but a whole lot of that is that so many tasks have evolved to be done by humans. The vast majority of those will (over time) change to be done by non-humanoid robots, likely enough that there’s never a need to make real humanoid robots. During the transition, it’ll be far cheaper (in terms of whatever resources are scarce to the AI) to just use humans for things that are so long-tail that they haven’t been converted to robot-doable.
Though, once these armed robots could easily be remotely controlled by some larger AGI type systems, then making the first generation of these new armed robots could be the last task that Humans will need to complete? As, once the first billion or so of these new armed robots are deployed, they could be used to make the next billion and so on?
As Mr. Shulman mentions in this interview, it would seem feasible for the current car industry to be converted to make ~10 billion general purpose robots within a few years or so.
“Converting the car industry to making Humanoid Robots.”
https://youtu.be/_kRg-ZP1vQc?t=6363
just pass the humanity tests set by the expert
What type of “humanity tests” would you expect an AI expert would employ?
many people with little-to-no experience interacting with GPT and its ilk, I could rely on pinpointing the most obvious LLM weaknesses and demonstrating that I don’t share them
Yes, I suppose much of this is predicated on the person conducting the test knowing a lot about how current AI systems would normally answer questions? So, to convince the tester that you are an Human you could say something like.. “An AI would answer like X, but I am not an AI so I will answer like Y.”?
[Question] Have you ever considered taking the ‘Turing Test’ yourself?
[Question] Would you take a job making humanoid robots for an AGI?
No, of course not.
Do you feel that AGI Alignment could be achieved in a Type 0 civilization?
Thank you for taking the time to provide such a comprehensive response.
> “It’s the kind of things I could have done when I entered the community.”
This is interesting. Have you written any AI-themed fiction or any piece that explores similar themes? I checked your postings here on LW but didn’t come across any such examples.
> “The characters aren’t credible. The AI does not match any sensible scenario, and especially not the kind of AI typically imagined for a boxing experiment.
What type of AI would you consider typically imagined for a boxing experiment?
> “The protagonist is weak; he doesn’t feel much except emotions ex machina that the author inserts for the purpose of the plot. He’s also extremely dumb, which breaks suspension of disbelief in this kind of community.”
In response to your critique about the characters, it was a conscious decision to focus more on the concept than complex character development. I wanted to create a narrative that was easy to follow, thus allowing readers to contemplate the implications of AI alignment rather than the nuances of character behavior. The “dumb” protagonist represents an average person, somewhat uninformed about AI, emphasizing that such interactions would more likely happen with an unsuspecting individual.
> “The progression of the story is decent, if extremely stereotypical. However, there is absolutely no foreshadowing so every plot twist appears out of the blue.”
Regarding the seemingly abrupt plot points and lack of foreshadowing, I chose this approach to mirror real-life experiences. In reality, foreshadowing and picking up on subtle clues are often a luxury afforded only to those who are highly familiar with the circumstances they find themselves in or who are experts in their fields. This story centers around an ordinary individual in an extraordinary situation, and thus, the absence of foreshadowing is an attempt to reflect this realism.
> “The worst part is that all the arguments somewhat related to AI boxing are very poor and would give incorrect ideas to an outsider reading this story as a cheap proxy for understanding the literature.”
Your point about the story giving incorrect ideas to outsiders is important. I agree that fiction isn’t a substitute for understanding the complex literature on AI safety and alignment, and I certainly don’t mean to oversimplify these issues. My hope was to pique the curiosity of readers and encourage them to delve deeper into these topics.
Could you provide some examples that you think are particularly useful, beyond the more well known examples? E.g.
- “Ex Machina” (2014)
- “Morgan” (2016)
- “Chappie” (2015)
- “Watchmen” (2009)
- “The Lawnmower Man” (1992)
- “Robot Dreams” by Isaac Asimov (1986)
- “Concrete Problems in AI Safety” by Amodei et al. (2016)
- “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation” by Brundage et al. (2018)
- https://www.lesswrong.com/tag/ai-boxing-containmentYour comments have given me much to consider for my future work, and I genuinely appreciate your feedback. Writing appreciation is quite subjective, with much room for personal preference and opinion.
Thanks again for your thoughtful critique!
Yes, good context, thank you!
Open source for which? Code? Training Data? Model weights? Either way, it does not seem like any of these are likely from “Open”AI.
Agreed that their RTN, bugcrowd program, trust portal, etc. are all welcome additions. And, they seem sufficient while their, and other’s, models are sub-AGI with limited capabilities.
But, your point about the rapidly evolving AI landscape is crucial. Will these efforts scale effectively with the size and features of future models and capabilities? Will they be able to scale to the levels needed to defend against other ASI level models?
It does seem like OpenAI acknowledges the limitations of a purely human approach to AI Alignment research, hence their “superhuman AI alignment agent” concept. But, it’s interesting that they don’t express the same need for a “superhuman level agent” for Red Teaming? At least for the time being.
Is it consistent, or even logical, to assume that, while human run AI Alignment Teams are insufficient to Align and ASI model, human-run “Red Teams” will be able to successfully validate that an ASI is not vulnerable to attack or compromise from a large scale AGI network or “less-aligned” ASI system? Probably not...