They all hope they’ll have enough time to discuss possible plans with very smart AI systems which are coming.
Ilya has been very explicit about it, but all AI lab leaders must be hoping for that…
They all hope they’ll have enough time to discuss possible plans with very smart AI systems which are coming.
Ilya has been very explicit about it, but all AI lab leaders must be hoping for that…
If our experience of qualia reflect some poorly understood phenomenon in physics, it could be part of a cluster of related phenomena, not all of which manifest in human cognition.
Right.
We don’t have as precise an understanding of qualia as we do of electrons
It’s a big understatement; we are still at a “pre-Galilean stage” in that “field of science”. I do hope this will change sooner rather later, but the current state of our understanding of qualia is dismal.
the things we have said about what we mean when we say “qualia” might not be sufficient to determine whether said phenomenon counts as qualia or not.
Oh, yes, we are absolutely not ready to tackle this. This does not mean that the question is unimportant, but it does mean that to the extent the question is important, we are in a really bad situation.
My hope is that the need to figure out “AI subjectivity” would push us to try to move faster on understanding the nature of qualia, understanding the space of possible qualia, and all other related questions.
I am a Camp 2 “qualia realist” (so I don’t think it’s “non-physical”, I think this is “undiscovered physics”, although it is possible that we need to add new primitives to our overall “picture of the world”, just like electric charge or mass are primitives; I don’t think we can be sure we have discovered all primitives already; it might be or not be the case).
But when Camp 2 people talk about whether AIs are conscious or not, they mean the question whether they are “sentient”, i.e. whether there is presence of “qualia”, of “subjective reality”, without implying a particular nature of that reality. (Conditionally on saying “yes”, one would like to also figure out “what it is like to be a computational process running an LLM inference”, another typical Camp 2 question.)
Now, there is also a functional question (is their cognition similar to human cognition), and this is more-or-less Camp 1/Camp 2 neutral, and in this sense one could make further improvements to the model architecture, but they are pretty similar to people in many respects already, so it’s not surprising that they behave similarly. That’s not a “hard problem”, they do behave more or less as if they are already conscious, because their architecture is already pretty similar to ours (hierarchy of attention processes and all that). But that’s orthogonal to our Camp 2 concerns.
One of the main complications of conversations about consciousness is that people seem to be stratified into two camps:
Many conversations about consciousness make sense for only one of those camps.
I would hazard a guess that you belong to Camp 1 in this classification:
not because it is ontologically fundamental
The newly commercial[1] lmarena still has not posted the scores for the new R1.
One starts to wonder if they are deliberately throttling the rates at which it is sampled for their 1-to-1 competitions.
(With all the problems with lmarena[2], it would be not a bad way to compare it, first of all, versus the old R1 and the new V3.)
See e.g. “The Leaderboard Illusion”, https://arxiv.org/abs/2504.20879
Sam Altman talks to Jack Kornfield and Soren Gordhamer, https://www.youtube.com/watch?v=ZHz4gpX5Ggc
The video is new, but the talk seems to have happened in April 2023, so adjust accordingly [...]
Anyway, here is a GPT-4o brief summary of the YouTube transcript: https://chatgpt.com/share/684243bb-dbe8-8010-8dbd-6e595c00ef94
METR CEO Beth Barnes on 80,000 Hours at https://www.youtube.com/watch?v=jXtk68Kzmms
One thing she is discussing at the 03:36:51 mark is “What METR isn’t doing that other people have to step up and do”.
In this sense, METR’s “Measuring AI Ability to Complete Long Tasks” is not done often enough.
They have done that evaluation within “Details about METR’s preliminary evaluation of o3 and o4-mini”, https://metr.github.io/autonomy-evals-guide/openai-o3-report/ on April 16 (an eternity these days) and the results are consistent with the beginning of a super-exponential trend. But there has been no evaluations like that since, and we have had Claude 4 and remarkable Gemini-2.5-Pro updates, and it seems crucial to understand what is going on with that potentially super-exponential trend and how rapidly the doubling period seems to shrink (if it actually shrinks).
So, perhaps, we should start asking who could help in this sense...
Yes, hopefully the authors will fix it in the post.
Meanwhile, the correct link seems to be https://www.lesswrong.com/posts/nuDJNyG5XLQjtvaeg/is-alignment-reducible-to-becoming-more-coherent
I suppose it’s better to at least know you need a plan and think to build a bunker, even if you don’t realize that the bunker will do you absolutely no good against the AGI itself, versus not even realizing you need a plan. And the bunker does potentially help against some other threats, especially in a brief early window?
I think Ilya realizes very clearly that the bunker is not against the AGI itself, but only against the turmoil of the “transition period”. He seems to be quite explicit in the quoted article ‘We’re Definitely Going to Build a Bunker Before We Release AGI’: The true story behind the chaos at OpenAI (emphasis mine):
“We’re definitely going to build a bunker before we release AGI,” Sutskever replied. Such a powerful technology would surely become an object of intense desire for governments globally. The core scientists working on the technology would need to be protected. “Of course,” he added, “it’s going to be optional whether you want to get into the bunker.”
They are not going to “disappear into their own universe”.
But if they want to survive, they’ll need to establish reasonable society which controls dangerous technologies and establish some degree of harmony between its members. Otherwise they will obliterate themselves together with our Solar System.
So, a good chunk of the problem of solving existential safety will be worked on by entities smarter than humans and better equipped to solve it than humans.
The open question is whether we’ll be included into their “circle of care”.
I think there is a good chance of that, but it depends a lot on how the ASI society will be structured and what would be its values.
We all tend to think about the ASI society structured into well-defined individuals which have long persistence. If we assume that the structure is indeed mostly individual-based (like almost all existential risk discourse assumes), then there are several realistic paths for all kinds of individuals, humans and non-humans, to be included into the “circle of care”.
One problem is that this assumption of the ASI society being mostly structured as well-defined persistent individuals with long-term interests is questionable, and without that assumption we just don’t know how to reason about this whole situation.
Ah, yes, you are right. And it’s actually quite discouraging that
Gemini 2.5 Pro loses coherence at 35k with my prompts
because I thought that it was Gemini 2.5 Pro which was supposed to be the model which had finally mostly fixed the recall problems in the long context (if I remember correctly).
So you seem to be saying that this recall depends much stronger on the nature of the input that one would infer from just briefly looking at published long-context benchmarks… That’s useful to keep in mind.
I think for long-term coherence one typically needs specialized scaffolding.
Here is an example: https://www.lesswrong.com/posts/7FjgMLbqS6Z6yYKau/recurrentgpt-a-loom-type-tool-with-a-twist
Basically, one wants to accumulate some kind of “state of the virtual world in question” as a memory while the story unfolds. Although, I can imagine that if the models start having “true long context” (e.g. long context without recall deterioration), and if that context is long enough to include the whole story, this might become unnecessary. So one might want to watch for emergence of those models (I think we are finally starting to see some tangible progress in this sense).
Yes, any “neuromorphic formalism” would do (basically, one considers stream-oriented functional programs, and one asks for streams in question to admit linear combinations of streams, and the programs end up being fairly compact high-end neural machines with small number of weights).
I can point you to a version I’ve done, but when people translate small specialized programs into small custom-synthesized Transformers, that’s in the same spirit. Or when people craft some compact neural cellular automata with small number of parameters, it is also in that spirit.
Basically, as long as programs themselves end up being expressible as some kind of sparse connectivity tensors, you can consider their linear combinations and series.
On one hand, it does not have to be a black box. If we consider one of the programming formalisms allowing to take linear combination of programs, one can consider decomposition of programs into series, and one can retain quite a bit of structure this way.
I think, capability-wise something close to “glass boxes” (in terms of structure, but not necessarily behavior) can be done.
But implications for safety are uncertain. On one hand, even very simple dynamic systems can have complex and unpredictable behavior. And even more so, when we add self-modification to the mix. So the structure can be transparent, but this might not always translate into transparency of behavior.
And then, the ability to understand these systems is a double-edged sword (it is a strong capability booster, and makes it much easier to improve those AI systems).
When people discuss the AI-2027 study on LessWrong, we mostly see the arguments from the conservative side, that timelines are too aggressive, that the progress will be less explosive.
We don’t see much arguments from the radical side saying that the AI-2027 study is too conservative. Your discussion of the likely dynamics of “intelligence explosion” suggests that you are viewing that aspect of the AI-2027 study as being too conservative.
Do you also view their timelines as likely being too conservative?
I think the OP claim is that (some part of) this is created by one of Facebook AIs, and not by Facebook human users.
The newsfeed is another of Facebook AIs, but it is expected that different Facebook AIs are aware of each other activity (although one can imagine a situation when different Facebook AIs don’t bother to inform each other).
This algorithm synthesized the post in question, so this algorithm knows that the post is synthetic.
Even with a comment by Max More.
Thanks!
The link seems to be missing.
I think the “singleton” case is generally not sufficiently analyzed in the literature. It is treated as something magical and not having an internal structure which could be discussed. A rationalist analysis would like to do better than that.
Nobody is asking what might be inside, should it still be a Minsky’s “society of mind”, and if so, what might the relationships be between various components of that “society of mind”, and so on.
In particular, how would it evolve its own internal structure, and its distribution of goals, and so on.
People seem to be hypnotized by it being an “all-powerful God”, this somehow prevents them from trying to think how it might work (given that the Universe will still be not fully known, there will still be quite a bit of value in open-endedness, in discovery, and so on).
But all this does not imply that we can rely upon stratification into individuals being the most likely default scenario.
Still, the bulk of the risk is self-destruction of the whole ecosystem of super-intelligent AIs together with everything else, regardless of how it is structured and stratified. A singleton is as likely to stumble into unsafe experiments in fundamental physics, if its internal critics are not strong enough.
An ecosystem of super-intelligent AIs (regardless of how it is structured and stratified) which is decent enough to navigate this main risk is not a bad starting point from the viewpoint of human interests as well. Something is sufficiently healthy within it, if it can reliably avoid self-destruction, see my earlier note for more details, https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential