You are correct. I misremembered the post, and I should edit it to clarify.
𝕮𝖎𝖓𝖊𝖗𝖆
[Question] What is the LessWrong Logo(?) Supposed to Represent?
[Yann Lecun] A Path Towards Autonomous Machine Intelligence
[Question] Why Are Posts in the Sequences Tagged [Personal Blog] Instead of [Frontpage]?
My reading material for today/this week (depending on how accessible it is for me):
“Simple Explanation of the No-Free-Lunch Theorem and Its Implications”
I want to learn more about NFLT, and how it constrains simple algorithms for general intelligence.
(Thank God for Sci-Hub).
For my Ensemble General Intelligence model, I was mostly imagining #2 instead of #3.
I said of my ensemble general intelligence model:
It could also dynamically generate narrow optimisers on the fly for the problem sets.
General intelligence might be described as an algorithm for picking (a) narrow optimiser(s) to apply to a given problem set (given x examples from said set).
I did not intend to imply that the set of narrow optimisers the general optimiser is selecting from is represented within the agent. I was thinking of a rough mathematical model for how you can describe it.
That there exists a (potentially infinite) set of all possible narrow optimisers a general intelligence might generate/select from, and there exists a function mapping problems sets (given x examples of said set) to narrow optimisers does not imply that any such representation is stored internally in the agent, nor that the agent implements a look up table.
I equivocated between selection and generation. In practice I imagine generation, but the mathematics of selection are easier to reason about.
I imagine that trying to implement ensemble specialised is impractical in the real world because there are too many possible problem sets. I did not at all consider it a potential model of general intelligence.
I might add this clarification when next I’m on my laptop.
It seems to me that the qualm is not about #2 vs #3 as models for humans, but how easily transfer learning happens for the relevant models of general intelligence, and what progress among the class of general intelligence that manifests in our world looks like.
Currently, I think that it’s possible to improve the meta optimisation processes for generating object level optimisation processes, but this doesn’t imply that an improvement to a particular object level optimisation process will transfer across domains.
This is important because improving object level processes and improving meta level processes are different. And improving meta level processes mostly looks like learning a new domain quicker as opposed to improved accuracy in all extant domains. Predictive accuracy still doesn’t transfer across domains the way it would for a simple optimiser.
I can probably make this distinction clearer, elaborate on it more in the OP.
I’ll think on this issue more in the morning.
The section I’m least confident/knowledgeable about is the speculation around applicability of NFL theorems and exploitation of structure/regularity, so I’ll avoid discussing it.
I simply do not think it’s a discussion I can contribute meaningfully to.
Future me with better models of optimisation processes would be able to reason better around it.
To illustrate how this matters.
Consider two scenarios:
A. There are universal non composite algorithms for predicting stimuli in the real world. Becoming better at prediction transfers across all domains.
B. There are narrow algorithms good at predicting stimuli in distinct domains. Becoming a good predictor in one domain doesn’t easily transfer to other domains.
Human intelligence being an ensemble makes it seem like we live in a world that looks more like B, than it does like A.
Predicting diverse stimuli involves composing many narrow algorithms. Specialising a neural circuit for predicting stimuli in one domain doesn’t easily transfer to predicting new domains.
The above has been my main takeaway from learning about how cognition works in humans (I’m still learning, but it seems to me like future learning would only deepen this insight instead of completely changing it).
We’re actually an ensemble of many narrow systems. Some are inherited because they were very useful in our evolutionary history.
But a lot are dynamically generated and regenerated. Our brain has the ability to rewire itself, create and modify its neural circuitry.
We constantly self modify our cognitive architectures (just without any conscious control over it). Maybe our meta machinery for coordinating and generating object level machinery remains intact?
This changes a lot about what I think is possible for intelligence. What “strongly superhuman intelligence” looks like.
Scepticism of Simple “General Intelligence”
Introduction
I’m fundamentally sceptical that general intelligence is simple.
By “simple”, I mostly mean “non composite” . General intelligence would be simple if there were universal/general optimisers for real world problem sets that weren’t ensembles/compositions of many distinct narrow optimisers.
AIXI and its approximations are in this sense “not simple” (even if their Kolmogorov complexity might appear to be low).
Thus, I’m sceptical that efficient cross domain optimisation that isn’t just gluing a bunch of narrow optimisers together is feasible.
General Intelligence in Humans
Our brain is an ensemble of some inherited (e.g. circuits for face recognition, object recognition, navigation, text recognition, place recognition, etc.) and some dynamically generated narrow optimisers (depending on the individual: circuits for playing chess, musical instruments, soccer, typing, etc.; neuroplasticity more generally).
We probably do have some general meta machinery as a higher layer (I guess for stuff like abstraction, planning, learning new tasks/rewiring our neural circuits, inference, synthesising concepts, pattern recognition, etc.).
But we fundamentally learn/become good at new tasks by developing specialised neural circuits to perform those tasks, not leveraging a preexisting general optimiser.
(This is a very important difference).
We already self modify (just not in a conscious manner) and our ability to do general intelligence at all is strongly dependent on our self modification ability.
Our general optimiser is just a system/procedure for dynamically generating narrow optimisers to fit individual tasks.
Two Models of General Intelligence
This is an oversimplification, but to help gesture at what I’m talking about, I’d like to consider two distinct ways in which general intelligence might manifest.
A. Simple Intelligence There exists a class of non compositional optimisation algorithms that are universal optimisers for the domains that actually manifest in the real world (these algorithms need not be universal for abitrary domains).
General intelligence is implemented by universal (non composit
General Intelligence and No Free Lunch Theorems
This suggests that reality is perhaps not so regular as for us to easily escape the No Free Lunch theorems. The more NFL theorems were a practical constraint, the more you’d expect general intelligence to look like an ensemble of narrow optimisers than a simple (non composite) universal optimiser.
People have rejected no free lunch theorems by specifying that reality was not a random distribution. There was intrinsic order and simplicity. It’s why humans could function as general optimisers in the first place.
But the ensemble like nature of human intelligence suggests that reality is not so simple and ordered for a single algorithm that does efficient cross domain optimisation.
We have an algorithm for generating algorithms. That is itself an algorithm, but it suggests that it’s not a simple one.
Conclusion
It seems to me that there is no simple general optimiser in humans.
Perhaps none exists in principle.
I unchecked the “moderators may promote to frontpage” because Twitter thread, but if you think LessWrong would benefit if it was made a frontpage post, do let me know.
[LQ] Some Thoughts on Messaging Around AI Risk
Twitter Cross Posting
Introduction
I’ll start reposting threads from my Twitter account to LW with no/minimal editing.
Twitter Relevant Disclaimers
I’ve found that Twitter incentivises me to be
More:
Snarky
Brazen
Aggressive
Confident
Exuberant
Less:
Nuanced
Modest
The overall net effect is that content written originally for Twitter has predictably low epistemic standards compared to content I’d write for LessWrong. However, trying to polish my Twitter content for LessWrong takes too much effort (my takeoff dynamics sequence [currently at 14 − 16 posts] started as an attempt to polish my takeoff dynamics thread for a LW post).
As I think it’s better to write than to not write, I’ve decided to publish my low-quality-but-still-LessWrong-relevant Twitter content here with no/minimal editing. I hedge against the predictably low quality by publishing it as shortform/to my personal blog instead of to the front page.
Testing
I’ll post the unedited/minimally edited threads as both blog posts (promotion to front page disabled) and shortform pieces. I’ll let the reception to the posts in both venues decide which one I’ll keep with going forward. I’ll select positively for:
Visibility
Engagement
And negatively against:
Hostility
Annoyance
Some Thoughts on Messaging Around AI Risk
Disclaimer
Stream of consciousness like. This is an unedited repost of a thread on my Twitter account. The stylistic and semantic incentives of Twitter influenced it.
Some thoughts on messaging around alignment with respect to advanced AI systems
A 🧵Terminology:
* SSI: strongly superhuman intelligence
* ASI: AI with decisive strategic advantage (“superintelligence”)
* “Decisive strategic advantage”:
A vantage point from which an actor can unilaterally determine future trajectory of earth originating intelligent life.Misaligned ASI poses a credible existential threat. Few things in the world actually offer a genuine threat of human extinction. Even global thermonuclear war might not cause it. The fundamentally different nature of AI risks...
That we have a competent entity that is optimising at cross-purposes with human welfare.
One which might find the disempowerment of humans to be instrumentally beneficial or for whom humans might be obstacles (e.g. we are competing with it for access to the earth’s resources).
An entity that would actively seek to thwart us if we tried to neutralise it. Nuclear warheads wouldn’t try to stop us from disarming them.
Pandemics might be construed as seeking to continue their existence, but they aren’t competent optimisers. They can’t plan or strategise. They can’t persuade individual humans or navigate the complexities of human institutions.
That’s not a risk scenario that is posed by any other advanced technology we’ve previously developed. Killing all humans is really hard. Especially if we actually try for existential security.
Somewhere like New Zealand could be locked down to protect against a superpandemic, and might be spared in a nuclear holocaust. Nuclear Winter is pretty hard to trigger, and it’s unlikely that literally every urban centre in the world will be hit.
Global thermonuclear war may very well trigger civilisational collapse, and derail humanity for centuries, but actual extinction? That’s incredibly difficult.
It’s hard to “accidentally kill all humans”. Unless you’re trying really damn hard to wipe out humanity, you will probably fail at it.
The reason why misaligned ASI is a _credible_ existential threat — a bar which few other technologies meet — is because of the “competent optimiser”. Because it can actually try really damn hard to wipe out humanity.
And it’s really good at steering the future into world states ranked higher in its preference ordering.
By the stipulation that it has decisive strategic advantage, it’s already implicit that should it decide on extinction, it’s at a vantage point from which it can execute such a plan.
But. It’s actually really damn hard to convince people of this. The inferential distance that needs to be breached is often pretty far.
If concrete risk scenarios are presented, then they’ll be concretely discredited. And we do not have enough information to conclusively settle the issues of disagreement.
For example, if someone poses the concrete threat of developing superweapons via advanced nanotechnology, someone can reasonably object that developing new advanced technology requires considerable:
* Infrastructure investment
* R & D, especially experiment and other empirical research
* Engineering
* TimeAn AI could not accomplish all of this under stealth, away from the prying eyes of human civilisation.
“Developing a novel superweapon in stealth mode completely undetected is pure fantasy” is an objection that I’ve often heard. And it’s an objection I’m sympathetic to somewhat. I am sceptical that intelligence can substitute for experiment (especially in R & D).
For any other concrete scenario of AI induced extinction one can present, reasonable objections can be formulated. And because we don’t know what SSIs are capable of, we can’t settle the facts of those objections.
If instead, the scenarios are left abstract, then people will remain unconvinced about the threat. The “how?” will remain unanswered.
Because of cognitive uncontainability — because some of the strategies available to the AI are strategies that we would never have thought of* — I find myself loathe to specify concrete threat scenarios (they probably wouldn’t be how the threat manifests).
https://arbital.com/p/uncontainability/
* It should be pointed out that some of the tactics AlphaGo used against Kie Jie were genuinely surprising and unlike tactics that have previously manifested in human games.
In the rigidly specified and fully unobservable environment of a Go board, Alpha Go was already uncontainable for humans. In bare reality with all its unbounded choice, an SSI would be even more so.
https://arbital.com/p/real_world/
It is possible that — should the AI be far enough in the superhuman domain — we wouldn’t even be able to comprehend its strategy (in much the same way scholars of the 10th century could not understand the design of an air conditioner).
https://arbital.com/p/strong_uncontainability/
Uncontainability is reason to be wary of an existential risk from SSIs even if I can’t formulate airtight scenarios illustrating said risk. However, it’s hardly a persuasive argument to convince someone who didn’t already take AI risk very seriously.
Furthermore, positing that an AI has “decisive strategic advantage” is already assuming the conclusion. If you posited that an omnicidal maniac had decisive strategic advantage, then you’ve also posited a credible existential threat.
It is obvious that a misaligned AI system with considerable power over humanity is a credible existential threat to humanity.What is not obvious is that an advanced AI system would acquire “considerable power over humanity”. Emergence of superintelligence is not self-evident.
I think the possibility of SSI is pretty obvious, so I will not spend much time justifying it. I will list a few arguments in favour though.Note: “brain” = “brain of homo sapiens”.
Arguments:
* Brain size limited by the birth canal
* Brain produced by a process not optimising for general intelligence
* Brain very energy constrained (20 watts)
* Brain isn’t thermodynamically optimal
* Brain could be optimised further via IES:
https://forum.effectivealtruism.org/topics/iterated-embryo-selection
Discussions of superintelligence often come with the implicit assumption that “high cognitive powers” when applied to the real world either immediately confer decisive strategic advantage, or allow one to quickly attain it:
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#:~:text=2.%20%C2%A0A%20cognitive%20system%20with%20sufficiently%20high%20cognitive%20powers%2C%20given%20any%20medium%2Dbandwidth%20channel%20of%20causal%20influence%2C%20will%20not%20find%20it%20difficult%20to%20bootstrap%20to%20overpowering%20capabilities%20independent%20of%20human%20infrastructure.
My honest assessment is that the above hypothesis is very non obvious without magical thinking:
https://www.lesswrong.com/posts/rqAvheoRHwSDsXrLw/i-no-longer-believe-intelligence-to-be-magical
Speaking honestly as someone sympathetic to AI x-risk (I’ve decided to become a safety researcher because I take the threat seriously), many of the proposed vectors I’ve heard people pose for how an AI might attain decisive strategic advantage seem “magical” to me.
I don’t buy those arguments and I’m someone who alieves that misaligned advanced AI systems can pose an existential threat.
Of course, just because we can’t formulate compelling airtight arguments for SSI quickly attaining decisive strategic advantage doesn’t mean it won’t.
Hell, our inability to find such arguments isn’t particularly compelling evidence either; uncontainability suggests that this is something we’d find it difficult to determine beforehand.
Uncontainability is a real and important phenomenon, but it may prove too much. If my best justification for why SSI poses a credible existential threat is “uncontainability”, I can’t blame any would be interlocutors for being sceptical.
Regardless, justifications aside, I’m still left with a conundrum; I’m unable to formulate arguments for x-risk from advanced AI systems that I am fully satisfied with. And if I can’t fully persuade myself of the credible existential threat, then how am I to persuade others?
I’ve been thinking that maybe I don’t need to make an airtight argument for the existential threat. Advanced AI systems don’t need to pose an existential threat for safety or governance work.
If I simply wanted to make the case for why safety and governance are important, then it is sufficient only to demonstrate that misaligned SSI will be very bad.
Some ways in which misaligned SSI can be bad that are worth discussing:
* Disempowering humans (individuals, organisations, states, civilisation)Humanity losing autonomy and the ability to decide their future is something we can generally agree is bad. With advanced AI this may manifest this on smaller scales (individuals) up to civilisation.
An argument for disempowerment can be made via systems with longer time horizons, more coherent goal driven behaviour, better planning ability/strategic acumen, etc. progressively acquiring more resources, influence and power, reducing what’s left in human hands.
In the limit, most of the power and economic resources will belong to such systems. Such human disempowerment will be pretty bad, even if it’s not an immediate existential catastrophe.
I think Joe Carl Smith made a pretty compelling argument for why agentic, planning systems were especially risky along this front:
https://docs.google.com/document/d/1smaI1lagHHcrhoi6ohdq3TYIZv0eNWWZMPEy8C8byYg/edit#
* Catastrophic scenarios (e.g. > a billion deaths. A less stringent requirement than literally all humans die).Misaligned AI systems could play a destabilising role in geopolitics, exacerbating the risk of thermonuclear war.
Alternatively, they could be involved in the development, spread or release of superpandemics.It’s easier to make the case for AI triggering catastrophe via extant vectors.
* Infrastructure failure (cybersecurity, finance, information technology, energy, etc.)Competent optimisers with comprehensive knowledge of the intricacies of human infrastructure could cause serious damage via leveraging said infrastructure in ways no humans can.
Consider the sheer breadth of their knowledge. LLMs can be trained on e.g. the entirety of Wikipedia, Arxiv, the internet archive, open access journals, etc.
An LLM with human like abilities to learn knowledge from text, would have a breadth of knowledge several orders of magnitude above the most well read human. They’d be able to see patterns and make inferences that no human could.
The ability of SSI to navigate (and leverage) human infrastructure would be immense. If said leverage was applied in ways that were unfavourable towards humans...
* Assorted dystopian scenarios(I’m currently drawing a blank here, but that is entirely due to my lack of imagination [I’m not sure what counts as sufficiently dystopian as to be worth mentioning alongside the other failure modes I listed]).
I don’t think arguing for an existential threat that people find it hard to grasp gains that much more (or any really) mileage over arguing for other serious risks that people are more easily able to intuit.
Unless we’re playing a cause Olympics* where we need to justify why AI Safety in particular is most important, stressing the “credible existential threat” aspect of AI safety may be counterproductive?
(* I doubt we’ll be in such a position except relative to effective altruists, and they’d probably be more sympathetic to the less-than-airtight arguments for an existential threat we can provide).
I’m unconvinced that it’s worth trying to convince others that misaligned advanced AI systems pose an existential threat (as opposed to just being really bad).
What I’m currently working on:
The sequence has an estimated length between 30K − 60K words (it’s hard to estimate because I’m not even done preparing the outlines yet).
I’m at ~8.7K words written currently (across 3 posts [the screenshots are my outlines]) and guess I’m only 5% of the way through the entire sequence.
Beware the planning fallacy though, so the sequence could easily grow significantly longer than I currently expect.
I work full time until the end of July and will be starting a Masters in September, so here’s to hoping I can get the bulk of the piece completed when I have more time to focus on it in August.
Currently, I try for some significant writing [a few thousand words] on weekends and fill in my outlines on weekdays. I try to add a bit more each day, just continuously working on it, until it spontaneously manifests. I also use weekdays to think about the sequence.
So, the twelve posts I’ve currently planned could very well have ballooned in scope by the time I can work on it full time.
Weekends will also be when I have the time for extensive research/reading for some of the posts).- 25 Jun 2022 13:36 UTC; 4 points) 's comment on DragonGod’s Shortform by (
Persuasion carried out through ordinary means of communication is capability deployed via the human interface in my taxonomy of real-world capabilities.
Here, the idea is influencing humans (or other agents) via means other than communication.
Hmm, I’ll edit/reword that section.
I reworded it to express uncertainty.
High Powers Over Physics
To be clear, I do have a few ideas, but they’re hypotheses I privileged, so I don’t want to poison your thinking by mentioning them.
Maybe in three days (or whenever I finish the mega essay), I’ll lay them out later and explore why I’m dissatisfied with them.
I was not aware that we had published any books!
A few questions:
Are there any other books?
Why are the books not prominently linked from the home page?
Where can I download/order them in ebook form?
I don’t read any physical books.