In context, “secure” means “secure against jailbreaks”. Source. H/t Cole Wyeth here.
I find this confusing.
Here’s a question: are object-level facts about the world, like “tires are usually black”, encoded directly in the human-created AGI source code?
If yes, then (1) yo be real, and (2) even if it happened, the source code would wind up with wildly unprecedented length and complexity, so much that it becomes basically inscrutable, just because it’s a complicated world (e.g. there are lots of types of tires, not all of them are black, they stop looking black when they get muddy, etc. etc.). See Yuxi Liu’s ruthless “obituary” of Cyc.
If no, then the human-created source code must be defining a learning algorithm of some sort. And then that learning algorithm will figure out for itself that tires are usually black etc. Might this learning algorithm be simple and legible? Yes! But that was true for GPT-3 too, which Eliezer is clearly putting in the “inscrutable” category in this tweet.
So what is he taking about?
Here are some options!
Maybe the emphasis is on the words “matrices” and “floating point”? Like, if there were 175 billion arrows defining the interconnections of an unprecedentedly-enormous unlabeled Pearlian causal graph, would we feel better about “mortal humans” aligning that? If so, why? “Node 484280985 connects to Node 687664334 with weight 0.034” is still inscrutable! How does that help?
Maybe the emphasis is on “billion”, and e.g. if it were merely 175 million inscrutable floating-point numbers, then mortal humans could align it? Seems far-fetched to me. What about 175,000 inscrutable parameters? Maybe that would help, but there are a LOT of things like “tires are usually black, except when muddy, etc.”. Would that really fit in 175,000 parameters? No way! The average adult knows 20,000 words!
Maybe the emphasis is on “mortal humans”, and what Eliezer meant was that, of course tons of inscrutable parameters were always inevitably gonna be part of AGI, and of course mortal humans will not be able to align such a thing, and that’s why we’re doomed and should stop AI research until we invent smarter humans. But … I did actually have the impression that Eliezer thinks that the large numbers of inscrutable parameters were a bad decision, as opposed to an inevitability. Am I wrong?
Maybe the emphasis is on “inscrutable”, and he’s saying that 175 billion floating point numbers is fine per se, and the problem is that the field of interpretability to date has not developed to the point where we can, umm, scrute them?
Maybe this is just an off-the-cuff tweet from 2022 and I shouldn’t think too hard about it? Could be!
[Question] Inscrutability was always inevitable, right?
Here’s a 2022 Eliezer Yudkowsky tweet:
I find this confusing.
Here’s a question: are object-level facts about the world, like “tires are usually black”, encoded directly in the human-created AGI source code?
If yes, then (1) yo be real, and (2) even if it happened, the source code would wind up with wildly unprecedented length and complexity, so much that it becomes basically inscrutable, just because it’s a complicated world (e.g. there are lots of types of tires, not all of them are black, they stop looking black when they get muddy, etc. etc.). See Yuxi Liu’s ruthless “obituary” of Cyc.
If no, then the human-created source code must be defining a learning algorithm of some sort. And then that learning algorithm will figure out for itself that tires are usually black etc. Might this learning algorithm be simple and legible? Yes! But that was true for GPT-3 too, which Eliezer is clearly putting in the “inscrutable” category in this tweet.
So what is he taking about?
Here are some options!
Maybe the emphasis is on the words “matrices” and “floating point”? Like, if there were 175 billion arrows defining the interconnections of an unprecedentedly-enormous unlabeled Pearlian causal graph, would we feel better about “mortal humans” aligning that? If so, why? “Node 484280985 connects to Node 687664334 with weight 0.034” is still inscrutable! How does that help?
Maybe the emphasis is on “billion”, and e.g. if it were merely 175 million inscrutable floating-point numbers, then mortal humans could align it? Seems far-fetched to me. What about 175,000 inscrutable parameters? Maybe that would help, but there are a LOT of things like “tires are usually black, except when muddy, etc.”. Would that really fit in 175,000 parameters? No way! The average adult knows 20,000 words!
Maybe the emphasis is on “mortal humans”, and what Eliezer meant was that, of course tons of inscrutable parameters were always inevitably gonna be part of AGI, and of course mortal humans will not be able to align such a thing, and that’s why we’re doomed and should stop AI research until we invent smarter humans. But … I did actually have the impression that Eliezer thinks that the large numbers of inscrutable parameters were a bad decision, as opposed to an inevitability. Am I wrong?
Maybe the emphasis is on “inscrutable”, and he’s saying that 175 billion floating point numbers is fine per se, and the problem is that the field of interpretability to date has not developed to the point where we can, umm, scrute them?
Maybe this is just an off-the-cuff tweet from 2022 and I shouldn’t think too hard about it? Could be!
Or something else? I dunno.
Prior related discussions on this forum: Glass box learners want to be black box (Cole Wyeth, 2025) ; “Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds” (1a3orn, 2023) ; “Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc” (Wentworth, 2022) (including my comment on the latter).