Well, I had to think about this for longer than five seconds, so that’s already a huge victory.
If I try to compress your idea down to a few sentences:
The humans ask the AI to produce design tools, rather than designs, such that there’s a bunch of human cognition that goes into picking out the particular atomic arrangements or synthesis pathways; and we can piecewise verify that the tool is making accurate predictions; and the tool is powerful enough that we can build molecular nanotech and an uploader by using the tool for an amount of time too short for Facebook to catch up and destroy the world. The AI that does this is purportedly sufficiently good at meta-engineering to build the tool, but not good enough at larger strategy that it can hack its way through the humans using just the code of the tool. The way in which this attacks a central difficulty is by making it harder for the AI to just build unhelpful nanotech using the capabilities that humans use to produce helpful nanotech.
Yes, sounds right to me. It’s also true that one of the big unproven assumptions here is that we could create an AI strong enough to build such a tool, but too weak to hack humans. I find it plausible, personally, but I don’t yet have an easy-to-communicate argument for it.
I don’t know of a reason we couldn’t do this with a narrow AI. I have no idea how, but it’s possible in principle so far as I know. If anyone can figure out how, they could plausibly execute the pivotal act described above, which would be a very good thing for humanity’s chances of survival.
EDIT: Needless to say, but I’ll say it anyway: Doing this via narrow AI is vastly preferable to using a general AI. It’s both much less risky and means you don’t have to expend an insane amount of effort on checking.
The humans ask the AI to produce design tools, rather than designs (...) we can piecewise verify that the tool is making accurate predictions (...) The way in which this attacks a central difficulty is by making it harder for the AI to just build unhelpful nanotech
I think this is a good way to put things, and it’s a concept that can be made more general and built upon.
Like, we can also have AIs produce:
Tools that make other tools
Tools that help to verify other tools
Tools that look for problems with other tools (in ways that don’t guarantee finding all problems, but can help find many)
Tools that help approximate brain emulations (or get us part of the way there), or predict what a human would say when responding to questions in some restricted domain
Etc, etc
Maybe you already have thought through such strategies very extensively, but AFAIK you don’t make that clear in any of your writings, and it’s not a trivial amount of inferential distance that is required to realize the full power of techniques like these.
I have written more about this concept in this post in this series. I’m not sure whether or not any of the concepts/ideas in the series are new, but it seems to me that several of them at the very least are under-discussed.
I think secrecy is rarely a long-term solution because it’s fragile, but it can definitely have short-term uses? For example, I’m sure that some insights into AI have the capacity to advance both alignment and capabilities; if you have such an insight then you might want to share it secretly with alignment researchers while avoiding sharing it publicly because you’d rather Facebook AI not enhance its capabilities. And so the secrecy doesn’t have to be a permanent load-bearing part of a system; instead it’s just that every day the secrecy holds up is one more day you get to pull ahead of Facebook.
Well, I had to think about this for longer than five seconds, so that’s already a huge victory.
If I try to compress your idea down to a few sentences:
The humans ask the AI to produce design tools, rather than designs, such that there’s a bunch of human cognition that goes into picking out the particular atomic arrangements or synthesis pathways; and we can piecewise verify that the tool is making accurate predictions; and the tool is powerful enough that we can build molecular nanotech and an uploader by using the tool for an amount of time too short for Facebook to catch up and destroy the world. The AI that does this is purportedly sufficiently good at meta-engineering to build the tool, but not good enough at larger strategy that it can hack its way through the humans using just the code of the tool. The way in which this attacks a central difficulty is by making it harder for the AI to just build unhelpful nanotech using the capabilities that humans use to produce helpful nanotech.
Sound about right?
Yes, sounds right to me. It’s also true that one of the big unproven assumptions here is that we could create an AI strong enough to build such a tool, but too weak to hack humans. I find it plausible, personally, but I don’t yet have an easy-to-communicate argument for it.
Why can’t a narrow AI (maybe like Drexler’s proposal) create the tool safely?
I don’t know of a reason we couldn’t do this with a narrow AI. I have no idea how, but it’s possible in principle so far as I know. If anyone can figure out how, they could plausibly execute the pivotal act described above, which would be a very good thing for humanity’s chances of survival.
EDIT: Needless to say, but I’ll say it anyway: Doing this via narrow AI is vastly preferable to using a general AI. It’s both much less risky and means you don’t have to expend an insane amount of effort on checking.
I think this is a good way to put things, and it’s a concept that can be made more general and built upon.
Like, we can also have AIs produce:
Tools that make other tools
Tools that help to verify other tools
Tools that look for problems with other tools (in ways that don’t guarantee finding all problems, but can help find many)
Tools that help approximate brain emulations (or get us part of the way there), or predict what a human would say when responding to questions in some restricted domain
Etc, etc
Maybe you already have thought through such strategies very extensively, but AFAIK you don’t make that clear in any of your writings, and it’s not a trivial amount of inferential distance that is required to realize the full power of techniques like these.
I have written more about this concept in this post in this series. I’m not sure whether or not any of the concepts/ideas in the series are new, but it seems to me that several of them at the very least are under-discussed.
Useless; none of these abstractions help find an answer.
From what I know of security, any system requiring secrecy is already implicitly flawed.
(Naturally, if this doesn’t apply and you backchanneled your idea for some legitimate meta-reason, I withdraw my objection.)
I think secrecy is rarely a long-term solution because it’s fragile, but it can definitely have short-term uses? For example, I’m sure that some insights into AI have the capacity to advance both alignment and capabilities; if you have such an insight then you might want to share it secretly with alignment researchers while avoiding sharing it publicly because you’d rather Facebook AI not enhance its capabilities. And so the secrecy doesn’t have to be a permanent load-bearing part of a system; instead it’s just that every day the secrecy holds up is one more day you get to pull ahead of Facebook.