We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Or better put, I can conceive many reasons why this plan fails.
Then could you produce a few of the main ones, to allow for examination?
Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
Anything that’s not difficult to understand isn’t powerful enough to be valuable.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”.
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.
We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.
Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.
You aren’t going to get designs for specific nanotech, you’re going to get designs for generic nanotech fabricators.
Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.
Can you reread what I wrote?
Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won’t have controls of what’s being produced in them?
Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?
How do the fabricators work? We can verify their inputs, too, right?
Can you verify code to be sure there’s no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.
We’ll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.
A single flaw and them all humans die at once? I don’t see how. Or better put, I can conceive many reasons why this plan fails. Also, I don’t see how see build those factories in the first place and we can’t use that time window to make the AGI to produce explicit results on AGI safety
Then could you produce a few of the main ones, to allow for examination?
What’s the time window in your scenario? As I noted in a different comment, I can agree with “days” as you initially stated. That’s barely enough time for the EA community to notice there’s a problem.
Anything (edit: except solutions of mathematical problems) that’s not difficult to understand isn’t powerful enough to be valuable.
Not to mention the AGI has the ability to fool both us and our AI into thinking it’s easy to understand and harmless, and then it will kill us all anyway.
This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there’s a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I’d say science and engineering are full of “metaphorically” NP problems that fit that description: you’re searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.
If we were serious about getting useful nanotech from an AGI, I think we’d ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.
That’s a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn’t remember this, and I was thinking about physical problems (like nanosystems).
For difficult problems in physical universe though, we can’t easily non-empirically check the solution. (For example, it’s not possible to non-empirically check if a molecule affects the human body in a desired way, and I’d expect that non-empirically checking if a nanosystem is safe would be at least as hard.)
For the physical world, I think there is a decent-sized space of “problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources”. In particular, I think this class of questions is pretty safe: “Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?”
So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we’ve learned something valuable about the AGI. If its answers are highly constrained, like “reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel”, then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.
There was a thread two months ago where I said similar stuff, here: https://www.lesswrong.com/posts/4T59sx6uQanf5T79h/interacting-with-a-boxed-ai?commentId=XMP4fzPGENSWxrKaA
I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).
So such an AI won’t kill us by giving us that advice, but it will kill us in other ways.
(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)
Please notice that I never said that an AGI won’t be unsafe.
If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn’t agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against
The AGI will kill us in other ways than its theorem proofs being either-hard-to-check-or-useless, but it will kill us nevertheless.
I think no one, incuding EY, doesn’t think “humanity ends as soon as we have an AGI”. Actual opinion is “Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI”. As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.