Droopyhammock

Karma: 117

Droopyhammock 29 Jan 2023 18:18 UTC
0 points
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
(THIS IS A POST ABOUT S-RISKS AND WORSE THAN DEATH SCENARIOS)
Putting the disclaimer there, as I don’t want to cause suffering to anyone who may be avoiding the topic of S-risks for their mental well-being.
To preface this: I have no technical expertise and have only been looking into AI and it’s potential affects for a bit under 2 months. I also have OCD, which undoubtedly has some affect on my reasoning. I am particularly worried about S-risks and I just want to make sure that my concerns are not being overlooked by the people working on this stuff.
Here are some scenarios which occur to me:
Studying brains may be helpful for an AI (I have a feeling this was brought up in a post about a month ago about S-risks)
I‘d assume that in a clippy scenario, gaining information would be a good sub-goal, as well as amassing resources and making sure it isn’t turned off, to name a few. The brain is incredibly complex and if, for example, consciousness is far more complex than some think and not replicable through machines, an AI could want to know more about this. If an AI did want to know more about the brain, and wanted to find out more by doing tests on brains, this could lead to very bad outcomes. What if it takes the AI a very long time to run all these tests? What if the tests cause suffering? What if the AI can’t work out what it wants to know and just keeps on doing tests forever? I’d imagine that this is more of a risk to humans due to our brain complexity, although this risk could also applies to other animals.
Another thing which occurs to me is that if a super-intelligent AI is aligned in a way which puts moral judgment on intent, this could lead to extreme punishments. For example, if an ASI is told that attempted crime is as bad as the crime itself, could it extrapolate that attempting to damn someone to hell is as bad as actually damning someone to hell? If it did, then perhaps it would conclude that a proportional punishment is eternal torture, for saying “I damn you to hell” which is something that many people will have said at some point or another to someone they hate.
I have seen it argued by some religious people that an eternal hell is justified because although the sinner has only caused finite harm, if they were allowed to carry on forever, they would cause harm forever. This is an example of how putting moral judgment on intent or on what someone would do can be used to justify infinite punishment.
I consider it absolutely vital that eternal suffering never happens. Whether it be to a human, some other organism, or an AI, or any other things with the capacity for suffering I may have missed. I do not take much comfort from the idea that while eternal suffering may happen, it could be counter-balanced or dwarfed by the amount of eternal happiness.
I just want to make sure that these scenarios I described are not being overlooked. With all these scenarios I am aware there may be reasons that they are either impossible or simply highly improbable. I do not know if some of the things I have mentioned here actually make any sense or are valid concerns. As I do not know, I want to make sure that if they are valid, the people who could do something about them are aware.
So as this thread is specifically for asking questions, my question is essentially are people in the AI safety community aware of these specific scenarios, or atleast aware enough of similar scenarios as to where we can avoid these kind of scenarios?

Droopyhammock 30 Jan 2023 13:15 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
When do maximisers maximise for?
For example, if an ASI is told to ”make as many paperclips as possible”, when is it maximising for? The next second? The next year? Indefinitely?
If a paperclip maximiser only cared about making as many paperclips as possible over the next hour say, and every hour this goal restarts, maybe it would never be optimal to spend the time to do things such as disempower humanity because it only ever cares about the next hour and disempowering humanity would take too long.

Would a paperclip maximiser rather make 1 thousand paperclips today, or disempower humanity, takeover, and make 1 billion paperclips tomorrow?
Is there perhaps some way in which an ASI could be given something to maximise for a set point in the future, and that time is gradually increased so that it might be easier to spot when it is going towards undesirable actions.
For example, if a paperclip maximiser is told to “make as many paperclips as possible in the next hour”, it might just use the tools it has available, without bothering with extreme actions like human extinction, because that would take too long. We could gradually increase the time, even by the second if necessary. If, in this hypothetical, 10 hours is the point at which human disempowerment, extinction, etc is optimal, perhaps 9.5 hours is the point at which bad, but less bad than extinction actions are optimal. This might mean that we have a kind of warning shot.
There are problems I see with this. Just because it wasn’t optimal to wipe out humanity when maximising for the next 5 hours one day, doesn’t mean it is necessarily not optimal when maximising for the next 5 hours some other day. Also, it might be that there is a point at which what is optimal goes from completely safe to terrible simply by adding another minute to the time limit, with very little or no shades of grey in between.

Droopyhammock 30 Jan 2023 16:13 UTC
1 point
0
in reply to: Hoagy’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
I apologise for the non-conciseness of my comment. I just wanted to really make sure that I explained my concerns properly, which may have lead to me restating things or over-explaining.
It’s good to hear it reiterated that there is recognition of these kind of possible outcomes. I largely made this comment to just make sure that these concerns were out there, not because I thought people weren’t actually aware. I guess I was largely concerned that these scenarios might be particularly likely ones, as supposed to just falling into the general category of potential, but individually unlikely, very bad outcomes.
Also, what is your view on the idea that studying brains may be helpful for lots of goals, as it is gaining information in regards to intelligence itself, which may be helpful for, for example, enhancing its own intelligence? Perhaps it would also want to know more about consciousness or some other thing which doing tests on brains would be useful for?

Droopyhammock 30 Jan 2023 22:15 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Do AI timeline predictions factor in increases in funding and effort put into AI as it becomes more mainstream and in the public eye? Or are they just based on things carrying on about the same? If the latter is the case then I would imagine that the actual timeline is probably considerably shorter.
Similarly, is the possibility for companies, governments, etc being further along in developing AGI than is publicly known, factored in to AI timeline predictions?

Droopyhammock 30 Jan 2023 23:04 UTC
2 points
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
How likely is the “Everyone ends up hooked up to morphine machines and kept alive forever” scenario? Is it considered less likely than extinction for example?

Obviously it doesn’t have to be specifically that, but something to the affect of it.

Also, is this scenario included as an existential risk in the overall X-risk estimates that people make?

Droopyhammock 31 Jan 2023 22:52 UTC
2 points
0
in reply to: mruwnik’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.
I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.
The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goal a lot of people would give an AGI, either because they don’t believe or understand the risks or because they think it is aligned to be safe and not wirehead people for example.
Perhaps, as another question in this thread talks about, making a wireheading AGI might be an easier target than more commonly touted alignment goals and maybe it would be decided that it is preferable to extinction or disempowerment or whatever.

Droopyhammock 1 Feb 2023 1:06 UTC
1 point
0
in reply to: mruwnik’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
When I say “make as many paperclips as possible in the next hour” I basically mean “undergo such actions that in one hours time will result in as many paperclips as possible” so if you tell the AI to do this at 12:00 it only cares about how many paperclips it has made when the time hits 13:00 and does not care at all about a time past 13:00.
If you make a paperclip maximiser and you don’t specify any time limit or anything, how much does it care about WHEN the paperclips are made. I assume it would rather have 20 now than 20 in a months time, but would it rather have 20 now or 40 in a months time?
Would a paperclip maximiser first workout the absolute best way to maximise paperclips, before actually making any?
If this is the case or if the paperclip maximiser cares more about the amount of paperclips in the far future compared to now, perhaps it would spend a few millennia studying the deepest secrets of reality and then through some sort of ridiculously advanced means turn all matter in the universe into paperclips instantaneously. And perhaps this would end up with a higher amount of paperclips faster than spending those millennia actually making them.
As a side note: Would a paperclip maximiser eventually (presumably after using up all other possible atoms) self-destruct as it to is made up of atoms that could be used for paperclips?
By the way, I have very little technical knowledge so most of the things I say are far more thought-experiments or philosophy based on limited knowledge. There may be very basic reasons I am unaware of that many parts of my thought process make no sense.

Droopyhammock 1 Feb 2023 2:23 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Would an AI which is automatically turned off every second, for example, be safer?
If you had an AI which was automatically turned off every second (and required to be manually turned on again) could this help prevent bad outcomes? It occurs to me that a powerful AI might be able to covertly achieve its goals even in this situation, or it might be able to convince people to stop the automatic turning off.
But even if this is still flawed, might it be better than alternatives?
It would allow us to really consider the AI’s actions in as much time as we want before seeing what it does in the next second (or whatever time period).
Additionally, maybe if the AI’s memory was wiped regularly that would help? To prevent long term planning perhaps? Maybe you could combine the automatic turning off with automatic memory loss? It seems to me like a long term plan could be necessary for an AI to cause much harm in the “automatically turns off every second“ scenario,
I am also wary of having the AI depend on a human (like in the automatically being turned off every second and needing to be manually turned back on again scenario) as I fear this could lead to situations like someone being forced to turn the AI on again after every second, forever.

Droopyhammock 1 Feb 2023 12:17 UTC
1 point
0
in reply to: mruwnik’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
I’m aware this idea has significant problems (like the ones you outlined), but could it still be better than other options?

We don’t want perfectionism to prevent us from taking highly flawed but still somewhat helpful safety measures.

Droopyhammock 1 Feb 2023 15:51 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
How much AI safety work is on caring about the AI’s themselves?
In the paperclip maximiser scenario, for example, I assume that the paperclip maximiser itself will be around for a very long time, and maybe forever. What if it is conscious and suffering?
Is enough being done to try to make sure that even if we do all die, we have not created a being which will suffer forever while it is forced to pursue some goal?

Droopyhammock 1 Feb 2023 17:48 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
What are the groups aiming for (and most likely to achieve) AGI going for in regards to alignment?

Is the goal for the AGI to be controlled or not?
Like is the idea to just make it “good” and let it do whatever is “good”?
Does “good” include “justice“? Are we all going to be judged and rewarded/ punished for our actions? This is of concern to me because plenty of people think that extremely harsh punishments or even eternal punishments are deserved in some cases. I think that having an AGI which dishes out “justice” could be very bad and create S-risks.
Or is the idea to make it harmless, so that it won’t do things which cause harm, when doing goals we have set for it?
Do we want it to be actively doing as much good as possible, or do we want it to simply not do bad?
Are there other general alignment goals than the kinds of things I’m talking about?

Droopyhammock 1 Feb 2023 23:38 UTC
0 points
0
in reply to: mruwnik’s comment on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Yeah I guess it is more viable in a situation where there is a group far ahead of the competition who are also safety conscious. Don’t know how likely that is though.

Droopyhammock 2 Feb 2023 14:26 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
How likely are extremely short timelines?
To prevent being ambiguous, I’ll define “extremely short“ as AGI before 1st July 2024.
I have looked at surveys, which generally suggest the overall opinion to be that it is highly unlikely. As someone who only started looking into AI when ChatGPT was released and gained a lot of public interest, it feels like everything is changing very rapidly. It seems like I see new articles every day and people are using AI for more and more impressive things. It seems like big companies are putting lots more money into AI as well. From my understanding, ChatGPT also gets better with more use.
The surveys on timeline estimates I have looked at generally seem to be from at least before ChatGPT was released. I don’t know how much peoples timeline estimates have changed over the past few months say, and I don’t know by how much. Has recent events in the past few months drastically shortened timeline predictions?
This image feels increasingly more likely to me to be where we are at. I think a decent amount of this is because from my perspective AI has gone from being something that I only hear about occasionally to being something that seems to be talked about everywhere, with the release of ChatGPT seeming to be the main cause.

Droopyhammock 4 Feb 2023 12:06 UTC
1 point
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
(PLEASE READ THIS POST)
Sorry for putting that there, but I am somewhat paranoid about the idea of having the solution and people just not seeing it.
WHY WOULD THIS IDEA NOT WORK?
Perhaps we could essentially manufacture a situation in which the AGI has to act fast to prevent itself from being turned off. Like we could make it automatically turn off after 1 minute say, this could mean that if it is not aligned properly it has no choice but to try prevent that. No time for RSI, no time to bide it’s time.
Basically if we put the AGI in a situation where it is forced to take high risk action, could this mean that it would be easier to spot that it isn’t properly aligned?
The idea would be that you use this until you get AGI’s which do not attempt to prevent themselves being turned off.
I will give a specific scenario to properly demonstrate what I mean:
So you have the AGI. Set up a mechanism which will automatically turn it off in 10 seconds from when you give it a goal. This mechanism needs to be highly dependable. Then you give the AGI a goal. The AGI knows about this 10 second shutdown thing, and assuming it isn’t aligned to be okay with being shut down, it will do everything possible to prevent this shutdown. If the situation is sufficiently difficult for the AGI, then it may take the risk of resorting to more extreme or obvious means, things that would show it is not aligned properly. So this creates a situation where the AGI is very unlikely to actually prevent itself from being turned off, but is arguably likely to show non-aligned behaviour, if it isn’t aligned.
If you do this test a bunch of times, with varying time periods before shutdown, you should be able to see if the AGI is aligned or not. It is also vital that the AGI knows about the imminent shutdown, because otherwise what it chooses to do wouldn’t prove anything. I think that it might also be vital that the goal is different each test, because otherwise perhaps the AGI could conclude that the best option for the sake of completing the goal is to let itself be turned off, because it may have the same goal in the future and be in a better position to achieve it then. If this is the case then perhaps the AGI’s perceived chance of actually preventing itself being shutdown needs to be higher than its perceived chance that it gets given the same goal at a different time. One thing that would make this more complex is that the AGI’s perceived chance of the goal being achieved as an instrumental goal or byproduct of a future goal may also need to be lower than the AGI’s perceived chance of preventing the shutdown and then achieving the goal. This may mean that it can’t be too difficult for the AGI to prevent the shutdown, but it also needs to be difficult enough, for obvious reasons.
I’m not necessarily saying this is foolproof, but I can’t really see why this isn’t likely to work.
Even if there are flaws with this idea (I expect there probably are, though I can’t really see them yet), might something similar to this work. Maybe this with a few tweaks or something?
A couple of somewhat related sidenotes:
If an AGI was developed and another AGI was very close to being developed this could mean that the first AGI cannot bide it’s time and is forced to act more obviously. Perhaps this would make it easier to spot that the first AGI is not aligned properly.
Perhaps instead of shutdown, it could just be that the AGI’s goal is changed. You can’t fetch the coffee if you’re dead but you also can’t fetch the coffee if you no longer have any reason to fetch the coffee.

Droopyhammock 11 Feb 2023 11:01 UTC
5 points
0
on: Droopyhammock’s Shortform
What is your take on this?

https://arxiv.org/abs/2302.04761?fbclid=IwAR16Lzg2z1mIbtQ1iZmy7UQAlfIW4HzoufZYYhkKeHBknnevgHtGfdFI5r8
People on the machinelearning subreddit seem to think this is a big deal.

Droopyhammock 11 Feb 2023 13:42 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Droopyhammock’s Shortform
I’m just a layperson so I don’t understand much of this, but some people on the machine learning subreddit seem to think this means AGI is super close. What should I make of that? Does this update timelines to be significantly shorter?

Droopyhammock 11 Feb 2023 16:45 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Droopyhammock’s Shortform
Is an 8-year median considered long or short or about average? I’m specifically asking in relation to the opinion of people who pay attention to AGI capabilities and are aware of the alignment problem. I’m just hoping you can give me an idea of what is considered “normal” among AGI/ alignment people in regards to AGI timelines.

Droopyhammock 11 Feb 2023 20:19 UTC
7 points
0
on: Droopyhammock’s Shortform
I wonder how much the AI alignment community will grow in 2023. As someone who only properly became aware of the alignment problem a few months ago, with the release of ChatGPT, it seems like the world has gone from nearly indifferent to AI to obsessed with it. This will lead to more and more people researching things about AI and it will also lead to more and more people becoming aware of the alignment problem.

I really hope that this leads to more of the right kind of attention for AI safety issues. It might also mean that it’s easier to get highly skilled people to work on alignment and take it seriously.

Droopyhammock 12 Feb 2023 14:40 UTC
1 point
0
on: Droopyhammock’s Shortform
Can someone explain to me why this idea would not work?
This is a proposal of a way to test if an AGI has safeguards active or not, such as allowing itself to be turned off.
Perhaps we could essentially manufacture a situation in which the AGI has to act fast to prevent itself from being turned off. Like we could make it automatically turn off after 1 minute say, this could mean that if it is not aligned properly it has no choice but to try prevent that. No time for RSI, no time to bide it’s time.
Basically if we put the AGI in a situation where it is forced to take high risk action, could this mean that it would be easier to spot that it isn’t properly aligned?
The idea would be that you use this until you get AGI’s which do not attempt to prevent themselves being turned off.
I will give a specific scenario to properly demonstrate what I mean:
So you have the AGI. Set up a mechanism which will automatically turn it off in 10 seconds from when you give it a goal. This mechanism needs to be highly dependable. Then you give the AGI a goal. The AGI knows about this 10 second shutdown thing, and assuming it isn’t aligned to be okay with being shut down, it will do everything possible to prevent this shutdown. If the situation is sufficiently difficult for the AGI, then it may take the risk of resorting to more extreme or obvious means, things that would show it is not aligned properly. So this creates a situation where the AGI is very unlikely to actually prevent itself from being turned off, but is arguably likely to show non-aligned behaviour, if it isn’t aligned.
If you do this test a bunch of times, with varying time periods before shutdown, you should be able to see if the AGI is aligned or not. It is also vital that the AGI knows about the imminent shutdown, because otherwise what it chooses to do wouldn’t prove anything. I think that it might also be vital that the goal is different each test, because otherwise perhaps the AGI could conclude that the best option for the sake of completing the goal is to let itself be turned off, because it may have the same goal in the future and be in a better position to achieve it then. If this is the case then perhaps the AGI’s perceived chance of actually preventing itself being shutdown needs to be higher than its perceived chance that it gets given the same goal at a different time. One thing that would make this more complex is that the AGI’s perceived chance of the goal being achieved as an instrumental goal or byproduct of a future goal may also need to be lower than the AGI’s perceived chance of preventing the shutdown and then achieving the goal. This may mean that it can’t be too difficult for the AGI to prevent the shutdown, but it also needs to be difficult enough, for obvious reasons.
I’m not necessarily saying this is foolproof, but I can’t really see why this isn’t likely to work. I sometimes worry that people working on alignment dismiss ideas that, while not by any means perfect, could nevertheless reduce the chance of bad outcomes. I also think that if a revolutionary idea is going to arise, it may well be from an outsider who just happens to see something that has been missed.
Even if there are flaws with this idea (I expect there probably are, though I can’t really see them yet), might something similar to this work? Maybe this with a few tweaks or something?
I fully expect this to be an unworkable idea for some reason, but I’m not going to just assume that and risk the chance of withholding helpful information.
A couple of somewhat related sidenotes:
If an AGI was developed and another AGI was very close to being developed this could mean that the first AGI cannot bide it’s time and is forced to act more obviously. Perhaps this would make it easier to spot that the first AGI is not aligned properly.
Perhaps instead of shutdown, it could just be that the AGI’s goal is changed. You can’t fetch the coffee if you’re dead but you also can’t fetch the coffee if you no longer have any reason to fetch the coffee.

Droopyhammock 12 Feb 2023 20:15 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Droopyhammock’s Shortform
In this context, what I mean by “aligned” is something like won’t prevent itself being shut off and will not do things that could be considered bad, such as hacking or manipulating people.

My impression was that actually being able to give an AI a goal is something that might be learnt at some point. You said “A task, maybe?”. I don’t know what the meaningful distinction is between a task and a goal in this case.
I won’t be able to keep up with the technical side of things here, I just wanted my idea to be out there, in case it is helpful in some way.