Depending on the agent implementation you may find that it is demotivated to achieve any useful outcome if they are power limited. Half-assing things seems pointless and futile, they aren’t sane actions in the world. E.g. trying to put out a fire when all you have is a squirt gun.
I’m someone who is moving in the opposite direction mainly (from AI to climate change). I see AGI as a lot harder to do than most, mainly due to the potential political ramifications causing slow development and thinking it will need experiments with novel hardware, so is more visible than just coding. So I see it as relatively easy to stop, at least inside a country. Multi-nationally would be trickier.
Some advise, I would try and frame your effort as “Understanding AGI risk”. While you think there is risk currently, having an open mind about the status of the risk is important. If AGI turns out to be existential risk-free then it could help with climate adaptation, even if it is not in time for climate mitigation.
Edit: You could frame it just as understanding AI, and put together independent briefs on each project for policy makers to understand the likely impacts both positive and negative and the state of play. Getting a good reputation and maintaining independence might be hard though.
A theory I read in “Energy and Civilisation” by Vaclav Smil is that we could get a big brain by developing tools and techniques (like cooking) to reduce the need for a complicated guts by having a higher quality diet.
This is connected to the Principled Intelligence hypothesis, because things like hunting or maintaining a fire require cooperation and communication. Maintaining the knowledge through a tribe for those things also required consistent communication. If you don’t all have the same word for ‘hot’ and use it in the same way, lots of people are going to get burned or go hungry. So you need a norm policing mechanism for the meaning of words in order for language to be at all useful for coordination and the transfer of culture needed for survival.
This is not norm policing against people trying to trick you, just norm policing against jibberish. It probably works against both to an extent.
All the tribes where a modicum of norm policing didn’t go to fixation couldn’t coordinate or communicate at all. And went back to the old process of eating roots and got out competed by those that did.
My view is that you have to build AIs with a bunch of safeguards to stop it destroying *itself* while it doesn’t have great knowledge of the world or the consequences of its actions. So some of the arguments around companies/governments skimping on safety don’t hold in the naive sense.
So things like how do you :
Stop a robot jumping off something too high
Stop an AI DOSing it’s own network connection
Stop a robot disassembling itself
When it is not vastly capable. Solving these things would give you a bunch of knowledge of safeguards and how to build them. I wrote about some of problems here
It is only when you expect a system to radically gain capability without needing any safeguards, does it makes sense to expect there to be a dangerous AI created by a team with no experience of safe guards or how to embed them.
As a data point for why this might be occurring. I may be an outlier, but I’ve not had much luck getting replies or useful dialogue from X-risk related organisations in response to my attempts at communications.
My expectation, currently. is that if I apply I won’t get a response and I will have wasted my time trying to compose an application. I won’t get any more information than I previously had.
If this isn’t just me, you might want to encourage organisations to be more communicative.
My view is more or less the one Eliezer points to here:
The big big problem is, “Nobody knows how to make the nice AI.” You ask people how to do it, they either don’t give you any answers or they give you answers that I can shoot down in 30 seconds as a result of having worked in this field for longer than five minutes.
There are probably no fire alarms for “nice AI designs” either, just like there are no fire alarms for AI in general.
Why should we expect people to share “nice AI designs”?
For longer time frames where there might be visible development, the public needs to trust that the political regulators of AI to have their interests at heart. Else they may try and make it a party political issue, which I think would be terrible for sane global regulation.
I’ve come across pretty strong emotion when talking about AGI even when talking about safety, which I suspect will come bubbling to the fore more as time goes by.
It may also help morale of the thoughtful people trying to make safe AI.
I think part of the problem is that corporations are the main source of innovation and they have incentives to insert themselves into the things they invent so that they can be trolls and sustain their business.
Compare email and facebook messenger for two different types of invention, with different abilities to extract tolls. However if you can’t extract a toll, it is unlikely you can create a business around innovation in an area.
I had been thinking about metrics for measuring progress towards shared agreed outcomes as a method of co-ordination between potentially competitive powers to avoid arms races.
I passed around the draft to a couple of the usual suspects in the ai metrics/risk mitigation in hopes of getting collaborators. But no joy. I learnt that Jack Clark of OpenAI is looking at that kind of thing as well and is a lot better positioned to act on it, so I have hopes around that.
Moving on from that I’m thinking that we might need a broad base of support from people (depending upon the scenario) so being able to explain how people could still have meaningful lives post AI is important for building that support. So I’ve been thinking about that.
To me closed loop is impossible not due to taxes but due to desired technology level. I could probably go buy a plot of land and try and recreate iron age technology. But most likely I would injure myself, need medical attention and have to reenter society.
Taxes aren’t also an impediment to close looped living as long as waste from the tax is returned. If you have land with a surplus of sunlight or other energy you can take in waste and create useful things with it (food etc). The greater loop of taxes has to be closed as well as well as the lesser loop.
From an infosec point of view, you tend to rely on responsible disclosure. That is you tell people that will be most affected or that can solve the problem for other people, they can create counter measures and then you release those counter measures to everyone else (which gives away the vulnerability as well), who should be in a position to quickly update/patch.
Otherwise you are relying on security via obscurity. People may be vulnerable and not know it.
There doesn’t seem to be a similar pipeline for non-computer security threats.
Similarly, it is not irrational to want to form a cartel or political ingroup. Quite the opposite. It’s like the concept of economic moat, but for humans.
And so you get the patriarchy and the reaction to it feminism. This leads to the culture wars that we have to day. So it is locally optimal but leads to problems in the greater system.
How do we escape this kind of trap?
I’m reminded of the quote by George Bernard Shaw.
“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”
I think it would be interesting to look at the reasons and occasions not to follow “standard” incentives.
I’ve been re-reading a sci-fi book which has the interesting Existential Risk scenario where most people are going to die. But some may survive.
If you are a person on earth in the book, you have the choice of helping out people and definitely dieing or trying desperately to be one of the ones to survive (even if you personally might not be the best person to help humanity survive).
In that situation I would definitely be in the “helping people better suited for surviving” camp. Following orders because the situation was too complex to keep in one persons head. Danger is fine because you are literally a dead person walking.
It becomes harder when the danger isn’t so clear and present. I’ll think about it a bit more.
The title of the book is frirarirf (rot13)
rm -f double-post
She asked my advice on how to do creative work on AI safety, on facebook. I gave her advice as best I could.
She seemed earnest and nice. I am sorry for your loss.
Dulce et Decorum Est Pro Huminatas Moria?
As you might be able to tell from the paraphrased quote I’ve been taught some bad things that can happen when this is taken too far.
Therefore the important thing is how we, personally, would engage with that decision if it came from outside.
For me it depends on my opinion of the people on the outside. There are four things I weigh:
Epistemic rigour. With lots of crucial considerations around existential risk, do I believe that the outside has good views on the state of the world? If they do not, they/I may be doing more harm than good.
Are they trying to move to better equilibria? Do they believe in winner take all or are they trying to plausibly pre-commit to sharing the winnings (with other people who are trying to plausibly pre-commit to sharing the winnings). Are they trying to avoid the race to bottom? It doesn’t matter if they can’t, but not trying at all means that they may miss out on better outcomes.
Feedback mechanisms: How is the outside trying to make itself better? It may not be good enough in the first two items, but do they have feedback mechanisms to improve them?
Moral uncertainty. What is their opinion on moral theory. They/I may do some truly terrible things if they are too sure of themselves.
My likelihood of helping humanity when following orders stems from those considerations. It is a weighty decision.
I’m interested in seeing where you go from here. With the old lesswrong demographic, I would predict you would struggle, due to cryonics/life extension being core to many people’s identities.
I’m not so sure about current LW though. The fraction of the EA crowd that is total utilitarian probably won’t be receptive.
I’m curious what it is that your intuitions do value highly. It might be better to start with that.
Has anyone done work on a AI readiness index? This could track many things, like the state of AI safety research and the roll out of policy across the globe. It might have to be a bit dooms day clock-ish (going backwards and forwards as we understand more) but it might help to have a central place to collect the knowledge.
Out of curiosity what is the upper bound on impact?