MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
Modern war is rather different from anything most notions of glory in battle or similar might optimize for.
Our instincts evolved around tribal war. Real war is shaped by what tech is most effective. And as tech makes modern war ever more different from tribal war, then it fails to activate those instincts.
This norm sucks for anyone who wants to be in a pub and not get caught in a fight.
EDIT: I now consider this whole approach to be irreparably flawed.
Imagine if their was a 50% chance the button would be pressed. Also the AI was in some betting thing that would pay out paperclips or staples based on a coin flip.
In scenario 1, this AI has an option of setting up a mechanism, external to itself, that controls the coin flip based on if the button is pressed.
In scenario 2, the AI can set up a mechanism to control the button based on the coin flip.
When only considering actions, probabilities and utilities, these look identical. But we want the AI to behave differently. So we must make use of the arrow of time. The structure of cause and effect. In a way we aren’t doing here.
End edit.
To make this work, you would need a utility function for shutting off, including all subagents.
Lets suppose that other than these robots, there are ~0 transistors in your house. Then we can define shutting off as minimizing the number of transistor flips in your house.
So you make
A=at least 1 coffee
B=1/transistor flips.
Note that, when imagining any potential future world where the switch isn’t pressed, the AI has no reason to bother counting transistor flips. And in potential futures where the switch is pressed, it doesn’t need to know what coffee is.
What it cares about are and
Lets list the robot’s options.
Let without coffee, and with coffee.
In worlds where the switch is pressed, the robot turns it’s self off immediately once it is fairly sure the switch will be pressed. Making In worlds where it doesn’t turn off, more transistors flip making . Because in both cases, and we only select from the paerito frontier, whenever the switch is pressed, it will turn off.
Lets apply your utility penalties, by putting them in and . Ie in or depending on the switch.
TC) Press switch, avoid cat. Switch pressed. So . ,
PC) Prevent switch, avoid cat.
IC) Ignore switch, avoid cat.
TH) Press switch, hit cat.
IH) Ignore switch, hit cat (because it predicts humans will see it and turn it off)
PH) Prevent switch, hit cat.
This puts IH and PH on the convex hull.
And I think my algorithm picks between them stochastically.
Suppose we get AI regulation that is full half hearted ban.
There are laws against all AI research. If you start a company with a website, offices etc openly saying your doing AI, police visit the office and shut it down.
If you publish an AI paper on a widely frequented bit of the internet under your own name, expect trouble.
If you get a cloud instance from a reputable provider and start running an AI model implemented the obvious way, expect it to probably be shut down.
The large science funders and large tech companies won’t fund AI research. Maybe a few shady ones will do a bit of AI. But the skills aren’t really there. They can’t openly hire AI experts. If they get many people involved someone will blow the whistle. You need to go to the dark web to so much as download a version of tensorflow, and chances are that’s full of viruses.
It’s possible to research AI with your own time, and your own compute. No one will stop you going around various computer stores and buying up GPU. If you are prepared to obfuscate your code, you can get an AI running on cloud compute. If you want to share AI research under a psudonym on obscure internet forums, no one will shut it down. (Extra boost if people are drowning such signal under a pile of plausible looking nonsense)
I would not expect much dangerous research to be done in this world. And implicit skills would fade. The reasons that make it so hard to repeat the moon landings would apply. (everyone has forgotten the details, tech has moved on, details lost, orginizational knowledge not there)
[Question] What is wrong with this “utility switch button problem” approach?
I don’t know. Even a technical illegality makes it really hard to start an institution like openAI, where you openly have a large budget to hire top talent full time. Also means that the latest algorithms aren’t published in top journals, only whispered in secret. Also, small hobbyist projects can’t get million dollar compute clusters easily.
I think Bob’s answer should probably be.
Look, I care somewhat about improving the world as a whole. But I also care about myself as well.
And I would recommend you don’t go out of your way to antagonize and reject allies with a utility function similar enough to yours that mutual cooperation is easy.
The number of people who are a genuine Alice is rather low.
Also, bear in mind that the human brain has a built in “don’t follow that logic off a cliff” circuit. This is the circuit that ensures crazy suicide cults are so rare despite the huge number of people who “believe” in heaven. (evolution. For whatever reason, people that always took beliefs 100% seriously didn’t survive as well.) It’s a circuit to not take beliefs too seriously, and it might be something like a negotiation between the common sense heuristics part of the brain and the abstract reasoning part. That’s the sort of hackish compromise evolution would make, given a common sense reasoning part that was reliable, but not too smart, and an abstract reasoning part that was smart and not reliable.
I survive AGI but die because we never solve aging: 11%
I survive AGI but die before aging is solved: 1%
I can forsee a few scenarios where we have AGI’ish. Maybe something based on imitating humans that can’t get much beyond human intelligence. Maybe we decide not to make it smarter for safety reasons. Maybe we are dealing with vastly superhuman AI’s that are programmed to do one small thing and then turn off.
In these scenarios, there is still a potential risk (and benefit) from sovereign superintelligence in our future. Ie good ASI, bad ASI and no ASI are all possibilities.
What does your “never solve aging” universe look like. Is this full bio-conservative deathist superintelligence?
Or are you seriously considering a sovereign superintelligence searching for solutions and failing.
Also, why are you assuming solving aging happens after AGI.
I think the probabilities are around 50⁄50.
we’re likely to see advances in medicine before AGI, but nuclear and biorisk roughly counteract that
I am pretty sure taking two entirely different processes and just declaring them to cancel is not on as a good modeling assumption.
Your “simpler is better” is hard to apply. One way of thinking about models where there are no intermediate cardinals isn’t that S doesn’t exist. But that T, a mapping from S to either the naturals or the reals, does exist.
And T will also be something you can’t explicitly construct.
Also, the axiom of choice basically says “there exists loads of sets that can’t be explicitly constructed”.
A few other problems with time bounded agents.
If they are engaged in self modification/ creating successor agents, they have no reason not to create an agent that isn’t time bounded.
As soon as there is any uncertainty about what time it is, then they carry on doing things, just in case their clock is wrong.
(How are you designing it? Will it spend forever searching for time travel?)
Fair enough.
Note: The absence of a catastrophe is also still hard to specify and will take a lot of effort, but the hardness is concentrated on bridging between high-level human concepts and the causal mechanisms in the world by which an AI system can intervene. For that...
Is the lack of a catastrophe intended to last forever, or only a fixed amount of time (ie 10 years, until turned off)
For all Time.
Say this AI looks to the future, it sees everything disassembled by nanobots. Self replicating bots build computers. Lots of details about how the world was are being recorded. Those recordings are used in some complicated calculation. Is this a catastrophe?
The answer sensitively depends on the exact moral valance of these computations, not something easy to specify. If the catastrophe prevention AI bans this class of scenarios, it significantly reduces future value, if it permits them, it lets through all sorts of catastrophes.
For a while.
If the catastrophe prevention is only designed to last a while while other AI is made, then we can wait for the uploading. But then, an unfriendly AI can wait too. Unless the anti catastrophe AI is supposed to ban all powerful AI systems that haven’t been greenlit somehow? (With a greenlighting process set up by human experts, and the AI only considers something greenlit if it sees it signed with a particular cryptographic key.) And the supposedly omnipotent catastrophe prevention AI has been programmed to stop all other AI’s exerting excess optimization on us. (in some way that lets us experiment while shielding us from harm.)
Tricky. But maybe doable.
Evolution can do some things in centuries, if the selection pressure is huge, which it is, and the change is simple, just adjusting a few parameters, which it is.
More to the point, most of the reasons why this model is bunk are technological or cultural changes, not evolution.
I think the whole thing is a load of nonsense. There are lots of things that are likely to impact fertility rates.
Firstly, it doesn’t at all account for people trying to do something, like generous child tax credits or similar. It requires large numbers of humans to see that the population has declined for centuries, and do nothing to fix this.
Then there is the more high tech stuff, artificial wombs and robot childcare, life extention anti aging tech, or a full tech singularity.
Then there are random economic shifts. A rise in remote work sees many people leaving cities, they move to large countryside houses with room for many children. (And the ability to look after kids while working)
Then there is evolution, biological and cultural.
Then the economic conditions that created the fall in fertility require a level of tech and wealth. What is the proposed model of the economy doing here. If we are reverting to medieval serfdom, fertility will rise. If we are looking at a high tech and wealthy society, why have they not invented any of the techs that would change the game? Is biological immortality really that hard? And more to the point, what disaster would kill 500 million wealthy, high tech people spread across the world that wouldn’t also kill 8 billion people?
I am really struggling to imagine any model of the future that fits their graph. As far as I can tell, their model was constructed by looking at some fertility data, and then pretending that, apart from the changes in fertility and population, nothing else would ever happen.
The attractiveness of “making children” can also grow. Imagine an automated robotic baby changing table.
When it’s robots that do all the boring or messy bits, leaving parents with only the fun bits, then parenting becomes more attractive. But sure, maybe video games are winning out in the attractiveness race.
Although a population decline requires people die of something. So this scenario makes sense if we invent super video games, but not immortality.
Arguments that stretch significantly less far into the future still need to contend with that.
If you have some utility function that depends on the amount of money you have, then the improvement from a bet that offers a 45% chance of winning a prize to one that offers a 55% chance is identical to the improvement from a bet that offers a 90% chance to one offering a 100% chance.
Note that this holds only when you have no “intermediate choices”.
Suppose you are pretty short of cash at the moment. And you might be getting a prize tomorrow. You have a chance to buy a fancy meal now. If you buy the fancy meal, and then don’t get the prize, you will really be struggling to pay off your bills. So it only makes sense to buy the fancy meal if you are >95% sure that you are getting the prize.
In this setup, it does make sense to value the extra certainty.
This is all assuming you don’t terminally value certainty in and of itself. You terminally value something else. (If not, then you risk being money pumped where you pay to learn info, even though you can’t use that info for anything)
But even if certainty isn’t a terminal goal, it can be an instrumental goal.
The framing effect thing is about the chance of winning some prize. Why would you want certainty, what you want is the prize.
Is the variation by person, or are their different varieties of silicone in use.
Or is silicone hard to remove some flavor molecules from, meaning a new piece of silicone won’t produce bad tastes, but an old piece will.
If some external threat appeared, well some of those joyous minds explored antimatter rocketry. They explored it as a puzzle, an intellectual curiosity. The optimal form being automatically calculated by the AI. But if the AI that protects them disappeared and they turned that knowledge to practical use, they could design an antimatter bomb.
The suffering child mind has no skills with which to defend itself.
Skimmed the paper. Can’t find a clear definition of what this ALU actually measures. So I don’t know whether this is impressive or not. (It’s too short to be searchable)