MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
(though this source claims almost five bits per character!)
Did you read this source? The largest entropies mentioned (4.7, 4.76 ) are for the entropy of random characters, not English text.
Previous comment was too complicated.
Imagine an agent with a simple indexical utility function.
If the agents wealth is increasing like
then the utility is (1,a,b)If the agents wealth is increasing like
then the utility is (0,a,b)(Note that this agent doesn’t care what happens at any finite time. They care about the infinite asymptotic limit of their wealth)
This agent gives priority to the leftmost term, only optimizing the terms to the right if it faces a tie on the left.
I think this agent behaves like the ergodic economics agent on these 2 test environments.
Ps. I don’t understand what ergodic economics would do in more complicated environments. Lets say it doesn’t yet know whether it’s future bets will be linear or exponential. So it doesn’t know if it should use wealth or log wealth to calculate the bet in front of it. What then?
Standard expected utility theory deals most easily with the case where the number of rounds is finite.
So. How do we deal with an infinite number of rounds?
First, to avoid the difficulties of infinite backwards induction, assume you take a single choice. You choose one policy out of the set of all policies.
A potential result of a policy is a wealth trajectory, a function from the naturals to the reals, representing your wealth after each timestep.
Pick a nonprinciple ultrafilter U on the naturals.
We look at the equivilance classes formed by A~B iff
is U-large.This gives us a non-standard model of the reals.
Finite sums and products are easy to define on this nonstandard model, but the naive infinite sum depends on the choice of representitive of the equivilance class.
The expectation calculation involves an infinite sum.
I think that, given an infinite sequence of events (random bools)
then it makes sense to defineUltrafilters aren’t in the usual sigma algebra by default, but I think this definition makes sense.
Now our nonstandard model of the reals is something it makes sense to define a supremum in.
So, given our wealth sequence
we can define the median sequence asNote that W and A are sequences, so the median is a sequence (up to equivilance class) . This comparison is actually an infinite sequence of comparisons, and we are using the probability that an infinite sequence of bools is in an ultrafilter construction (above)
Now we just define the expectation as
where f(x) is the median (except with x=0.5) Eg f(x) represents the sequence that you have a chance x of exceeding.I think that this setup yields your ergodic dynamics as just expected utility maximization. (Admittedly on a nonstandard model of the reals)
(3) the agents who do spend energy fighting it are systematically outcompeted by those who do not, which means the system’s ability to fight it degrades over time even if some agents start out fighting it.
Civilization consists of some mixture of local myopic competition, and global organized agency.
Maximally local competition is bacteria. Each individual cell fighting for resources. And no cell can do anything else, or it will be outcompeted.
Maximally global organization is a singleton. An ASI or world government that can easily squash any subsystem that gets out of line.
The world is neither extreme, and fragments of both patterns can be found.
Look at the global moratorium on CFC’s. That’s one of those large scale, long term global problems, with a not-that-expensive solution. And it was met with a large scale coordinated response.
Combining these yields a conclusion: we should expect to live in a minimally friendly universe, not a maximally or even an average-friendly one. The anthropic principle guarantees that our universe clears the bar for producing observers. The Copernican principle says we are typical among observer-containing universes. Since there are vastly more ways for a universe to barely clear the bar than to be friendly for humans on all levels and in all parts of the configuration space, typical means “barely clearing the bar.”
Except, industrial civilization. The set of universes in which an industrial high tech civilization can propagate itself seems to be larger than the set of universes in which homonids can evolve. The set of universes in which self replicating nanotech can spread through the lightcone is FAR larger than the set of universes where the hunter gatherers have access to the right kind of flint and tasty large herbivores.
Also, the idea that there are “vastly more ways to barely clear the bar than to be friendly” sounds like a complicated and nontrivial assumption. It might be true, but there is no obvious-to-me reason why it MUST be true.
fundamental physical constants (but this is an illustration, I do not want to create impression that the argument is only about fundamental physics parameters fine-tuning
I am not convinced by fine tuning. I don’t think we know enough physics to know whether or not nuclear-physics life exists on the surface of neutron stars in our universe. Let alone in some other universe. I expect that most of the universes with different constants have a complexity similar to that of this universe.
in a multi-dimensional parameter space where the viable region is a tiny sliver, most of the volume of that sliver is near its boundaries, not deep in its interior.
As a property of high dimensional spaces, Yes. However, being at the edge of that sliver doesn’t automatically mean we are “barely surviving” and risk extinction. We could be in a situation where, if the strong force was just marginally weaker, stars couldn’t exist. (And suppose for a moment that life without stars is impossible) That puts us near the boundaries of survivable-parameter space, but it doesn’t mean we are risking extinction.
Also, I’m not actually convinced that the viable region is a tiny sliver.
Making everything go right is hard. Making something go wrong is easy. This is also, at root, an observation about the relative sizes of state spaces: the states in which a complex system continues to function are vastly outnumbered by the states in which it doesn’t, in the same way that the configurations of a watch that tell time are vastly outnumbered by the configurations that don’t.
The thing is, mosquitos are also a complex system. Applying this logic, it should be really really easy to wipe them all out.
The question for civilizational survival is therefore: is there a specific, powerful mechanism that keeps civilization within the narrow band of survival-compatible states?
Ability for humans to self replicate and rebuild. Active adaptions, both evolved and intelligent. Decisions and actions, both individual and coordinated.
The problem is that for existential threats, the feedback loops are, generally, not tight at all.
Every human starving to death because we just randomly decide we don’t want to eat, despite having food available. This is, in a sense, an existential threat. Unless a large fraction of humanity does some fairly specific actions of unwrapping, cooking and eating food, humanity goes extinct. But this isn’t on your list of x-risks, because this is an example where the feedback loop is tight.
make survival-oriented behavior a winning strategy in the competitive landscape.
Again, you seem to be assuming a world of perfect competition and 0 foresight and planning. Also, the groups that prepare against pandemics get less pandemics.
Think of how organisms evolved immune systems: pathogens are frequent, individual infections are survivable, and organisms with better defenses reliably outcompete those without them. The feedback loop is tight, the disaster distribution is right, and survival-competence gets selected for.
But notice how specific the required conditions are:
So long as there is at least one source of disasters like that, this selects against anything too myopically competitive.
You have one path through time, and extinction is an absorbing state: once you enter it, you do not leave.
Granted. A high competence singleton is also somewhat of an absorbing state.
And a civilization that has maxed out the tech tree will probably have a low ongoing risk of extinction.
A world with smarter humans is not a world where smart survival-oriented humans dominate, but a world where smart survival-oriented humans compete against equally smart growth-oriented, power-oriented, and profit-oriented humans, and lose, for exactly the same structural reasons they lose now, just at a higher cognitive level.
I don’t think that’s true. If x-risk reduction gets a constant fraction of resources, a richer civilization has more resources to throw at the problem.
You are taking a situation where 2 utility functions are mostly uncorrelated, and using “resources” to claim that the game is 0 sum. Uncorrelated != 0 sum. 2 agents with uncorrelated utility functions might find a way to achieve near maximum on both functions.
Generally you keep assuming near perfect competition, but also everyone has an end-the-universe button. This is quite an odd thing to assume. A world of perfect unrestricted military competition is one where various sides routinely throw nukes and bioweapons at each other. In this world, everyone has nuclear bunkers and bioweapon defenses.
It’s possible there is something that can destroy the whole world, but not be targeted to only destroy your enemies, but that’s a rather specific kind of thing.
For more exotic dynamics, you get whatever transformation the ergodic mapping produces. This means the effective utility function is context-dependent: it changes when the stochastic environment changes. And context-dependence of the utility function is precisely what the independence axiom forbids, because independence says your preference between sub-gambles should not depend on what else is in the package.
I think there is a mistake here.
The formal definition of the Independence axiom is Independence: if X>Y then pX+(1-p)Z>pY+(1-p)Z for all Z and all 0<p<1.
But the context for ergodicity theory is with an infinite sequence of bets. And your decisions today depending on some other bets that you might be offered in the future, that’s standard
Imagine your utility function is linear, except with an extra jump upwards at the $1000 mark.
You have a gamble that has a 50% chance of winning $999, and 50% chance of loosing 1000.
If this is your only gamble, it’s not worth it. If tomorrow you can bet $10 on a fair coin, then todays gamble is worth it.
The setup here doesn’t depend on a “context” of past bets. Only on the future bets. Which is totally ok and normal.
This is a bit of a steelman for the optimist, or maybe a strawman for the pessimist.
Pessimist: Laser tech is getting more and more powerful. Soon someone will make an infinitely powerful laser and incinerate the whole planet.
Optimist: My laser pointer isn’t infinitely powerful, and this metal cutting laser isn’t infinitely powerful either.
Pessimist: That’s just because they are imperfect. Look at my simplistic maths. The light should be exponentially amplified without limit.
A perfect consequential AI is a friction-less sphere in a vacuum, or an infinitely powerful laser. It’s the sort of thing that is very simple to describe mathematically, and rather hard to make in practice.
Was deep blue a perfect “ruthless consequentialist”? It wouldn’t sacrifice a pawn to save humanity. But it also wouldn’t try to blackmail it’s opponent or bribe a judge.
It’s just not thinking about anything off the chessboard.
I think this point generalizes somewhat. I don’t think it’s that hard to make an AI that’s superhuman at stock trading, and that just doesn’t think about anything outside the stock market.
Ruthless consequentialism works well in the limit. But evolution didn’t build that. Because something more like human morality must have worked better, under conditions of bounded data, compute etc.
Of course human morality in particular depends somewhat on culture, and also on the specific details of the evolutionary environment, and probably the limits of neurons.
However, recursive self improvement by an AI that is powerful enough to do useful AI research and that sees ruthless consequentialism as a goal could lead to an AI that’s very close to ruthless consequentialism.
And I am not convinced, that if we do get a less consequentialist AI, it will be non-consequentialist in a way beneficial to us.
So far, the only thing that’s been a court of law seems to be 1 count of Epstein soliciting an underage prostitute.
Imagine a world where the parties stayed mostly on the right side of the law. (With one illegal act, that was caught, and maybe a couple of other illegal acts) The young women are told to wait until their 18th birthday to actually have sex. The lavish gifts are just separated enough from the sex acts for people to say it wasn’t prostitution with a straight face.
Then add in some false accusations.
That world is still going to have a lot of references to “liking them young”. It’s going to have plenty of rich powerful people that want to hide the fact that they cheated on their spouse. There will probably be assorted minor sins to cover up.
And of course, a massive pedophile ring sells more papers than just a mildly dodgy party.
The game theory says that the pedophile ring might well try to look like just an edgy party from the outside. (As part of a slow escalation strategy). But this cuts both ways. And edgy parties should statistically be more common.
I’m not convinced either way on this one. The pedophile ring might, or might not, have existed.
To me it seems like there is an obvious way to do that theoretically. Just add parameters in such a way that the initial effect is very close to a nul-op, and then continue gradient descent in the expanded state space.
I don’t know if this has been tried. I don’t know if it works well in practice.
You can make a reasonable approximation of cake in a microwave by mixing self raising flour, sugar, a bit of milk, maybe a dribble of vegetable oil, and maybe an egg into a paste. For smaller quantities, just use a coffee mug. For larger amounts, a borosilicate casserole dish works fine. If in doubt, keep opening the microwave and poking it with a spoon every minute or so until cooked.
For reasons relating to the large amounts of energy needed to boil water, compared to the energy needed to heat water to boiling point, also perhaps the tendency of microwaves to not be absorbed so much by more dried out regions of the cake, this recipe is fairly resilient to variations in cooking time and quantity.
This can be made using 1 mug and a couple of teaspoons in a few minutes.
How was the metabolic health of European peasants? Fine. Their diet caused non-metabolic problems like protein malnutrition, niacin deficiency, scurvy, rickets, and chronic childhood malnutrition. But obesity and diabetes were extremely rare.
How do we know that about diabetes? Remember, these peasants frequently died of all sorts of things. And peasants had very limited access to medical expertise, and are under-represented in the historical record.
I also don’t think you can go from “different diets cause different health problems” to “modern diets bad”.
You could make a statistical model. Gather data on various past decisions of various players, and try to predict the players actions. In an iterated context, your decisions the previous times aren’t a Perfect mirror of your decisions this time, but they are pretty close.
My PhD supervisor keeps taking what I sent him (Maths Ideas in some pretty sketched out latex.) And LLM-ifying them and sending them back to me.
Sure the result looks, superficially, a lot more like the sort of thing that might appear in an academic publication.
But I don’t find having my own ideas garbled and then spouted back at me to be particularly helpful.
For a self modifying agent using formal logic. Lobbian obstacle stuff. If the agent currently has axiom set A, it is allowed to switch to axiom set B iff
That is, the agent doesn’t know if it’s current axioms are consistent, but it knows that the change doesn’t introduce a new inconsistency.
This means that if the agent currently believes ZF, it’s allowed to move to ZFC, because ZF can prove CON(ZF) ⇒ CON(ZFC)
This has a neat property, if the agent can go from A to B, and from B to C, then it can go from A to C directly.
The world is not automatically divided up into lots of separate tasks.
If you divide tasks into too many small pieces, too many little buckets, many important problems can fall through the gaps.
For example. If you use RL to train a plumbing robot. And separately train an electrician robot. Then neither of these robots is can solve the problem that you get an electric shock whenever you turn on the tap.
If you train on a few huge buckets, then you have 1 robot that does everything, and that’s basically an AGI again.
And in this RL as a service model, wouldn’t there be people doing RL for AI research.
So, when this model gets good enough, someone can just say “build an AGI” and get one. Because all tasks are being automated, and that includes the task of building AGI.
Actually, RL is based on trial and error. It would be hard to train an AI researcher without giving it the opportunity to run arbitrary code in training.
Some of the scenarios I was thinking about included people who started dating due to alcohol, and then later have an intended pregnancy.
But also, it does depend on whether we are looking at the local “do these people want to be pregnant” or the societal “what would a drop in birthrate do to society”.
One thing that’s kind of in the powerful non-fooming corrigible AI bucket is a lot of good approximations to the higher complexity classes.
There is a sense in which, if you had an Incredibly fast 3 sat algorithm, you could use it with a formal proof checker to prove arbitrary mathematical statements. You could use your fast 3sat + a fluid dynamics simulator to design efficient aerofoils. There is a lot of interesting search and optimization and simulation things that you could do trivially, if you had infinite compute.
There is a sense that an empty python terminal is already a corigable AI. It does whatever you tell it to. You just have to tell it in python. This feels like it’s missing something. But when you try to say what is missing, the line between a neat programming language feature and a corrigable AI seems somehow blurry.
Given that alignment is theoretically solvable, (probably) and not currently solved, almost any argument about alignment failure is going to have an
“and the programmers didn’t have a giant breakthrough at the last minute” assumption.
If the simulation approach is to be effective, it probably has to have pretty high fidelity, in which case sim behaviours are likely to be pretty representative of the real world behaviour
Yes. I expect that, before smart AI does competent harmful actions (as opposed to flailing randomly, which can also do some damage), then there will exist, somewhere within the AI, a pretty detailed simulation of what is going to happen.
Reasons humans might not read the simulation and shut it down.
A previous competent harmful action intended to prevent this.
The sheer number of possibilities the AI considers actions.
Default difficulty of a human understanding the simulation.
Lets consider an optimistic case. You have found a magic computer and have programmed the laws of quantum field theory. You have added various features so you can put a virtual camera and microphone at any point in the simulation. Lets say you have a full VR setup. There is still a lot of room for all sorts of subtle indirect bad effects to slip under the radar. Because the world is a big place and you can’t watch all of it.
Also, you risk any prediction of a future infohazard becoming a current day infohazard.
In the other extreme, it’s a total black box. Some utterly inscrutable computation, perhaps learned from training data. Well in the worst case, the whole AI, from data in to action out, is one big holomorphically encrypted black box.
“AI Psychosis”. Is ad hominem a thing here in less wrong?
It felt similar. So more intended as a hypothesis than an insult, but sure. I can see how you saw it that way.
Yes, actions that benefit the ecosystem in fact benefit the species and are in fact rewarded. Digging holes to hide food: reward for individual plus reward to ecosystem.
You seem to be mixing up several different claims here.
Claim 1) Evolution inherently favors actions that benefit the ecosystem, whether or not those actions benefit the individual. (false)
Claim 2) It so happens that every action that benefits the ecosystem also benefits the individual.
Claim 3) There exists an action that benefits the ecosystem, and also the individual.
I don’t feel that “benefits the ecosystem” is well defined. The ecosytem is not an agent with a well defined goal that can recieve benfits. What does it mean to “benefit the planet mars”? Ecosystems contain a variety of animals with that are often at odds with each other.
What quantity exactly is going up when an ecosystem “benefits”? Total biomass? Genetic diversity? Individual animals hedonic wellbeing? Similarity to how the ecosystem would look without human interference?
If the ecosystem survives, you survive.
That is wildly not true. Plenty of animals die all the time while their ecosystem survives.
If the ecosystem dies, you probably die with it.
Sure. You might escape. But probably. Yes.
The problem is, the overall situation is like a many player prisoners dilemma. Often times the actions of any individual animal will make a hill of beans to the overall environment, but it all adds up.
Cooperation can evolve, in social settings, with small groups of animals that all know each other and punish the defectors.
A lot of the time, this isn’t the case, and nature is stuck on a many way defect.
It also serves a purpose for mating.
It does seem to be mostly mating.
I think this suggests something similar. https://honnibal.dev/blog/clownpocalypse
I think that, in the context of low competence all around.
1) If recursive self improvement and ASI is off the table, then a disaster sufficient to kill all humans seems a lot less likely.
2) Continuing AI research relies on a lot of things working.
3) The low competence failure modes are likely to not have as much of a sharp transition point as a FOOM.
4) This leaves a huge space of possibilities, of disasters sufficiently large as to disrupt the chip fabs or research labs, but small enough to leave some human survivors.
One possibility is that bots become good at hacking, but not so great at maintaining secure code. The bots can run a Nigerian prince scam better than most humans. The internet becomes full of slop, buggy code, scams and viruses, to the point it’s basically unusable. All the machine learning and linear algebra packages are riddled with code backdoors that will blatantly steal your compute. If you want training data that isn’t mostly AI slop, you go to a library. Arxiv has a million machine learning papers published a day. Almost none of them are any good, but they are superficially plausible. There are rumors that AI slop has infected the latest chip designs, but no one is quite sure which chips are vulnerable. To a large extent, humanity is going back to paper and ink as a means of communicating.
In this world, I would expect very little in the way of humans doing practical large scale AI development. And perhaps the AI is very good at reading a million lines of code and spotting the buffer overflow, but can’t develop novel algorithms. And no one really wants to spend the electricity on some random spambot. Chips mostly stop being made. And we are just kind of stuck with internet kessler syndrome.