Currently studying postgrad at Edinburgh.
Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation.
Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn’t thinking about morality at all.
These are just different ways to describe the same thing.
Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all.
I think the assumption it that human-like morality isn’t universally privileged.
Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours.
In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated.
In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.
if a human had been brought up to have ‘goals as bizarre … as sand-grain-counting or paperclip-maximizing’, they could reflect on them and revise them in the light of such reflection.
Human “goals” and AI goals are a very different kind of thing.
Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by “far surpass all the intellectual activities of any man however clever.”.
The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.
Humans “ability” to reflect on and change our goals is more that we don’t really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind. We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their “goals”. And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears “be nice to people”, and not when it hears “maximize paperclips”.
The training procedure is only judging based on actions during training. This makes it incapable of distinguishing between an agent that behaves in the box, and runs wild the moment it gets out the box, from an agent that behaves all the time.
The training process produces no incentive that controls the behaviour of the agent after training. (Assuming the training and runtime environment differ in some way.)
As such, the runtime behaviour depends on the priors. The decisions implicit in the structure of the agent and training process, not just the objective. What kinds of agents are easiest for the training process to find. A sufficiently smart agent that understands its place in the world seems simple. A random smart agent will probably not have the utility function we want. (There are lots of possible utility functions.) But almost any agent with real world goals that understands the situation its in will play nice on the training, and then turn on us in deployment.
There are various discussions about what sort of training processes have this problem, and it isn’t really settled.
I don’t think this research, if done, would give you strong information about the field of AI as a whole.
I think that, of the many topics researched by AI researchers, chess playing is far from the typical case.
It’s [chess] not the most relevant domain to future AI, but it’s one with an unusually long history and unusually clear (and consistent) performance metrics.
An unusually long history implies unusually slow progress. There are problems that computers couldn’t do at all a few years ago that they can do fairly efficiently now. Are there problems where people basically figured out how to do that decades ago and no significant progress has been made since?
The consistency of chess performance looks like more selection bias. You aren’t choosing a problem domain where there was one huge breakthrough that. You are choosing a problem domain that has had slow consistent progress.
For most of the development of chess AI (All the way from Alpha Beta pruning to Alpha Zero) Chess AI’s improved by an accumulation of narrow, chess specific tricks. (And more compute) How to represent chess states in memory in a fast and efficient manor. Better evaluation functions. Tables of starting and ending games. Progress on chess AI’s contained no breakthroughs, no fundamental insights, only a slow accumulation of little tricks.
There are cases of problems that we basically knew how to solve from the early days of computers, any performance improvements are almost purely hardware improvements.
There are problems where one paper reduces the compute requirements by 20 orders of magnitude. Or gets us from couldn’t do X at all, to able to do X easily.
The pattern of which algorithms are considered AI and which are considered maths and which are considered just programming is somewhat arbitrary. A chess playing algorithm is AI, a prime factoring algorithm is maths, a sorting algorithm is programming or computer science. Why? Well those are the names of the academic departments that work on them.
You have a spectrum of possible reference classes for transformative AI that range from the almost purely software driven progress, to the almost totally hardware driven progress.
To gain more info about transformative AI, someone would have to make either a good case for why it should be at a particular position on the scale, or a good case for why its position on the scale should be similar to the position of some previous piece of past research. In the latter case, we can gain from examining the position of that research topic. If hypothetically that topic was chess, then the research you propose would be useful. If the reason you chose chess was purely that you thought it was easier to measure, then the results are likely useless.
In front of you is several experimental results relating to an obscure physics phenomena. There are also 6 proposed explanations for this phenomena. One of these explanations is correct. The rest were written by people who didn’t know the experimental results (and who failed to correctly deduce them based on surrounding knowledge) As such, these explanations cannot shift anticipation towards the correct experimental result. Your task is to distinguish the real explanation from the plausible sounding fake explanations.
On your screens you should see a user interface that looks somewhat like a circuit simulator program. You have several different types of components, available, and connectors between them. Some of the components contain an adjustable knob. Some contain a numeric output dial and some have neither.
These components obey some simple equations. These equations do not necessarily correspond to any real world electronic components. To encourage thought about which experiment to perform next, you will be penalized for the number of components used. You are of course free to reuse components from one experiment in the next, but some experiments will break components. Most broken components will have a big red X appear on them. However at least one component can be broken in a way that doesn’t have any visual indicator. You may assume that all fresh components are perfectly identical. You may use a calculator. Your task is to figure out the equations behind these components. Go.
The environment rolls 2 standard 6 sided dice , one red and one blue. The red dice shows the number of identical copies of your agent that will be created. Each agent will be shown one of the 2 dice. This is an independent coinflip for each agent. The agents are colourblind so have no idea which dice they saw. They just see a number. The agents must assign probabilities to each of the 36 outcomes, and are scored on the log of the probability they assigned to the correct outcome.
Write code for an agent that maximizes its total score across all copies.
Write code for an agent that maximizes its average score.
Explain how and why these differ.
I am not confident that GDP is a useful abstraction over the whole region of potential futures.
Suppose someone uses GPT5 to generate code, and then throws lots of compute at the generated code. GPT5 has generalized from the specific AI techniques humans have invented, seeing them as just a random sample from the broader space of AI techniques. When it samples from that space, it sometimes happens to pull out a technique more powerful than anything humans have invented. Given plenty of compute, it rapidly self improves. The humans are happy to keep throwing compute at it. (Maybe the AI is doing some moderately useful task for them, maybe they think its still training.) Neither the AI’s actions, nor the amount of compute used are economically significant. (The AI can’t yet gain much more compute without revealing how smart it is, and having humans try to stop it.) After a month of this, the AI hacks some lab equipment over the internet, and sends a few carefully chosen emails to a biotech company. A week later, nanobots escape the lab. A week after that and the grey goo has extinguished all earth life.
Alternate. The AI thinks its most reliable route to takeover involves economic power. It makes loads of money performing various services, (Like 50% GDP money) It uses this money to buy all the compute, and to pay people to make nanobots. Grey goo as before.
(Does grey goo count as GDP? What about various techs that the AI develops that would be ever so valuable if they were under meaningful human control?)
So in this set of circumstances, whether there is explosive economic growth or not depends on whether “Do everything and make loads of money” or “stay quiet and hack lab equipment” offer faster / more reliable paths to nanobots.
In a game with any finite number of players, and any finite number of actions per player.
Let O=A1×A2×... the set of possible outcomes.
Player i implements policy Pi:P(O)→Ai . For each outcome in o∈O , each player searches for proofs (in PA) that the outcome is impossible. It then takes the set of outcomes it has proved impossible, and maps that set to an action.
There is always a unique action that is chosen. Whatsmore, given oracles for
Ie the set of actions you might take if you can prove at least the impossility results in U and possibly some others.
Given such an oracle Qi for each agent, there is an algorithm for their behaviour that outputs the fixed point in polynomial (in |O| ) time.
The next task to fall to narrow AI is adversarial attacks against humans. Virulent memes and convincing ideologies become easy to generate on demand. A small number of people might see what is happening, and try to shield themselves off from dangerous ideas. They might even develop tools that auto-filter web content. Most of society becomes increasingly ideologized, with more decisions being made on political rather than practical grounds. Educational and research institutions become full of ideologues crowding out real research. There are some wars. The lines of division are between people and their neighbours, so the wars are small scale civil wars.
Researcher have been replaced with people parroting the party line. Society is struggling to produce chips of the same quality as before. Depending on how far along renewables are, there may be an energy crisis. Ideologies targeted at baseline humans are no longer as appealing. The people who first developed the ideology generating AI didn’t share it widely. The tech to AI generate new ideologies is lost.
The clear scientific thinking needed for major breakthroughs has been lost. But people can still follow recipes. And make rare minor technical improvements to some things. Gradually, idealogical immunity develops. The beliefs are still crazy by a truth tracking standard, but they are crazy beliefs that imply relatively non-detrimental actions. Many years of high, stagnant tech pass. Until the culture is ready to reembrace scientific thought.
A different way of going about this is to unpack what we mean by “god”. We can ask about morality in worlds that contains something similar, something that is arguably “god” and arguably not.
Polytheism. A whole bunch of gods that worked together to create the world. Between them they wrote the bible, koran and most other religious texts.
A god matching all the typical descriptions, spaceless timeless creator of life and earth.. Responsible for all the bible and jesus stuff. This god is made out of particles behaving under physical laws.
Perhaps this god is actually an alien that evolved billions of years ago, then used their advanced tech to travel to earth, create life and write the bible.
Maybe a ASI that is simulating the world as we know it. A team of human sociologists are instructing the AI, and told it to write some books and perform some miracles, to see how that effected society.
Stand off god. Spaceless timeless all knowing creator of the universe. Didn’t write the bible, or even inspire it. Not related to Jesus in any way. Jesus was just a normal human skilled in magic tricks. This god isn’t actually that interested in humans, they care more about pulsars, black holes etc.
Imagine you were in these worlds. Would the beings be what you called “god”? Well that’s just a question of how you define words. Where would your morality come from in these worlds?
You could consider morality to be like money or artistic beauty. There are no money particles. There are metal circles and plastic rectangles. But the concept of money as a means of transaction is something that exists in peoples heads. Money shapes the world by affecting what people do. Changing when they pick up and use objects. Causing people to do work they otherwise wouldn’t. Such abstract and powerful forces can greatly shape the world. Little metal circles just sit there being little metal circles. That too is the force morality has in this world. All those people who consider morality and choose to do the right thing. There is light in this world, and it is us.
Different people have different preferences in paintings. Imagine you are programming a computer to calculate artistic beauty. Imagine coding a function func1(person, picture). This function takes as inputs a detailed brain scan of a person, and a particular picture. It calculates how much the person would enjoy looking at the picture. You can use this to define func2(picture)=func1(Bob,picture) . It happens that Bob has a rather simple taste in pictures. Bob likes red and symmetrical pictures. So we code func3(picture). This function just measures redness and symmetricalnes. It makes no reference to Bob. func2 calculates Bob’s thoughts, it reasons about how bob will think. And yet both functions produce the same answers.
Morality is similar. We can define moral1( person, situation) which measures how moral a particular person thinks a particular situation is. A large value for moral1( Alice, situation) means that Alice actually feels motivated towards the situation and tries to cause it to happen.
Now suppose that Eve is just evil. moral1( Alice, situation) might be wildly different from moral1( Eve, situation). But both Alice and Eve can (assuming both were knowledgable) calculate both of these functions. When Bob looks at a picture, they aren’t reasoning about their own reasoning, they are just thinking about how wonderfully red it is, the function within his mind is func3. When Alice looks at a situation, they aren’t thinking about their own thinking either (in general) they are looking at all the cute bunnies and thinking how fluffy they are.
So is morality subjective or objective? That depends on how you define your words. By morality, do you mean the equivalents of func1, func2 or func3? (Possibly with yourself or the average human in place of Bob?)This is just a question of definitions.
On to how you should think and act. Imagine god gave you the complete and unambiguous guide to morality. Totally complete, totally superseding all previous instructions. Can you imagine being disappointed in this guide? Feeling that it wasn’t as nice as it could be. Or perhaps horrified as god orders a human sacrifice? Can you imagine good news instead. That the guide to morality was everything you hoped it would be. What if you could somehow write the guide to morality, what would you write? Why not just do that. It is, after all, your choice.
You can if you want delegate. You can fully acknowledge that god is fictional, and try to work out what god would want if he did exist, and do that. You could try to work out what Harry Potter would do if he did exist, and do that. Even if god does exist, its your choice whether to follow his instructions or to do something else. You could go with what feels right, your instinctive sense of niceness. You could calculate, weighing up lives with statistics to choose the greatest good. You could delegate to a hypothetical version of yourself who was smarter and kinder, and try to figure out what they would do.
While it is your choice, choosing is in itself a process. Your choices are shaped by the kind of person you are. Which is itself shaped by the genes and words that lead to your existence as it is now. So in a sense, you are learning about the sort of person you are, a fact determined but not known to you. You must go through the process of choosing to learn this fact about yourself. It is your choice how you see the choice, you can see it as an onerous imposition of responsibility, or you can treat it lightly. A freedom from any external morality weighing you down. A feather that can drift in whatever direction their whims may take them.
This is, I suspect, where your morality came from all along. Some would have come from deep within yourself. From the genes that specify the brain circuits of empathy. Some will come from the ethical advice of friends and neighbours. Some may come from the bible. The ink and paper of the bible is still there. You can use it as a source of ethical advice if you think it is good. I would recommend not using the bible as a source of ethical advice. Maybe look https://forum.effectivealtruism.org/ instead? But it is, after all, your choice. You can trust and rely on others, but it is your choice of who to trust. It always was. Even if the decision to trust was implicit and invisible.
I would be potentially concerned that this is a trick that evolution can use, but human AI designers can’t use safely.
In particular, I think this is the sort of trick that produces usually fairly good results when you have a fixed environment, and can optimize the parameters and settings for that environment. Evolution can try millions of birds, tweaking the strengths of desire, to get something that kind of works. When the environment will be changing rapidly; when the relative capabilities of cognitive modules are highly uncertain and when self modification is on the table, these tricks will tend to fail. (I think)
Use the same brain architecture in a moderately different environment, and you get people freezing their credit card in blocks of ice so they can’t spend it, and other self defeating behaviour. I suspect the tricks will fail much worse with any change to mental architecture.
On your equivalence to an AI with an interpretability/oversight module. Data shouldn’t be flowing back from the oversight into the AI.
We can get a rough idea of this by considering how much physical changes have a mental effect. Psychoactive chemicals, brain damage etc. Look at how much ethanol changes the behaviour of a single neuron in a lab dish. How much it changes human behaviour. And that gives a rough indication of how sensitively dependant human behaviour is on the exact behaviour of its constituent neurons.
For the goal of getting humans to mars, we can do the calculations and see that we need quite a bit of rocket fuel. You could reasonably be in a situation where you had all the design work done, but you still needed to get atoms into the right places, and that took a while. Big infrastructure projects can be easier to design. For a giant damm, most of the effort is in actually getting all the raw materials in place. This means you can know what it takes to build a damm, and be confident it will take at least 5 years given the current rate of concrete production.
Mathematics is near the other end of the scale. If you know how to prove theorem X, you’ve proved it. This stops us being confident that a theorem won’t be proved soon. Its more like a radioactive decay of an fairly long lived atom more likely to be next week than any other week.
I think AI is fairly close to the maths, most of the effort is figuring out what to do.
Ways my statement could be false.
If we knew the algorithm, and the compute needed, but couldn’t get that compute.
If AI development was an accumulation of many little tricks, and we knew how many tricks were needed.
But at the moment, I think we can rule out confident long termism on AI. We have no way of knowing that we aren’t just one clever idea away from AGI.
I agree that purely synthetic AI will probably happen sooner.
I think this post sums up the situation.
If you know how to make an AGI, you are only a little bit of coding before making it. We have limited AI’s that can do some things, and aren’t clear what we are missing. Experts are inventing all sorts of algorithms.
There are various approaches like mind uploading, evolutionary algorithms etc that fairly clearly would work if we threw enough effort at them. Current reinforcement learning approaches seem like they might get smart, with enough compute and the right environment.
Unless you personally end up helping make the first AGI, then you personally will probably not be able to see how to do it until after it is done (if at all). The fact that you personally can’t think of any path to AGI does not tell us where we are on the tech path. Someone else might be putting the finishing touches on their AI right now. Once you know how to do it, you’ve done it.
The main risk here is that it’s easy to scare people so much that all of the research gets shut down.
Why do you think that this is easy to do and bad. There are currently a small number of people warning about AI. There is some scary media stories, but not enough to really do much.
Do I really need to spell it out how that would be abused?
If the capability is there, the world has to deal with it, whoever first uses it. If the project is somewhat “use once, then burn all the notes”, then it wouldn’t make it much easier for anyone else to follow in their footsteps.
I feel like if you start seriously considering things that are themselves almost as bad as AI ruin in their implications in order to address potential AI ruin, you took a wrong turn somewhere.
Typical human priors are full of anthropomorphism when thinking about AI. Suppose you have something that has about the effect of some rationality training, of learning about and really understanding a few good arguments for AI risk. Yes the same tech could be used for horrible brainwashy purposes, but hopefully we can avoid giving the tech to people who would use it like that. The hopeful future being one where humanity develops advanced AI very cautiously, taking as long as it needs to get it right, and then has a glorious FAI future. This does not look “almost as bad as AI ruin” to me.