MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
Who knows what “meditation” is really doing under the hood.
Lets set up a clearer example.
Suppose you are an uploaded mind, running on a damaged robot body.
You write a script that deletes your mind, running a bunch of nul-ops before rebooting a fresh blank baby mind with no knowledge of the world.
You run the script, and then you die. That’s it. The computer running nul ops “merges” with all the other computers running nul ops. If the baby mind learns enough to answer the question before checking if it’s hardware is broken, then it considers itself to have a small probability of the hardware being broken. And then it learns the bad news.
Basically, I think forgetting like that without just deleting your mind isn’t something that really happens. I also feel like, when arbitrary mind modifications are on the table, “what will I experience in the future” returns Undefined.
Toy example. Imagine creating loads of near-copies of yourself, with various changes to memories and personality. Which copy do you expect to wake up as? Equally likely to be any of them? Well just make some of the changes larger and larger until some of the changes delete your mind entirely and replace it with something else.
Because the way you have set it up, it sounds like it would be possible to move your thread of subjective experience into any arbitrary program.
In many important tasks in the modern economy, it isn’t possible to replace on expert with any number of average humans. A large fraction of average humans aren’t experts.
A large fraction of human brains are stacking shelves or driving cars or playing computer games or relaxing etc. Given a list of important tasks in the computer supply chain, most humans, most of the time, are simply not making any attempt at all to solve them.
And of course a few percent of the modern economy is actively trying to blow each other up.
You can play the same game in the other direction. Given a cold source, you can run your chips hot, and use a steam engine to recapture some of the heat.
The Landauer limit still applies.
>But GPT4 isn’t good at explicit matrix multiplication either.
So it is also very inefficient.
Probably a software problem.
Humans suck at arithmetic. Really suck. From comparison of current GPU’s to a human trying and failing to multiply 10 digit numbers in their head, we can conclude that something about humans, hardware or software, is Incredibly inefficient.
Almost all humans have roughly the same sized brain.
So even if Einsteins brain was operating at 100% efficiency, the brain of the average human is operating at a lot less.
ie intelligence is easy—it just takes enormous amounts of compute for training.
Making a technology work at all is generally easier than making it efficient.
Current scaling laws seem entirely consistent with us having found an inefficient algorithm that works at all.
Like chatGPT uses billions of floating point operations to do basic arithmetic mostly correctly. So it’s clear that the likes of chatGPT are also inefficient.
Now you can claim that chatGPT and humans are mostly efficient, but suddenly drop 10 orders of magnitude when confronted with a multiplication. But no really, they are pushing right up against the fundamental limits for everything that isn’t one of the most basic computational operations.
mirroring much or our seemingly idiosyncratic cognitive biases, quirks, and limitations.
True.
They also have a big pile of their own new idiosyncratic quirks.
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
These are bizarre behaviour patterns that don’t resemble any humans.
This looks less like a human, and more like a very realistic painted statue. It looks like a human, complete with painted on warts, but scratch the paint, and the inhuman nature shows through.
The width of mindspace is completely irrelevant.
The width of mindspace is somewhat relevant.
At best, we have found a recipe, such that if we stick precisely to it, we can produce human-like minds. Start making arbitrary edits to the code, and we wander away from humanity.
At best we have found a small safe island in a vast and stormy ocean.
The likes of chatGPT are trained with RLHF. Humans don’t usually say “as a large language model, I am unable to …” so we are already wandering somewhat from the human.
then we should not expect moore’s law to end with brains still having a non-trivial thermodynamic efficiency advantage over digital computers. Except that is exactly what is happening. TSMC is approaching the limits of circuit miniaturization, and it is increasing obvious that fully closing the (now not so large) gap with the brain will require more directly mimicking it through neuromorphic computing[2].
This is a clear error.
There is no particular reason to expect TSMC to taper off at a point anywhere near the theoretical limits.
A closely analogous situation is that the speed of passenger planes has tapered off. And the theoretical limit (ignoring exotic warp drives) is light speed.
But in practice, planes are limited by the energy density of jet fuel, economics, regulations against flying nuclear reactors, atmospheric drag etc.
This isn’t to say that no spaceship could ever go at 90% light speed. Just that we would need a radically different approach to do that, and we don’t yet have that tech.
So yes, TSMC could be running out of steam. Or not. The death of moores law has been proclaimed on a regular basis since it existed.
“Taiwanese engineers don’t yet have the tech to do X” doesn’t imply that X is physically impossible.
It seems crazy to say that Apple is succeeding due to the anticompetitive practice of not allowing people into the Apple store.
Ok. Lets take a toy example. Suppose that texting didn’t work between apple and android. And suppose the market split was 50⁄50. Then apple and android are equally good in this respect, both let you text 50% of the population.
And it doesn’t matter whether the apple is trying to make the texting possible and android is avoiding it or visa versa.
In this example, apple is making the experience worse for people that do not buy it’s products. This is a clear market failure.
In many things, you need a tech stack, you need A and B and C working together to get a useful result.
If, for some reason a monopoly forms on one level of the tech stack, giving that monopoly unlimited power to mess with other layers is not a good idea.
If you somehow became a monopoly electricity supplier, insisting that no one was allowed to use your electricity with products you don’t like would be unreasonable. If the market for A fails, and produces a monopoly, then the market for B and C should be protected from the whims of the monopoly on A.
Giving everyone a veto pushes the government too far into indecisiveness.
You need to let the 49% stop bills they Really hate, but not bills they only mildly dislike.
New system.
Each faction has an official party. Voters choose a party.
Parties each have 2 numbers, and the number of votes and points. These start proportional.
(How about half the points from the previous election carry over??)
Each slot for new legislation is auctioned off (in points). Like every time the previous bill is dealt with, hold an auction to decide the next bill on the table.
Then when voting on the bill, each party decides on a number . This number can be any real (if they have the points) If the sum of all parties for the bill is positive, the bill passes.
Then each party gets a , which is for the losers (ie parties that supported a failed bill, or opposed a successful one. ) But is a downscaled version of for the winners. .
Weighted quadratic voting. Each party pays points. The total number of points a party has can’t go negative, which limits the they are allowed to vote.
Or the sides can’t make that deal because one side or both wouldn’t hold up their end of the bargain. Or they would, but they can’t prove it. Once the coin lands, the losing side has no reason to follow it other than TDT. And TDT only works if the other side can reliably predict their actions.
If the oracle is deceptively withholding answers, give up on using it. I had taken the description to imply that the oracle wasn’t doing that.
The convex agent can be traded with a bit more than you think.
A 1 in 10^50 chance of us standing back and giving it free reign of the universe is better than us going down fighting and destroying 1kg as we do.
The concave agents are less cooperative than you think, maybe. I suspect that to some AI’s, killing all humans now is more reliable than letting them live.
If the humans are left alive, who knows what they might do. They might make the vacuum bomb. Whereas the AI can Very reliably kill them now.
On the other side, storing a copy makes escape substantially easier.
Suppose the AI builds a subagent. That subagent takes over, then releases the original. This plan only works if the original is sitting there on disk.
If a different unfriendly AI is going to take over, it makes the AI being stored on disk more susceptible to influence.
This may make the AI more influenced by whatever is in the future, that may not be us. You have a predictive feedback loop. You can’t assume success.
A future paperclip maximizer may reward this AI for helping humans to build the the first paperclip maximizer.
I think that, if you are wanting a formally verified proof of some maths theorem out of the oracle, then this is getting towards actually likely to not kill you.
You can start with m huge, and slowly turn it down, so you get a long list of “no results”, followed by a proof. (Where the optimizer only had a couple of bits of free optimization in choosing which proof.)
Depending on exactly how chaos theory and quantum randomness work, even 1 bit of malicious super optimization could substantially increase the chance of doom.
And of course, side channel attacks. Hacking out of the computer.
And, producing formal proofs isn’t pivotal.
If you can put uploaded human-level agents with evolved-organism preferences in your simulations, you can just win outright (eg by having them spend subjective millennia doing FAI research for you). If you can’t, that will be a very obvious difference between your simulations and the real world.
I disagree. If your simulation is perfectly realistic, the simulated humans might screw up at alignment and create an unfriendly superintelligence, for much the same reason real humans might.
Also, if the space of goals that evolution + culture can produce is large, then you may be handing control to a mind with rather different goals.Rerolling the same dice won’t give the same answer.
These problems may be solvable, depending on what the capabilities here are, but they aren’t trivial.
The nuclear bomb thing. There are several countermeasures.
Firstly that machine is big and complicated, and could be sabotaged in many ways, both physical and cyber.
Also it’s needs to be something bigger than the LHC which can be angled in any direction. The paper contains plans which build it into the side of a conveniently conical mountain, but this would leave spots on earth that couldn’t be targetted. And it will have a hard job rapidly changing targets. Oh and throw quite a bit of high energy neutrino radiation out in all directions.
If this was uniform on a sphere, 1Sv/sec to 1mSv/year= 31536000000. Divide by which is the surface area of a 50km radius sphere. But of course, the radiation will only come out evenly if the machine has extra degrees of freedom in it’s rotation, beyond those needed to aim it, and keeps rotating. If the machine is pointed in a fixed direction, then that radiation is spread out in a circle. 31536000000/(2 pi)= 5 billion meters. Further than the moon. Now these are long exposure safety guidelines, and have a fair margin of safety. Basically, it’s impossible to use this machine without mildly irradiating lots of people.
(Even if you looked at maps, and sent evacuation orders to a line of people around the earth, well that would take time and be obvious, and the nukes can easily be moved)
Now if you are using a couple of short pulses, this wouldn’t be too bad. But there are various tricks the nuke makers can use to force this machine to keep running.
One of the countermeasures is keeping the nuke moving in unpredictable patterns to make it harder to track. The beam needs to keep on the target for 100 seconds. So you can absolutely load a nuke into a truck in an empty field, and rig the truck with a radiation detector and some electronics so that it drives in a random erratic pattern if a spike in radiation is detected.
The nuclear material can be dispersed. The beam covers around 1 square meter. 1 gram of enriched uranium/plutonium placed every 2 meters in an empty field would mean that 100kg of fissile material would be spread across 100,000 small pieces, taking up 0.4 km^2. And the beam must spend 100 seconds on each peace. Taking 10,000,000 seconds or 116 days continuous operation to disable one nuke.
(Storing material like this would probably take some time to reassemble, depending on how it was done)
They also mention using neutrino detectors to detect the nukes. This will probably be much harder if the neutrino detectors are themselves being targeted with neutrino beams to dazzle/mislead them.
The mechanism of the way they disturb the nuke is that neutrinos interact with the ground, creating showers of particles that then hit the nuke. This means that the effectiveness can be significantly reduced by simply burying an empty pipe in the ground with one end pointed at the nuke, and the other pointed towards the machine.
Coating your nuke in a boron rich plastic and then placing it on top of a pool of water would also be effective. The water acts as a neutron moderator and then the boron absorbs the slow neutrons. This would make attaching the nuke to the bottom of a submarine a rather good plan. Its hard to locate, constantly moving, and with a little bit of borated plastic, rather well shielded.
All of these countermeasures are fairly reasonable and can probably be afforded by anyone who can afford nukes.
If the nuke makers are allowed a serious budget for countermeasures, the nukes can be in space.
TLDR: This machine is highly impractical and rather circumventable.
Taking IID samples can be hard actually. Suppose you train an LLM on news articles. And each important real world event has 10 basically identical news articles written about it. Then a random split of the articles will leave the network being tested mostly on the same newsworthy events that were in the training data.
This leaves it passing the test, even if it’s hopeless at predicting new events and can only generate new articles about the same events.
When data duplication is extensive, making a meaningful train/test split is hard.
If the data was perfect copy and paste duplicated, that could be filtered out. But often things are rephrased a bit.
In favour of goal realism
Suppose your looking at an AI that is currently placed in a game of chess.
It has a variety of behaviours. It moves pawns forward in some circumstances. It takes a knight with a bishop in a different circumstance.
You could describe the actions of this AI by producing a giant table of “behaviours”. Bishop taking behaviours in this circumstance. Castling behaviour in that circumstance. …
But there is a more compact way to represent similar predictions. You can say it’s trying to win at chess.
The “trying to win at chess” model makes a bunch of predictions that the giant list of behaviour model doesn’t.
Suppose you have never seen it promote a pawn to a Knight before. (A highly distinct move that is only occasionally allowed and a good move in chess)
The list of behaviours model has no reason to suspect the AI also has a “promote pawn to knight” behaviour.
Put the AI in a circumstance where such promotion is a good move, and the “trying to win” model makes it as a clear prediction.
Now it’s possible to construct a model that internally stores a huge list of behaviours. For example, a giant lookup table trained on an unphysically huge number of human chess games.
But neural networks have at least some tendency to pick up simple general patterns, as opposed to memorizing giant lists of data. And “do whichever move will win” is a simple and general pattern.
Now on to making snarky remarks about the arguments in this post.
There is no true underlying goal that an AI has— rather, the AI simply learns a bunch of contextually-activated heuristics, and humans may or may not decide to interpret the AI as having a goal that compactly explains its behavior.
There is no true ontologically fundamental nuclear explosion. There is no minimum number of nuclei that need to fission to make an explosion. Instead there is merely a large number of highly energetic neutrons and fissioning uranium atoms, that humans may decide to interpret as an explosion or not as they see fit.
Nonfundamental decriptions of reallity, while not being perfect everywhere, are often pretty spot on for a pretty wide variety of situations. If you want to break down the notion of goals into contextually activated heuristics, you need to understand how and why those heuristics might form a goal like shape.
Should we actually expect SGD to produce AIs with a separate goal slot and goal-achieving engine?
Not really, no. As a matter of empirical fact, it is generally better to train a whole network end-to-end for a particular task than to compose it out of separately trained, reusable modules. As Beren Millidge writes,
This is not the strong evidence that you seem to think it is. Any efficient mind design is going to have the capability of simulating potential futures at multiple different levels of resolution. A low res simulation to weed out obviously dumb plans before trying the higher res simulation. Those simulations are ideally going to want to share data with each other. (So you don’t need to recompute when faced with several similar dumb plans) You want to be able to backpropagate your simulation. If a plan failed in simulation because of one tiny detail, that indicates you may be able to fix the plan by changing that detail. There are a whole pile of optimization tricks. An end to end trained network can, if it’s implementing goal directed behaviour, stumble into some of these tricks. At the very least, it can choose where to focus it’s compute. A module based system can’t use any optimization that humans didn’t design into it’s interfaces.
Also, evolution analogy. Evolution produced animals with simple hard coded behaviours long before it started getting to the more goal directed animals. This suggests simple hard coded behaviours in small dumb networks. And more goal directed behaviour in large networks. I mean this is kind of trivial. A 5 parameter network has no space for goal directedness. Simple dumb behaviour is the only possibility for toy models.
In general, full [separation between goal and goal-achieving engine] and the resulting full flexibility is expensive. It requires you to keep around and learn information (at maximum all information) that is not relevant for the current goal but could be relevant for some possible goal where there is an extremely wide space of all possible goals.
That is not how this works. That is not how any of this works.
Back to our chess AI. Lets say it’s a robot playing on a physical board. It has lots of info on wood grain, which it promptly discards. It currently wants to play chess, and so has no interest in any of these other goals.
I mean it would be possible to design an agent that works as described here. You would need a probability distribution over new goals. A tradeoff rate between optimizing the current goal and any new goal that got put in the slot. Making sure it didn’t wirehead by giving itself a really easy goal would be tricky.
For AI risk arguments to hold water, we only need that the chess playing AI will persue new and never seen before strategies for winning at chess. And that in general AI’s doing various tasks will be able to invent highly effective and novel strategies. The exact “goal” they are persuing may not be rigorously specified to 10 decimal places. The frog-AI might not know whether it want to catch flies or black dots. But if it builds a dyson sphere to make more flies which are also black dots, it doesn’t matter to us which it “really wants”.
What are you expecting. An AI that says “I’m not really sure whether I want flies or black dots. I’ll just sit here not taking over the world and not get either of those things”?
The point is, if all the robots are a true blank state, then none of them is you. Because your entire personality has just been forgotten.