MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
One thing that’s kind of in the powerful non-fooming corrigible AI bucket is a lot of good approximations to the higher complexity classes.
There is a sense in which, if you had an Incredibly fast 3 sat algorithm, you could use it with a formal proof checker to prove arbitrary mathematical statements. You could use your fast 3sat + a fluid dynamics simulator to design efficient aerofoils. There is a lot of interesting search and optimization and simulation things that you could do trivially, if you had infinite compute.
There is a sense that an empty python terminal is already a corigable AI. It does whatever you tell it to. You just have to tell it in python. This feels like it’s missing something. But when you try to say what is missing, the line between a neat programming language feature and a corrigable AI seems somehow blurry.
Given that alignment is theoretically solvable, (probably) and not currently solved, almost any argument about alignment failure is going to have an
“and the programmers didn’t have a giant breakthrough at the last minute” assumption.
If the simulation approach is to be effective, it probably has to have pretty high fidelity, in which case sim behaviours are likely to be pretty representative of the real world behaviour
Yes. I expect that, before smart AI does competent harmful actions (as opposed to flailing randomly, which can also do some damage), then there will exist, somewhere within the AI, a pretty detailed simulation of what is going to happen.
Reasons humans might not read the simulation and shut it down.
A previous competent harmful action intended to prevent this.
The sheer number of possibilities the AI considers actions.
Default difficulty of a human understanding the simulation.
Lets consider an optimistic case. You have found a magic computer and have programmed the laws of quantum field theory. You have added various features so you can put a virtual camera and microphone at any point in the simulation. Lets say you have a full VR setup. There is still a lot of room for all sorts of subtle indirect bad effects to slip under the radar. Because the world is a big place and you can’t watch all of it.
Also, you risk any prediction of a future infohazard becoming a current day infohazard.
In the other extreme, it’s a total black box. Some utterly inscrutable computation, perhaps learned from training data. Well in the worst case, the whole AI, from data in to action out, is one big holomorphically encrypted black box.
“AI Psychosis”. Is ad hominem a thing here in less wrong?
It felt similar. So more intended as a hypothesis than an insult, but sure. I can see how you saw it that way.
Yes, actions that benefit the ecosystem in fact benefit the species and are in fact rewarded. Digging holes to hide food: reward for individual plus reward to ecosystem.
You seem to be mixing up several different claims here.
Claim 1) Evolution inherently favors actions that benefit the ecosystem, whether or not those actions benefit the individual. (false)
Claim 2) It so happens that every action that benefits the ecosystem also benefits the individual.
Claim 3) There exists an action that benefits the ecosystem, and also the individual.
I don’t feel that “benefits the ecosystem” is well defined. The ecosytem is not an agent with a well defined goal that can recieve benfits. What does it mean to “benefit the planet mars”? Ecosystems contain a variety of animals with that are often at odds with each other.
What quantity exactly is going up when an ecosystem “benefits”? Total biomass? Genetic diversity? Individual animals hedonic wellbeing? Similarity to how the ecosystem would look without human interference?
If the ecosystem survives, you survive.
That is wildly not true. Plenty of animals die all the time while their ecosystem survives.
If the ecosystem dies, you probably die with it.
Sure. You might escape. But probably. Yes.
The problem is, the overall situation is like a many player prisoners dilemma. Often times the actions of any individual animal will make a hill of beans to the overall environment, but it all adds up.
Cooperation can evolve, in social settings, with small groups of animals that all know each other and punish the defectors.
A lot of the time, this isn’t the case, and nature is stuck on a many way defect.
It also serves a purpose for mating.
It does seem to be mostly mating.
The things animals do in general are not necessarily necessary for their survival, but they are usually necessary for the good health of the ecosystem they are part of.
That seems strange. That’s not how evolution usually works.
Movement is the first action of the universe. Memesis is the second. (Chaos is the third).
Have you had long discussions with ChatGPT? This sounds like the sort of thing a person suffering from AI psychosis might say.
Peacocks feathers are a defense mechanism
Really?
By the way, how many r’s are in raspberry?
This isn’t a good theory.
While humans aren’t literally the only animals to make tools, the difference between human tools and the basically pointy sticks that other animals make is vast.
There isn’t any great reason to expect there to exist a narratively satisfying “what makes us human”.
“Waste” is a non-apple. You have a very specific definition of efficiency, and anything that doesn’t fit this model is “waste”. Sure, by your model, a cathedral is waste. But spending the same amount of effort and resources digging holes and then filling them in again would also be waste. By your definition, almost any activity that doesn’t spread your genes is waste. So your theory is non-predictive as it doesn’t explain why humans build cathedrals rather than something else.
Also, culture and “waste” aren’t uniquely human. From peacock tails to bird songs to magpies collecting shiny things to orcas balancing fish on their heads. Lots of animals have something resembling culture.
Learning to copy is easy. Learning to flawlessly assess if some behavior is useful is hard. So it’s no surprise that many animals learn by copying each other. And the actions they copy aren’t always useful to survival, but are usually pretty good.
I think this might be a case of “no one got fired for …”
No one got fired for designing the most generic boring beige office. No great skill is needed to pull it off. I think there is a common risk minimization strategy of producing bland neutral inoffensive designs.
One thing that might be interesting is asking for SVG’s, and seeing if the errors in these maps match up with corresponding errors in the SVG’s, suggesting a single global data store.
Also, this is a good reminder of what a huge and bewildering variety of LLM’s there are these days.
This propulsion system won’t work like traditional systems that rely on a reaction mass and, therefore, conservation of momentum. Instead it will work more like reverse osmosis.
Reverse osmosis devices are used to make fresh water. They also conserve momentum.
Conservation of momentum isn’t just how conventional rockets work. It’s a law that we suspect applies universally and without exception.
In the reverse osmosis analogy, the “solution” is space-time and the “solute” is the vacuum fluctuations. In osmosis, the solution will try to equalize the concentration of solute.
Wordy analogy based reasoning of this kind does not reliably produce correct answers.
At best, reasoning like this can be used to generate a suggestion for what equations to consider. Because if the upside is a nobel prize, and the downside is wasting a few hours, it’s worth a go even if it’s probably wrong.
This relies on a person that understands the maths of quantum field theory.
In practice, there are a lot of people going “I have the ideas, I just need someone to add the maths”, and not that many people who understand the maths.
Looking at your equations, I think I can spot at least 1 mistake.
You say from “Verlinde’s entropic gravity”. Verlinde’s work isn’t something I am familiar with, so I can’t say whether this is correct or not.
But, if it is, this is the force at 1 point. To calculate the overall force, we must take the integral.
If tends to a constant (the background entropy of empty space) sufficiently fast as the distance from your spacecraft increases, then we can use the gradient theorem https://en.wikipedia.org/wiki/Gradient_theorem to show that all the forces must inevitably cancel out.
Consider this diagram.
The entropy needs to be the same at the far left and far right of the graph, because empty spacetime, far from any influence, has a fixed entropy. Your spacecraft sits in the middle. A small amount of your spacecraft on the far left experiences a steep gradient, and so a strong rightwards force. A larger amount of your spacecraft in the middle and right experiences a weaker leftward force.
And so it all adds up to 0 force in total.
This sort of everything cancelling out behavior is an inevitability assuming the equation and
Flat spacetime
T and S are converge to a constant (and do so at least cubically fast, 2x the distance means at most 1⁄8 th as much variation in T and S.)
Be a little careful with this. It’s possible to make the AI do all sorts of strange things via unusual world models. Ie a paperclip maximizing AI can believe “everything you see is a simulation, but the simulators will make paperclips in the real world if you do X”
If your confident that the world model is true, I think this isn’t a problem.
Imagine a system where you just type in the word “tires”, and you get a list of everything the AI knows about tires, as english sentences or similar.
You can change the sentence from “tires are usually black” to “tires are usually pink”, and the AI’s beliefs change accordingly.
This is, in some sense, a very scrutable AI system. And it’s not obviously impossible.
Except suppose the AI went “tires are usually pink. Tires are usually dyed with amorphous carbon. Therefore amorphous carbon is pink. ” and carries on like that, deducing a bizarre alternate reality version of chemistry where an electrons charge is if it’s spin up, and for spin down electrons. And it somehow all works out into a self consistent system of physics. And every fact you told the AI somehow matches up. (Many many facts you didn’t tell the AI are wildly different)
Suddenly it doesn’t feel so scrutible.
As makes sense: if cults did not have high attrition rates, they would long ago have dominated the world due to exponential growth.
It seems to me that this did happen, and then, after a while, christianity and the like mellowed out.
I mean the church did encourage it’s most fanatical believers to take a vow of celibacy, so it’s not totally implausible that a bit of evolution happened.
Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn’t expect the human brain algorithm to be an all-or-nothing affair.
If humans are looking at parts of the human brain, and copying it, then it’s quite possible that the last component we look at is the critical piece that nothing else works without. A modern steam engine was developed step by step from simpler and cruder machines. But if you take apart a modern steam engine, and copy each piece, it’s likely that it won’t work at all until you add the final piece, depending on the order you recreate pieces in.
It’s also possible that rat brains have all the fundamental insights. To get from rats to humans, evolution needed to produce lots of genetic code that grew extra blood vessels to supply the oxygen and that prevented brain cancer. (Also, evolution needed to spend time on alignment) A human researcher can just change one number, and maybe buy some more GPU’s.
One thing I disagree with is the idea that there is only one “next paradigm AI” with specific properties.
I think there are a wide spectrum of next paradigm AI’s, some safer than others. Brain like AI’s are just one option out of a large possibility space.
And if the AI is really brainlike, that suggests making an AI that’s altruistic for the same reason some humans are. Making a bunch of IQ 160, 95th percentile kindness humans, and basically handing the world over to them sounds like a pretty decent plan.
But they still involve some AI having a DSA at some point. So they still involve a giant terrifying single point of failure.
A single point of failure also means a single point of success.
It could be much worse. We could have 100s of points of failure, and if anything goes wrong at any of those points, we are doomed.
It includes chips that have neither been already hacked into, nor secured, nor had their rental price massively bid upwards. It includes brainwashable humans who have neither been already brainwashed, nor been defended against further brainwashing.
We are already seeing problems with ChatGPT induced psychosis. And seeing LLM’s that kinda hack a bit.
What does the world look like if it is saturated with moderately competent hacking/phishing/brainwashing LLM’s? Yes, a total mess. But a mess with less free energy perhaps? Especially if humans have developed some better defenses. Probably still a lot of free energy, but less.
These posts are mainly exploring my disagreement with a group of researchers who think of LLMs[2] as being on a smooth, continuous path towards ASI.
I’m not sure exactly what it means for LLM’s to be on a “continuous path towards ASI”.
I’m pretty sure that LLM’s aren’t the pinnacle of possible mind design.
So the question is, will better architectures be invented by a human or an LLM, and how scaled will the LLM be when this happens.
In the “benefits” section probably goes an increased birth rate. I would guess that alcohol begins more lives than it ends.
If you generate random syntactically valid ZFC statements, you can adjust rough complexity by varying the formula size. And, if your training a small toy model first, you can filter out the problems so easy that the toy model can solve them.
Some of the scenarios I was thinking about included people who started dating due to alcohol, and then later have an intended pregnancy.
But also, it does depend on whether we are looking at the local “do these people want to be pregnant” or the societal “what would a drop in birthrate do to society”.