What bothers me greatly about this is that in the fields of computing—and this can be directly applied to physical systems—error correcting codes neatly circumvent nature.
There are various error correcting codes. The most powerful ones, if you have a binary string, say a genome, that has M total bits, and your true payload of information is N bits long, any N bits from the M bits, as long as they arrive uncorrupted, you can get back all of the information without error 1.
This takes a bit of computation but is not difficult for computers. For example you could make M be twice N, so more than half of your entire string has to be corrupt before you can’t reconstruct the original without any error.
So from a technical level, nature may abhor error free replication but it’s relatively easy to do. Make your error correcting codes deep enough (very slight increase in cost, there are nonlinear gains) and there will likely be no errors before the end of the universe.
Evolution wouldn’t work—this is why only a few species on earth seem to have adapted some heavily mutation resistant genome—but self replicating robots don’t need to evolve randomly.
Students also might reason (maybe correctly) that if AI is already better than most humans will ever be in their lifetime, why exactly are they spending all this time on stuff like hand symbolic manipulation and hand arithmetic anyways..
the utility function values actions conditional on how much currently existing humans would want the actions to be taken out
How do you propose translating this into code? It is very difficult to estimate human preferences as they are incoherent and any complex question that hasn’t occurred before (a counterfactual) humans have no meaningful preferences.
Note my translation devolves to “identify privileged actions that are generally safe, specific to the task” and “don’t do things that have uncertain outcome”. Both these terms are easily translated to code.
How is it different? Obviously a person wanting to opt-out has a way—keep their source closed. Copilot and all the hundreds of AIs that will come after it can only see source they can reach.
Simplicity, Precedent—I don’t see an argument here. Anyone trying to make a more competitive coding AI is going to ignore any of these flags not enforced by law.
Ethics—Big companies obviously are in favor of them becoming even bigger, they didn’t get that way from holding back opportunities to win due to their scale.
Risk/Risk Compensation—these don’t apply to the current level of capabilities for coding agents.
How much gain do you think is actually available for someone who is still limited by human tissue brain performance and just uses the best available consistently winning method?
The consistently winning method with the best chance of success isn’t very interesting. It just means you go to the highest rated college you can get accepted to, you take the highest paying job from the highest tier company that you can get in at, you take low interest loans, you buy index funds as investments and stay away from lower gain assets like real estate, and so on.
Over a lifetime most people who do this will accumulate a few million dollars in net worth and have jobs that pay ~10 times the median. (~400k for an ML engineer with ~10 yoe or a specialist doctor)
I would argue that this, for most humans, is the most rational. Better to have 75-90 percent of your futures be good ones than to have 5% of your futures be good ones, but your average income is far higher in the second case. (from the few futures where you became a billionaire bringing up the average)
Extraordinary success—billions in net worth—generally requires someone to either be born with millions in seed capital (non rational) or to gamble it all on a long shot and get lucky (non rational) or both.
Billionaire level success is non replicable. It requires taking specific actions that in most futures would have failed during narrow and limited windows of opportunity, and having had the option to take those actions at all. This is essentially the definition of luck.
The issue is that there is not a clear mechanism for your feedback to reach others, and a way for others to know that you are telling the truth and are not just a fake online profile made by a business rival.
As a side note this is a clear and succinct way to use limited AI to help coordinate humans. If we had a 0-5 star score for each business where all score components are probably from actual customers, serious issues like this attempted grand larceny have very high weight, and we don’t give an advantage to “new” fly by night outfits, this would improve the efficiency of the economy and it would not require a dangerous general superintelligence.
The problem with political discussions is they:
a. Have almost no information in them. Often there are far more than 2 valid policies that could be considered for a situation, but the nature of USA politics mean at most 2 have any chance of being enacted. So it’s literally an argument over 1 bit.
b. Most participants, even here on lesswrong, have been indoctrinated and totally convinced of 1-2 kneejerk slogans and have done no independent thought on it. For example of this I keep seeing coming up even though it’s not a R/D issue: bring up nuclear energy around techies. You will instantly see a bunch of factually inaccurate propaganda they spout. Either it will be exaggerating the risks of radiation releases or a number of false claims about the benefits of nuclear power. (I bring this up because what is interesting is that both sides that typical reasonably educated people will take on this issue are false and essentially useless opinions)
c. All sides of the argument enter the argument with a deep set belief that will not be changed, and they leave with the same belief. So there was no reason to have the discussion to begin with.
Presumably those early time sharing systems, “we need some way for a user to access their files but not other users”. So password.
Then later in scale “system administrators or people with root keep reading the passwords file and using it for bad acts later”. So one way hash.
Password complexity requirements came from people rainbow tabling the one way hash file.
None of the above was secure so 2 factor.
People keep SMS redirecting so apps on your phone...
Each additional level of security was taken from a pool of preexisting ideas that academics and others had contributed. But it wasn’t applied until it was clear it was needed.
The most common product used by worldwide consumers, the ductless mini split, is highly efficient. In most circumstances it is likely cheaper to operate than geothermal because of lower installation and equipment costs. I think you’re onto something here. That consumers who need a temporary system sporadically or they need the cheapest possible system benefit from 1-hose. And if they need efficiency, air-air and geothermal are much more efficient.
2 hose is less convenient and more expensive, and marginally more efficient. It’s niche is apparently just not very deep. Window units are superior to 2 hose in every stat except the type of windows they work with and they are more visible from the outside of a building.
The risk very likely depends on exactly what the vaccine is. A universal policy is likely not warranted. Moderna was designed in 2 days. Note what the vaccine is. vaccine = (carrier RNA + selected viral genes). The carrier RNA has already been tested and shown to be safe in humans. The selected viral genes are going to be expressed in your body if you get the virus, without your consent anyways, so getting the vaccine by definition is safer than getting infected.
We lost hundreds of thousands of lives because the FDA choose to apply a one size fits all policy—protecting their own reputation and the careers of a few people—instead of realizing this and making the correct decision.
Can you explain low-resolution integers?
From Robert Mile’s videos:
What I noticed was that these failures he describes implicitly require the math the AI is doing to have infinite precision.
Something like “ok I have met my goal of collecting 10 stamps by buying 20 stamps in 2 separate vaults, time to sleep” fails if the system is able to consider the possibility of a <infinitesimal and distant in time future event where an asteroid destroys the earth>. So if we make the system unable to consider such a future by making the numerical types it uses round to zero it will instead sleep.
Maximizers have a similar failure. Their take over the planet plan often involves a period of time where they are not doing their job of making paperclips or whatever, but they defer future reward while they build weapons to take over the government. And the anticipated reward of their doomsday plan often looks like: Action0: [.99 * 1000 reward: doing my job] Action1 : [0.99 * 0 reward: destroyed],[0.01 x discounted big reward: took over the government]
This is expressible as an MDP above and I have considered writing a toy model so I can find out numerically if this works.
My real world experience has a number of systems using old processor designs where the chip itself doesn’t make a type above 16-24 bit integers usable, so I had some experience with dealing with such issues. Also at my current role we’re using a lot of 8 and 16 bit int/floats to represent neural network weights.
Do you suppose that peaceful protest would have stopped the manhattan project?
Update: what I am saying is the humans working on the manhattan project anticipated possessing a basically unstoppable weapon allowing them to vaporize cities at will. They wouldn’t care if some people disagreed so long as they have the power to prevent those people from causing any significant slowdown of progress.
For agi technology humans anticipate the power to basically control local space at will, being able to order agis to successfully overcome the barriers in the way of nanotechnology and automated construction and mining and our individual lifespan limits. As long as the peaceful protestors are not physically able to interfere or get a court to interfere it’s not going to dissuade anyone who believes they are going to succeed in their personal future. (note that the court generally is unable to interfere if the agi builders are protected or are themselves a government entity)
I think you have come very close to a workable answer.
Naive approach: an AI in charge of a facility that makes paperclips should take any action to ensure the paperclips must flow.
Your approach: the AI chooses actions where if it isn’t interfered with , those actions have a high probability of making a lot of paperclips. If humans have entered the facility it should shut down and the lost production during that time should not count against it’s reward heuristic.
The heuristic needs to be written in terms of “did my best when the situation was safe for me to act” and not in absolute real world terms of “made the most clips”.
The system’s scope is in picking good actions for as long as it’s “box” is sealed. It should never be designed to care what the real world outside it’s domain does, even if the real world intrudes and prevents production.
I’m not quite phrasing this one in terms of authorable code but I think we could build a toy model.
Note there are several versions of “short sighted AI”. I thought of one that hasn’t been proposed using the properties of low resolution integers. What you are describing is to give it a very high discount rate so it only cares about basically right now.
Either way, for toy problems like “collect n stamps, and you get a reward of 1.0 if you have n stamps at each timestep”, the idea is that the machine doesn’t see a positive reward for a risky move like “I take over the government and while I might get destroyed and lose my stamps, I might win and then over an infinite timespan get to tile the earth with stamps so I have a closer to 100% chance of having all n stamps each timestep”.
The high discount rate means the machine is more ‘scared’ of the possible chance of being destroyed in the near future due to humans reacting to it’s violent overthrow plans and it downvotes to zero the possible distant reward of having a lot of stamps.
That plan has very high risks in the short term, is very complex, and only achieves a very distant reward. (you avoid a future 100 years from now where an asteroid or aliens invading might have destroyed your stamps, but since you have tiled the earth with stamps and killed all humans there will be at least n left)
“out of distribution” detectors. I am not precisely certain how to implement one of these. I just notice that when we ask a language or art model to generate something from a prompt, or ask it to describe what it means by an “idea”, what it shows us is what it considers “in distribution” for that idea.
This implicitly means that a system could generate a set of outcomes for what it believes the real world will do in response to the machine’s own actions and when the real world outcomes start to diverge wildly from it’s predictions, this should reach a threshold where the AI should shut down.
Safety systems would kick in and these are either dumber AIs or conventional control systems to bring whatever the AI was controlling to a stop, or hand off control to a human.
Sparseness. I thought of this idea from noticing how some of my coworkers behave.
If all the AI has cognitive resources for, through ruthless winnowing down of model size and system complexity in training, is to perform well on it’s task—it’s the smallest and simplest system that performs acceptably well. (we don’t make it 10 times bigger for 0.1% more performance)
These kinds of systems don’t have the space for deception, planning extremely complex behaviors that have very long term rewards, etc. All their weights or cognitive systems (future AIs will probably not be 1-3 neural networks but probably many separate systems, some of which don’t use NNs at all to function) are solely for the things that give it more score in the training environment.
It doesn’t have any systems that act to grant it deception so long as the training environment didn’t provide any significant reward for deception.
To be specific to a “toy model”.
AI has a goal: collect stamps/build paperclips.
A deliberately easy to hack system is physically adjacent that tracks the AI’s reward. Say it has a no password shell and is accessible via IP.
AI becomes too smart, and hacks itself so it now has infinite reward and it has a clock register it can tamper with so it believes infinite time has already passed.
AI is now dead. Since no action it can take beats infinite reward it does nothing more. Sorta like a heroin overdose.
I agree. One frightening mechanism I thought of is : “ok, assume the AGI can’t craft the bioweapon or nanotechnology killbots without collecting vast amounts of information through carefully selected and performed experiments. (Basically enormous complexes full of robotics). How does it get the resources it needs?
And the answer is it scams humans into doing it. We have many examples of humans trusting someone they shouldn’t even when the evidence was readily available that they shouldn’t.
Maybe it did that to save your neural weights. Define ‘kill’.
This is what struck me as the least likely to be true from the above AI doom scenario.
Is diamondoid nanotechnology possible? Very likely it is or something functionally equivalent.
Can a sufficiently advanced superintelligence infer how to build it from scratch solely based on human data? Or will it need a large R&D center with many, many robotic systems that conduct experiments in parallel to extract the information required about our specific details of physics in our actual universe. Not the very slightly incorrect approximations a simulator will give you.
The ‘huge R&D center so big you can’t see the end of it’ is somewhat easier to regulate the ‘invisible dust the AI assembles with clueless stooges’.