MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
To use Richard Miles’ example, a robot car driver with a big, red, shiny stop button should prevent a child in the vehicle hitting that button, as the child would not actually be acting in its own long term interests.
Bootstrapping. In a near ideal scenario I would want the first superhuman AI to be corrigible, and in the MIRI bunker surrounded by experts.
Corrigibility is very useful for a powerful AI that still needs debugging. Once the debugging is done, the AI that is used day to day will be less corrigible. You start with a maximally corrigible AI, surrounded by experts. Then you ask this AI to build a suitable AI for a self driving car.
If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
The whole point of corrigibility is to make as few assumptions as possible about the functionality of AI subsystems.
Suppose the AI’s future prediction is seriously buggy, and it believes that a shutdown will lead to a plague of giant moon frogs. You want to be able to see this false belief. And then you want the AI to shut down anyway so you can debug it.
Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
The corrigible AI isn’t supposed to be something you rely on to keep your civilization running. It’s a debugging and AI research platform. If the AI is running every car in the world, then shutting it off has immediate obvious negative outcomes.
If the research AI in it’s bunker believes that shutting it off will have negative outcomes, I don’t trust it. It’s still a prototype. It might still be buggy. Corrigibility is for when the AI is still a buggy prototype and you trust the human experts judgement over the AI’s.
If you want to do something generic, like “make a text editor” or “make a todo app”, then this is where the AI is most successful. LLM’s work best when they have many examples of similar projects. But if there are many similar projects, why don’t you just use one of them instead. The more unique a project, the more it’s actually worth doing, but the less useful LLM’s are.
Sometimes writing a clear description of what you want done in english isn’t much easier than writing it in python. This is especially true if your thinking is in visualizations and algorithms not in english. The english language is a bit of a mess and was mostly designed before computers became a thing, it is far from the optimal language in which to instruct machines.
LLM generated code seems in practice to offer a large increase in code quantity in exchange for a modest decrease in code quality. This is often a bad tradeoff.
Like running it under the hot tap. Heat makes the metal expand, makes the air expand, and softens any sticky bits of jam gumming up the threads.
Oh and a rubber washing up glove, for better grip.
Well one of the things they did was have a “Robot” that was actually just a human in a robot costume. That’s something that looks impressive if you don’t realize what’s happening, but avoids 100% of the tricky problems.
More recently, their robots have been teleoperated. Which is still avoiding a lot of the trickier problems with AI operation.
Tesla has a strong track record of making predictions/promises that they couldn’t keep. They have been promising full self driving next year for a while. Cybertruck is mostly intended to look cool, and is otherwise pretty mediocre as an electric vehicle.
Emitting slogans like “the machine that makes the machine”
I was under the impression that Tesla is really good at marketing and slogans and general hype building. They once did a tech demo event with a “robot” that was a guy in a robot costume. I think Tesla are more interested in looking impressive than in actually solving the tricky problems.
These randomly trained models, are they uncertain or confidently wrong on the test data?
My model of what is going on here is that stochastic gradient descent is acting roughly like an MCMC sampling method. It’s producing a random sample from the space of low loss parameters. And that the simpler hypothesis correspond to larger parameter space volumes.
When the network needs to memorize, it needs to use nearly all it’s parameters, meaning a small parameter-space volume. When the network is learning a pattern, it’s only using a small fraction of it’s parameters on the pattern, and the rest of the parameters can be almost anything, so long as they don’t get in the way. This means simple hypothesis have a huge volume in parameter space. (This is basically the lottery ticket hypothesis, and it explains why network distillation is so effective.)
MCMC means sampling from the distribution proportional to
so larger parameter space volumes will be more likely to be sampled.So the network training will choose the simplest hypothesis available.
Grokking makes sense if the simpler hypothesis are sometimes harder for local greedy search methods to find.
Are we comparing the current system to LLM’s? Or a well designed digital flowchart system to LLM’s?
I think the first one is a win for the LLM’s. Not sure on the second one.
Red isn’t weakly dominant. Suppose you know that you get the tiebreaker. And you care at least somewhat about protecting others. Then red is worse. Red is only “weakly dominant” if you don’t care in the slightest about the lives of anyone else.
At least part of this problem is about the balance of selfishness vs altruism.
Lets suppose all people involved are perfectly altruistic.
Or, equivalently, that if the majority pick red, then a randomly selected set of people die. (Equal to the number of blue’s)
Would this random-death variation change your view of the situation?
Lets take selfishness vs altruism out of the equation.
If the majority pick red, a number of (randomly chosen) people die, equal to the number of blue voters.
If the majority pick blue, no one dies.
Would anyone pick red in this problem?
The issue is that the only reason to choose blue is to rescue the other people who chose blue.
And the only reason to choose red is to protect yourself from other people who chose red.
The real motivation for the efficiency-impairing simplifications is none of size, cost or complexity. It is to reduce replication time.
I’m not convinced that simplicity does imply reduced replication time.
Lets suppose that all the parts of the autofac require a similar manufacturing time per weight.
Lets suppose that 10% of the weight is in screws.
Imagine 20 autofacs. You might as well assign 2 of these autofacs to just manufacture screws. (Assuming they are all nearby so transport costs are low.)
Now suppose you have a screwfac. A device which has the cost and weight of one autofac, but due to economies of specialization, has the screw output of 2 autofacs.
I just scale it down by a factor of 5, and thereby reduce the duplication time by a factor of 5. So it duplicates in 5 weeks instead of 25 weeks.
This is a benefit of smaller sizes. Small, not simple.
I would somewhat suggest older tech like books and flow charts over the latest LLM’s. I’m not saying LLM’s wouldn’t work. Just that I don’t really trust LLM’s, and a simpler flowchart based system won’t suddenly start talking about goblins for inscrutable reasons.
If you lived your whole life under sodium vapor light, you just wouldn’t have a notion of color. You would know which objects were bright or dark, and so get low predictive error.
If you walked into a windowless room you had never been in before, and your wearing gloves so you can’t look at your hands etc, then you wouldn’t know if the lighting was high CRI or not.
This would be the worst case scenario and corresponds to a low CRI, whereas a high CRI light would give you better information about the colors present in the environment.
Better information leads to less guesswork by your visual system and minimizing prediction error might be something preferred by your mind.
RGB light (like from a white computer screen) provides just as much color info, it’s just slightly different info.
So this only makes any sense if you are comparing the same objects, under different CRI lighting conditions, which could cause your predictions to be slightly worse.
This theory would also seem to predict that wearing tinted sunglasses would be intensely unpleasant. Which doesn’t seem like a good prediction.
Most candles and oil lamps are really rather dim and smoky. So they make everything look black (because it’s covered in a layer of soot, and because the light was so dim).
Looking around me, nearly every object is artificially painted or dyed in some way or another. And mostly not particularly chosen to make the colors match.
For human faces, we seem particularly sensitive to small variations, and slight differences in color could indicate illness. For artists doing color matching, sure color temperature matters. But otherwise, why should it matter if I perceive orange curtains as a slightly different shade of orange?
Another possibility is that open source software projects that are worth compromising may have to close off purely for security reasons. Exposing your source might make you too vulnerable, especially if you accept public submissions at all.
I don’t think this is true. Decompilers are already descent. And sophisticated AI’s should be able to spot bugs in the raw machine code anyway. In a sense, the machine code is more informative, because you might be able to exploit compiler bugs.
If the sun shrimps environment is sufficiently complex to allow the evolution of intelligent life, I suspect that some form of technology is possible. Biotechnology if nothing else.
I suspect that, if you took the earth as it was a million years ago, and magically made every octopus in the ocean as smart as Von Neumann, there would be octopus technology. Give it a million years for civilization to develop, and there would probably be octopus ASI or an octopus dyson sphere or simiar.
No I don’t know every detail of their tech tree. If I had not learned them as history, I couldn’t figure out every detail of humanities tech tree from first principles.
Fire is somewhat helpful, and an important part of the path we took. It’s not the only possible basis of technology.
(On an alien planet, there might be a world where fire doesn’t work well because there is too little oxygen in the air. But also, geological processes cause the rocks to produce substantial amounts of electricity. One of their most primitive techs is a salty wet string, used to keep warm in winter if you don’t mind the smell of the hydrogen chloride. Slightly more sophisticated, but still primitive, the electrochemical refining of sodium. A metal much more useful in a low oxygen environment. And the aliens wonder how any technology could develop on worlds without this natural electricity)
You don’t need fire to make stone tools, weave baskets, spin cloths, etc. You don’t need fire in order to figure out Darwinian evolution and Mendelian inheritance and do selective breeding with a pretty clear idea of what you are doing.
There are probably all sorts of manufacturing techniques that can only be done with octopus tentacles underwater.
Eg it’s easy to move heavy stuff around by just strapping a few floats onto it. Creatures living on land would have a much harder time moving heavy stuff around, they would need to invent some kind of wheel or airship or something, and even then it wouldn’t be as good.
If this was true, it would make the fermi paradox more pronounced. Wouldn’t we see the sun shrimp, especially if they developed tech?
I don’t think you can rescue a sense of control or “steering” from a world with superintelligence, aligned or not.
I think some level of “steering” is possible in a world with aligned AI.
Suppose someone made a super-intelligence that sat in it’s box, worked out if P=NP, and printed an answer of YES/NO/MAYBE. And then it shut itself down. (To be clear, this isn’t a box that the ASI can’t escape, it’s an ASI aligned to stay in it’s box)
A world with ASI, but where humans are in control is possible. It requires good alignment, and good coordination between humans. Although the “stay in box, and do one thing” alignment feels philosophically simpler than the “coherent extrapolated volition” alignment.
This means paying a large capabilities tax. Most of the strange wonderous and powerful things that ASI could make simply don’t exist in this world of boxed ASI.
Lets say you want to do something more useful than the P =NP bot above. You design an ASI to cure ageing. Its main output is a chemical formula in standard notation. This AI is carefully programmed to only think about the biochemistry, and only the biochemistry. It’s programmed to only go for a drug that works for standard drug biochemistry reasons. Anything at all weird, ask a human. If the humans can’t understand, don’t.
In practice decisions of CEOs of large corporations routinely lead to harming a great lot of people and they get very minor reprecussions for it if any.
True. But there is also a kind of scaling error here.
Suppose you run a small business with one employee. You ask your employee to do something slightly risky. Most of the time it works out fine. And if it doesn’t, it’s a tragic freak accident.
Now scale up. Your business now employs millions of people. Someone is dying from the job every few days.
Any tradeoff on sufficiently large scales is going to have many lives on both sides of the equations. Which makes it very easy to paint the CEO’s as evil mass murderers if you ignore the other side of the tradeoff.
The domain of anthropomorphism.
I suspect LLM’s are the mind equivalent of those robots with realistic silicone faces. Humans have a strong tendency to anthropomorphize. We see faces in clouds. The LLM’s are trained in a way that rewards a superfical humanlike appearance.