You should be deeply embarrassed if your model outputs an obviously wrong or obviously time-inconsistent answer even in a hypothetical situation.
Suppose you have a particle accelerator that goes up to half the speed of light. You notice an effect whereby faster particles become harder to accelerate.
You curve fit this effect and get that c√c2−v2. and 1+12v2c2+12v4c4 both fit the data, well the first one fits the data slightly better. However, when you test your formula on the case of a particle travelling at twice the speed of light, you get back nonsensical imaginary numbers. Clearly the real formula must be the second one. (The real formula is actually the first one)
A good model will often give a nonsensical answer when asked a nonsensical question, and nonsensical questions don’t always look nonsensical.
I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated “the”, we don’t notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.
I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn’t quite match up with what is said.
Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn’t really have a coherent world model.
If the whole reason you didn’t want to open the window was the energy put in to heating/ cooling the air, why not use a heat exchanger? I reackon it cold be done using a desktop fan, a stack of thin aluminium plates, and a few pieces of cardboard or plastic to block air flow.
Within a narrow field, where data is plentiful, learning rationality is much less powerful than learning from piles of data. Imagine three people, A, B and C. A doesn’t know any chess or rationality, B has studied game theory, bays theorem, principles of decision theory and all round rationality. They have never played chess before, and have just been told the rules. C has been playing chess for years.
I would expect C to win easily. Its much easier to learn from experience, and remember your teachers experience, than it is to deduce what good chess strategies are from first principles. The only time I would expect B to win is if they were playing nim, or some other game with a simple winning strategy, and C had an intuition for this strategy, but sometimes made mistakes. I would expect B to beat A however.
Rationality is learning to squeeze every last drop of usefulness out of your data, and doing this is less effective at just grabbing more data when data is plentiful. Financial markets are another plentiful data domain. Many hedge fundies already know game theory, they also have a detailed knowledge of financial minutiae. Wanabe rationalists, If you want to be a banker, go ahead. But don’t expect to beat the market from rationality alone any more than you can good deduce chess moves from first principles and beat a grandmaster without ever having played before.
Rationality comes into its own because it applies a small boost to many domains of skill, not a big boost to any one. It also works much better in the absence of piles of data.
The every day world is roughly inexploitable, and very data rich. The regions you would expect rationality to do well in are the ones where there isn’t a pile of data so large even a scientist can’t ignore it. Fermi Paradox, AGI design, Interpretations of Quantum mechanics, Philosophical Zombies, ect.
There is also a cultural element in that the people who know most Rationality have more important things to do than using this to gain a slight advantage in buisness. Many of the people here would rather be discussing AI alignment, or the fermi paradox, or black holes, or anything interesting really than being an investment banker. All the people that get to be skilled rationalists value knowledge for its own sake and are pursuing that.
You would also need many data points to gain good evidence unless rationality was just magic. I am faced with a tricky choice and choose option 1. Its quite good. Would I have chosen option 2 if I hadn’t learned rationality? How good was option 2 any way? It’s hard to spot when rationaliy has helped someone.
In conclusion, the lack of “Rationality gave me magic powers” clickbait is not significant evidence that we are doing something wrong. A large Randomized controlled trial finding that rationality didn’t work would be worrying.
From an outside view, you have given a long list of wordy philosophical arguments, all of which involve terms that you haven’t defined. The success rate for arguments like that isn’t great.
We can be reasonably certain that the world is made up of some kind of fundamental part obeying simple mathematical laws. I don’t know which laws, but I expect there to be some set of equations, of which quantum mechanics and relativity are approximations, that predicts every detail of reality.
The minds of humans, including myself, are part of reality. Look at a philosopher talking about consciousness or qualia in great detail. “A Philosopher talking about qualia” is a high level approximate description of a particular collection of quantum fields or super-strings (or whatever reality is made of).
You can choose a set of similar patterns of quantum fields and call them qualia. This makes a qualia the same type of thing as a word or an apple. You have some criteria about what patterns of quantum fields do or don’t count as an X. This lets you use the word X to describe the world. There are various details about how we actually discriminate based on sensory experience. All of our idea of what an apple is comes from our sensory experience of apples, correlated to sensory experience of people saying the word “apple”. This is a feature of the map, not the territory.
I am a mind. A mind is a particular arrangement of quantum fields that selects actions based on some utility function stored within it. Deep blue would be a simpler example of a mind. The point is that minds are mechanistic, (mind is an implicitly defined set of patterns of quantum fields, like apple) minds also contain goals embedded within their structure. My goals happen to make various references to other minds, in particular they say to avoid an implicitly defined set of states that my map calls minds in pain.
I would use a definition of qualia in which they were some real, neurological phenomena. I don’t know enough neurology to say which.
Instead of just voting comments up and down, can we vote comments north, south east west past and future to make a full 4d voting system? Position the comments in their appropriate position on the screen, using drop shadows to indicate depth. Access inbuilt compasses on smartphones to make sure the direction is properly aligned. Use the GPS to work out the velocity and gravitational field exposure to make proper relativistic calculations. The comments voted into the future should only show up after a time delay, while those voted into the past should show up before they are posted. A potential feature for Karma 3.0+√2i .
If you add adhoc patches until you can’t imagine any way for it to go wrong, you get a system that is too complex to imagine. This is the “I can’t figure out how this fails” scenario. It is going to fail for reasons that you didn’t imagine.
If you understand why it can’t fail, for deep fundamental reasons, then its likely to work.
This is the difference between the security mindset and ordinary paranoia. The difference between adding complications until you can’t figure out how to break the code, and proving that breaking the code is impossible (assuming the adversary can’t get your one time pad, its only used once, your randomness is really random, your adversary doesn’t have anthropic superpowers ect).
I would think that the chance of serious failure in the first scenario was >99%, and in the second, (assuming your doing it well and the assumptions you rely on are things you have good reason to believe) <1%
I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.
If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.
The law is blind to safety.
The law is bureaucratic and ossified. It is probably not employing much top talent, as it’s hard to tell top talent from the rest if you aren’t as good yourself (and it doesn’t have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren’t idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.
Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don’t understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).
All the law can really do is tell what public image an AI group want’s to present, provide funding to everyone, and get in everyone’s way. Telling cops to “smash all GPU’s” would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can’t even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.
After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that’s about the limit of its safe operation.)
Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.
I don’t know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don’t think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can’t tell how powerful it would be with an unbounded utility function.
1)Climate change caused extinction is not on the table. Low tech humans can survive everywhere from the jungle to the arctic. Some humans will survive.
2) I suspect that climate change won’t cause massive social collapse. It might well knock 10% of world GDP, but it won’t stop us having an advanced high tech society. At the moment, its not causing damage on that scale, and I suspect that in a few decades, we will have biotech, renewables or other techs that will make everything fine. I suspect that the damage caused by climate change won’t increase by more than 2 or 3 times in the next 50 years.
3) If you are skilled enough to be a scientist, inventing a solar panel that’s 0.5% more efficient does a lot more good than showing up to protests. Protest’s need many people to work, inventors can change the world by themselves. Policy advisors and academics can suggest action in small groups. Even working a normal job and sending your earnings to a well chosen charity is likely to be more effective.
4) Quite a few people are already working on global warming. It seems unlikely that a problem needs 10,000,001 people working on it to solve, and if only 10,000,000 people work on it, they won’t manage. Most of the really easy work on global warming is already being done. This is not the case with AI risk as of 10 years ago, for example. (It’s got a few more people working on it since then, still nothing like climate change.)
Different minds use different criteria to evaluate an argument. Suppose that half the population were perfect rationalists, whose criteria for judging an argument depended only on Occam’s razor and Bayesian updates. The other half are hard-coded biblical literalists, who only believe statements based on religious authority. So half the population will consider “Here are the short equations, showing that this concept has low Komelgorov complexity” to be a valid argument, the other half consider, “Pope Clement said …” to be a strong argument.
Suppose that any position that has strong religious and strong rationalist arguments for it is so obvious that no one is doubting or discussing it. Then most propositions believed by half the population have strong rationalist support, or strong religious support, but not both. If you are a rationalist and see one fairly good rationalist argument for X, you search for more info about X. Any religious arguments get dismissed as nonsense.
The end result is that the rationalists are having a serious discussion about AI risk among themselves. The religous dismiss AI as ludicrous based on some bible verse.
The religious people are having a serious discussion about the second coming of Christ and judgement day, which the rationalists dismiss as ludicrous.
The end result is a society where most of the people who have read much about AI risk think its a thing, and most of the people who have read much about judgement day think its a thing.
If you took some person from one side and forced them to read all the arguments on the other, they still wouldn’t believe. Each side has the good arguments under their criteria of what a good argument is.
The rationalists say that the religious have poor epistemic luck, there is nothing we can do to help them now, when super-intelligence comes it can rewire their brains. The religious say that the rationalists are cursed by the devil, when judgement day comes, they will be converted by the glory of god.
The rationalists are designing a super-intelligence, the religious are praying for judgement day.
Bad ideas and good ones can have similar social dynamics because most of the social dynamics around an idea depends on human nature.
Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn’t friendly, but still meets the reward criteria.
The low bandwidth is to reduce manipulation. Don’t let it control you with a single bit.
Criticism of the singularity narrative has been raised from various angles. Kurzweil and Bostrom seem to assume that intelligence is a one-dimensional property and that the set of intelligent agents is totally-ordered in the mathematical sense
Amongst humans, physical fitness isn’t a single dimension, one person can be better at sprinting, while another is better at high jumping. But there is a strong positive correlation. We can roughly talk about how physically fit someone is.
This is a case of the concept that Star Slate Codex describes as ambijectivity.
So we can talk about intelligence as if it was a single parameter, if we have reason to believe that the dimensions of intelligence are strongly correlated. One reason these dimensions might be correlated is if there was some one size fits all type algorithm.
A neural network algorithm that can take 1000 images of object A, and 1000 images of object B, and then learn to distinguish them, is fairly straightforward to make. Making a version that works if and only if none of the pictures contain cats would be harder. You would have to add an extra algorithm that detected cats and made the system fail if a cat was detected. So you have a huge number of dimensions of intelligence, ability to distinguish dogs from teapots, chickens from cupcakes ect. But it is actively harder to make a system that performs worse on cat related tasks, as you have to put in a special case that says “if you see cat, then break”.
Another reason to expect the dimensions of intelligence to be correlated is that they were all produced by the same process. Suppose there was 100 dimensions of intelligence, and that an AI with intelligence (x,x,x...x) was smart enough to make an AI of intelligence (2x,2x,2x,...2x) . Here you get exponential growth. And the reason the dimensions are correlated is that they were controled by the same AI. If the AI is made of many seperate modules, and each module has a seperate level of ability, this model holds.
There are also economic reasons to expect correlation if reasources are fungible. Suppose you are making a car. You can buy a range of different gearboxes, and different engines at different prices. Do you buy a state of the art engine and a rusty mess of a gearbox? No, the best way to get a functioning car on your budget is to buy a fairly good gearbox and engine. The same might apply to an AI, the easiest place to improve might be where it is worst.
Is conway’s life, with random starting state, interpretable? If you zoom in on any single square, it is trivial to predict what it will do. Zoom out and you need a lot of compute. There is no obvious way to predict if a cell will be on in 1000000 timesteps without brute force simulating the whole thing (at least the past light cone). What would an interpretability tool for conway’s life look like?
Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.