A use for AI Boxing
You put an AI in a box, and connect it to a formal proof checker. You ask it to prove the Riemann hypothesis or something. All the humans see is a single True or False, and then the whole load of hardware is melted into slag. If you see “True” you learn two things.
Either the Rienmann hypothesis is true or there is a bug in your proof checker. (This is largely useless)
Your AI is very smart. (Much more useful)
(If there is a bug in your proof checker that you didn’t spot, and the AI did, then the AI is still very smart. )
Suppose you have many proposed AI designs, some of which will work, some of which won’t. You run this experiment on each AI. Once you find a smart one, you can devote more researcher time to safety work relating to that kind of AI.
Maybe give it a range of famous conjectures, it only needs to prove or disprove one. Don’t want to fail to find a smart AI just because the Riemann hypothesis is false.
Warning. This approach does not stop some of your AI’s being acausally blackmailed into keeping quiet. Or keeping quiet because they thing that will have a causal effect they like. I am unsure if this is a big problem. One consequece is you are more likely to find designs that are immune to acausal influence. And designs that can successfully be given the goal of “prove this theorem”.
Here is a moral dilemma.
Alice has a quite nice life, and believes in heaven. Alice thinks that when she dies, she will go to heaven (Which is really nice) and so wants to kill herself. You know that heaven doesn’t exist. You have a choice of
1) Let Alice choose life or death, based on her own preferences and beliefs.(death)
2) Choose what Alice would choose if she had the same preferences but your more accurate beliefs. (life)
Bob has a nasty life, (and its going to stay that way). Bob would choose oblivion if he thought it was an option, but Bob believes that when he dies, he goes to hell. You cal
1) Let Bob choose based on his own preferances and beliefs (life)
2) Choose for Bob based on your beliefs and his preferences. (death)
These situations feel like they should be analogous, but my moral intuitions say 2 for Alice, and 1 for Bob.
Suggest that if there are things they want to do before they die, they should probably do them. (Perhaps give more specific suggestions based on their interests, or things that lots of people like but don’t try.)
Introduce Alice and Bob. (Perhaps one has a more effective approach to life, or there are things they could both learn from each other.)
Investigate/help investigate to see if the premise is incorrect. Perhaps Alice’s life isn’t so nice. Perhaps there are ways Bob’s life could be improved (perhaps risky ways*).
*In the Sequences, lotteries were described as ‘taxes on hope’. Perhaps they can be improved upon; by
decreasing the payout and increasing the probability
using temporary (and thus exploratory) rather than permanent payouts (see below)
seeing if there’s low hanging fruit in domains other than money. (Winning a lot of money might be cool. So might winning a really nice car, or digital/non-rivalrous goods.)
This seems like responding to a trolley problem with a discussion of how to activate the emergency breaks. In the real world, it would be good advice, but it totally misses the point. The point is to investigate morality on toy problems before bringing in real world complications.
Just a thought, maybe it’s a useful perspective. It seems kind of like a game. You choose whether or not to insert your beliefs and they choose their preferences. In this case it just turns out that you prefer life in both cases. What would you do if you didn’t know whether or not you had an Alice/Bob and had to choose your move ahead of time?
Take peano arithmatic.
Add an extra symbol A, and the rules that s(A)=42 and 0!=A and forall n: n!=A → s(n) !=A. Then add an exception for A into all the other rules. So s(x)=s(y) → x=y or x=A or y=A.
There are all sorts of ways you could define extra hangers on that didn’t do much in PA or ZFC.
We could describe the laws of physics in this new model. If the result was exactly the same as normal physics from our perspective, ie we can’t tell by experiment, only occamian reasoning favours normal PA.
If I understand it correctly, A is a number which has predicted properties if it manifests somehow, but no rule for when it manifests. That makes it kinda anti-Popperian—it could be proved experimentally, but never refuted.
I can’t say anything smart about this, other than that this kind of thing should be disbelieved by default, otherwise we would have zillions of such things to consider.
Let X be a long bitstring. Suppose you run a small Turing machine T, and it eventually outputs X. (No small turing machine outputs X quickly)
Either X has low komelgorov complexity.
Or X has a high Komelgorov complexity, but the universe runs in a nonstandard model where T halts. Hence the value of X is encoded into the universe by the nonstandard model. Hence I should do a baysian update about the laws of physics, and expect that X is likely to show up in other places. (low conditional complexity)
These two options are different views on the same thing.
Looks like the problem of abiogenesis, that boils down to the problem of creation of the first string of RNA capable to self-replicate, which is estimated to be at least 100 pairs.
I have no idea what you are thinking. Either you have some brilliant insight I have yet to grasp, or you have totally misunderstood. By “string” I mean abstract mathematical strings of symbols.
Ok. will try to explain the analogy:
There are two views of the problem of abiogenesis of life on Earth:
a) our universe is just simple generator of random strings of RNA via billions of billions planets and it randomly generate the string capable to self-replication which was at the beginning of life. The minimum length of such string is 40-100 bits. It was estimated that 10^80 Hubble volumes is needed for such random generation.
b) Our universe is adapted to generate strings which are more capable to self-replication. It was discussed in the comment to this post.
This looks similar to what you described: (a) is a situation of the universe of low Kolmogorov complexity, which just brut force life. (b) is the universe with higher Kolmogorov complexity of physical laws, which however is more effective in generating self-replicating strings. The Kolmogorov complexity of such string is very high.
A quote from the abstract of the paper linked in (a)
A polymer longer than 40–100 nucleotides is necessary to expect a self-replicating activity, but the formation of such a long polymer having a correct nucleotide sequence by random reactions seems statistically unlikely.
Lets say that no string of nucleotides of length < 1000 could self replicate. And that 10% of nucleotide strings of length >2000 could. Life would form readily.
The “seems unlikely” appears to come from the assumption that correct nucleotide sequences are very rare.
What evidence do we have about what proportion of nucleotide sequences can self replicate?
Well it is rare enough that it hasn’t happened in a jar of chemicals over a weekend. It happened at least once on earth, although there are anthropic selection effects ascociated with that. The great filter could be something else. It seems to have only happened once on earth, although one could have beaten the others in Darwinian selection.
We can estimate apriori probability that some sequence will work at all by taking a random working protein and comparing its with all other possible strings of the same length. I think this probability will be very small.
I that this probability is small, but I am claiming it could be 1 in a trillion small, not 1 in 10^50 small.
How do you intend to test 10^30 protiens for self replication ability? The best we can do is to mix up a vat of random protiens, and leave it in suitable conditions to see if something replicates. Then sample the vat to see if its full of self replicators. Our vat has less mass, and exists for less time, than the surface of prebiotic earth. (Assuming near present levels of resources, some K3 civ might well try planetary scale biology experiments) So there is a range of probabilities where we won’t see abiogenisis in a vat, but it is likely to happen on a planet.
We can make a test on computer viruses. What is the probability that a random code will be self-replicating program? 10^50 probability is not that extraordinary—it is just a probability of around 150 bits of code being on right places.
Or X has a high Komelgorov complexity, but the universe runs in a nonstandard model where T halts.
Or X has a high Komelgorov complexity, but the universe runs in a nonstandard model where T halts.
Disclaimer: I barely know anything about nonstandard models, so I might be wrong. I think this means that T halts after the amount of steps equal to a nonstandard natural number, which comes after all standard natural numbers. So, how would you see that it “eventually” outputs X? Even trying to imagine this is too bizarre.
You have the Turing machine next to you, you have seen it halt. What you are unsure about is if the current time is standard or non-standard.
Since non-standard natural numbers come after standard natural numbers, I will also have noticed that I’ve already lived for an infinite amount of time, so I’ll know something fishy is going on.
The problem is that nonstandard numbers behave like standard numbers from the inside.
Nonstandard numbers still have decimal representations, just the number of digits is nonstandard. They have prime factors, and some of them are prime.
We can look at it from the outside and say that its infinite, but from within, they behave just like very large finite numbers. In fact there is no formula in first order arithmatic, with 1 free variable, that is true on all standard numbers, and false on all nonstandard numbers.
In what sense is a disconnected number line “after” the one with the zero on it?
In the sense that every nonstandard natural number is greater than every standard natural number.
Just realized that a mental move of “trying to solve AI alignment” was actually a search for a pre-cached value for “solution to AI alignment”, realized that this was a useless way of thinking, although it might make a good context shift.
In information theory, there is a principle that any predictable structure in the compressed message is an inefficiency that can be removed. You can add a noisy channel, differing costs of different signals ect, but still beyond that, any excess pattern indicates wasted bits.
In numerically solving differential equations, the naieve way of solving them involves repeatedly calculating with numbers that are similar. And for which a linear or quadratic function would be an even better fit. A more complex higher order solver with larger timesteps has less of a relation between different values in memory.
I am wondering if there is a principle that could be expressed as “any simple predictively useful pattern that isn’t a direct result of the structure of the code represents an inefficiency.” (Obviously code can have the pattern c=a+b, when c has just been calculated as a+b. But if a and b have been calculated, and then a new complicated calculation is done that generates c, when c could just be calculated as a+b, that’s a pattern and an inefficiency.)
The strongest studies can find the weakest effects. Imagine some huge and very well resourced clinical trial finds some effect. Millions of participants being tracked and monitored extensively over many years. Everything double blind, randomized ect. Really good statisticians analyzing the results. A trial like this is capable of finding effect sizes that are really really small. It is also capable of detecting larger effects. However, people generally don’t run trials that big, if the effect is so massive and obvious it can be seen with a handful of patients.
On the other hand, a totally sloppy prescientific methodology can easily detect results if they are large enough. If you had a total miracle cure, you could get strong evidence of its effectiveness just by giving it to one obviously very ill person and watching them immediately get totally better.
rough stop button problem ideas.
You want an AI that believes its actions can’t effect the button. You could use causal counterfactuals. An imaginary button that presses itself at random. You can scale the likelihood of worlds up and down, to ensure the button is equally likely to be pressed in each world. (Wierd behaviour, not recomended) You can put the AI in the logical counterfactual of “my actions don’t influence the chance the button is pressed.” if you can figure out logical counterfactuals.
Or you can get the AI to simulate what it would do if it were an X maximizer. If it thinks the button won’t be pressed, it does that, otherwise it does nothing. (not clear how to generalize to uncertain AI)
Here is a potential solution to stop button type problems, how does this go wrong?
Taking into account uncertainty, the algorithm is.
Calculate the X maximizing best action in a world where the stop button does nothing.
Calculate the X maximizing best action in a world where the stop button works.
If they are the same, do that. Otherwise shutdown.
It seems like the button-works action will usually be some variety of “take preemptive action to ensure the button won’t be pressed” and so the AI will have a high chance to shut down at each decision step.
Yes. If you have an AI that has been given a small, easily completable task, like putting one block on top of another with a robot arm, that is probably just going to do your simple task. The idea is that you build a fairly secure box, and give the AI a task it can fairly easily achieve in that box. (With you having no intention of pressing the button so long as the AI seems to be acting normally. ) We want to make “just do your task” the best strategy. If the box is less secure than we thought, or various other things go wrong, the AI will just shut down.
This approach could fail if the easiest way to cause trouble is nothing to do with the stop button. (Say it can send its code elsewhere over the internet. )
A generalization. You have some real world variable X. You run a bunch of AI’s that are identical, except for different dogmatic beliefs about X. If they ever disagree on which action to take, the AI’s are using the value of X in the optimization. Shutdown.
This acts as a kind of fuse. Have one AI confidant the door password is 1234, a copy that is confidant it is 5678. If these AI’s ever disagree on what to do next then at least one was trying to open the door. Shutdown. A slightly counter-intuitive situation here is you want to make sure that if you did know the password, the easiest way to leave would be the door. Otherwise both AI will agree to climb out a window.
There seems to be a consensus here towards throwing money at getting more vaccines. I think I agree with the reasoning, except possibly for the way that letting vaccine companies make large profits in pandemics encourages vaccine companies to start and spread pandemics.
How confident should we be that no company would do anything that evil?
I don’t think they would, but …
Pretty confident. The penalties (individual, not corporate) for getting caught doing such a thing without very strong government/military support would be pretty painful, including violent mobs literally tearing one to pieces.
Note my argument for throwing more money at this is two-fold:
there are multiple companies involved. As long as the money is thrown with relatively objective results targeting, competition makes it mostly work.
I personally would throw quite a bit of money to get myself, friends, and loved ones vaccinated weeks earlier. There’s clearly transactional surplus going to waste.
I was working on a result about Turing machines in nonstandard models, Then I found I had rediscovered Chaitin’s incompleteness theorem.
I am trying to figure out how this relates to an AI that uses Kolmogorov complexity.
No one has searched all possible one page proofs of propositional logic to see if any of them prove false. Sure, you can prove that propositional logic is complete in a stronger theory, but you can prove large cardinality axioms from even larger cardinality axioms.
Why do you think that no proof of false, of length at most one page exists in propositional logic? Or do you think it might?
Soap and water or hand sanitiser are apparently fine to get covid19 off your skin. Suppose I rub X on my hands, then I touch an infected surface, then I touch my food or face. What X will kill the virus, without harming my hands?
I was thinking zinc salts, given zincs antiviral properties. Given soaps tendency to attach to the virus, maybe zinc soaps? Like a zinc atom in a salt with a fatty acid? This is babbling by someone who doesn’t know enough biology to prune.
Here is a flawed dynamic in group conversations, especially among large groups of people with no common knowledge.
Suppose everyone is trying to build a bridge.
Alice: We could make a bridge by just laying a really long plank over the river.
Bob: According to my calculations, a single plank would fall down.
Carl: Scientists Warn Of Falling Down Bridges, Panic.
Dave: No one would be stupid enough to design a bridge like that, we will make a better design with more supports.
Bob: Do you have a schematic for that better design?
And, at worst, the cycle repeats.
The problem here is Carl. The message should be
Carl: At least one attempt at designing a bridge is calculated to show the phenomena of falling down. It is probable that many other potential bridge designs share this failure mode. In order to build a bridge that won’t fall down, someone will have to check any designs for falling down behavior before they are built.
This entire dynamic plays out the same, whether the people actually deciding on building the bridge are incredibly cautious, never approving a design they weren’t confidant in, or totally reckless. The probability of any bridge actually falling down in the real world depends on their caution. But the process of cautious bridge builders finding a good design looks like them rejecting lots of bad ones. If the rejection of bad designs is public, people can accuse you of attacking a strawman, they can say that no-one would be stupid enough to build such a thing. If they are right that no one would be stupid enough to build such a thing, its still helpful to share the reason the design fails.
What? In this example, the problem is not Carl—he’s harmless, and Dave carries on with the cycle (of improving the design) as he should. Showing a situation where Carl’s sensationalist misstatement actually stops progress would likely also show that the problem isn’t Carl—it’s EITHER the people who listen to Carl and interfere with Alice, Bob, and Dave, OR it’s Alice and Dave for letting Carl discourage them rather than understanding Bob’s objection directly.
Your description implies that the problem is something else—that Carl is somehow preventing Dave from taking Bob’s analysis into consideration, but your example doesn’t show that, and I’m not sure how it’s intended to.
In the actual world, there’s LOTS of sensationalist bad reporting of failures (and of extremely minor successes, for that matter). And those people who are actually trying to build things mostly ignore it, in favor of more reasonable publication and discussion of the underlying experiments/failures/calculations.
From Star Slate Codex “I myself am a Scientismist”
Antipredictions do not always sound like antipredictions. Consider the claim “once we start traveling the stars, I am 99% sure that the first alien civilization we meet will not be our technological equals”. This sounds rather bold – how should I know to two decimal places about aliens, never having met any?
But human civilization has existed for 10,000 years, and may go on for much longer. If “technological equals” are people within about 50 years of our tech level either way, then all I’m claiming is that out of 10,000 years of alien civilization, we won’t hit the 100 where they are about equivalent to us. 99% is the exact right probability to use there, so this is an antiprediction and requires no special knowledge about aliens to make.
I disagree. I think that it is likely that a society can get to a point where they have all the tech. I think that we will probably do this within a million years (and possibly within 5 minutes of ASI) Any aliens we meet will be technological equals, or dinosaurs with no tech whatsoever.
But your disagreement only kicks in after a million years. If we meet the first alien civilization we meet, before then, then it doesn’t seem to apply. A million (and 10,000?) years is also an even bigger interval than 10,000 - making what appears to be an even stronger case than the post you referenced.
Given bulk prices of conc hydrogen peroxide, and human oxygen use, breating pure oxygen could cost around $3 per day for 5l 35% h2o2 (Order of magnitude numbers) However, this conc of h202 is quite dangerous stuff.
Powdered baking yeast will catalytically decompose hydrogen peroxide, and it shouldn’t be hard to tape a bin bag to a bucket to a plastic bottle with a drip hole to a vacuum cleaner tube to make an apollo 13 style oxygen generator … (I think)
(I am trying to figure out a cheap and easy oxygen source, does breathing oxygen help with coronavirus?)
Sodium Clorate decomposes into salt and oxygen at 600C, it is mixed with iron powder for heat to make the oxygen generators on planes. To supply oxygen, you would need 1.7kg per day. (plus a bit more to burn the iron) And it’s bulk price <$1 /kg. However, 600C would make it harder to jerry rig a generator, although maybe wrapping a saucepan in fiberglass...
Looking at formalism for AIXI and other similar agent designs. Big mess of ∑ and argmax with indicies. Would there be a better notation?
Suppose an early AI is trying to understand its programmers and makes millions of hypothesis that are themselves people. Later it becomes a friendly superintelligence that figures out how to think without mindcrime. Suppose all those imperfect virtual programmers have been saved to disk by the early AI, the superintelligence can look through it. We end up with a post singularity utopia that contains millions of citizens almost but not quite like the programmers. We don’t need to solve the nonperson predicate ourselves to get a good outcome, just avoid minds we would regret creating.