MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
I don’t think this works in the infinite limit. With a truely unlimited amount of compute, insane things happen. I wouldn’t trust that a randomly initialized network wasn’t already a threat.
For example, bulk randomness can produce deterministic-seeming laws over the distribution. (Statistical mechanics). These laws can in turn support the formation and evolution of life.
That or a sufficiently large neural net could just have all sorts of things hiding in it by shear probability.
The win scenario here is that these techniques work well enough that we get LLM’s that can just tell us how to solve alignment properly.
If you really have preferences over all possible histories of the universe, then technically you can do anything.
Money pumping thus only makes sense in a context where your preferences are over a limited subset of reality.
Suppose you go to a pizza place. The only 2 things you care about are which kind of pizza you end up eating, and how much money you leave with. And you have cyclic preferences about pizza flavor A<B<C<A.
Your waiter offers you a 3 way choice between pizza flavors A, B, C. Then they offer to let you change your choice for $1, repeating this offer N times. Then they make your pizza.
Without loss of generality, you originally choose A.
For N=1, you change your choice to B, having been money pumped for $1. For N=2, you know that if you change to B the first time, you will then change to C, so you refuse the first offer, and then change to B. The same goes for constant known N>2, repeatedly refuse, then switch at the last minute.
Suppose the waiter will keep asking and keep collecting $1 until you refuse to switch. Then, you will wish you could commit yourself to paying $1 exactly once, and then stopping. But if you are the sort of agent that switches, you will keep switching forever, paying infinity $ and never getting any pizza.
Suppose the waiter rolls a dice. If they get a 6, they let you change your pizza choice for $1, and roll the dice again. As soon as they get some other number, they stop rolling dice and make your pizza. Under slight strengthening of the idea of cyclical preferences to cover decisions under uncertainty, you will keep going around in a cycle until the dice stops rolling 6′s.
So some small chance of being money pumped.
Money pumping is an agent with irrational cyclic preferences is quite tricky, if the agent isn’t looking myopically 1 step ahead but can forsee how the money pumping ends long term.
but my intuition suggests that giant amounts of transistors shouldn’t be the fastest way to compute almost everything,
You really want your computation devices to be small, fast, cheap and reliable. And transistors are very small and fast and reliable. Also, binary has a lot of advantages, and transistors can do arbitrary logic gates.
Also, special purpose components have a lot of limitations.
Your gravity sort only works if the computer is the right way up.
Analogue processes in general are hard to do to reasonable precision. The sort of optics based fourier transform hardware would not only be low precision, and probably nondeterministic, it would also be fixed size. If you want to do bigger or smaller fourier transforms, tough.
It’s hard to replace general purpose components with special purpose ones because there are so many different things you might want to compute. Modern computers can do loads of tasks at a well enough level. A device that could do all sorting magically and instantly, and made your computer 10% more expensive, would still probably not be worth it. How much of your computers time is actually spent on sorting.
A lot of the code currently run isn’t a neat maths thing like sorting or fourier transform. It’s full of 1000′s of messy details. (eg the linux kernal, firefox, most other packages) It would be possible to make specialized hardware with a specific version of all the details built into it. But other than that, what you want is a general instruction following machine.
Now, Bell could patch over this problem. For instance, they could pick a bunch of functions like , , , etc, and require that those also be uncorrelated.
One neat thing, if you require ALL functions to be uncorrelated, then this is equivalent to saying the mutual information is 0.
One thing that makes this more complicated, is you seem to be talking about omnipresent simulated clones. But in such a scenario, a large fraction of my utility would concern the clones. So any task that requires too much boring manual detail work is likely to just not get done. Or are they hypothetical clones in some way? Is this about what the clones could do, not about what they would do?
Yeah. Probably.
I think I wasn’t using the symbols ⇒ as a inequality comparison but as an arrow or something. Not sure.
Looked up Blum’s speedup theorem.
Simplified structure of proof. Consider a listing of all turing machines. T_1, T_2 …
The problem over which the speed up theorem applies involves simulating the first N Turing machines, but applying MUCH more compute to those that come first in the listing.
So your simulating Turing machine i for 2^^(N-i) (note double up arrow) steps.
By caching the outputs of the first few Turing machines, you can get a faster algorithm.
But, at some point, you get a Turing machine that never halts, but can’t be proved never to halt in ZFC.
At this point, your slower algorithm has to actually run the Turing machine, and your faster algorithm just magically knows that it never halts.
Blums speedup comes from replacing a long running computation, with a look up table full of uncomputable data. Computability theories “there exists an algorithm” allows you to magically know any finite amount of data.
Given the Blum infinite sequence of ever faster algorithms, there is a fastest algorithm that can be proved to work in ZFC.
In sports with a weight-class, people do dubious things like dehydrating themselves to lose weight.
What unhealthy tricks might be used to cut down on compute-weight?
Aren’t social games potentially sufficiently non-zero-sum that it’s fine for everyone to play together.
(Think parents letting their small children win at easy games?)
Yes. Sorry wrong words. I was more meaning Solomonov induction here.
For instance, there is no computable predictor which always learns to do as well as every other computable predictor on all sequences.
What’s going on here is quite interesting. For any number N, a version of AIXI with time and length bound N is in some sense optimal for it’s compute.
So (in the limit), what you want is a really big N.
One way to do that is to pick a probability distribution over Turing machines, then store Chaitin’s constant ( p(halt) ) to some number of digits. Start running more and more Turing machines for longer and longer until that many halt, and the time taken is your number.
If we take this probability distribution over Turing machines as our prior, then the Turing machines that this algorithm does badly on are the ones that don’t halt in time N. And each extra digit of Chaitin’s constant, on average, halves the measure of such Turing machines.
Would an elegant implementation of AGI be safe? What about an elegant and deeply understood implementation?
My impression is that it’s possible for an elegant and deeply understood implementation to block particular failure modes. An AI with a tightly bounded probability distribution can’t get pascal mugged.
Being well understood isn’t sufficient. Often it just means you know how the AI will kill you.
And “Safe” implies blocking all failure modes.
I noticed my toothbrush. I tend to brush hard, so I go through toothbrushes quickly; the bristles were all splayed out rather than straight. Wizard power would be making my own toothbrush, out of something which wouldn’t wear out so easily.
They do make various different firmnesses of toothbrush.
In general, there are large economies of scale from mass production. Society needs to be seriously screwing something up before making it yourself is a viable option for a lot of things. You can make your own toothbrush as a hobby, but it will probably be cheaper and easier to just buy more toothbrushes. Or, run hot water over the bristles, squeeze them back into shape, then run cold water over them. Or just keep chewing on a flat looking toothbrush.
(Also, you want your toothbrush to be less wear resistant than your teeth, as your toothbrush is easier to replace)
Wizard-power is pretty weak when you are just making something for yourself. It only really shines with economies of scale. Some of that economy of scale is in owning the tools. Some is in understanding the particular details of the domain.
I’d have to run the numbers to check that 200 flips is enough to give a high-confidence estimate of
It isn’t enough. See plot. Also, 200 not being enough flips is part of what makes this interesting. With a million flips, this would pretty much just be the exact case. The fact that it’s only 200 flips gives you a tradeoff in how many label_bits to include.
Here is the probability density function for heads plotted for each of your coins.
python code
import numpy as np
l=np.linspace(0,1,32)
def f(a):
a=np.array([a,1-a])
b=a
for i in range(199):
b=np.convolve(b,a)
return bq=np.arange(201)
import matplotlib.pyplot as plt
ff=[f(i) for i in l]
plt.plot(ff);plt.show()for i in ff:
_=plt.plot(q,i)
plt.show()
plt.xlabel(“heads”)
Text(0.5, 0, ‘heads’)
plt.ylabel(“prob”)
Text(0, 0.5, ‘prob’)
for i in ff:
_=plt.plot(q,i)
plt.show()
The modern world does intentional large-scale chemical engineering, and this is meaningfully different in its scale and quickly-shifting nature than nearly anything that came before 1661. You can’t necessarily use the same analogical and cultural reasoning you would have used in 1660 to tell you what’s safe and what isn’t, in a world where everything you touch was chemically engineered.
You should be less confident in your understanding of the long-term consequences of eating Red Dye Number 40 than in your understanding of the long-term consequences of eating non-GMO wheat.
I’m not confident in this. I doubt analogical and cultural reasoning were ever that great. People in 1660 made use of lead and mercury, tobacco and ethanol.
If something is sufficiently obviously and rapidly poisonous, cultural knowledge will pick that up, but cultural knowledge like this isn’t nearly as accurate as modern science. Most of our best knowledge about the health effects of wheat comes from modern science.
The phenomena of caring a lot about subtle long term health effects is relatively recent, driven in part by increased wealth.
Medieval peasants ate wheat, not because they had access to some cultural knowledge on wheat not having long term health effects. But because there weren’t many other easy sources of calories around. The agricultural calorie economics basically forced them to eat mostly grain,
And, what was best to eat in a medieval context isn’t necessarily what’s best to eat now. (Eg ancient traditions about drinking beer because the water wasn’t safe. This kind of cultural knowledge can go out of date for reasons other than artificial additives to foods. )
In the long term, I think the goal should be an AI that is nice to humans, because it was programmed to be nice to humans. This AI is not in any way reliant on the productive abilities of humans. In this scenario, capitalism isn’t really relevant. The AI is a singleton.
if metaethicists really were serious about resolving their disputes they should contract a software engineer (or something) to help implement on GitHub a metaethics version of Table 2
There is a progression from philosophy to maths to engineering. But this sounds like your anxious to skip to the engineering. As the old addage goes. Engineering must be done. This is engineering. Therefore this must be done.
If the LLM is just spitting out random opinions it found on r/philosophy, how is this useful? If we want a bunch of random opinions, we can check r/philosophy ourselves.
This plan sounds like a rush to engineer something without the philosophy, resulting in entirely the wrong thing being produced.
and then accept that real-world engineering solutions tend to be “dirty” and inelegant remixes plus kludgy optimisations to handle edge cases,
Because the tricky thing here isn’t making an algorithm to produce the right answer, but deciding what the right answer is.
Suppose I had an algorithm that could perfectly predict what Joe public would think about any ethics dilemma, given 1 minute to think. Is this algorithm a complete solution to meta-ethics.
No.
In the Bomb example, CDT supposedly picks the right box, despite Omega’s prediction. I think the bomb question is broken in some way.
I think paying in counterfactual mugging is actually the right decision. (In bizarre hypothetical land. In real life, don’t pay it’s a 2 tailed coin)
Here is a hypothetical. I go up to you and say.
I know omega will play counterfactual mugging with you tomorrow.
If you pay me $1 now, and then don’t pay omega, I will set your car on fire.
If you don’t pay me any money now, I will go away without setting your car on fire.
You think about that and realize. If you don’t pay me $1, then the expected value is $0. (No cars burned, no payments to or from Omega.)
But if you do pay me $1 now, then you better pay Omega $100 tomorrow if Omega asks. So Omega will predict this. E(U)=0.5*$10,000 − 0.5*$100 -$1 = $4949.
Also, the bomb example is malformed. It involved Omega predicting your actions, but also leaving you a note.
Suppose your actions are “read the note, and go for the box without the bomb.”? What does Omega do if this is your algorithm? This hypothetical as you have it seems to involve the perfect predictor Omega failing to predict the actions of CDT, EDT and VDT.
Also, for intuitions (and possibly sensible behavior), scale matters. That is, I would 1 box if the quantities were $1 and $1,000,000. But 2 box if the quantities were $999,999 and $1,000,000. This isn’t consistent with being certain of any 1 decision theory. But it is very sensible behavior if your uncertain which decision theory is true in some sense.
Your bad example takes 5 minutes at a party. Your “good” example takes 8 weeks of work. It is not hard, in general, to get a better answer by investing more effort.
A specific example, worked out in full detail can exhibit the presence of security holes, but not their absence. If the system is a complicated mess, it can be very hard to find a security hole, but also very hard to prove it doesn’t have one. (And it’s quite likely it does have one)
When speculating about the risks of future AI, the easiest proofs of concept will be rather toy and of arguable relevance. More sophisticated proofs of concept on less toy examples might be dangerous to create.
If you see a bunch of potential threats, it’s not guaranteed that all those threats are real. But they are all likely enough to be real that you have to plan for them. The list of speculations will contain some false positives. The list of fully detail worked out exploits will contain false negatives.