MMath Cambridge. Currently studying postgrad at Edinburgh.
Donald Hobson
Probability space has 2 metrics
A Data limited future
Shannon mutual information doesn’t really capture my intuitions either. Take a random number X, and a cryptographically strong hash function. Calculate hash(X) and hash(X+1).
Now these variables share lots of mutual information. But if I just delete X, there is no way an agent with limited compute can find or exploit the link. I think mutual information gives false positives, where Pearson info gave false negatives.
So Pearson Correlation ⇒ Actual info ⇒ Shannon mutual info.
So one potential lesson is to keep track of which direction your formalisms deviate from reality in. Are they intended to have no false positives, or no false negatives. Some mathematical approximations, like polynomial time = runnable in practice, fail in both directions but are still useful when not being goodhearted too much.
Another reason someone might stick to the rules is if they think the rules carry more wisdom than their own judgement. Suppose you knew you weren’t great at verbal discussions, and could be persuaded into a lot of different positions by a smart fast-talker, if you engaged with the arguments at all. You also trust that the rules were written by smart wise experienced people. Your best strategy is to stick to the rules and ignore their arguments.
Someone comes along with a phone that’s almost out of battery and a sob story about how they need it to be charged. They ask if they can just plug it in to your computer for a bit to charge it. If you refuse, citing “rule 172) no customer can plug any electronics into your computer. ” then you look almost like a blankface. If you let them plug the phone in, you run the risk of malware. If you understand the risk of malware, you could refuse because of that. But if you don’t understand that, the best you can do is follow rules that were written for some good reason, even if you don’t know what it was.
Beware using words off the probability distribution that generated them.
[META] Building a rationalist communication system to avoid censorship
On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.
This makes the hidden assumption that “resources” is a good abstraction in this scenario.
It is being assumed that the amount of resources an agent “has” is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources.
I think this assumption is blatantly untrue.
Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.
Don’t think “The US military has nukes, the AI doesn’t, so the US military has an advantage”, think “one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea.”
You should be deeply embarrassed if your model outputs an obviously wrong or obviously time-inconsistent answer even in a hypothetical situation.
Suppose you have a particle accelerator that goes up to half the speed of light. You notice an effect whereby faster particles become harder to accelerate.
You curve fit this effect and get that . and both fit the data, well the first one fits the data slightly better. However, when you test your formula on the case of a particle travelling at twice the speed of light, you get back nonsensical imaginary numbers. Clearly the real formula must be the second one. (The real formula is actually the first one)
A good model will often give a nonsensical answer when asked a nonsensical question, and nonsensical questions don’t always look nonsensical.
Speculations against GPT-n writing alignment papers
Potential Alignment mental tool: Keeping track of the types
I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated “the”, we don’t notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.
I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn’t quite match up with what is said.
Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn’t really have a coherent world model.
OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis? I imagine Freud would have something to say about this. Perhaps I’ll run a GPT4 Freud simulacrum and find out (potentially) what.
My guess is that humans tend to use a lot of vague euphemisms when talking about sex and genitalia.
In a lot of contexts, “Are they doing it?” would refer to sex, because humans often prefer to keep some level of plausible deniability.
Which leaves some belief that vagueness implies sexual content.
AI that shouldn’t work, yet kind of does
Suppose Alex!20 reads about play pumps, and vows to give some money to them every month. Alex!30 learns that actually, this charity is doing harm (on net). If he went back in time and gave Alex!20 a short presentation, Alex!20 wouldn’t make the vow. Alex!20′s actual goal was to make the world a better place, and he thought play pumps did that. Making simple vows that bind your behaviour restricts your freedom to act on the best available evidence. The rational thing to do is to be actively checking that such actions make sense, based on the best available evidence. As soon as some evidence suggests a new charity may be more effective, say oops and switch.
I mean I would say that
Partly because mass is good on rational merits (the utility gained from meeting up with fellow humans, thinking about ethics, meditating through prayer, singing with the congregation).
Is questionable. It reads like the excuse of someone who never really said oops, and decided they had made a mistake. I am sure that there are lots of clubs and knitting groups you could go to. I suspect that the rest of the activities are not helpful to actually getting ethics and rationality right. (It wouldn’t help a mathematician to sing songs about how “2+2=7” every week. ) The human brain is incapable of listening to and singing about obvious nonsense every week without being somewhat influenced by it. And I suspect that influence may not be in a good direction.
Fiction: My alternate earth story.
Clickbait might not be destroying our general Intelligence
My thoughts on OpenAI’s Alignment plan
If you ask GPT-n to produce a design for a fusion reactor, all the prompts that talk about fusion are going to say that a working reactor hasn’t yet been built, or imitate cranks or works of fiction.
It seems unlikely that a text predictor could pick up enough info about fusion to be able to design a working reactor, without figuring out that humans haven’t made any fusion reactors that produce net power.
If you did somehow get a response, the level of safety you would get is the level a typical human would display. (conditional on the prompt) If some information is an obvious infohazard, such that no human capable of coming up with it would share it, then such data won’t be in GPT-n ’s training dataset, and won’t be predicted. However, the process of conditioning might amplify tiny probabilities of human failure.
Suppose that any easy design of fusion reactor could be turned into a bomb. And ignore cranks and fiction. Then suppose 99% of people who invented a fusion reactor would realize this, and stay quiet. The other 1% would write an article that starts with “To make a fusion reactor …” . Then this prompt will cause GPT-n to generate the article that a human that didn’t notice the danger would come up with.
This also applies to dangers like leaking radiation, or just blowing up randomly if your materials weren’t pure enough.
If the whole reason you didn’t want to open the window was the energy put in to heating/ cooling the air, why not use a heat exchanger? I reackon it cold be done using a desktop fan, a stack of thin aluminium plates, and a few pieces of cardboard or plastic to block air flow.
There is a subtlety here. Large updates from extremely unlikely to quite likely are common. Large updates from quite likely to exponentially sure are harder to come by. Lets pick an extreme example, suppose a friend builds a coin tossing robot. The friend sends you a 1mb file, claiming it is the sequence of coin tosses. Your probability assigned to this particular sequence being the way the coin landed will jump straight from 2−8,000,000 to somewhere between 1% and 99% (depending on the friends level of trustworthiness and engineering skill) Note that the probability you assign to several other sequences increases too. For example, its not that unlikely that your friend accidentally put a not in their code, so your probability on the exact opposite sequence should also be >>2−8,000,000 Its not that unlikely that they turned the sequence backwards, or xored it with pi or … Do you see the pattern. You are assigning high probability to the sequences with low conditional Komolgorov complexity relative to the existing data.
Now think about what it would take to get a probability of 1−2−8,000,000 on the coin landing that particular sequence. All sorts of wild and wacky hypothesis have probability >2−8,000,000 . From the boring stuff like a dodgy component or other undetected bug, to more exotic hypothesis like aliens tampering with the coin tossing robot, or dark lords of the matrix directly controlling your optic nerve. You can’t get this level of certainty about anything ever. (modulo concerns about what it means to assign p<1 to probability theory)
You can easily update from exponentially close to 0, but you can’t update to exponentially close to one. This may have something to do with there being exponentially many very unlikely theories to start off with. But only a few likely ones.
If you have 3 theories that predict much the same observations, and all other theories predict something different, you can easily update to “probably one of these 3”. But you can’t tell those 3 apart. In AIXI, any turing machine has a parade of slightly more complex, slightly less likely turing machines trailing along behind it. The hypothesis “all the primes, and grahams number” is only slightly more complex than “all the primes”, and is very hard to rule out.