I operate by Crocker’s rules.
If you try this, reformat to work around the BPE problem as detailed in https://www.gwern.net/GPT-3#bpes
making it easy to catch abuse/vandalism of the system
This suggests that even the admins don’t know who upvoted which post. Do they?
I let it pass even though its answer was not well formed because it mentioned both the show and the type of store, so I judged that it saw all the relevant connections. I suppose you’re used to better form from it.
Feel free to be rude to me, I operate by Crocker’s rules :)
...it didn’t fail abysmally? Am I being silly? It correctly explains the first two puns and fails on the third.
This roughly tracks what’s going on in our real beliefs, and why it seems absurd to us to infer that the world is a dream of a rational agent—why think that the agent will assign higher probability to the real world than the “right” prior? (The simulation argument is actually quite subtle, but I think that after all the dust clears this intuition is basically right.)
To the extent that we instincitively believe or disbelieve this, it’s not for the right reasons—natural selection didn’t have any evidence to go on. At most, that instinct is a useful workaround for the existential dread glitch.
Assume that there is a real prior (I like to call this programming language Celestial), and that it can be found from first principles and having an example universe to work with. Then I wouldn’t be surprised if we receive more weight indirectly than directly. After all:
Our laws of physics may be simple, but us seeing a night sky devoid of aliens suggests that it takes quite a few bits to locate us in time and space and improbability.
An anthropic bias would circumvent this, and agents living in the multiverse would be incentivized to implement it: The universes thereby promoted are particularly likely to themselves simulate the multiverse and act on what they see, and those are the only universes vulnerable to the agent’s attack.
Our universe may be particularly suited to simulate the multiverse in vulnerable ways, because of our quantum computers. All it takes is that we run a superposition of all programs, rely on a mathematical heuristic that tells us that almost all of the amplitudes cancel out, and get tricked by the agent employing the sort of paradox of self-reference that mathematical heuristics tend to be wrong on.
If the quirks of chaos theory don’t force the agent to simulate all of our universe to simulate any of it, then at least the only ones of us that have to worry about being simulated in detail in preparation of an attack are AI/AI safety researchers :P.
Surely, the adversary convinces it this is a pig by convincing it that it has fur and no wings? I don’t have experience in how it works on the inside, but if the adversary can magically intervene on each neuron, changing its output by d by investing d² effort, then the proper strategy is to intervene on many features a little. Then if there are many layers, the penultimate layer containing such high level concepts as fur or wings would be almost as fooled as the output layer, and indeed I would expect the adversary to have more trouble fooling it on such low-level features as edges and dots.
Why do you think adversarial examples seem to behave this way? The pig equation seems equally compatible with fur or no fur recognized, wings or no wings. Indeed, it plausibly thinks the pig an airliner because it sees wings and no fur.
An instrumentally corrigible agent lets you correct it because it expects you know better than it. The smarter it becomes, the less your higher competence is worth, and the more it loses out by letting you take the wheel while you’re not perfectly aligned with it.
Presumably, you are asking because you want to calculate the worst-case disutility of the universe, in order to decide whether making sure that it doesn’t come about is more important than pretty much anything else.
I would say that this question cannot be properly answered through physical examination, because the meaning of such human words as suffering becomes too fuzzy in edge cases.
The proper approach to deciding on actions in the face of uncertainty of the utility function is utility aggregation. The only way I’ve found to not run into Pascal’s Wager problems, and the way that humans seem to naturally use, is to normalize each utility function before combining them.
So let’s say that we are 50⁄50 uncertain whether there is no state of existence worse than nonexistence, or we should cast aside all other concerns to avert hell. Then after normalization and combination, the exact details will depend on what method of aggregation we use (which should depend on the method we use to turn utility functions into decisions), but as far as I can see the utility function would come out to one that tells us to exert quite an effort to avert hell, but still care about other concerns.
I expect GPT-2 can do that. goes to talktotransformer.com GPT-2 can do neither scrambling nor unscrambling. Oh well. I still expect that if GPT can do unscrambling (as I silently assumed), it can do scrambling.
You can, actually. ln(5cm)=ln(5)+ln(cm), and since we measure distances, the ln(cm) cancels out. The same way, ln(-5)=ln(5)+ln(-1). ln(-1) happens to be pi*i, since e^(pi*i) is −1.
In that thought experiment, Euclidean distance doesn’t work because different dimensions have different units. To fix that, you could move to the log scale. Or is the transformation actually more complicated than multiplication?
Darn it, missed that comment. But how does Euclidean distance fail? I’m imagining the dimensions as the weights of a neural net, and e-coli optimization being used because we don’t have access to a gradient. The common metric I see that would have worse high-dimensional behavior is Manhattan distance. Is it that neighborhoods of low Manhattan distance tend to have more predictable/homogenous behavior than those of low Euclidean distance?
If instead of going one step in one of n directions, we go sqrt(1/n) forward or backward in each of the n directions (for a total step size of 1), we try an expected number of twice in order to get sqrt(1/n) progress, for a total effort factor of O(1/sqrt(n)). (O is the technical term for ~ ^^)
I’d like to see them using the model to generate the problem framing which produces the highest score on a given task.
Even if it’s just the natural language description of addition that comes before the addition task, it’d be interesting how it thinks addition should be explained. Does some latent space of sentences one could use for this fall out of the model for free?
More generally, a framing is a function turning data like [(2,5,7), (1,4,5), (1,2,_)] into text like “Add. 2+5=7, 1+4=5, 1+2=”, and what we want is a latent space over framings.
More generally, I expect that getting the full power of the model requires algorithms that apply the model multiple times. For example, what happens if you run the grammar correction task multiple times on the same text? Will it fix errors it missed the first time on the second try? If so, the real definition of framing should allow multiple applications like this. It would look like a neural net whose neurons manipulate text data instead of number data. Since it doesn’t use weights, we can’t train it, and instead we have to use a latent space over possible nets.
Note that greaterwrong.com already has this. (The eye icon on the bottom right.)
removing 30 neurons at random from the network barely moves the accuracy at all
I expect that after distillation, this robustness goes away? (“Perfection is achieved when there is nothing left to take away.”)
If, as far as he knew, winds are random, shouldn’t he still have turned around after half his supplies were gone, in case the winds randomly decide to starve him?
Expand? I don’t see how both could be disadvantaged by allocation-before-optimization.