Yeah, that’s my view.
To the extent that we instincitively believe or disbelieve this, it’s not for the right reasons—natural selection didn’t have any evidence to go on. At most, that instinct is a useful workaround for the existential dread glitch.
To the extent that we believe this correctly, it’s for the same reasons that we are able to do math and philosophy correctly (or at least more correctly than chance :) despite natural selection not caring about it much. It’s the same reason that you can correctly make arguments like the one in your comment.
I think that under the counting measure, the vast majority of people like us are in simulations (ignoring subtleties with infinities that make that statement meaningless).
I think that under a more realistic measure, it’s unclear whether or not most people like us are in simulations.
Those statements are unrelated to what I was getting at in the post though, which is more like: the simulation argument rests on us being the kind of people who are likely to be simulated, we don’t think that everyone should believe they are in a simulation because the simulators are more likely to simulate realistic-looking worlds than reality is to produce realistic-looking worlds, that seems absurd.
The whole thing is kind of a complicated mess and I wanted to skip it by brushing aside the simulation argument. Maybe should have just not mentioned it at all given that the simulation argument makes such a mess of it. I don’t expect to be able to get clarity in this thread either :)
I think the reason why the hypothesis that the world is a dream seems absurd has very little to do with likelihood ratios and everything to do with heuristics like “don’t trust things that sound like what a crazy person, drug-addled person, or mystic would say.”
It’s not the hypothesis that’s absurd, it’s this particular argument.
That’s right—you still only get a bound on average quality, and you need to do something to cope with failures so rare they never appear in training (here’s a post reviewing my best guesses).
But before you weren’t even in the game, it wouldn’t matter how well adversarial training worked because you didn’t even have the knowledge to tell whether a given behavior is good or bad. You weren’t even getting the right behavior on average.
(In the OP I think the claim “the generalization is now coming entirely from human beliefs” is fine, I meant generalization from one distribution to another. “Neural nets are are fine” was sweeping these issues under the rug. Though note that in the real world the distribution will change from neural net training to deployment, it’s just exactly the normal robustness problem. The point of this post is just to get it down to only a robustness problem that you could solve with some kind of generalization of adversarial training, the reason to set it up as in the OP was to make the issue more clear.)
So even when you talk about amplifying f, you mean a certain way of extending human predictions to more complicated background information (e.g. via breaking down Z into chunks and then using copies of f that have been trained on smaller Z), not fine-tuning f to make better predictions.
That’s right, f is either imitating a human, or it’s trained by iterated amplification / debate—in any case the loss function is defined by the human. In no case is f optimized to make good predictions about the underlying data.
My impression is that your hope is that if Z and f start out human-like, then this is like specifying the “programming language” of a universal prior, so that search for highly-predictive Z, decoded through f, will give something that uses human concepts in predicting the world.
Z should always be a human-readable (or amplified-human-readable) latent; it will necessarily remain human-readable because it has no purpose other than to help a human make predictions. f is going to remain human-like because it’s predicting what the human would say (or what the human-consulting-f would say etc.).
The amplified human is like the programming language of the universal prior, Z is like the program that is chosen (or slightly more precisely: Z is like a distribution over programs, described in a human-comprehensible way) and f is an efficient distillation of the intractable ideal.
I’m not totally sure what actually distinguishes f and Z, especially once you start jointly optimizing them. If f incorporates background knowledge about the world, it can do better at prediction tasks. Normally we imagine f having many more parameters than Z, and so being more likely to squirrel away extra facts, but if Z is large then we might imagine it containing computationally interesting artifacts like patterns that are designed to train a trainable f on background knowledge in a way that doesn’t look much like human-written text.
f is just predicting P(y|x, Z), it’s not trying to model D. So you don’t gain anything by putting facts about the data distribution in f—you have to put them in Z so that it changes P(y|x,Z).
Now, maybe you can try to ensure that Z is at least somewhat textlike via making sure it’s not too easy for a discriminator to tell from human text, or requiring it to play some functional role in a pure text generator, or whatever.
The only thing Z does is get handed to the human for computing P(y|x,Z).
The difference is that you can draw as many samples as you want from D* and they are all iid. Neural nets are fine in that regime.
It seems even worse than any of that. If your AI wanted anything at all it might debate well in order to survive. So if you are banking on it single-mindedly wanting to win the debate then you were already in deep trouble.
I don’t understand how the second wave can’t be explained by increase in testing. Before only people who were sick were allowed to be tested, who correlate more with hospital visits, which correlates more with deaths, so it more closely follows the death graph.
US positive test rate is up from 4.4% to 7.4%: https://coronavirus.jhu.edu/testing/individual-states
It used to be the case that 4.4% of people you tested had COVID-19.
Now you test more people, who look less risky on average, and find that 7.4% of people you test have COVID-19. The people you would have tested in the old days are the riskiest subgroup, so more than 7.4% of them have COVID-19.
So it sure seems like the infection rate went up by at least (7.4/4.4) = +70%.
My impression is that most individual investors and pension funds put a significant part of their portfolio into bonds.
I’d love to get evidence on that and it seems important.
Your position doesn’t sound right to me. You don’t need many people changing their allocations moderately to totally swamp a 1% change in inflows.
My guess would be that more than 10% of investors, weighted by total equity holdings, adjust their exposure deliberately, but I’d love to know the real numbers.
Do you think something like IDA is the only plausible approach to alignment? If so, I hadn’t realized that, and I’d be curious to hear more arguments, or just intuitions are fine. The aligned overseer you describe is supposed to make treachery impossible by recognizing it, so it seems your concern is equivalent to the concern: “any agent (we make) that learns to act will be treacherous if treachery is possible.” Are all learning agents fundamentally out to get you? I suppose that’s a live possibility to me, but it seems to me there is a possibility we could design an agent that is not inclined to treachery, even if the treachery wouldn’t be recognized
No, but what are the approaches to avoiding deceptive alignment that don’t go through competitiveness?
I guess the obvious one is “don’t use ML,” and I agree that doesn’t require competitiveness.
Edit: even so, having two internal components that are competitive with each other (e.g. overseer and overseee) does not require competitiveness with other projects.
No, but now we are starting to play the game of throttling the overseee (to avoid it overpowering the overseer) and it’s not clear how this is going to work and be stable. It currently seems like the only appealing approach to getting stability there is to ensure the overseer is competitive.
This argument seems to prove too much. Are you saying that if society has learned how to do artificial induction at a superhuman level, then by the time we give a safe planner that induction subroutine, someone will have already given that induction routine to an unsafe planner? If so, what hope is there as prediction algorithms relentlessly improve? In my view, the whole point of AGI Safety research is to try to come up with ways to use powerful-enough-to-kill-you artificial induction in a way that it doesn’t kill you (and helps you achieve your other goals). But it seems you’re saying that there is a certain level of ingenuity where malicious agents will probably act with that level of ingenuity before benign agents do.
I’m saying that if you can’t protect yourself from an AI in your lab, under conditions that you carefully control, you probably couldn’t protect yourself from AI systems out there in the world.
The hope is that you can protect yourself from an AI in your lab.
So competitiveness still matters somewhat, but here’s a potential disagreement we might have: I think we will probably have at least a few months, and maybe more than a year, where the top one or two teams have AGI (powerful enough to kill everyone if let loose), and nobody else has anything more valuable than an Amazon Mechanical Turk worker.
Definitely a disagreement, I think that before anyone has an AGI that could beat humans in a fistfight, tons of people will have systems much much more valuable than a mechanical turk worker.
The way I map these concepts, this feels like an elision to me. I understand what you’re saying, but I would like to have a term for “this AI isn’t trying to kill me”, and I think “safe” is a good one. That’s the relevant sense of “safe” when I say “if it’s safe, we can try it out and tinker”. So maybe we can recruit another word to describe an AI that is both safe itself and able to protect us from other agents.
I mean that we don’t have any process that looks like debate that could produce an agent that wasn’t trying to kill you without being competitive, because debate relies on using aligned agents to guide the training process (and if they aren’t competitive then the agent-being-trained will, at least in the limit, converge to an equilibrium where it kills you).
The main reason I’m personally confused is that 2 months ago I thought there was real uncertainty about whether we’d be able to keep the pandemic under control. Over the last 2 months that uncertainty has gradually been resolved in the negative, without much positive news about people’s willingness to throw in the towel rather than continuing to panic and do lockdowns, and yet over that period SPY has continued moving up.
I’m making no attempt at all to estimate prices based on fundamentals and I’m honestly not even sure how that exercise is supposed to work. Interest rates are very low and volatility isn’t that high so it seems like you would have extremely high equity prices if e.g. most investors were rational with plausible utility functions. But equity prices are never nearly as high as that kind of analysis would suggest.
I think people’s annual income is on average <20% of their net worth ($100T vs $20T), maybe more like 15%.
So 2 months of +20% savings amounts to <1% increase in total savings, right?
If that’s right, this doesn’t seem very important relative to small changes in people’s average allocation between equities/debt/currency, which fluctuate by 10%s during the normal business cycle.
Normally I expect higher savings rates to represent concern about having money in the future, which will be accompanied by a move to safer assets. And of course volatility is currently way up, so rational investors probably couldn’t afford to invest nearly as much in stocks unless they were being compensated with significantly higher returns (which should involve prices only returning to normal levels as volatility falls).
So what if AI Debate survives this concern? That is, suppose we can reliably find a horizon-length for which running AI Debate is not existentially dangerous. One worry I’ve heard raised is that human judges will be unable to effectively judge arguments way above their level. My reaction is to this is that I don’t know, but it’s not an existential failure mode, so we could try it out and tinker with evaluation protocols until it works, or until we give up. If we can run AI Debate without incurring an existential risk, I don’t see why it’s important to resolve questions like this in advance.
There are two reasons to worry about this:
The purpose of research now is to understand the landscape of plausible alignment approaches, and from that perspective viability is as important as safety.
I think it is unlikely for a scheme like debate to be safe without being approximately competitive—the goal is to get honest answers which are competitive with a potential malicious agent, and then use those answers to ensure that malicious agent can’t cause trouble and that the overall system can be stable to malicious perturbations. If your honest answers aren’t competitive, then you can’t do that and your situation isn’t qualitatively different from a human trying to directly supervise a much smarter AI.
In practice I doubt the second consideration matters—if your AI could easily kill you in order to win a debate, probably someone else’s AI has already killed you to take your money (and long before that your society totally fell apart). That is, safety separate from competitiveness mostly matters in scenarios where you have very large leads / very rapid takeoffs.
Even if you were the only AI project on earth, I think competitiveness is the main thing responsible for internal regulation and stability. For example, it seems to me you need competitiveness for any of the plausible approaches for avoiding deceptive alignment (since they require having an aligned overseer who can understand what a treacherous agent is doing). More generally, trying to maintain a totally sanitized internal environment seems a lot harder than trying to maintain a competitive internal environment where misaligned agents won’t be at a competitive advantage.