China could definitely win a war with Taiwan now, but it might not go smoothly or cheaply. I imagine they want plans/preparations that meet a high standard for decisive victory. But if Taiwan formally declared independence, I think they would start a war immediately.
Joey KL
This is great! Might I suggest basis points for the numerator unit, since the quantities are so small. Then the unit can be “bips per billion“, with your reference quantity at 20bps/$B.
What’s an example of one account by which China currently thinks the rest of the world is low in quality?
What’s more, it may be that the competitive equilibrium of increasingly advanced AI never leads to a decisive strategic advantage over other countries trying to keep up. For example, countries with increasingly automated economies may still want to trade with less automated countries, which enriches those countries enough to stay somewhat relevant by automating at a delay. Or less automated economies might have a “catch up” advantage where they can copy the advancements of the most automated economies more easily than they could have developed them themselves. This is how I would describe the geopolitical effects of the industrial revolution: it was a major advantage for the countries first to industrialize, but not an overwhelming / decisive one.
This doesn’t seem right to me. Middle powers will only become irrelevant insofar as the ASI built by great powers isn’t aligned to the interests of middle powers. But it could be that it will be: it could be aligned to respect property rights, or human rights, sovereign rights, or values which the middle powers hold in common with the great powers; or it could be aligned to serve citizens / politicians of the great powers who are sympathetic to the middle power. And to the extent that the ASI won’t be by default, the middle powers can use their leverage to try to get it aligned with those things rather than to slow down / stop it entirely.
Have you read Ender’s Game?
It’s about a child prodigy being tricked into destroying an alien planet by having it presented to him as just a simulation.
Oh, I didn’t see your footnote. I didn’t know about the US Marshalls vs Marshall of the US Supreme Court distinction. That’s interesting and confusing!
The US Marshalls are charged with carrying out court orders but are actually part of the executive branch, which makes it even less plausible they would materially stand up to the president.
Huh, that seems totally wrong to me. This seems like about as straightforwardly a case of incorrigibility as I can imagine.
Yeah, I think would have just said “someone who works at an AI capabilities company”. (I’m not trying to make this a big deal or controversy, I just find myself very irritated by how they get attention by antagonizing AI safety people in ways that seem bad faith to me.)
Sure, I don’t think there’s much problem with engaging with the point on substance, but I would avoid publicizing their brand.
I think there should be some kind of discourse sanction on Mechanize (as in, broadly avoid engaging with their ideas and mentioning them) because I think they are intentionally operating in bad faith as a brand as a publicity stunt.
Finally, we’ve optimized the Long Horizon Software Development capability, from the famous METR graph “Don’t Optimize The Long Horizon Software Development Capability”
What I’m imagining is: we train AIs on a mix of environments that admit different levels of reward hacking. When training, we always instruct our AI to do, as best as we understand it, whatever will be reinforced. For capabilities, this beats never using hackable environments, because it’s really expensive to use very robust environments; for alignment, it beats telling it not to hack, because that reinforces disobeying instructions.
In the limit, this runs into problems where we have very limited information about what reward hacking opportunities are present in the training environments, so the only instruction we can be confident is consistent with the grader is “do whatever will receive a high score from the grader”, which will… underspecify… deployment behavior, to put it mildly.
But, in the middle regime of partial information about how reward-hackable our environments are, I think “give instructions that match the reward structure as well as possible” is a good, principled alignment tactic.
Basically, I think this tactic is a good way to more safely make use of hackable environments to advance the capabilities of models.
Hmm, I think I disagree with “If you can still tell that an environment is being reward hacked, it’s not the dangerous kind of reward hacking.” I think there will be a continuous spectrum of increasingly difficult to judge cases, and a continuous problem of getting better at filtering out bad cases, such that “if you can tell” isn’t a coherent threshold. I’d rather talk about “getting better at distinguishing” reward hacking.
I think we just have different implicit baselines here. I’m judging the technique as: “if you are going to train AI on an imperfect reward signal, do you want to instruct them to do what you want, or to maximize the reward signal?” and I think you clearly want the later for simple, elegant reasons. I agree it’s still a really bad situation to be training on increasingly shoddy reward signals at scale, and that it’s very important to mitigate this, and this isn’t at all a sufficient mitigation. I just think it’s a principled mitigation.
update: I flew Delta today and the wifi wasn’t very good. I think I misremembered it being better than it is.
I disagree entirely. I don’t think it’s janky or ad-hoc at all. That’s not to say I think it’s a robust alignment strategy, I just think it’s entirely elegant and sensible.
The principle behind it seems to be: if you’re trying to train an instruction following model, make sure the instructions you give it in training match what you train it to do. What is janky or ad hoc about that?
Huh, I have pretty good experiences using wifi flying Delta?
I disagree with “don’t believe what you can’t explain”. I think being successful where others have failed often requires executing on intuitions that you can’t easily justify. I think this should be encouraged, as long as you’re adequately internalizing the risk of failure. (In the sense of economic internalize, not psychological internalize.)
I’m somewhat skeptical of the importance of ultra high CRI. I don’t doubt that 80 vs 95 makes a difference, but I’m not sold on 95 vs 98, since it seems susceptible to placebo and is confounded in the wild by other lighting design factors. What convinced you?