I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.
Diffractor
So, here’s some considerations (not an actual policy)
It’s instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.
First, a chunk from WikipediaMost of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine article by U.S. anti-weapons activist Howard Morland in 1979 on the “secret of the hydrogen bomb”. In 1978, Morland had decided that discovering and exposing this “last remaining secret” would focus attention onto the arms race and allow citizens to feel empowered to question official statements on the importance of nuclear weapons and nuclear secrecy. Most of Morland’s ideas about how the weapon worked were compiled from highly accessible sources—the drawings which most inspired his approach came from the Encyclopedia Americana. Morland also interviewed (often informally) many former Los Alamos scientists (including Teller and Ulam, though neither gave him any useful information), and used a variety of interpersonal strategies to encourage informational responses from them (i.e., asking questions such as “Do they still use sparkplugs?” even if he wasn’t aware what the latter term specifically referred to)....
When an early draft of the article, to be published in The Progressive magazine, was sent to the DOE after falling into the hands of a professor who was opposed to Morland’s goal, the DOE requested that the article not be published, and pressed for a temporary injunction. After a short court hearing in which the DOE argued that Morland’s information was (1). likely derived from classified sources, (2). if not derived from classified sources, itself counted as “secret” information under the “born secret” clause of the 1954 Atomic Energy Act, and (3). dangerous and would encourage nuclear proliferation...Through a variety of more complicated circumstances, the DOE case began to wane, as it became clear that some of the data they were attempting to claim as “secret” had been published in a students’ encyclopedia a few years earlier....
Because the DOE sought to censor Morland’s work—one of the few times they violated their usual approach of not acknowledging “secret” material which had been released—it is interpreted as being at least partially correct, though to what degree it lacks information or has incorrect information is not known with any great confidence.
So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there is a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it’s more doable for AI stuff (and I don’t know how biosecurity has managed to stay so low-profile). This doesn’t mean you can just wander around giving the rough sketch of the insight, in math, it’s not too hard to reinvent things once you know what you’re looking for. But, AI math does have a huge advantage in this it’s a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn’t read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the exact right direction and even then you’ve got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)
Also, secrets can get out through really dumb channels. Putting important parts of the H-bomb structure in a student’s encyclopedia? Why would you do that? Well, probably because there’s a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn’t.
So, due to AI work being insight/math-based, security would be based a lot more on just… not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between “secret” and “well-known”. See all those Arxiv papers with like, 5 citations. So, for something that’s kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don’t think it’s a binary between “full secret” and “full publish”, there’s probably intermediate options available.
Of course, if it’s known that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it’s probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.
Anyways, I’m gonna set a 10-minute timer to have thoughts about the guidelines:
Ok, the first thought I’m having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.
The next thought is that working something out for a while and then going “oh, this is roughly adjacent to something I wouldn’t want to publish, when developed further” isn’t quite as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don’t know what you’re looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can’t befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result… well, you’d still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you’ve got a saving throw if someone runs across the thing.
There is a very strong argument for talking to several other people if you’re unsure whether it’d be good to publish/publicize, because it reduces the problem of “person with laxest safety standards publicizes” to “organization with the laxest safety standards publicizes”. This isn’t a full solution, because there’s still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of “secrecy standards” is very much needed. But within an organization, “have personal conversation with senior personnel” sounds like the obvious thing to do.
So, current thoughts: There’s some intermediate options available instead of just “full secret” or “full publish” (publish under pseudonym and don’t list it as research, publish as normal but don’t make efforts to advertise it broadly) and I haven’t seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I’d be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that would be critical on a policy level is “talk it out with a few senior people first to make the decision, instead of going straight for personal judgement”, as that tamps down on the coordination problem considerably.
There’s a difference between “consistency” (it is impossible to derive X and notX for any sentence X, this requires a halting oracle to test, because there’s always more proof paths), and “propositional consistency”, which merely requires that there are no contradictions discoverable by boolean algebra only. So A^B is propositionally inconsistent with notA, and propositionally consistent with A. If there’s some clever way to prove that B implies notA, it wouldn’t affect the propositional consistency of them at all. Propositional consistency of a set of sentences can be verified in exponential time.
I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a “selection functional”, and the type signature is called a “quantification functional”, and there’s several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature .
See if this works.
Agreed. The bargaining solution for the entire game can be very different from adding up the bargaining solutions for the subgames. If there’s a subgame where Alice cares very much about victory in that subgame (interior decorating choices) and Bob doesn’t care much, and another subgame where Bob cares very much about it (food choice) and Alice doesn’t care much, then the bargaining solution of the entire relationship game will end up being something like “Alice and Bob get some relative weights on how important their preferences are, and in all the subgames, the weighted sum of their utilities is maximized. Thus, Alice will be given Alice-favoring outcomes in the subgames where she cares the most about winning, and Bob will be given Bob-favoring outcomes in the subgames where he cares the most about winning”
And in particular, since it’s a sequential game, Alice can notice if Bob isn’t being fair, and enforce the bargaining solution by going “if you’re not aiming for something sorta like this, I’ll break off the relationship”. So, from Bob’s point of view, aiming for any outcome that’s too Bob-favoring has really low utility since Alice will inevitably catch on. (this is the time-extended version of “give up on achieving any outcome that drives the opponent below their BATNA”) Basically, in terms of raw utility, it’s still a bargaining game deep down, but once both sides take into account how the other will react, the payoff matrix for the restaurant game (taking the future interactions into account) will look like “it’s a really bad idea to aim for an outcome the other party would regard as unfair”
This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the “Nirvana trick” stated in this intro (to encode nonstandard decision-theory problems), has transitioned from “weird hack that happens to work” to “pops straight out when you make all the math as elegant as possible”. Accordingly, I’m working on a “living textbook” (like a textbook, but continually being updated with whatever cool new things we find) where I try to explain everything from scratch in the crispest way possible, to quickly catch up on the frontier of what we’re working on. That’s my current project.
I still do think that this is a large and tractable vein of research to work on, and the conclusion hasn’t changed much.
I think I have a contender for something which evades the conditional-threat issue stated at the end, as well as obvious variants and strengthenings of it, and which would be threat-resistant in a dramatically stronger sense than ROSE.
There’s still a lot of things to check about it that I haven’t done yet. And I’m unsure how to generalize to the n-player case. And it still feels unpleasantly hacky, according to my mathematical taste.
But the task at least feels possible, now.
EDIT: it turns out it was still susceptible to the conditional-threat issue, but then I thought for a while and came up with a different contender that feels a lot less hacky, and that provably evades the conditional-threat issue. Still lots of work left to be done on it, though.
Yes, pink is gas and purple is mass, but also the gas there makes up the dominant component of the visible mass in the Bullet Cluster, far outweighing the stars.
Also, physicists have come up with a whole lot of possible candidates for dark matter particles. The supersymmetry-based ones took a decent kicking at the LHC, and I’m unsure of the motivations for some of the other ones, but the two that look most promising (to me, others may differ in opinion) are axions and sterile neutrinos, as those were conjectured to plug holes in the Standard Model, so they’ve got a stronger physics motivation than the rest. But again, it might be something no physicist saw coming.
For axions, there’s something in particle physics called the strong CP problem, where there’s no theoretical reason whatsoever why strong-force interactions shouldn’t break CP symmetry. And yet, as far as we can tell, the CP-symmetry-breakingness of the strong-force interaction is precisely zero. Axions were postulated as a way to deal with this, and for certain mass ranges, they would work. They’d be extremely light particles.
And for sterile neutrinos, there’s a weird thing we’ve noticed where all the other quarks and leptons can have left-handed or right-handed chirality, but neutrinos only come in the left-handed form, nobody’s ever found a right-handed neutrino. Also, in the vanilla Standard Model, neutrinos are supposed to be massless. And as it turns out, if you introduce some right-handed neutrinos and do a bit of physics fiddling, something called the seesaw mechanism shows up, which has the two effects of making ordinary neutrinos very light (and they are indeed thousands of times lighter than any other elementary particle with mass), and the right-handed neutrinos very heavy (so it’s hard to make them at a particle accelerator). Also, since the weak interaction (the major way we know neutrinos are a thing) is sensitive to chirality, the right-handed neutrinos don’t really do much of anything besides have gravity and have slight interactions with neutrinos, with are already hard to detect. So that’s another possibility.
Yup, this turned out to be a crucial consideration that makes the whole project look a lot less worthwhile. If ventilation at a bad temperature is available, it’s cheaper to just get a heat exchanger and ventilate away and eat the increased heating costs during winter than to do a CO2 stripper.
There’s still a remaining use case for rooms without windows that aren’t amenable to just feeding an air duct outside, but that’s a lot more niche than my original expectations. Gonna edit the original post now.
You’re pretending that it’s what nature is doing what you update your prior. It works when sentences are shown to you in an adversarial order, but there’s the weird aspect that this prior expects the sentences to go back to being drawn from some fixed distribution afterwards. It doesn’t do a thing where it goes “ah, I’m seeing a bunch of blue blocks selectively revealed, even though I think there’s a bunch of red blocks, the next block I’ll have revealed will probably be blue”. Instead, it just sticks with its prior on red and blue blocks.
There’s a misconception, it isn’t about finding sentences of the form and , because if you do that, it immediately disproves . It’s actually about merely finding many instances of where has probability, and this lowers the probability of . This is kind of like how finding out about the Banach-Tarski paradox (something you assign low probability to) may lower your degree of belief in the axiom of choice.
The particular thing that prevents trolling is that in this distribution, there’s a fixed probability of drawing on the next round no matter how many implications and ’s you’ve found so far. So the way it evades trolling is a bit cheaty, in a certain sense, because it believes that the sequence of truth or falsity of math sentences that it sees is drawn from a certain fixed distribution, and doesn’t do anything like believing that it’s more likely to see a certain class of sentences come up soon.
Edited. Thanks for that. I guess I managed to miss both of those, I was mainly going off of the indispensable and extremely thorough Atomic Rockets site having extremely little discussion of intergalactic missions as opposed to interstellar missions.
It looks like there are some spots where me and Armstrong converged on the same strategy (using lasers to launch probes), but we seem to disagree about how big of a deal dust shielding is, how hard deceleration is, and what strategy to use for deceleration.
My preferred way of resolving it is treating the process of “arguing over which equilibrium to move to” as a bargaining game, and just find a ROSE point from that bargaining game. If there’s multiple ROSE points, well, fire up another round of bargaining. This repeated process should very rapidly have the disagreement points close in on the Pareto frontier, until everyone is just arguing over very tiny slices of utility.
This is imperfectly specified, though, because I’m not entirely sure what the disagreement points would be, because I’m not sure how the “don’t let foes get more than what you think is fair” strategy generalizes to >2 players. Maaaybe disagreement-point-invariance comes in clutch here? If everyone agrees that an outcome as bad or worse than their least-preferred ROSE point would happen if they disagreed, then disagreement-point-invariance should come in to have everyone agree that it doesn’t really matter exactly where that disagreement point is.
Or maybe there’s some nice principled property that some equilibria have, which others don’t, that lets us winnow down the field of equilibria somewhat. Maybe that could happen.
I’m still pretty unsure, but “iterate the bargaining process to argue over which equilibria to go to, you don’t get an infinite regress because you rapidly home in on the Pareto frontier with each extra round you add” is my best bad idea for it.
EDIT: John Harsanyi had the same idea. He apparently had some example where there were multiple CoCo equilibria and his suggestion was that a second round of bargaining could be initiated over which equilibria to pick, but that in general, it’d be so hard to compute the n-person Pareto frontier for large n, that an equilibria might be stable because nobody can find a different equilibria nearby to aim for.
So this problem isn’t unique to ROSE points in full generality (CoCo equilibria have the exact same issue), it’s just that ROSE is the only one that produces multiple solutions for bargaining games, while CoCo only returns a single solution for bargaining games. (bargaining games are a subset of games in general)
Yeah, “transferrable utility games” are those where there is a resource, and the utilities of all players are linear in that resource (in order to redenominate everyone’s utilities as being denominated in that resource modulo a shift factor). I believe the post mentioned this.
In the proof of Lemma 3, it should be
“Finally, since , we have that .Thus, and are both equal to .
instead.
I’d be extremely interested in the quantitative analysis you’ve done so far.
For 1, the mental model for non-relativistic but high speeds should be “a shallow crater is instantaneously vaporized out of the material going fast” and for relativistic speeds, it should be the same thing but with the vaporization directed in a deeper hole (energy doesn’t spread out as much, it keeps in a narrow cone) instead of in all directions. However, your idea of having a spacecraft as a big flat sheet and being able to tolerate having a bunch of holes being shot in it is promising. The main issue that I see is that this approach is incompatible with a lot of things that (as far as we know) can only be done with solid chunks of matter, like antimatter energy capture, or having sideways boosting-rockets, and once you start armoring the solid chunks in the floaty sail, you’re sort of back in the same situation. So it seems like an interesting approach and it’d be cool if it could work but I’m not quite sure it can (not entirely confident that it couldn’t, just that it would require a bunch of weird solutions to stuff like “how does your sheet of tissue boost sideways at 0.1% of lightspeed”.
For 2, the problem is that the particles which are highly penetrating are either unstable (muons, kaons, neutrons...) and will fall apart well before arrival (and that’s completely dodging the issue of making bulk matter out of them), or they are stable (neutrinos, dark matter), and don’t interact with anything, and since they don’t really interact with anything, this means they especially don’t interact with themselves (well, at least we know this for neutrinos), so they can’t hold together any structure, nor can they interact with matter at the destination. Making a craft out of neutrinos is ridiculously more difficult than making a craft out of room-temperature air. If they can go through a light-year of lead without issue, they aren’t exactly going to stick to each other. Heck, I think you’d actually have better luck trying to make a spaceship out of pure light.
For 3, it’s because in order to use ricocheting mass to power your starcraft, you need to already have some way of ramping the mass up to relativistic speeds so it can get to the rapidly retreating starcraft in the first place, and you need an awful lot of mass. Light already starts off at the most relativistic speed of all, and around a star you already have astronomical amounts of light available for free.
For 4, there sort of is, but mostly not. The gravity example has the problem of the speeding up of the craft when it has the two stars ahead of it perfectly counterbalancing the backwards deceleration when the two stars are behind it. For potentials like gravity or electrical fields or pretty much anything you’d want to use, there’s an inverse-square law for them, which means that they aren’t really relevant unless you’re fairly close to a star. The one instance I can think of where something like your approach is the case is the electric sail design in the final part. In interstellar space, it brakes against the thin soup of protons as usual, but nearby a star, the “wind” of particles streaming out from the star acts as a more effective brake and it can sail on that (going out), or use it for better deceleration (coming in). Think of it as a sail slowing a boat down when the air is stationary, and slowing down even better when the wind is blowing against you.
Task completed.
If there’s something wrong with some theory, isn’t it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is? If there was some out-of-left-field thing, I’d expect it to have confusing manifestations in many different areas and astronomers angsting about dramatically inconsistent measurements, I would not expect the CMB to end up explained away (and the error bars on those measurements are really really small) by the same 5:1 mix of non-baryonic matter vs baryonic matter the astronomers were postulating for everything else.
In other words, if you were starting out blind, the “something else will be found for a theory” bucket would not start out with most of its probability mass on “and in every respect, including the data that hasn’t come in yet since it’s the 1980′s now, it’s gonna look exactly like the invisible mass scenario”. It’s certainly not ruled out, but it has taken a bit of a beating.
Also, physics is not obligated to make things easy to find. Like how making a particle accelerator capable of reaching the GUT scale to test Grand Unified Theories takes a particle accelerator the size of a solar system.