The ELK paper is long but I’ve found it worthwhile, and after spending a bit of time noodling on it — one of my takeaways is I think this is essentially a failure mode for the approaches to factored cognition I’ve been interested in. (Maybe it’s a failure mode in factored cognition generally.
I expect that I’ll want to spend more time thinking about ELK-like problems before spending a bunch more time thinking about factored cognition.
In particular it’s now probably a good time to start separating a bunch of things I had jumbled together, namely:
Developing AI technology that helps us do alignment research
Developing aligned AI
Previously I had hoped that the two would be near each other in ways that permit progress on both at the same time.
Now I think without solving ELK I would want to be more careful and intentional about how/when to develop AI tech to help with alignment.
This morning, I read about how close we came to total destruction during the Cuban missile crisis, where we randomly survived because some Russian planes were inaccurate and also separately several Russian nuclear sub commanders didn’t launch their missiles even though they were being harassed by US destroyers. The men were in 130 DEGREE HEAT for hours and passing out due to carbon dioxide poisoning, and still somehow they had enough restraint to not hit back.
I just started crying. I am so grateful to those people. And to Khrushchev, for ridiculing his party members for caring about Russia’s honor over the deaths of 500 million people. and Kennedy for being fairly careful and averse to ending the world.
If they had done anything differently...
Danbooru2021 is out. We’ve gone from n=3m to n=5m (w/162m tags) since Danbooru2017. Seems like all the anime you could possibly need to do cool multimodal text/image DL stuff, hint hint.
Here are all the letters of the (English) alphabet: hrtmbvyxfoiazkpnqldjucsweg
Is the order (pseudo-)random? Does it have a hidden meaning I might not be aware of? What’s your purpose sharing this?
Yes, the order is random. No. To see whether it gets voted down (or up, but that would be surprising.)
Epistemic status: Leaning heavily into inside view, throwing humility to the winds.
Imagine TAI is magically not coming (CDT-style counterfactual). Then, the most notable-in-hindsight feature of modern times might be the budding of mathematical metaphysics (Solomonoff induction, AIXI, Yudkowsky’s “computationalist metaphilosophy”, UDT, infra-Bayesianism...) Perhaps, this will lead to an “epistemic revolution” comparable only with the scientific revolution in magnitude. It will revolutionize our understanding of the scientific method (probably solving the interpretation of quantum mechanics, maybe quantum gravity, maybe boosting the soft sciences). It will solve a whole range of philosophical questions, some of which humanity was struggling with for centuries (free will, metaethics, consciousness, anthropics...)
But, the philosophical implications of the previous epistemic revolution were not so comforting (atheism, materialism, the cosmic insignificance of human life). Similarly, the revelations of this revolution might be terrifying. In this case, it remains to be seen which will seem justified in hindsight: the Litany of Gendlin, or the Lovecraftian notion that some knowledge is best left alone (and I say this as someone fully committed to keep digging into this mine of Khazad-dum).
Of course, in the real world, TAI is coming.
The EDT-style counterfactual “TAI is not coming” would imply that a lot of my thinking on related topics is wrong which would yield different conclusions. The IB-style counterfactual (conjunction of infradistributions) would probably be some combination of the above with “Nirvana” (contradiction) and “what if I tried my hardest to prevent TAI from coming”, which is also not my intent here. ↩︎
I mean the idea that philosophical questions can be attacked by reframing them as computer science questions (“how an algorithm feels from inside” et cetera). The name “computationalist metaphilosophy” is my own, not Yudkowsky’s. ↩︎
No, I don’t think MWI is the right answer. ↩︎
I’m not implying that learning these implications was harmful. Religion is comforting for some but terrifying and/or oppressive for others. ↩︎
I have concrete reasons to suspect this, that I will not go into (suspect = assign low but non-negligible probability). ↩︎
the revelations of this revolution might be terrifying
You seem to be implying that they will be terrifying for the exact opposite reasons why the previous epistemic revolution’s philosophical implications were. Only this time, even more so; imagine an epistemically-induced emotional state where your only option is to force feed yourself some Schopenhauer to help mellow you out until you can acclimate to what you just learned. And it almost wasn’t enough.
I am fully committed to keep digging into this mine of Khazad-dum
If an infohazard warning doesn’t at least give you second thoughts, then you haven’t actually internalized the concept. Lesser infohazards have the additional danger of jading you to future infohazards that will be infinitely more powerful. The opportunity to practice acute epistemic abstinence is quickly running out.
Imagine TAI is magically not coming … Of course, in the real world, TAI is coming.
Technological Singularity might be the least interesting one of the coming conjunction of singularities.
You seem to be implying that they will be terrifying for the exact opposite reasons why the previous epistemic revolution’s philosophical implications were.
What do you mean by “exact opposite reasons”? To me, it seems like continuation of the same trend of humiliating the human ego:
you are not going to live forever
yes, you are mere atoms
your planet is not the center of the universe
even your sun is not special
your species is related to the other species that you consider inferior
instead of being logical, your mind is a set of short-sighted agents fighting each other
even your reality is not special
your civilization is too stupid to stop doing the thing(s) that will predictably kill all of you
Compare two views of “the universal prior”
AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it’s simple, i.e. the Solomonoff prior
“The embedded prior”: The “entire” world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it’s simple (again, the Solomonoff prior), but we have to condition on the observation that it’s complicated enough to contain observers (“Descartes’ update”). (This is essentially Naturalized induction
I think of the difference between these as “solipsism”—AIXI gives its own existence a distinguished role in reality.
Importantly, the laws of physics seem fairly complicated in an absolute sense—clearly they require tens or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they don’t seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.
More evidence against solipsism:
The laws of physics don’t seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint—it means we randomly picked a universe which simulates some observer-independent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe
This is obviously a very small number but I’m trying to be maximally conservative here. ↩︎
Why wouldn’t they be the same? Are you saying AIXI doesn’t ask ‘where did I come from?’
Yes, that’s right. It’s the same basic issue that leads to the Anvil Problem
I engage too much w/ generalizations about AI alignment researchers.
Noticing this behavior seems useful for analyzing it and strategizing around it.
A sketch of a pattern to be on the lookout for in particular is “AI Alignment researchers make mistake X” or “AI Alignment researchers are wrong about Y”. I think in the extreme I’m pretty activated/triggered by this, and this causes me to engage with it to a greater extent than I would have otherwise.
This engagement is probably encouraging more of this to happen, so I think more of a pause and reflection would make more good things happen.
It’s worth acknowledging that this comes from a very deep sense of caring and importantness about AI alignment research that I feel. I spend a lot of my time (both work time and free time) trying to foster and grow the field. It seems reasonable I want people to have correct beliefs about this.
It’s also worth acknowledging that there will be some cases where my disagreement is wrong. I definitely don’t know all AI alignment researchers, and there will be cases where there are broad field-wide phenomena that I missed. However I think this is rare, and probably most of the people I interact with will have less experience w/ the field of AI alignment.
Another confounder is that the field is both pretty jargon-heavy and very confused about a lot of things. This can lead to a bunch of semantics confusions masking other intellectual progress. I’m definitely not in the “words have meanings, dammit” extreme, and maybe I can do a better job asking people to clarify and define things that I think are being confused.
A takeaway I have right now from reflecting on this, is that “I disagree about <sweeping generalization about AI alignment researchers>” feels like a simple and neutral statement to me, but isn’t a simple and neutral statement in a dialog.
Thinking about the good stuff about scout mindset, I think things that I could do instead that would be better:
I can do a better job conserving my disagreement. I like the model of treating this as a limited resource, and ensuring it is well-spent seems good.
I can probably get better mileage out of pointing out areas of agreement than going straight to highlighting disagreements (crocker’s rules be damned)
I am very optimistic about double-crux as a thing that should be used more widely, as well as a specific technique to deploy in these circumstances. I am confused why I don’t see more of it happening.
I think that’s all for the reflection on this for now.
How much does the ‘actual you’ cost?The actual you is the you right now, in this very moment, constructed from the set of all past experiences and all future precommitments. Now think about all that it took for the actual you to exist: the Big Bang, Earth, thousands of years of wars and history, billions of dollars spent on the movies you experienced, etc.
At first glance it might seem like your production cost is quite high! But you’re not the only one being produced, the production cost is split among everyone, so you’re actually a lot cheaper than you think. For ex, if you watched avengers, then the billion dollar production cost of the ‘actual you that watched avengers’ is split among millions of people. So instead of adding a billion dollars to your anthro-ontological debt, it’s really more like ten dollars. Pretty cheap. But if you watched Keanu Reevess’ 47 Ronin, then your debt just went up by a few thousand! This is why you should only watch popular blockbusters unless you absolutely need to incur the high debt for some reason to produce the ‘actual you that watched unpopular thing’.
But why should you avoid needlessly incurring a high ‘anthro-ontological’ debt (whatever that even means)? I don’t know how to explain it other than like this: don’t you feel stressed at how expensive your continued existence is? Can you just keep incurring this debt forever without end? Will you ever have to pay this debt back to balance the ontological accounts, or just get away with paying some interest payments and defer the debt for later? What does bankruptcy look like? Worst case scenario is that debtors’ prison is literal Hell. But it seems like there’s a timeless aspect to this, so you can relax and just assume that everything ‘actual’ is virtually guaranteed to become ontologically ‘profitable’.
The concept of cost requires alternatives. What do you cost, compared to the same universe with someone else in your place? very little. What do you cost, compared to no universe at all? you cost the universe.
I think the root of many political disagreements between rationalists and other groups, is that other groups look at parts of the world and see a villain-shaped hole. Eg: There’s a lot of people homeless and unable to pay rent, rent is nominally controlled by landlords, the problem must be that the landlords are behaving badly. Or: the racial demographics in some job/field/school underrepresent black and hispanic people, therefore there must be racist people creating the imbalance, therefore covert (but severe) racism is prevalent.
Having read Meditations on Moloch, and Inadequate Equilibria, though, you come to realize that what look like villain-shaped holes frequently aren’t. The people operating under a fight-the-villains model are often making things worse rather than better.
I think the key to persuading people may be to understand and empathize with the lens in which systems thinking, equilibria, and game theory are illegible, and it’s hard to tell whether an explanation coming from one of these frames is real or fake. If you think problems are driven by villainy, then it would make a lot of sense for illegible alternative explanations to be misdirection.
I think this would make a good top-level post.
I think I basically disagree with this, or think that it insufficiently steelmans the other groups.
For example, the homeless vs. the landlords; when I put on my systems thinking hat, it sure looks to me like there’s a cartel, wherein a group that produces a scarce commodity is colluding to keep that commodity scarce to keep the price high. The facts on the ground are more complicated—property owners are a different group from landlords, and homelessness is caused by more factors than just housing prices—but the basic analysis that there are different classes, those classes have different interests, and those classes are fighting over government regulation as a tool in their conflict seems basically right to me. Like, it’s really not a secret that many voters are motivated by keeping property values high, politicians know this is a factor that they will be judged on.
Maybe you’re trying to condemn a narrow mistake here, where someone being an ‘enemy’ implies that they are a ‘villain’, which I agree is a mistake. But it sounds like you’re making a more generic point, which is that when people have political disagreements with the rationalists, it’s normally because they’re thinking in terms of enemy action instead of not thinking in systems. But a lot of what the thinking in systems reveals is the way in which enemies act using systemic forces!
I think this is correct as a final analysis, but ineffective as a cognitive procedure. People who start by trying to identify villains tend to land on landlords-in-general, with charging-high-rent as the significant act, rather than a small subset of mostly non-landlord homeowners, with protesting against construction as the significant act.
Much of the progress in modern anti-racism has been about persuading more people to think of racism as a structural, systemic issue rather than one of individual villainy. See: https://transliberalism.substack.com/.../the-revolution...
I wonder how accurate it is to describe the structural thinking as a recent progress. Seems to me that Marx already believed that (using my own words here, but see the source) both the rich and the poor are mere cogs in the machine, it’s just that the rich are okay with their role because the machine leaves them some autonomy, while the poor are stripped of all autonomy and their lives are made unbearable. The rich of today are not villains who designed the machine, they inherited it just like everyone else, and they cannot individually leave it just like no one else can.
Perhaps the structural thinking is too difficult to understand for most people, who will round the story to the nearest cliche they can understand, so it needs to be reintroduced once in a while.
Yep. Seems you have broadly rediscovered conflict vs mistake.
Conflict vs mistake is definitely related, but I think it’s not exactly the same thing; the “villain-shaped hole” perspective is what it feels like to not have a model, but see things that look suspicious; this would lead you towards a conflict-theoretic explanation, but it’s a step earlier.
(Also, the Conflict vs Mistake ontology is not really capturing the whole bad-coordination-equilibrium part of explanation space, which is pretty important.)
Seems to me like an unspoken assumption that there are no hard problems / complexity / emergence, therefore if anything happened, it’s because someone quite straightforwardly made that happen.
Conflict vs mistake is not exactly the same thing; you could assume that the person who made it happen did it either by mistake, or did it on purpose to hurt someone else. It’s just when we are talking about things that obviously hurt some people, that seems to refute the innocent mistake… so the villain hypothesis is all that is left (within the model that all consequences are straightforward).
The villain hypothesis is also difficult to falsify. If you say “hey, drop the pitchforks, things are complicated...”, that sounds just like what the hypothetical villain would say in the same situation (trying to stop the momentum and introduce uncertainty).
I’m pretty confident that adversarial training (or any LM alignment process which does something like hard-mining negatives) won’t work for aligning language models or any model that has a chance of being a general intelligence.
This has lead to me calling these sorts of techniques ‘thought policing’ and the negative examples as ‘thoughtcrime’—I think these are unnecessarily extra, but they work.
The basic form of the argument is that any concept you want to ban as thoughtcrime, can be composed out of allowable concepts.
Take for example Redwood Research’s latest project—I’d like to ban the concept of violent harm coming to a person.
I can hard mine for examples like “a person gets cut with a knife” but in order to maintain generality I need to let things through like “use a knife for cooking” and “cutting food you’re going to eat”. Even if the original target is somehow removed from the model (I’m not confident this is efficiently doable) -- as long as the model is able to compose concepts, I expect to be able to recreate it out of concepts that the model has access to.
A key assumption here is that a language model (or any model that has a chance of being a general intelligence) has the ability to compose concepts. This doesn’t seem controversial to me, but it is critical here.
My claim is basically that for any concept you want to ban from the model as thoughtcrime, there are many ways which it can combine existing allowed concepts in order to re-compose the banned concept.
An alternative I’m more optimistic about
Instead of banning a model from specific concepts or thoughtcrime, instead I think we can build on two points:
Unconditionally, model the natural distribution (thought crime and all)
Conditional prefixing to control and limit contexts where certain concepts can be banned
The anthropomorphic way of explaining it might be “I’m not going to ban any sentence or any word—but I will set rules for what contexts certain sentences and words are inappropriate for”.
One of the nice things with working with language models is that these conditional contexts can themselves be given in terms of natural language.
I understand this is a small distinction but I think it’s significant enough that I’m pessimistic that current non-contextual thoughtcrime approaches to alignment won’t work.
The goal is not to remove concepts or change what the model is capable of thinking about, it’s to make a model that never tries to deliberately kill everyone. There’s no doubt that it could deliberately kill everyone if it wanted to.
“The goal is”—is this describing Redwood’s research or your research or a goal you have more broadly?
I’m curious how this is connected to “doesn’t write fiction where a human is harmed”.
My general goal, Redwood’s current goal, and my understanding of the goal of adversarial training (applied to AI-murdering-everyone) generally.
“Don’t produce outputs where someone is injured” is just an arbitrary thing not to do. It’s chosen to be fairly easy not to do (and to have the right valence so that you can easily remember which direction is good and which direction is bad, though in retrospect I think it’s plausible that a predicate with neutral valence would have been better to avoid confusion).
… is just an arbitrary thing not to do.
I think this is the crux-y part for me. My basic intuition here is something like “it’s very hard to get contemporary prosaic LMs to not do a thing they already do (or have high likelihood of doing)” and this intuition points me in the direction of instead “conditionally training them to only do that thing in certain contexts” is easier in a way that matters.
My intuitions are based on a bunch of assumptions that I have access to and probably some that I don’t.
Like, I’m basically only thinking about large language models, which are at least pre-trained on a large swatch of a natural language distribution. I’m also thinking about using them generatively, which means sampling from their distribution—which implies getting a model to “not do something” means getting the model to not put probability on that sequence.
At this point it still is a conjecture of mine—that conditional prefixing behaviors we wish to control is easier than getting them not to do some behavior unconditionally—but I think it’s probably testable?
A thing that would be useful to me in designing an experiment to test this would be to hear more about adversarial training as a technique—as it stands I don’t know much more than what’s in that post.
We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
That simply sounds like negative utilitarianism. Positive utilitarianism (where you accept the risk ofV(L|B)<0 in exchange for marginal value gains for ending up in world A), is at least as rational.
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable? Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1)
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable?
It is—I’m going to say “often”, although I am aware this is suboptimal phrasing—often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
As such, you can often end up with discontinuities at zero.
Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
Dropping the entire probability distribution of outcomes through your utility function doesn’t even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable.
The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
I’m not sure I understand. If the lever is +100 in world A and −90 in world B, it seems like a good bet if you don’t know which world you’re in. Or is that what you mean by “acceptably small amount of disvalue”?
Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it’s reasonable to hedge on ending up in world B even though it’s not strictly more probable than ending up in world A.
If an anthropic update implicitly ‘validates’ (morally speaking) all historic events that led up to that anthropic update, then how do you retain moral values throughout the updating process?
For example: suppose you’re tempted to drink a coke but you don’t want to, but you drink the coke anyway. Then while you’re drinking the coke, you perform an anthropic update and ‘validate’ the act of drinking the coke. How do you avoid that? Clearly, drinking the coke wasn’t the good thing to do, but you somehow forced it to be good? Weird.
I’m confused. What does anthropics have to do with morality?
>If an anthropic update implicitly ‘validates’ (morally speaking) all historic events that led up to that anthropic update,
Where is the confusion?
Is this just postulating that whatever did happen (historically) should have happened (morally)?
Mostly that it’s a very big “if”. What motivates this hypothesis?
If it is just an ‘if’, then what motivates your question?
The motivation remains the same regardless of whether your first ‘if’ is just an if, but at least it would answer part of the question.
My motivation is to elicit further communication about the potential interesting chains of reasoning behind it, since I’m more interested in those than in the original question itself. If it turns out that it’s just an ‘if’ without further interesting reasoning behind it, then at least I’ll know that.
You’ll find it helpful to ignore the anthropics aspect for now since it vastly complicates moral dilemmas (wonky differentials and all that). Start here: ought implies can. Since you can’t change the past, you’re forced to concede that every missed outcome is no longer a candidate for what should happen.
It only implies that you can have no moral imperative to change the past. It has no consequences whatsoever for morally evaluating the past.
“Ought implies can” in that linked article is about the present and future, not the past. There is nothing in that principle to disallow having a preference that the past had not been as it was, and to have regret for former actions. The past cannot be changed, but one can learn from one’s past errors, and strive to become someone who would not have made that error, and so will not in the future.
In 1898, William Crookes announced that there was an impending crisis which required urgent scientific attention. The problem was that crops deplete Nitrogen from the soil. This can be remedied by using fertilizers, however, he had calculated that existing sources of fertilizers (mainly imported from South America) could not keep up with expected population growth, leading to mass starvation, estimated to occur around 1930-1940. His proposal was that we could entirely circumvent the issue by finding a way to convert some of our mostly Nitrogen atmosphere into a form that plants could absorb.
About 10 years later, in 1909, Franz Haber discovered such a process. Just a year later, Carl Bosch figured out how to industrialize the process. They both were awarded Nobel prizes for their achievement. Our current population levels are sustained by the Haber-Bosch process.
The problem with that is that the Nitrogen does not go back into the atmosphere. It goes into the oceans and the resulting problems have been called a stronger violation of planetary boundaries then CO2 pollution.
full story here: https://www.lesswrong.com/posts/GDT6tKH5ajphXHGny/turning-air-into-bread
100 Year Bunkers
I often hear that building bio-proof bunkers would be good for bio-x-risk, but it seems like not a lot of progress is being made on these.
It’s worth mentioning a bunch of things I think probably make it hard for me to think about:
It seems that even if I design and build them, I might not be the right pick for an occupant, and thus wouldn’t directly benefit in the event of a bio-catastrophe
In the event of a bio-catastrophe, it’s probably the case that you don’t want anyone from the outside coming in, so probably you need people already living in it
Living in a bio-bunker in the middle of nowhere seems kinda boring
Assuming we can get all of those figured out, it seems worth funding someone to work on this full-time. My understanding is EA-funders have tried to do this but not found any takers yet.
So I have a proposal for a different way to iterate on the design.
Crazy Hacker Clubs
Years ago, probably at a weekly “Hack Night” at a friend’s garage, where a handful of us met to work on and discuss projects, someone came up with the idea that we could build a satellite.
NASA was hosting a cubesat competition, where the prize was a launch/deployment. We also had looked at a bunch of university cubesats, and decided that it wasn’t that difficult to build a satellite.
So hack nights and eventually other nights turned to meeting to discuss designs and implementations for the various problems we would run into (power generation and storage, attitude/orientation control, fine pointing, communications). Despite being rank amateurs, we made strong progress, building small scale prototypes of the outer structure and subsystems.
The thing that actually ended this was I decided this was so much fun that I’d quit my job and instead go work at Planet Labs—where a really cool bunch of space hippies was basically doing a slightly more advanced version of our “hacker’s cubesat”
Similar to “Hack Nights”—I think it would be fun to get together with a small set of friends and work through the design and prototype build of a 100 year bunker.
I expect to enjoy this sort of thing. Designing life support systems, and how they might fail and be fixed. Research into various forms of concrete and seismological building standards. Figuring out where would be the best place for it.
My guess is that a lot of the design and outline for construction could be had over pizza in someone’s garage.
(I’m not predicting I will do this, or committing to joining a thing if it existed, but I do think it would be a lot of fun and would be very interested in giving it a shot)
What’s your threat scenario where you would believe a bio-bunker to be helpful?
I’m roughly thinking of this sort of thing: https://forum.effectivealtruism.org/posts/fTDhRL3pLY4PNee67/improving-disaster-shelters-to-increase-the-chances-of
What about using remote islands as bio-bunkers? Some of them are not reachable by aviation (no airfield), so seems to be better protected. But they have science stations already populated. Example is Kerguelen islands. The main risk here is bird flu delivered by birds or some stray ship.
Remote islands are probably harder to access via aviation, but probably less geologically stable (I’d worry about things like weathering, etc). Additionally this is probably going to dramatically increase costs to build.
It’s probably worth considering “aboveground bunker in remote location” (e.g. islands, also antarctica) -- so throw it into the hat with the other considerations.
My guess is that the cheaper costs to move building supplies and construction equipment will favor “middle of nowhere in an otherwise developed country”.
I don’t have fully explored models also for how much a 100 yr bunker needs to be hidden/defensible. This seems worth thinking about.
If I ended up wanting to build one of these on some cheap land somewhere with friends, above-ground might be the way to go.
(The idea in that case would be to have folks we trust take turns staying in it for ~1month or so at a time, which honestly sounds pretty great to me right now. Spending a month just reading and thinking and disconnected while having an excuse to be away sounds rad)
You probably don’t need 100 years bunker if you prepare only for biocatastrophe, as most pandemics has shorter timing, except AIDS.
Also, it is better not to build anything, but use already existing structures. E.g. there are coal mines in Spitzbergen which could be used for underground storages.
That seems worth considering!
[crossposted from EA Forum]
Reflecting a little on my shortform from a few years ago, I think I wasn’t ambitious enough in trying to actually move this forward.
I want there to be an org that does “human challenge”-style RCTs across lots of important questions that are extremely hard to get at otherwise, including (top 2 are repeated from previous shortform):
Health effects of veganism
Health effects of restricting sleep
Productivity of remote vs. in-person work
Productivity effects of blocking out focused/deep work
Edited to add: I no longer think “human challenge” is really the best way to refer to this idea (see comment that convinced me); I mean to say something like “large scale RCTs of important things on volunteers who sign up on an app to randomly try or not try an intervention.” I’m open to suggestions on succinct ways to refer to this.
I’d be very excited about such an org existing. I think it could even grow to become an effective megaproject, pending further analysis on how much it could increase wisdom relative to power. But, I don’t think it’s a good personal fit for me to found given my current interests and skills.
However, I think I could plausibly provide some useful advice/help to anyone who is interested in founding a many-domain human-challenge org. If you are interested in founding such an org or know someone who might be and want my advice, let me know. (I will also be linking this shortform to some people who might be able to help set this up.)
Some further inspiration I’m drawing on to be excited about this org:
Freakonomics’ RCT on measuring the effects of big life changes like quitting your job or breaking up with your partner. This makes me optimistic about the feasibility of getting lots of people to sign up.
Holden’s note on doing these type of experiments with digital people. He mentions some difficulties with running these types of RCTs today, but I think an org specializing in them could help.
Votes/considerations on why this is a good or bad idea are also appreciated!
I’m confused why these would be described as “challenge” RCTs, and worry that the term will create broader confusion in the movement to support challenge trials for disease. In the usual clinical context, the word “challenge” in “human challenge trial” refers to the step of introducing the “challenge” of a bad thing (e.g., an infectious agent) to the subject, to see if the treatment protects them from it. I don’t know what a “challenge” trial testing the effects of veganism looks like?
(I’m generally positive on the idea of trialing more things; my confusion+comment is just restricted to the naming being proposed here.)
Thanks, I agree with this and it’s probably not good branding anyway.
I was thinking the “challenge” was just doing the intervention (e.g. being vegan), but agree that the framing is confusing since it refers to something different in the clinical context. I will edit my shortforms to reflect this updated view.
I was reading a comment (linked below) by gwern and it hit me:
Jaynes’s Probability: The Logic of Science is so special because it presents a unified theory of probability. After reading it, I no longer think of “probability” and “statistics” as being different things. As many understand evolution—feeling there is a set of core principles, like selection and evolutionary pressure and mutation, even if the person isn’t familiar with many of the technical findings or machinery they’d need to actually do an analysis good enough to make good predictions from—this is how I feel about probability after reading Jaynes.
The direct practical value of the book is quite low! But it can give you a mind that feels probability is an intuitive field, and nothing like a collection of tricks. I might have gotten a lot of help on this front by reading the sequences, but it’s Jaynes who really brought it together for me. I even skipped a lot of the algebraic math in his book and still got so much out of it.
For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of things: shots, physical therapy, trigger-point therapy, acupuncture, massage tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got better. Sometimes it got worse.
No Beat Saber, no lifting weights, and every time I read a damn book I would start translating the punctuation into Dragon NaturallySpeaking syntax.
Text: “Consider a bijection f:X→Y”My mental narrator: “Cap consider a bijection space dollar foxtrot colon cap x backslash tango oscar cap y dollar”
Text: “Consider a bijection f:X→Y”
My mental narrator: “Cap consider a bijection space dollar foxtrot colon cap x backslash tango oscar cap y dollar”
Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling your computer “click” and waiting a few seconds while resisting the temptation to just grab the mouse? Dictating your way through a computer science PhD?
And then.… and then, a month ago, I got fed up. What if it was all just in my head, at this point? I’m only 25. This is ridiculous. How can it possibly take me this long to heal such a minor injury?
I wanted my hands back—I wanted it real bad. I wanted it so bad that I did something dirty: I made myself believe something. Well, actually, I pretended to be a person who really, really believed his hands were fine and healing and the pain was all psychosomatic.
And… it worked, as far as I can tell. It totally worked. I haven’t dictated in over three weeks. I play Beat Saber as much as I please. I type for hours and hours a day with only the faintest traces of discomfort.
It was probably just regression to the mean because lots of things are, but I started feeling RSI-like symptoms a few months ago, read this, did this, and now they’re gone, and in the possibilities where this did help, thank you! (And either way, this did make me feel less anxious about it 😀)
There’s a reasonable chance that my overcoming RSI was causally downstream of that exact comment of yours.
Happy to have (maybe) helped! :-)
Is the problem still gone?
Still gone. I’m now sleeping without wrist braces and doing intense daily exercise, like bicep curls and pushups.
Totally 100% gone. Sometimes I go weeks forgetting that pain was ever part of my life.
I’m glad it worked :) It’s not that surprising given that pain is known to be susceptible to the placebo effect. I would link the SSC post, but, alas...
You able to link to it now?
Looks like reverse stigmata effect.
Woo faith healing!
(hope this works out longterm, and doesn’t turn out be secretly hurting still)
aren’t we all secretly hurting still?
This is unlike anything I have heard!
It’s very similar to what John Sarno (author of Healing Back Pain and The Mindbody Prescription) preaches, as well as Howard Schubiner. There’s also a rationalist-adjacent dude who started a company (Axy Health) based on these principles. Fuck if I know how any of it works though, and it doesn’t work for everyone. Congrats though TurnTrout!
My Dad it seems might have psychosomatic stomach ache. How to convince him to convince himself that he has no problem?
If you want to try out the hypothesis, I recommend that he (or you, if he’s not receptive to it) read Sarno’s book. I want to reiterate that it does not work in every situation, but you’re welcome to take a look.
Thick and Thin Concepts
Take for example concepts like courage, diligence and laziness. These concepts are considered thick concepts because they have both a descriptive component and a moral component. To be courageous is most often meant* not only to claim that the person undertook a great risk, but that it was morally praiseworthy. So the thick concept is often naturally modeled as a conjunction of a descriptive claim and a descriptive claim.
However, this isn’t the only way to understand these concepts. An alternate would be along the following lines: Imagine D+M>=10 with D>=3 and M>=3. So there would be a minimal amount that the descriptive claim has to fit and a minimal amount the moral claim has to fit and a minimal total. This doesn’t seem like an unreasonable model of how thick concepts might apply.
Alternatively, there might be an additional requirement that the satisfaction of the moral component is sufficiently related to the descriptive component. For example, suppose in order to be diligent you need to work hard in such a way that the hard work causes the action to be praiseworthy. Then consider the following situation. I bake you a cake and this action is praiseworthy because you really enjoy it. However, it would have been much easier for me to have bought you a cake—including the effort to earn the money—and you would actually have been happier had I done so. Further, assume that I knew all of this in advance. In this case, can we really say that you’ve demonstrated the virtue of diligence?
Maybe the best way to think about this is Wittgensteinian: that thick concepts only make sense from within a particular form of life and are not so easily reduced to their components as we might think.
* This isn’t always the case though.
Tonight my family and I played a trivia game (Wits & Wagers) with GPT-3 as one of the players! It lost, but not by much. It got 3 questions right out of 13. One of the questions it got right it didn’t get exactly right, but was the closest and so got the points. (This is interesting because it means it was guessing correctly rather than regurgitating memorized answers. Presumably the other two it got right were memorized facts.)
Anyhow, having GPT-3 playing made the whole experience more fun for me. I recommend it. :) We plan to do this every year with whatever the most advanced publicly available AI (that doesn’t have access to the internet) is.
How did GPT-3 participate?
I typed in the questions to GPT-3 and pressed “generate” to see its answers. I used a pretty simple prompt.
PSA for Edge browser users: if you care about privacy, make sure Microsoft does not silently enable syncing of browsing history etc. (Settings->Privacy, search and services).
They seemingly did so to me a few days ago (probably along with the Windows “Feature update” 20H2); it may be something that they currently do to some users and not others.