Yes, with that operationalisation, the update has no impact on actions. (Which makes it even more clear that the parsimonious choice is to skip it.)
(I do think it’s unclear whether selfish agents “should” be updateless in transparent newcomb.)
Yeah. It might be clearer to think about this as a 2-by-2 grid, with “Would you help a recent copy of yourself that has had one divergent experience from you?” on one axis and “Would you help a version of yourself that would naively be seen as non-existant?” (e.g. in transparent newcombs) on another.
It seems fairly clear that it’s reasonable to answer “yes” to both of these.
It’s possible that a selfish agent could sensibly answer “no” to both of them.
But perhaps we can exclude the other options.
Answering “yes” to the former and “no” to the latter would correspond to only caring about copies of yourself that ‘exist’ in the naive sense. (This is what the version of EDT+SSA that I wrote about it in my top-level comment would do.) Perhaps this could be excluded as relying on philosophical confusion about ‘existence’.
Answer “no” to the former and “yes” to the latter might correspond to something like… only caring about versions of yourself that you have some particular kind of (counterfactual) continuity or connection with. (I’m making stuff up here.) Anyway, maybe this could be excluded as necessarily having to rely on some confusions about personal identity.
Interesting! Here’s one way to look at this:
EDT+SSA-with-a-minimal-reference-class behaves like UDT in anthropic dilemmas where updatelessness doesn’t matter.
I think SSA with a minimal reference class is roughly equivalent to “notice that you exist; exclude all possible worlds where you don’t exist; renormalize”
In large worlds where your observations have sufficient randomness that observers of all kinds exists in all worlds, the SSA update step cannot exclude any world. You’re updateless by default. (This is the case in the 99% example above.)
In small or sufficiently deterministic worlds, the SSA update step can exclude some possible worlds.
In “normal” situations, the fact that it excludes worlds where you don’t exist doesn’t have any implications for your decisions — because your actions will normally not have any effects in worlds where you don’t exist.
But in situations like transparent newcombs, this means that you will now not care about non-existent copies of yourself.
Basically, EDT behaves fine without updating. Excluding worlds where you don’t exist is one kind of updating that you can do that doesn’t change your behavior in normal situations. Whether you do this or not will determine whether you act updateless in situations like transparent newcomb that happen in small or sufficiently deterministic worlds. (In large and sufficiently random worlds, you’ll act updateless regardless.)
Viewed like this, the SSA part of EDT+SSA looks unnecessary and strange. Especially since I think you do want to act updateless in situations like transparent newscomb.
Re your edit: That bit seems roughly correct to me.
If we are in a simulation, SIA doesn’t have strong views on late filters for unsimulated reality. (This is my question (B) above.) And since SIA thinks we’re almost certainly in a simulation, it’s not crazy to say that SIA doesn’t have strong view on late filters for unsimulated reality. SIA is very ok with small late filters, as long as we live in a simulation, which SIA says we probably do.
But yeah, it is a little bit confusing, in that we care more about late-filters-in-unsimulated reality if we live in unsimulated reality. And in the (unlikely) case that we do, then we should ask my question (C) above, in which case SIA do have strong views on late filters.
I think it’s important to be clear about what SIA says in different situations, here. Consider the following 4 questions:
A) Do we live in a simulation?
B) If we live in a simulation, should we expect basement reality to have a large late filter?
C) If we live in basement reality, should we expect basement reality (ie our world) to have a large late filter?
D) If we live in a simulation, should we expect the simulation (ie our world) to have a large late filter?
In this post, you persuasively argue that SIA answers “yes” to (A) and “not necessarily” to (B). However, (B) is almost never decision-relevant, since it’s not about our own world. What about (C) and (D)? (Which are easier to see how they could be decision-relevant, for someone who buys SIA. I personally agree with you that something like Anthropic Decision Theory is the best way to reason about decisions, but responsible usage of SIA+CDT is one way to get there, in anthropic dilemmas.)
To answer (C): If we condition on living in basement reality, then SIA favors hypotheses that imply many observers in basement reality. The simulated copies are entirely irrelevant, since we have conditioned them away. (You can verify this with bayes theorem.) So we are back with the SIA doomsday argument again, and we face large late filters.
To answer (D): Detailed simulations of civilisations that spread to the stars are vastly more expensive than detailed simulations of early civilizations. This means that the latter are likely to be far more common, and we’re almost certainly living in a simulation we’re we’ll never spread to the (simulated) stars. (This is plausibly because the simulation will be turned off before we get the chance.) You could discuss what terminology to use for this, but I’d be inclined to call this a large late filter, too.
So my preferred framing isn’t really that the simulation hypothesis “undercuts” the SIA doomsday argument. It’s rather that the simulation hypothesis provides one plausible mechanism for it: that we’re in a simulation that will end soon. But that’s just a question of framing/terminology. The main point of this comment is to provide answers to questions (C) and (D).
The point of the defection/cooperation thing isn’t just that cooperation is a kindness to Sam Altman personally, which can be overridden by the greater good. The point is that generally cooperative behavior, and generally high amounts of trust, can make everyone better off. If it was true as you said that:
he’s a head state in control of WMDs and should be (and expect to be) treated as such
and as a consequence, he e.g. expected someone to record him during the Q&A, then he would presumably not have done the Q&A in the first place, or would have shared much less information. This would have lead to humanity learning less about OpenAI’s plans.
And this is definitely not a single-shot interaction. This was Sam’s second Q&A at an ACX meetup, and there was no reason to expect it to be the last. Moreover, there’s been a lot of interactions between the alignment community (including some who write on LW) and OpenAI in the past. And given that OpenAI’s decisions about alignment-related things matter a lot (as you say) it seems important to keep up good relations and high degrees of trust.
honestly curious about reasons for downvotes if anyone is willing to share
I initially downvoted, have since retracted it. Since trust/cooperation can be quite fragile and dependent on expectations about how people will behave, I became worried when I read you as basically announcing that you and other people should and will defect in the future. And I wanted a quick way to mark disagreement with that, to communicate that this isn’t a generally accepted point of view. But your point was phrased in a perfectly civil manner, so I should really have just taken the time to write a response, sorry.
As a general point, these notes should not be used to infer anything about what Sam Altman thought was important enough to talk a lot about, or what his general tone/attitude was. This is because
The notes are filtered through what the note-takers thought was important. There’s a lot of stuff that’s missing.
What Sam spoke about was mostly a function of what he was asked about (it was a Q&A after all). If you were there live you could maybe get some idea of how he was inclined to interpret questions, what he said in response to more open questions, etc. But here, the information about what questions were asked is entirely missing.
General attitude/tone is almost completely destroyed by the compression of answers into notes.
For example, IIRC, the thing about GPT being empathic was in response to some question like “How can we make AI empathic?” (i.e., it was not his own idea to bring up empathy). The answer was obviously much longer than the summary by notes (so less dismissive). And directionally, it is certainly the case already that GPT-3 will act more empathic if you tell it to do so.
Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
He did make some general claims that it was one of his top few concerns, that he felt like OpenAI had been making some promising alignment work over the last year, that it was still an important goal for OpenAI’s safety work to catch up with its capabilities work, that it was good for more people to go into safety work, etc. Not very many specifics as far as I can remember.
Thanks, all this seems reasonable, except possibly:
Merging (maybe via BCI) most likely path to a good outcome.
Merging (maybe via BCI) most likely path to a good outcome.
Which in my mind still carries connotations like ~”merging is an identifiable path towards good outcomes, where the most important thing is to get the merging right, and that will solve many problems along the way”. Which is quite different from the claim “merging will likely be a part of a good future”, analogous to e.g. “pizza will likely be a part of a good future”. My interpretation was closer to the latter (although, again, I was uncertain how to interpret this part).
Were you there throughout the post-Q&A discussion? (I missed it.)
I wrote down some places where my memory disagreed with the notes. (The notes might well be more accurate than me, but I thought I’d flag in case other people’s memories agree with mine. Also, this list is not exhaustive, e.g. there are many things on the notes that I don’t remember, but where I’d be unsurprised if I just missed it.)
AGI will not be a binary moment. We will not agree on the moment it did happen. It will be gradual. Warning sign will be, when systems become capable of self-improvement.
I don’t remember hearing that last bit as a generic warning sign, but I might well have missed it. I do remember hearing that if systems became capable of self-improvement (sooner than expected?), that could be a big update towards believing that fast take-off is more likely (as mentioned in your next point).
AGI will not be a pure language model, but language will be the interface.
I remember both these claims as being significantly more uncertain/hedged.
AGI (program able to do most economically useful tasks …) in the first half of the 2030ies is his 50% bet, bit further out than others at OpenAI.
I remembered this as being a forecast for ~transformative AI, and as explicitly not being “AI that can do anything that humans can do”, which could be quite a bit longer. (Your description of AGI is sort-of in-between those, so it’s hard to tell whether it’s inconsistent with my memory.)
Merging via CBI most likely path to a good outcome.
I was a bit confused about this answer in the Q&A, but I would not have summarized it like this. I remember claims that some degree of merging with AI is likely to happen conditional on a good outcome, and maybe a claim that CBI was the most likely path towards merging.
The 1.8% number comes from your own calculations, though, right? Shouldn’t we be comparing the lizardman constant with the reported percentages, rather than this calculated number?
In this case, that might be 2.8%. But I don’t know what the methodology of the survey was. If they just asked a bunch of random people and got them to self-report whether they had covid; maybe we should actually use the percentage of people who claimed to have long covid among everyone asked, which could be lower than 1.8%.
Of course, all of these numbers are smaller than the lizardman constant anyway.
This page (I don’t know how trustworthy it is) claimed on 19th May:
“Three days [ago] Goa issued a Government Order for mass prophylaxis using ivermectin for the people of Goa”, said Dr Suryakant. Goa is the first state in India to use ivermectin in this way, he emphasised. Uttar Pradesh was the first state to issue a Government Order for the use of ivermectin for treatment and prophylaxis for asymptomatic and mild cases of covid-19 and for prophylaxis of health care workers and home contacts.
On your “WHY”, you seem to be presenting reasons why other people not believing your model shouldn’t count as strong evidence against it. Which is all fair. But I’m still curious for positive evidence to believe your model in the first place. Maybe this would be obvious if I knew more biology, but as it is, I don’t know why I should place higher credence in your model than any other model (e.g. the one at the bottom of this comment, if that counts).
...Then that bimodal response could directly and cleanly justify claiming “antibody response was 3.5-fold higher” in some very fuzzy and general way (because 28% x 3.5 = 98%)
As far as I can tell, “antibody response was 3.5-fold higher” just means that, on average, people in the extended dosing schedule had 3.5x more antibodies. I can’t tell whether you interpret it in some other way, or if you think this is a misleading way to describe things, or if you’re making some other point...?
The graph you included as a supporting claim was, I think, just the B panel from the totality of Figure 2 which is nice in many ways.
The data in Panel A therefore seems consistent to me that “eventually” there is some roughly normal and acceptable level of “vaccinated at all, in an essentially bimodal way” that two doses reaches faster than typical?
Ok now I’m confused.
Do you think that all people on these graphs have reached a “normal and acceptable level of ‘vaccinated at all, in an essentially bimodal way’ ”?
If so, do you not think that there’s any important immunity difference between a single-vaccinated person around 1-10 on the graph, or a doubly-vaccinated person around 1000-10000?
Or if you think that only some of the people on this graph are immune, where do you think the line between immune and not-immune should be drawn on these graphs? (The distribution seems to be fairly continuous everywhere, to me, so it seems arbitrary to draw the line anywhere.)
Or if you think the important immunity difference isn’t captured by antibody-levels, what is it about?
And re “that two doses reaches faster than typical”; are you implying that the single-dosed people’s antibody response would’ve kept increasing beyond the 5-6 week mark and eventually gotten as high as the doubly-vaccinated people? That seems unlikely to me. (Other than maybe the few people where their antibodies did increase, but I’m happy to ignore them until I understand the most normal response curve better.)
My hunch is that extended Bleed3 would show a decline from the extended Bleed2 measurement…
The thing I’m asking for is: what’s the best second epicycle to add? What is the mechanism? If someone is already seroconverted, what would you measure to detect “that their mechanistic biological state is not ALREADY in the configuration that you’d be hoping to cause to improve via the administration of a third dose”?
Here’s one suggestion:
1. The more antibodies you have, the less probability of getting sick, the less probability of getting severe disease, etc.
2. More vaccines increases the number of antibodies you have.
3. Therefore you want to have more vaccines.
I would’ve thought (1) to be fairly uncontroversial? And the linked study seems to provide good evidence for (2) when going from 1 to 2 doses, increasing antibodies by roughly a factor of 100. And of course adding more vaccines will eventually stop adding more antibodies. But right now I don’t have any reason to believe in a big difference between going from 1->2 vaccines vs going from 2->3 vaccines (other than 2 vaccines being the general standard). So I wouldn’t be surprised if taking a 3rd vaccine could increase your antibodies by another order of magnitude.
Maybe you think this doesn’t provide enough of a “mechanism”? Biology being complicated, I’m very happy to take empirical data for what it is, and make extrapolations even if I don’t know what the mechanism is. Personally, I also don’t feel like I have any more mechanism for “vaccine have a fixed probability of causing antibodies if you don’t already have them, otherwise they don’t do much” than “vaccine typically increases antibodies by a lot regardless of whether you have them or not”. So when the evidence clearly indicates the latter, I will definitely believe it.
And yeah, also, if someone has the option, I agree that it seems probably better to get a different vaccine than the same vaccine again!
Huh, I’m pretty surprised by this model. Why do you think it’s correct?
Here’s an image of some measure of people’s antibody responses from page 9 of this paper, where the first set of points is people’s response 5-6 wks after dose 1, and the second set of points is people’s response 2-3 wks after dose 2.
It looks like people who get an antibody response to the first dose still get a much improved response from a second dose. And there’s no sign of a bimodal responses to any of the doses. Is that consistent with your model?
Also, the way vaccines can protect from severe disease without protecting from infection seems to suggest that there’s more than a binary question of response/not-response.
How many asymptomatic? And how did people know of them?
(The human baseline is a loss of 0.7 bits, with lots of uncertainty on that figure.)
I’d like to know what this figure is based on. In the linked post, Gwern writes:
The pretraining thesis argues that this can go even further: we can compare this performance directly with humans doing the same objective task, who can achieve closer to 0.7 bits per character.
But in that linked post, there’s no mention of “0.7” bits in particular, as far as I or cmd-f can see. The most relevant passage I’ve read is:
Claude Shannon found that each character was carrying more like 1 (0.6-1.3) bit of unguessable information (differing from genre to genre8); Hamid Moradi found 1.62-2.28 bits on various books9; Brown et al 1992 found <1.72 bits; Teahan & Cleary 1996 got 1.46; Cover & King 1978 came up with 1.3 bits10; and Behr et al 2002 found 1.6 bits for English and that compressibility was similar to this when using translations in Arabic/Chinese/French/Greek/Japanese/Korean/Russian/Spanish (with Japanese as an outlier). In practice, existing algorithms can make it down to just 2 bits to represent a character, and theory suggests the true entropy was around 0.8 bits per character.11
I’m not sure what the relationship is between supposedly unguessable information and human performance, but assuming that all these sources were actually just estimating human performance, and without looking into the sources more… this isn’t just lots of uncertainty, but vast amounts of uncertainty, where it’s very plausible that GPT-3 has already beaten humans. This wouldn’t be that surprising, given that GPT-3 must have memorised a lot of statistical information about how common various words are, which humans certainly don’t know by default.
I have a lot of respect for people looking into a literature like this and forming their own subjective guess, but it’d be good to know if that’s what happened here, or if there is some source that pinpoints 0.7 in particular as a good estimate.
Terminologically, I like topic/content/purpose. Where ‘purpose’ includes potential results from the job (including pay) and how much you care about and are motivated by them. It could be difficult to split content and purpose, though. E.g. being able to see and talk with the people you’re helping could be very motivating, but it doesn’t fit purely into either content or purpose.
SIA isn’t needed for that; standard probability theory will be enough (as our becoming grabby is evidence that grabbiness is easier than expected, and vice-versa).
I think there’s a confusion with SIA and reference classes and so on. If there are no other exact copies of me, then SIA is just standard Bayesian update on the fact that I exist. If theory T_i has prior probability p_i and gives a probability q_i of me existing, then SIA changes its probability to q_i*p_i (and renormalises).
Yeah, I agree with all of that. In particular, SIA updating on us being alive on Earth is exactly as if we sampled a random planet from space, discovered it was Earth, and discovered it had life on it. Of course, there are also tons of planets that we’ve seen that doesn’t look like they have life on them.
But “Earth is special” theories also get boosted: if a theory claims life is very easy but only on Earth-like planets, then those also get boosted.
I sort-of agree with this, but I don’t think it matters in practice, because we update down on “Earth is unlikely” when we first observe that the planet we sampled was Earth-like.
Here’s a model: Assume that there’s a conception of “Earth-like planet” such that life-on-Earth is exactly equal evidence for life emerging on any Earth-like planet, and 0 evidence for life emerging on other planets. This is clearly a simplification, but I think it generalises. “Earth-like planet” could be any rocky planet, any rocky planet with water, any rocky planet with water that was hit by an asteroid X years into its lifespan, etc.
Now, if we sample a planet (Earth) and notice that it’s Earth-like and has life on it, we do two updates:
Noticing that Earth is an Earth-like planet should update us towards thinking that Earth-like planets are common in the universe.
Noticing that life emerged on Earth should update us towards thinking that life has a high probability of emerging on Earth-like planets.
If we don’t know anything else about the universe yet, these two updates should collectively imply an update towards life-is-common that is just as big as if we hadn’t done this decomposition, and just updated on the hypothesis “how common is life?” in the first place.
Now, lets say we start observing the rest of the universe. Lets assume this happens via sampling random planets and observing (a) whether they are/aren’t Earth-like (b) whether they do/don’t have life on them.
If we sample a non-Earth-like planet, we update towards thinking that Earth-like planets aren’t common.
If we sample an Earth-like planet without life, we update towards thinking that Earth-like planets has a lower probability of supporting life.
I haven’t done the math, but I’m pretty sure that it doesn’t matter which of these we observe. The update on “How common is life?” will be the same regardless. So the existence of “Earth is special”-hypotheses doesn’t matter for our best-guess of “How common is life?”, if we only conside the impact of observing planets with/without Earth-like features and life.
Of course, observing planets isn’t the only way we can learn about the universe. We can also do science, and reason about the likely reasons that life emerged, and how common those things ought to be.
That means that if you can come up with a strong theoretical argument (that isn’t just based on observing how many planets are Earth-like and/or had life on them, including Earth) that some feature of Earth significantly boosts the probability of life and that that feature is extremely rare in the universe at-large, then that would be a solid argument for why to expect life to be rare in the universe. However, note that you’d have to argue that it was extremely rare. If we’re assuming that grabby aliens could travel over many galaxies, then we’ve already observed evidence that grabby life is sufficiently rare to not yet have appeared in any of a very large number of planets in any of a very large number of galaxies. Your theoretical reasons to expect life to be rare would have to assert that it’s even rarer than that to impact the results.
Good point, I didn’t think about that. That’s the old SIA argument for there being a late filter.
The reason I didn’t think about it is because I use SIA-like reasoning in the first place because it pays attention to the stakes in the right way: I think I care about acting correctly in universes with more copies of me almost-proportionally more. But I also care more about universes where civilisations-like-Earth are more likely to colonise space (ie become grabby), because that means that each copy of me can have more impact. That kind-of cancels out the SIA argument for a late filter, mostly leaving me with my priors, which points toward a decent probability that any given civilisation colonises space in a grabby manner.
Also: if Earth-originiating intelligence ever becomes grabby, that’s a huge bayesian update in favor of other civilisations becoming grabby, too. So regardless of how we describe the difference between T1 and T2, SIA will definitely think that T1 is a lot more likely once we start colonising space, if we ever do that.
But by “theory of the universe”, Robin Hanson meant not only the theory of how the physical universe was, but the anthropic probability theory. The main candidates are SIA and SSA. SIA is indifferent between T1 and T2. But SSA prefers T1 (after updating on the time of our evolution).
SIA is not indifferent between T1 and T2. There are way more humans in world T1 than in world T2 (since T2 requires life to be very uncommon, which would imply that humans are even more uncommon), so SIA thinks world T1 is much more likely. After all, the difference between SIA and SSA is that SIA thinks that universes with more observers are proportionally more likely; so SIA will always think aliens are more likely than SSA does.
Previously, I thought this was in conflict with the fact that humans didn’t seem to be particularly early (ie., if life is common, it’s surprising that there aren’t any aliens around 13.8 billion years into the universe’s life span). I ran the numbers, and concluded that SIA still thought that we’d be very likely to encounter aliens (though most of the linked post instead focuses on answering the decision-relevant question “how much of potentially-colonisable space would be colonised without us?”, evaluated ADT-style).
After having read Robin’s work, I now think humans probably are quite early, which would imply that (given SIA/ADT) it is highly overdetermined that aliens are common. As you say, Robin’s work also implies that SSA agrees that aliens are common. So that’s nice: no matter which of these questions we ask, we get a similar answer.
Thanks, computer-speed deliberation being a lot faster than space-colonisation makes sense. I think any deliberation process that uses biological humans as a crucial input would be a lot slower, though; slow enough that it could well be faster to get started with maximally fast space colonisation. Do you agree with that? (I’m a bit surprised at the claim that colonization takes place over “millenia” at technological maturity; even if the travelling takes millenia, it’s not clear to me why launching something maximally-fast – that you presumably already know how to build, at technological maturity – would take millenia. Though maybe you could argue that millenia-scale travelling time implies millenia-scale variance in your arrival-time, in which case launching decades or centuries after your competitors doesn’t cost you too much expected space?)
If you do agree, I’d infer that your mainline expectation is that we succesfully enforce a worldwide pause before mature space-colonisation; since the OP suggests that biological humans are likely to be a significant input into the deliberation process, and since you think that the beaming-out-info schemes are pretty unlikely.
(I take your point that as far as space-colonisation is concerned; such a pause probably isn’t strictly necessary.)