LessWrong dev & admin as of July 5th, 2022.
RobertM
So in general I’m noticing a pattern where you make claims about things that happened, but it turns out those things didn’t happen, or there’s no evidence that they happened and no reason one would believe they did a priori, and you’re actually just making an inference and presenting as the state of reality. These seem to universally be inferences which cast other’s motives or actions in a negative light. They seem to be broadly unjustified by the provided evidence and surrounding context, or rely on models of reality (both physical and social) which I think are very likely in conflict with the models held by the people those inferences are about. Sometimes you draw correspondences between the beliefs and/or behaviors of different people or groups, in what seems like an attempt to justify the belief/behavior of the first, or to frame the second as a hypocrite for complaining about the first (though you don’t usually say why these comparisons are relevant). These correspondences turn out to only be superficial similarities while lacking any of the mechanistic similarities that would make them useful for comparisons, or actually conceal the fact that the two parties in question have opposite beliefs on a more relevant axis. For lack of a better framing, it seems like you’re failing to disentangle your social reality from the things that actually happened. I am deeply sympathetic to the issues you experienced, but the way this post (and the previous post) was written makes it extremely difficult to engage with productively, since so many of the relevant claims turn out to not to be claims about things that actually occurred, despite their appearance as such.
I went through the post and picked out some examples (stopping only because it was taking too much of my evening, not because I ran out), with these being the most salient:
> 1. Many others have worked to conceal the circumstances of their deathsAs far as I can tell, this is almost entirely unsubstantiated, with the possible exception of Maia, and in that case it would have been Ziz’s circle doing the concealment, not any of the individuals you express specific concerns about.
> 2. My psychotic break in which I imagined myself creating hell was a natural extension of this line of thought.
The way this is written makes it sound like you think that it ought to have been a (relatively) predictable consequence.
> 3. By the law of excluded middle, the only possible alternative hypothesis is that the problems I experienced at MIRI and CFAR were unique or at least unusually severe, significantly worse than companies like Google for employees’ mental well-being.
In theory, the problems you experienced could have come from sources other than your professional environment. That is a heck of a missing middle.
> 4. This view is rapidly becoming mainstream, validated by research performed by MAPS and at Johns Hopkins, and FDA approval for psychedelic psychotherapy is widely anticipated in the field.
This seems to imply that Michael’s view on the subject corresponds in most relevant ways to the views taken by MAPS/etc. I don’t know what Michael’s views on the subject actually are, but on priors I’m extremely skeptical that the correspondence is sufficient to make this a useful comparison (which, as an appeal to authority, is already on moderately shaky grounds).
> 5. including a report from a friend along the lines of “CFAR can’t legally recommend that you try [a specific psychedelic], but...”
Can you clarify what relationship this friend had with CFAR? This could be concerning if they were a CFAR employee at the time. If they were not a CFAR employee, were they quoting someone who was? If neither, I’m not sure why it’s evidence of CFAR’s views on the subject.
> 7. MIRI leaders were already privately encouraging me to adopt a kind of conflict theory in which many AI organizations were trying to destroy the world
This is not supported by your later descriptions of those interactions.
First, at no point do you describe any encouragement to adopt a conflict-theoretic view. I assume this section is the relevant one: “MIRI leaders including Eliezer Yudkowsky and Nate Soares told me that this was overly naive, that DeepMind would not stop dangerous research even if good reasons for this could be given. Therefore (they said) it was reasonable to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first. So I was being told to consider people at other AI organizations to be intractably wrong, people who it makes more sense to compete with than to treat as participants in a discourse.” This does not describe encouragement to adopt a conflict-theoretic view. It describes encouragement to adopt some specific beliefs (e.g. about DeepMind’s lack of willingness to integrate information about AI safety into their models and then behave appropriately, and possible ways to mitigate the implied risks), but these are object-level claims, not ontological claims.
Second, this does not describe a stated belief that “AI organizations were trying to destroy the world”. Approximately nobody believes that AI researchers at e.g. DeepMind are actively trying to destroy the world. A more accurate representation of the prevailing belief would be something like “they are doing something that may end up destroying the world, which, from their perspective, would be a totally unintentional and unforeseen consequence of their actions”. This distinction is important, I’m not just nitpicking.
> 8.I was given ridiculous statements and assignments including the claim that MIRI already knew about a working AGI design and that it would not be that hard for me to come up with a working AGI design on short notice just by thinking about it, without being given hints.
I would be pretty surprised if this turned out to be an accurate summary of those interactions. In particular, that:
1) MIRI (Nate) believed, as of 2017, that would it be possible to develop AGI given known techniques & technology in 2017, with effectively no new research or breakthroughs required, just implementation, and
2) You were told that you should be able to come up with such a design yourself on short notice without any help or collaborative effort.Indeed, the anecdote you later relay about your interaction with Nate does not support either of those claims, though it carries its own confusions (why would he encourage you to think about how to develop a working AGI using existing techniques if it would be dangerous to tell you outright? The fact that there’s an extremely obvious contradiction here makes me think that there was a severe miscommunication on at least one side of this conversation).
> 10. His belief that mental states somewhat in the direction of psychosis, such as those had by family members of schizophrenics, are helpful for some forms of intellectual productivity is also shared by Scott Alexander and many academics.
This seems to be (again) drawing a strong correspondence between Michael’s beliefs and actions taken on the basis of those beliefs, and Scott’s beliefs. Scott’s citation of research “showing greater mental modeling and verbal intelligence in relatives of schizophrenics” does not imply that Scott thinks it is a good idea to attempt to induce sub-clinical schizotypal states in people—in fact I would bet a lot of money that Scott thinks doing so is an extremely bad idea, which is a more relevant basis on which to compare his belief’s with Michael’s.
> 11. Scott asserts that Michael Vassar discourages people from seeking mental health treatment. Some mutual friends tried treating me at home for a week as I was losing sleep and becoming increasingly mentally disorganized before deciding to send me to a psychiatric institution, which was a reasonable decision in retrospect.
Were any of those people Michael Vassar? If not, I’m not sure how it’s intended to rebut Scott’s claim (though in general I agree that Scott’s claim about Michael’s discouragement could stand to be substantiated in some way). If so, retracted, but then why is that not specified here, given how clearly it rebuts one of the arguments?
> 13. This is inappropriately enforcing the norms of a minority ideological community as if they were widely accepted professional standards.
This does not read like a charitable interpretation of Scott’s concerns. To wit, if I was friends with someone who I knew was a materialist atheist rationalist, and then one day they came to me and started talking about auras and demons in a way which made it sound like they believed they existed (the way materialist phenomena exist, rather than as “frames”, “analogies”, or “I don’t have a better word for this purely internal mental phenomenon, and this word, despite all the red flags, has connotations that are sufficiently useful that I’m going to use it as a placeholder anyways”), I would update very sharply on them having had a psychotic break (or similar). The relevant reference class for how worrying it is that someone believes something is not what the general public believes, it’s how sharp a departure it is from that person’s previous beliefs (and in what direction—a new, sudden, literal belief in auras and demons is a very common symptom of a certain cluster of mental illnesses!). Note: I am not making any endorsement about the appropriateness of involuntary commitment on the basis of someone suddenly expressing these beliefs. I’m not well-calibrated on the likely distribution of outcomes from doing so.
Moving on from the summary:
> I notice that I have encountered little discussion, public or private, of the conditions of Maia Pasek’s death. To a naive perspective this lack of interest in a dramatic and mysterious death would seem deeply unnatural and extremely surprising, which makes it strong evidence that people are indeed participating in this cover-up.
I do not agree that this is the naive perspective. People, in general, do not enjoy discussing suicide. I have no specific reason to believe that people in the community enjoy this more than average, or expect to get enough value out of it to outweigh the unpleasantness. Unless there is something specifically surprising about a specific suicide that seems relevant to the community more broadly, my default expectation would be that people largely don’t talk about it (the same way they largely don’t talk about most things not relevant to their interests). As far as I can tell, Maia was not a public figure in a way that would, by itself, be sufficient temptation to override people’s generalized dispreference for gossip on the subject. Until today I had not seen any explanation of the specific chain of events preceding Maia’s suicide; a reasonable prior would have been “mentally ill person commits suicide, possibly related to experimental brain hacking they were attempting to do to themselves (as partially detailed on their own blog)”. This seems like fairly strong supporting evidence. I suppose the relevant community interest here is “consider not doing novel neuropsychological research on yourself, especially if you’re already not in a great place”. I agree with that as a useful default heuristic, but it’s one that seems “too obvious to say out loud”. Where do you think a good place for a PSA is?
In general it’s not clear what kind of cover-up you’re imagining. I have not seen any explicit or implicit discouragement of such discussion, except in the (previously mentioned) banal sense that people don’t like discussing suicide and hardly need additional reasons to avoid it.
> While there is a post about Jay’s death on LessWrong, it contains almost no details about Jay’s mental state leading up to their death, and does not link to Jay’s recent blog post. It seems that people other than Jay are also treating the circumstances of Jay’s death as an infohazard.
Jay, like Maia, does not strike me as a public figure. The post you linked is strongly upvoted (+124 at time of writing). It seems to be written as a tribute, which is not the place where I would link to Jay’s most recent blog post if I were trying to analyze causal factors upstream of Jay’s suicide. Again, my prior is that nothing about the lack of discussion needs an explanation in the form of an active conspiracy to suppress such discussion.
> There is a very disturbing possibility (with some evidence for it) here, that people may be picked off one by one (sometimes in ways they cooperate with, e.g. through suicide), with most everyone being too scared to investigate the circumstances.
Please be specific, what evidence? This is an extremely serious claim. For what it’s worth, I don’t agree that people can be picked off in “ways they cooperate with, e.g. though suicide”, unless you mean that Person A would make a concerted effort to convince Person B that Person B ought to commit suicide. But, to the extent that you believe this, the two examples of suicides you listed seem to be causally downstream of “self-hacking with friends(?) in similarly precarious mental states”, not “someone was engaged in a cover-up (of what?) and decided it would be a good idea to try to get these people to commit suicide (why?) despite the incredible risks and unclear benefits”.
> These considerations were widely regarded within MIRI as an important part of AI strategy. I was explicitly expected to think about AI strategy as part of my job. So it isn’t a stretch to say that thinking about extreme AI torture scenarios was part of my job.
S-risk concerns being potentially relevant factors in research does not imply that thinking about specific details of AI torture scenarios would be part of your job. Did someone at MIRI make a claim that imagining S-risk outcomes in graphic detail was necessary or helpful to doing research that factored in S-risks as part of the landscape? Roll to disbelieve.
> Another part of my job was to imagine myself in the role of someone who is going to be creating the AI that could make everything literally the worst it could possibly be, in order to avoid doing that, and prevent others from doing so.
Similarly, roll to disbelieve that someone else at MIRI suggested that you imagine this without significant missing context which would substantially alter the naïve interpretation of that claim.
Skipping the anecdote with Nate & AGI, as I addressed it above, but following that:
> Nate and others who claimed or implied that they had such information did not use it to win bets or make persuasive arguments against people who disagreed with them, but instead used the shared impression or vibe of superior knowledge to invalidate people who disagreed with them.
Putting aside the question of whether Nate or others actually claimed or implied that they had a working model of how to create an AGI with 2017-technology, if they had made such a claim, I am not sure why you would expect them to try to use that model to win bets or make persuasive arguments. I would in fact expect them to never say anything about it to the outside world, because why on earth would you do that given, uh, the entire enterprise of MIRI?
> But I was systematically discouraged from talking with people who doubted that MIRI was for real or publicly revealing evidence that MIRI was not for real, which made it harder for me to seriously entertain that hypothesis.
Can you concretize this? When I read this sentence, an example interaction I can imagine fitting this description would be someone at MIRI advising you in a conversation to avoid talking to Person A, because Person A doubted that MIRI was for real (rather than for other reasons, like “people who spend a lot of time interacting with Person A seem to have a curious habit of undergoing psychotic breaks”).
> In retrospect, I was correct that Nate Soares did not know of a workable AGI design.
I am not sure how either of the excerpts support the claim that “Nate Soares did not know of a workable AGI design” (which, to be clear, I agree with, but for totally unrelated reasons described earlier). Neither of them make any explicit or implicit claims about knowledge (or lack thereof) of AGI design.
> In a recent post, Eliezer Yudkowsky explicitly says that voicing “AGI timelines” is “not great for one’s mental health”, a new additional consideration for suppressing information about timelines.
This is not what Eliezer says. Quoting directly: “What feelings I do have, I worry may be unwise to voice; AGI timelines, in my own experience, are not great for one’s mental health, and I worry that other people seem to have weaker immune systems than even my own.”
This is making a claim that AI timelines themselves are poor for one’s mental health (presumably, the consideration of AI timelines), not the voicing of them.
> Researchers were told not to talk to each other about research, on the basis that some people were working on secret projects and would have to say so if they were asked what they were working on. Instead, we were to talk to Nate Soares, who would connect people who were working on similar projects. I mentioned this to a friend later who considered it a standard cult abuse tactic, of making sure one’s victims don’t talk to each other.
The reason cults attempt to limit communication between their victims is to prevent the formation of common knowledge of specific abusive behaviors that the cult is engaging in, and similar information-theoretic concerns. Taking for granted the description of MIRI’s policy (and application of it) on internal communication about research, this is not a valid correspondence; they were not asking you to avoid discussing your interactions with MIRI (or individuals within MIRI) with other MIRI employees, which could indeed be worrying if it were a sufficiently general ask (rather than e.g. MIRI asking someone in HR not to discuss confidential details of various employees with other employees, which would technically fit the description above but is obviously not what we’re talking about).
> It should be noted that, as I was nominally Nate’s employee, it is consistent with standard business practices for him to prevent me from talking with people who might distract me from my work; this goes to show the continuity between “cults” and “normal corporations”.
I can’t parse this in a way which makes it seem remotely like “standard business practices”. I disagree that it is a standard business practice to actively discourage employees from talking to people who might distract them from their work, largely because employees do not generally have a problem with being distracted from their work because they are talking to specific people. I have worked at number of different companies, each very different from the last in terms of size, domain, organizational culture, etc, and there was not a single occasion where I felt the slightest hint that anyone above me on the org ladder thought I ought to not talk to certain people to avoid distractions, nor did I ever feel like that was part of the organization’s expectations of me.
> MIRI researchers were being very generally denied information (e.g. told not to talk to each other) in a way that makes more sense under a “bad motives” hypothesis than a “good motives” hypothesis. Alternative explanations offered were not persuasive.
Really? This seems to totally disregard explanations unrelated to whether someone has “good motives” or “bad motives”, which are not reasons that I would expect MIRI to have at the top of their list justifying whatever their info-sec policy was.
> By contrast, Michael Vassar thinks that it is common in institutions for people to play zero-sum games in a fractal manner, which makes it unlikely that they could coordinate well enough to cause such large harms.
This is a bit of a sidenote but I’m not sure why this claim is interesting w.r.t. AI alignment, since the problem space, almost by definition, does not require coordination with intent to cause large harms, in order to in fact cause large harms. If the claim is that “institutions are sufficiently dysfunctional that they’ll never be able to build an AGI at all”, that seems like a fully-general argument against institutions ever achieving any goals that require any sort of internal and/or external coordination (trivially invalidated by looking out the window).
> made a medication suggestion (for my sleep issues) that turned out to intensify the psychosis in a way that he might have been able to predict had he thought more carefully
I want to note that while this does not provide much evidence in the way of the claim that Michael Vassar actively seeks to induce psychotic states in people, it is in fact a claim that Michael Vassar was directly, causally upstream of your psychosis worsening, which is worth considering in light of what this entire post seems to be arguing against.
I don’t actually see very much of an argument presented for the extremely strong headline claim:
This post aims to show that, over the next decade, it is quite likely that most democratic Western countries will become fascist dictatorships—this is not a tail risk, but the most likely overall outcome.
You draw an analogy between the “by induction”/”line go up” AI risk argument, and the increase in far-right political representation in Western democracies over the last couple decades. But the “by induction”/”line go up” argument for AI risk is not the reason one should be worried; one should be worried for specific causal reasons that we expect unaligned ASI to cause extremely bad outcomes. There is no corresponding causal model presented for why fascist dictatorship is the default future outcome for most Western democracies.
Like, yes, it is a bit silly to see “line go up” and plug one’s fingers in one’s ears. It certainly can happen here. Donald Trump being elected in 2024 seems like the kind of thing that might do it, though I’d probably be happy to bet at 9:1 against. But if that doesn’t happen, I don’t know why you expect some other Republican candidate to do it, given that none of them seem particularly inclined.
Meanwhile I find Duncan vaguely fascinating like he is a very weird bug
I don’t know[1] for sure what purpose this analogy is serving in this comment, and without it the comment would have felt much less like it was trying to hijack me into associating Duncan with something viscerally unpleasant.
- ^
My guess is that it’s meant to convey something like your internal emotional experience, with regards to Duncan, to readers.
- ^
I think this post neglects one of the most serious risks: that adopting a strategy is correlated decision across agents, that others will correctly see that happening, and that the downside risk is significantly magnified by those dynamics.
Naive 1st-order utilitarianism gives you the wrong answer here. Do not illegally skip on paying your income taxes to donate to charity. Spend your cognitive resources getting a better job, or otherwise legally optimizing for more income. Being a software engineer at many tech companies will enable you to donate six figures per year while maintaining a comfortable lifestyle, without restricting your ability to engage with financial infrastructure, own property, or travel.
At a high level, I’m sort of confused by why you’re choosing to respond to the extremely simplified presentation of Eliezer’s arguments that he presented in this podcast.
I do also have some object-level thoughts.
When capabilities advances do work, they typically integrate well with the current alignment[1] and capabilities paradigms. E.g., I expect that we can apply current alignment techniques such as reinforcement learning from human feedback (RLHF) to evolved architectures.
But not only do current implementations of RLHF not manage to robustly enforce the desired external behavior of models that would be necessary to make versions scaled up to superintellegence safe, we have approximately no idea what sort of internal cognition they generate as a pathway to those behaviors. (I have a further objection to your argument about dimensionality which I’ll address below.)
However, I think such issues largely fall under “ordinary engineering challenges”, not “we made too many capabilities advances, and now all our alignment techniques are totally useless”. I expect future capabilities advances to follow a similar pattern as past capabilities advances, and not completely break the existing alignment techniques.
But they don’t need to completely break the previous generations’ alignment techniques (assuming those techniques were, in fact, even sufficient in the previous generation) for things to turn out badly. For this to be comforting you need to argue against the disjunctive nature of the “pessimistic” arguments, or else rebut each one individually.
The manifold of mind designs is thus:
Vastly more compact than mind design space itself.
More similar to humans than you’d expect.
Less differentiated by learning process detail (architecture, optimizer, etc), as compared to data content, since learning processes are much simpler than data.
This can all be true, while still leaving the manifold of “likely” mind designs vastly larger than “basically human”. But even if that turned out to not be the case, I don’t think it matters, since the relevant difference (for the point he’s making) is not the architecture but the values embedded in it.
It also assumes that the orthogonality thesis should hold in respect to alignment techniques—that such techniques should be equally capable of aligning models to any possible objective.
This seems clearly false in the case of deep learning, where progress on instilling any particular behavioral tendencies in models roughly follows the amount of available data that demonstrate said behavioral tendency. It’s thus vastly easier to align models to goals where we have many examples of people executing said goals.
The difficulty he’s referring to is not one of implementing a known alignment technique to target a goal with no existing examples of success (generating a molecularly-identical strawberry), but of devising an alignment technique (or several) which will work at all. I think you’re taking for granted premises that Eliezer disagrees with (model value formation being similar to human value formation, and/or RLHF “working” in a meaningful way), and then saying that, assuming those are true, Eliezer’s conclusions don’t follow? Which, I mean, sure, maybe, but… is not an actual argument that attacks the disagreement.
As far as I can tell, the answer is: don’t reward your AIs for taking bad actions.
As you say later, this doesn’t seem trivial, since our current paradigm for SotA basically doesn’t allow for this by construction. Earlier paradigms which at least in principle[1] allowed for it, like supervised learning, have been abandoned because they don’t scale nearly as well. (This seems like some evidence against your earlier claim that “When capabilities advances do work, they typically integrate well with the current alignment[1] and capabilities paradigms.”)
As it happens, I do not think that optimizing a network on a given objective function produces goals orientated towards maximizing that objective function. In fact, I think that this almost never happens.
I would be surprised if Eliezer thinks that this is what happens, given that he often uses evolution as an existence proof that this exact thing doesn’t happen by default.
I may come back with more object-level thoughts later. I also think this skips over many other reasons for pessimism which feel like they ought to apply even under your models, i.e. “will the org that gets there even bother doing the thing correctly” (& others laid out in Ray’s recent post on organizational failure modes). But for now, some positives (not remotely comprehensive):
In general, I think object-level engagement with arguments is good, especially when you can attempt to ground it against reality.
Many of the arguments (i.e. the section on evolution) seem like they point to places where it might be possible to verify the correctness of existing analogical reasoning. Even if it’s not obvious how the conclusion changes, helping figure out whether any specific argument is locally valid is still good.
The claim about transformer modularity is new to me and very interesting if true.
- ^
Though obviously not in practice, since humans will still make mistakes, will fail to anticipate many possible directions of generalization, etc, etc.
I think the reason you’re confused is because the comment did not original read like that; based on my recollection it was edited to add (most of?) the second paragraph after the fact. It was originally a mostly content-free slur.
You’re making a claim about both:
what sorts of cognitive capabilities can exist in reality, and
whether current (or future) training regimes are likely to find them
It sounds like you agree that the relevant cognitive capabilities are likely to exist, though maybe not for prime number factorization, and that it’s unclear whether they’d fit inside current architectures.
I do not read Eliezer as making a claim that future GPT-n generations will become perfect (or approximately perfect) text predictors. He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are “merely imitating” human text. This is not obviously true; to the extent that there exist some cognitive capabilities which are physically possible to instantiate in GPT-n model weights which can solve these prediction problems, and are within the region of possible outcomes of our training regimes (+ the data used for them), then it is possible that we will find them.
Putting aside the concerns about potential backfire effects of unilateral action[1], calling the release of gene drive mosquitoes “illegal” is unsubstantiated. The claim that actually cashes out to is “every single country where Anopheles gambiae are a substantial vector for the spread of malaria has laws that narrowly prohibit the release of release of mosquitoes”. The alternative interpretation, that “every single country will stretch obviously unrelated laws as far as necessary to throw the book at you if you do this”, may be true, but isn’t very interesting, since that can be used as a fully general argument against doing anything ever.[2]
- ^
Which I’m inclined to agree with, though notably I haven’t actually seen a cost/benefit analysis from any of those sources.
- ^
Though you’re more likely to have the book thrown at you for some things than for others, and it’d be silly to deny that we have non-zero information about what those things are in advance. I still think the distinction is substantial.
- ^
Have you sat down for 5 minutes and thought about how you, as an AGI, might come up with a way to wrest control of the lightcone from humans?
EDIT: I ask because your post (and commentary on this thread) seems to be doing this thing where you’re framing the situation as one where the default assumption is that, absent a sufficiently concrete description of how to accomplish a task, the task is impossible (or extremely unlikely to be achieved). This is not a frame that is particularly useful when examining consequentialist agents and what they’re likely to be able to accomplish.
I think I can take a stab at this; timing myself out of curiosity.
@Said: let me draw an analogy to a fictional online interaction (without implying that comment that started all of this is analogous in *all* relevant ways to the fictional one):
Author Andy: ”...a destructive mode of communication.”
Commenter Cody: “What do you mean by destructive in that context?”
If Andy had written something like “a tangerine mode of communication,” it would be understandable if Cody (and most other readers) had *literally* no referents for “tangerine” which would cause that sentence to parse at all. If Andy had instead written something like ”...a mode of communication harms the ability of conversational participants to reach agreement on the definition of terms [x, y, and z],” and Cody asked what “harms” meant in that context, as an outsider, it would be very difficult to understand where the communication had broken down, because “harms” is a widely-used term with referents that map relatively cleanly to the concepts at play, even if it is not the most common use for the term. “Destructive” is a more interesting case, because it is rarely used as a modifier to “mode of communication”, but if Cody were to claim that there was no “plausible interpretation” or “standard usage” he could assume, it would be difficult to understand how to help him construct the mental machinery to map the dictionary definition (as, for example, a “standard usage”) of “destructive” as an adjective to another concept. “Destructive” has a widely-known and well-accepted definition, and while Cody is not claiming that he does not know that definition (or any others), he is claiming that *none* of the definitions he knows produce coherent output when used to modify “mode of communication”.
This is what this looks like, from the outside. You are claiming that you have no referents for “authentic” which produce a coherent-in-context (note: no claim about whether it is justified) interpretation for the given sentence(s). Authentic has a dictionary definition of “genuine”; if we replace “authenticity” with “being genuine” in
Similarly, why should “that which can be destroyed by authenticity” be destroyed? Because authenticity is fundamentally more real and valuable than what it replaces, which must be implemented on a deeper level than “what my current beliefs think.”
...it seems to be a coherent claim (though, again, no claim on whether it is sufficiently justified). If you have the same problem with “genuine”, then perform another substitution: “truthful self-representation” (that substitution tied together the external context with the modifier, which is maybe a sign that it’s a clearer way of communicating the mapping? Need to think about that...). It is difficult to understand what kind of answer you are looking for when you ask “what is the standard usage of authenticity”, because this is a query that is trivially resolved by a dictionary lookup/google search. If the answer that procedure provides is insufficient to provide a mapping to the broader context the term is used in, then repeating it back to you won’t help; it’s clear that your confusion is elsewhere (this is gesturing the direction of a definition for “shape of confusion”). If you don’t see any way in which any plausible definition/referent for “authentic”, set in that context, allows you generate expectations from the resulting sentence(s) (for example, being able to come up with hypothetical situations which would *not* be accurately described as such), then there’s either incompatible mental machinery, or a more subtle misunderstanding. I don’t think that’s the case, though. I believe you know (or could look up) the definition of “authentic”, and I believe that if you ran the iterated procedure of substituting synonyms (or sufficiently close referents in concept-space, accounting for the surrounding context), you would quickly find an interpolation that was sufficiently coherent. It is possible that you ran this procedure and decided that the predictions that could be generated by the result were very “fuzzy” (the distribution of possible expectations would be extremely wide; you would have trouble cleaving reality at a sensible set of joints). If so, this is the point where I would describe to the author of the original post what my interpretation of the claim was, with some hint as to what shape the distribution of generated expectations my interpretation would imply, so that the author could help me narrow the boundaries of that distribution (or point me to another spot on the map entirely, if my interpretation was completely wrong rather than insufficiently well-specified).
...almost an hour, and I don’t think I did a great job, but maybe this crosses some inferential distance.
I’m deeply uncertain about how often it’s worth litigating the implied meta-level concerns; I’m not at all uncertain that this way of expressing them was inappropriate. I don’t want see sniping like this on LessWrong, and especially not in comment threads like this.
Consider this a warning to knock it off.
This is not directly related to the current situation, but I think is in part responsible for it.
Said claims that it is impossible to guess what someone might mean by something they wrote, if for some reason the reader decided that the writer likely didn’t intend the straightforward interpretation parsed by the reader. It’s somewhat ambiguous to me whether Said thinks that this is impossible for him, specifically, or impossible for people (either most or all).
Relevant part of the first comment making this point:
(B) Alice meant something other than what it seems like she wrote.
What might that be? Who knows. I could try to guess what Alice meant. However, that is impossible. So I won’t try. If Alice didn’t mean the thing that it seems, on a straightforward reading, like she meant, then what she actually meant could be anything at all.
Relevant part of the second comment:
“Impossible” in a social context means “basically never happens, and if it does happen then it is probably by accident” (rather than “the laws of physics forbid it!”). Also, it is, of course, possible to guess what someone means by sheer dumb luck—picking an interpretation at random out of some pool of possibilities, no matter how unlikely-seeming, and managing by chance to be right.
But, I can’t remember a time when I’ve read what someone said, rejected the obvious (but obviously wrong) interpretation, tried to guess what they actually meant, and succeeded. When I’ve tried, the actual thing that (as it turned out) they meant was always something which I could never have even imagined as a hypothesis, much less picked out as the likeliest meaning. (And, conversely, when someone else has tried to interpret my comments in symmetric situations, the result has been the same.)
In my experience, this is true: for all practical purposes, either you understand what someone meant, or it’s impossible to guess what they could’ve meant instead.
For the sake of argument, I will accept that Said finds this impossible. With that said, the idea that this is impossible—or that it “basically never happens, and if it does happen then it is probably by accident”—is incompatible with my experience, and the experience of approximately anybody I have queried on the subject. (Said may object here, and claim that people are not reliable reporters. And yet conversations happen anyways; I’ve done this before in situations where there was no possible double illusion of transparency. This is not to say that there are no trade-offs; I would not be surprised if Said finds himself confidently holding an incorrect understanding of others’ claims less often than most people.)
My guess is that this is responsible for a large part of what many consider to be objectionable about Said’s conversational style. Many other objections presented in the comments here (and in the past) seem confused, wrong, or misguided. It might be slightly more pleasant to read Said’s comments if he added some trimmings of “niceness” to them, but I agree with him that sort of thing carries meaningful costs. Rather, I think the bigger problem is that the way Said responds to other people’s writing, when he is e.g. seeking clarification, or arguing a point, is that he does not believe in the value of interpretive labor, and therefore doesn’t think it’s valuable to do any upfront work to reduce how much interpretive labor his interlocutors will need to do, since according to him, that should in any case be “zero”.
This basically doesn’t work when you’re trying to communicate with people who do, in fact, successfully[1] do interpretive labor, and therefore expect their conversational partners to share in that effort, to some degree.
Separately, and more to the matter at hand, although I think that there were supererogatory paths that Duncan could have taken to reduce escalation at various points, I do think that Said’s claim that Duncan advocated for a norm of interaction accurately described as “don’t ask people for examples of their claims” was obviously unsupported by his linked evidence. After Duncan calls this out, Said doubles down, and then later (in the comments on this post) tries to offload this onto a distinction between whether he was making a claim about what Duncan literally wrote, vs. what could straightforwardly be inferred about Duncan’s intentions (based on what he wrote).
I find this uncompelling given that Said has also admitted (in the comments here) that his literal claim was indeed a strawman, while at the same time the entire thread was precipitated by gjm indicating that he thought the claim was a strawman. Said claims to have then given a more “clarified and narrow form” of his claim in response to gjm’s comment:
If “asking people for examples of their claims” doesn’t fit Duncan’s stated criteria for what constitutes acceptable engagement/criticism, then it is not pretending, but in fact accurate, to describe Duncan as advocating for a norm of “don’t ask people for examples of their claims”. (See, for example, this subthread on this very post, where Duncan alludes to good criticism requiring that the critic “[put] forth at least half of the effort required to bridge the inferential gap between you and the author as opposed to expecting them to connect all the dots themselves”. Similar descriptions and rhetoric can be found in many of Duncan’s recent posts and comments.)
Duncan has, I think, made it very clear that that a comment that just says “what are some examples of this claim?” is, in his view, unacceptable. That’s what I was talking about. I really do not think it’s controversial at all to ascribe this opinion to Duncan.
If Said is referring to the parenthetical starting with “See, for example”, then I am sorry to say that adding such a parenthetical in the context of repeating the original claim nearly verbatim (
to describe Duncan as advocating for a norm of “don’t ask people for examples of their claims”
, and“what are some examples of this claim?” is, in his view, unacceptable
) does not count as clarifying or narrowing his claim, but is simply performing the same motion that Duncan took issue with, which is attempting to justify a false claim with evidence that would support a slightly-related but importantly different claim.I’m leaving out a lot of salient details because this is, frankly, exhausting. I think the dynamics around Killing Socrates were not great, but I also have less well-formed thoughts there.
- ^
Sometimes—often enough that it’s worth relying on, at least.
- ^
We circulated a document about the project to various groups in the field, and invited people from OpenAI, DeepMind, Anthropic, Open Philanthropy, FTX Future Fund, ARC, and MIRI, as well as some independent researchers to participate in the discussions.
Is this a complete set of the organizations you reached out to? Because, given this...
People from ARC, DeepMind, and OpenAI, as well as one independent researcher agreed to participate.
Most people we were in touch with were very interested in participating. However, after checking with their own organizations, many returned saying their organizations would not approve them sharing their positions publicly.
...the implication is that, of the researchers you reached out to at {Anthropic, OpenPhil, FTX FF, MIRI}, those who expressed interest were unable to participate because their organization wouldn’t let them share their (individual? organizational?) opinions publicly? This is quite surprising! Is there any more detail you can share here, such as the specific concerns expressed?
Meta-comment:
It’s difficult to tell, having spent some time (but not a very large amount of time) following this back-and-forth, whether much progress is being made in furthering Eliezer’s and Paul’s understanding of each other’s positions and arguments. My impression is that there has been some progress, mostly from Paul vetoing Eliezer’s interpretations of Paul’s agenda, but by nature this is a slow kind of progress—there are likely many more substantially incorrect interpretations than substantially correct ones, so even if you assume progress toward a correct interpretation to be considerably faster than what might be predicted by a random walk, the slow feedback cycle still means it will take a while.
My question is why the two of you haven’t sat down for a weekend (or as many as necessary) to hash out the cruxes and whatever confusion surrounds them. This seems to be a very high-value course of action: if, upon reaching a correct understanding of Paul’s position, Eliezer updates in that direction, it’s important that happen as soon as possible. Likewise, if Eliezer manages to convince Paul of catastrophic flaws in his agenda, that may be even more important.
I am pretty concerned that most of the public discussion about risk from e.g. the practice of open sourcing frontier models is focused on misuse risk (particular biorisk). Misuse risk seems like it could be a real thing, but it’s not where I see most of the negative EV, when it comes to open sourcing frontier models. I also suspect that many people doing comms work which focuses on misuse risk are focusing on misuse risk in ways that are strongly disproportionate to how much of the negative EV they see coming from it, relative to all sources.
I think someone should write a summary post covering “why open-sourcing frontier models and AI capabilities more generally is -EV”. Key points to hit:
(1st order) directly accelerating capabilities research progress
(1st order) we haven’t totally ruled out the possibility of hitting “sufficiently capable systems” which are at least possible in principle to use in +EV ways, but which if made public would immediately have someone point them at improving themselves and then we die. (In fact, this is very approximately the mainline alignment plan of all 3 major AGI orgs.)
(2nd order) generic “draws in more money, more attention, more skilled talent, etc” which seems like it burns timelines
And, sure, misuse risks (which in practice might end up being a subset of the second bullet point, but not necessarily so). But in reality, LLM-based misuse risks probably don’t end up being x-risks, unless biology turns out to be so shockingly easy that a (relatively) dumb system can come up with something that gets ~everyone in one go.
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
I’d be interested to see some representative (or, alternatively, egregious) examples of public communications along those lines. I agree that such behavior is bad (and also counterproductive).
I am not covering training setups where we purposefully train an AI to be agentic and autonomous. I just think it’s not plausible that we just keep scaling up networks, run pretraining + light RLHF, and then produce a schemer.[2]
Like Ryan, I’m interested in how much of this claim is conditional on “just keep scaling up networks” being insufficient to produce relevantly-superhuman systems (i.e. systems capable of doing scientific R&D better and faster than humans, without humans in the intellectual part of the loop). If it’s “most of it”, then my guess is that accounts for a good chunk of the disagreement.
By contrast, some lines of research where I’ve seen compelling critiques (and haven’t seen compelling defences) of their core intuitions, and therefore don’t recommend to people:
Cooperative inverse reinforcement learning (the direction that Stuart Russell defends in his book Human Compatible); critiques here and here.
John Wentworth’s work on natural abstractions; exposition and critique here, and another here.
The first critique of natural abstractions says:
Concluding thoughts on relevance to alignment: While we’ve made critical remarks on several of the details, we also want to reiterate that overall, we think (natural) abstractions are an important direction for alignment and it’s good that someone is working on them! In particular, the fact that there are at least four distinct stories for how abstractions could help with alignment is promising.
The second says:
I think this is a fine dream. It’s a dream I developed independently at MIRI a number of years ago, in interaction with others. A big reason why I slogged through a review of John’s work is because he seemed to be attempting to pursue a pathway that appeals to me personally, and I had some hope that he would be able to go farther than I could have.
Neither of them seemed, to me, to be critiques of the “core intuitions”; rather, the opposite: both suggested that the core intuitions seemed promising; the weaknesses were elsewhere. That suggests that natural abstractions might be a better than average target for incoming researchers, not a worse one.
I have some other disagreements, but those are model-level disagreements; that piece of advice in particular seems to be misguided even under your own models. I think I agree with the overall structure and most of the prioritization (though would put scalable oversight lower, or focus on those bits that Joe points out are the actual deciding factors for whether that entire class of approaches is worthwhile—that seems more like “alignment theory with respect to scalable oversight”).
I think this is perhaps better evidence for race dynamics:
“The race starts today, and we’re going to move and move fast,” Nadella said.
Also, it looks like the Eliezer-style doomsdayists may have to update on the ChatGPT having turned out to be an obsequious hallucination-prone generalist, rather than… whatever they expected the next step to be.
I don’t recall Eliezer (or anyone worth taking seriously with similar views) expressing confidence that whatever came after the previous generation of GPT would be anything in particular (except “probably not AGI”). Curious if you have a source for something like this.
This post, and many of @AnthonyRepetto’s subsequent replies to comments on it, seem to be attacking a position that the named individuals don’t hold, while stridently throwing out a bunch of weird accusations and deeply underspecified claims. “Bayes is persistently wrong”—about what, exactly?
Content like this should include specific, uncontroversial examples of all the claimed intellectual bankruptcy, and not include a bunch of random (and wrong) snipes.
I’m rate-limiting your ability to comment to once per day. You may consider this a warning; if the quality of your argumentation doesn’t improve then you will no longer be welcome to post on the site.