So8res comments on Related Discussion from Thomas Kwa’s MIRI Research Experience

So8res 6 Oct 2023 17:50 UTC
22 points
12
On the facts: I’m pretty sure I took Vivek aside and gave a big list of reasons why I thought working with me might suck, and listed that there are cases where I get real frustrated as one of them. (Not sure whether you count him as “recent”.)

My recollection is that he probed a little and was like “I’m not too worried about that” and didn’t probe further. My recollection is also that he was correct in this; the issues I had working with Vivek’s team were not based in the same failure mode I had with you; I don’t recall instances of me getting frustrated and bulldozey (though I suppose I could have forgotten them).

(Perhaps that’s an important point? I could imagine being significantly more worried about my behavior here if you thought that most of my convos with Vivek’s team were like most of my convos with you. I think if an onlooker was describing my convo with you they’d be like “Nate was visibly flustered, visibly frustrated, had a raised voice, and was being mean in various of his replies.” I think if an onlooker was describing my convos with Vivek’s team they’d be like “he seemed sad and pained, was talking quietly and as if choosing the right words was a struggle, and would often talk about seemingly-unrelated subjects or talk in annoying parables, while giving off a sense that he didn’t really expect any of this to work”. I think that both can suck! And both are related by a common root of “Nate conversed while having strong emotions”. But, on the object level, I think I was in fact avoiding the errors I made in conversation with you, in conversation with them.)

As to the issue of not passing on my “working with Nate can suck” notes, I think there are a handful of things going on here, including the context here and, more relevantly, the fact that sharing notes just didn’t seem to do all that much in practice.

I could say more about that; the short version is that I think “have the conversation while they’re standing, and I’m lying on the floor and wearing a funny hat” seems to work empirically better, and...

hmm, I think part of the issue here is that I was thinking like “sharing warnings and notes is a hypothesis, to test among other hypotheses like lying on the floor and wearing a funny hat; I’ll try various hypotheses out and keep doing what seems to work”, whereas (I suspect) you’re more like “regardless of what makes the conversations go visibly better, you are obligated to issue warnings, as is an important part of emotionally-bracing your conversation partners; this is socially important if it doesn’t seem to change the conversation outcomes”.

I think I’d be more compelled by this argument if I was having ongoing issues with bulldozing (in the sense of the convo we had), as opposed to my current issue where some people report distress when I talk with them while having emotions like despair/hoplessness.

I think I’d also be more compelled by this argument if I was more sold on warnings being the sort of thing that works in practice.

Like… (to take a recent example) if I’m walking by a whiteboard in rosegarden inn, and two people are like “hey Nate can you weigh in on this object-level question”, I don’t… really believe that saying “first, be warned that talking techincal things with me can leave you exposed to unshielded negative-valence emotions (frustration, despair, …), which some people find pretty crappy; do you still want me to weigh in?” actually does much. I am skeptical that people say “nope” to that in practice.

I suppose that perhaps what it does is make people feel better if, in fact, it happens? And maybe I’ll try it a bit and see? But I don’t want to sound like I’m promising to do such a thing reliably even as it starts to feel useless to me, as opposed to experimenting and gravitating towards things that seem to work better like “offer to lie on the floor while wearing a funny hat if I notice things getting heated”.
- Vivek Hebbar 7 Oct 2023 1:35 UTC
  22 points
  2
  Parent
  I’ve been asked to clarify a point of fact, so I’ll do so here:
  My recollection is that he probed a little and was like “I’m not too worried about that” and didn’t probe further.
  This does ring a bell, and my brain is weakly telling me it did happen on a walk with Nate, but it’s so fuzzy that I can’t tell if it’s a real memory or not. A confounder here is that I’ve probably also had the conversational route “MIRI burnout is a thing, yikes” → “I’m not too worried, I’m a robust and upbeat person” multiple times with people other than Nate.
  In private correspondence, Nate seems to remember some actual details, and I trust that he is accurately reporting his beliefs. So I’d mostly defer to him on questions of fact here.
  I’m pretty sure I’m the person mentioned in TurnTrout’s footnote. I confirm that, at the time he asked me, I had no recollection of being “warned” by Nate but thought it very plausible that I’d forgotten.
  - TurnTrout 7 Oct 2023 1:48 UTC
    7 points
    7
    Parent
    This is a slight positive update for me. I maintain my overall worry and critique: chats which are forgettable do not constitute sufficient warning.
    Insofar as non-Nate MIRI personnel thoroughly warned Vivek, that is another slight positive update, since this warning should reliably be encountered by potential hires. If Vivek was independently warned via random social connections not possessed by everyone,^[1] then that’s a slight negative update.
    ^
    For example, Thomas Kwa learned about Nate’s comm doc by randomly talking with a close friend of Nate’s, and mentioning comm difficulties.
- TurnTrout 8 Oct 2023 17:58 UTC
  10 points
  10
  Parent
  I think I’d also be more compelled by this argument if I was more sold on warnings being the sort of thing that works in practice.
  Like… (to take a recent example) if I’m walking by a whiteboard in rosegarden inn, and two people are like “hey Nate can you weigh in on this object-level question”, I don’t… really believe that saying “first, be warned that talking techincal things with me can leave you exposed to unshielded negative-valence emotions (frustration, despair, …), which some people find pretty crappy; do you still want me to weigh in?” actually does much. I am skeptical that people say “nope” to that in practice.
  I think there are several critical issues with your behavior, but I think the most urgent is that people often don’t know what they’re getting into. People have a right to make informed decisions and to not have large, unexpected costs shunted onto them.
  It’s true that no one has to talk with you. But it’s often not true that people know what they’re getting into. I spoke out publicly because I encountered a pattern, among my friends and colleagues, of people taking large and unexpected emotional damage from interacting with you.
  If our July interaction had been an isolated incident, I still would have been quite upset with you, but I would not have been outraged.
  If the pattern I encountered were more like “a bunch of people report high costs imposed by Nate, but basically in the ways they expected”, I’d be somewhat less outraged.^[1] If people can accurately predict the costs and make informed decisions, then people who don’t mind (like Vivek or Jeremy) can reap the benefits of interacting with you, and the people who would be particularly hurt can avoid you.
  If your warnings are not preventing this pattern of unexpected hurt, then you need to do better. You need to inform people to the point that they know what distribution they’re sampling from. If people know, I’m confident that they will start saying “no.” I probably would have said “no thanks” (or at least ducked out sooner and taken less damage), and Kurt would have said “no” as well.
  If you don’t inform people to a sufficient extent, the community should (and, I think, will) hold you accountable for the unexpected costs you impose on others.
  1. ^
    I would still be disturbed and uneasy for the reasons Jacob Steinhardt mentioned, including “In the face of real consequences, I think that Nate would better regulate his emotions and impose far fewer costs on people he interacts with.”
  What links here?
  - TurnTrout's comment on Related Discussion from Thomas Kwa’s MIRI Research Experience by Raemon (8 Oct 2023 17:27 UTC; 2 points)
  - TurnTrout 8 Oct 2023 18:29 UTC
    2 points
    10
    Parent
    (I don’t know who strong disagree-voted the parent comment, but I’m interested in hearing what the disagreement is. I currently think the comment is straightforwardly correct and important.)
    - Zack_M_Davis 8 Oct 2023 20:11 UTC
      19 points
      3
      Parent
      The 9-karma disagree-vote is mine. (Surprise!) I thought about writing a comment, and then thought, “Nah, I don’t feel like getting involved with this one; I’ll just leave a quick disagree-vote”, but if you’re actively soliciting, I’ll write the comment.
      
      I’m wary of the consequences of trying to institute social norms to protect people from subjective emotional damage, because I think “the cure is worse than the disease.” I’d rather develop a thick skin and take responsibility for my own emotions (even though it hurts when some people are mean), because I fear that the alternative is (speaking uncharitably) a dystopia of psychological warfare masquerading as kindness in which people compete to shut down the expression of perspectives they don’t like by motivatedly getting (subjectively sincerely) offended.
      
      Technically, I don’t disagree with “people should know what they’re getting into” being a desirable goal (all other things being equal), but I think it should be applied symmetrically, and it makes sense for me to strong-disagree-vote a comment that I don’t think is applying it symmetrically: it’s not fair if “fighty” people need to to make lengthy disclaimers about how their bluntness might hurt someone’s feelings (which is true), but “cooperative” people don’t need to make lengthy disclaimers about how their tone-policing might silence someone’s perspective (which is also true).
      
      I don’t know Nate very well. There was an incident on Twitter and Less Wrong the other year where I got offended at how glib and smug he was being, despite how wrong he was about the philosophy of dolphins. But in retrospect, I think I was wrong to get offended. (I got downvoted to oblivion, and I deserved it.) I wish I had kept my cool—not because I personally approve of the communication style Nate was using, but because I think it was bad for my soul and the world to let myself get distracted by mere style when I could have shrugged it off and stayed focused on the substance.
      - [ ]
        [deleted]
- TurnTrout 6 Oct 2023 18:05 UTC
  2 points
  0
  Parent
  I think I’d also be more compelled by this argument if I was more sold on warnings being the sort of thing that works in practice.
  You told me you would warn people, and then did not.^[1]
  1. ^
    Do I have your permission to quote the relevant portion of your email to me?
  - So8res 6 Oct 2023 18:54 UTC
    8 points
    2
    Parent
    I warned the immediately-next person.
    
    It sounds to me like you parsed my statement “One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I’m starting a collaboration with.” as me saying something like “I hereby adopt the solemn responsibility of warning people in advance, in all cases”, whereas I was interpreting it as more like “here’s a next thing to try!”.
    
    I agree it would have been better of me to give direct bulldozing-warnings explicitly to Vivek’s hires.
    - TurnTrout 7 Oct 2023 1:26 UTC
      14 points
      0
      Parent
      Here is the statement:
      (One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I’m starting a collaboration with. Obvious in hindsight; sorry for not doing that in your case.)
      I agree that this statement does not explicitly say whether you would make this a one-time change or a permanent one. However, the tone and phrasing—”Obvious in hindsight; sorry for not doing that in your case”—suggested that you had learned from the experience and are likely to apply this lesson going forward. The use of the word “obvious”—twice—indicates to me that you believed that warnings are a clear improvement.
      Ultimately, Nate, you wrote it. But I read it, and I don’t really see the “one-time experiment” interpretation. It just doesn’t make sense to me that it was “obvious in hindsight” that you should… adopt this “next thing to try”..?
      - So8res 7 Oct 2023 3:28 UTC
        25 points
        1
        Parent
        I did not intend it as a one-time experiment.
        
        In the above, I did not intend “here’s a next thing to try!” to be read like “here’s my next one-time experiment!”, but rather like “here’s a thing to add to my list of plausible ways to avoid this error-mode in the future, as is a virtuous thing to attempt!” (by contrast with “I hereby adopt this as a solemn responsibility”, as I hypothesize you interpreted me instead).
        
        Dumping recollections, on the model that you want more data here:
        
        I intended it as a general thing to try going forward, in a “seems like a sensible thing to do” sort of way (rather than in a “adopting an obligation to ensure it definitely gets done” sort of way).
        
        After sending the email, I visualized people reaching out to me and asking if i wanted to chat about alignment (as you had, and as feels like a reconizable Event in my mind), and visualized being like “sure but FYI if we’re gonna do the alignment chat then maybe read these notes first”, and ran through that in my head a few times, as is my method for adopting such triggers.
        
        I then also wrote down a task to expand my old “flaws list” (which was a collection of handles that I used as a memory-aid for having the “ways this could suck” chat, which I had, to that point, been having only verbally) into a written document, which eventually became the communication handbook (there were other contributing factors to that process also).
        
        An older and different trigger (of “you’re hiring someone to work with directly on alignment”) proceeded to fire when I hired Vivek (if memory serves), and (if memory serves) I went verbally through my flaws list.
        
        Neither the new nor the old triggers fired in the case of Vivek hiring employees, as discussed elsewhere.
        
        Thomas Kwa heard from a friend that I was drafting a handbook (chat logs say this occured on Nov 30); it was still in a form I wasn’t terribly pleased with and so I said the friend could share a redacted version that contained the parts that I was happier with and that felt more relevant.
        
        Around Jan 8, in an unrelated situation, I found myself in a series of conversations where I sent around the handbook and made use of it. I pushed it closer to completion in Jan 8-10 (according to Google doc’s history).
        
        The results of that series of interactions, and of Vivek’s team’s (lack of) use of the handbook caused me to update away from this method being all that helpful. In particular: nobody at any point invoked one of the affordances or asked for one of the alternative conversation modes (though those sorts of things did seem to help when I personally managed to notice building frustration and personally suggest that we switch modes (although lying on the ground—a friend’s suggestion—turned out to work better for others than switching to other conversation modes)). This caused me to downgrade (in my head) the importance of ensuring that people had access to those resources.
        
        I think that at some point around then I shared the fuller guide with Vivek’s team, but I didn’t quickly detemine when from the chat logs. Sometime between Nov 30 and Feb 22, presumably.
        
        It looks from my chat logs like I then finished the draft around Feb 22 (where I have a timestamp from me noting as much to a friend). I probably put it publicly on my website sometime around then (though I couldn’t easily find a timestamp), and shared it with Vivek’s team (if I hadn’t already).
        
        The next two MIRI hires both mentioned to me that they’d read my communication handbook (and I did not anticipate spending a bunch of time with them, nevermind on technical research), so they both didn’t trigger my “warn them” events and (for better or worse) I had them mentally filed away as “has seen the affordances list and the failure modes section”.
        TurnTrout 8 Oct 2023 17:27 UTC
        2 points
        0
        Parent
        I appreciate the detail, thanks. In particular, I had wrongly assumed that the handbook had been written much earlier, such that even Vivek could have been shown it before deciding to work with you. This also makes more sense of your comments that “writing the handbook” was indicative of effort on your part, since our July interaction.
        Overall, I retain my very serious concerns, which I will clarify in another comment, but am more in agreement with claims like “Nate has put in effort of some kind since the July chat.”
        The next two MIRI hires both mentioned to me that they’d read my communication handbook
        Noting that at least one of them read the handbook because I warned them and told them to go ask around about interacting with you, to make sure they knew what they were getting into.
  - So8res 6 Oct 2023 19:48 UTC
    4 points
    0
    Parent
    
    Do I have your permission to quote the relevant portion of your email to me?
    
    Yep! I’ve also just reproduced it here, for convenience:
    
    (One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I’m starting a collaboration with. Obvious in hindsight; sorry for not doing that in your case.)