Neel Nanda comments on zroe1′s Shortform

Neel Nanda 29 Nov 2025 12:59 UTC
36 points
18
Have you tried emailing the authors of that paper and asking if they think you’re missing any important details? Imo there’s 3 kinds of papers:
1. Totally legit
2. Kinda fragile and fiddly, there’s various tacit knowledge and key details to get right, but the results are basically legit. Or, eg, it’s easy for you to have a subtle bug that breaks everything. Imo it’s bad if they don’t document this, but it’s different from not replicating
3. Misleading (eg only works in a super narrow setting and this was not documented at all) or outright false
I’m pro more safety work being replicated, and would be down to fund a good effort here, but I’m concerned about 2 and 3 getting confused
- zroe1 29 Nov 2025 22:56 UTC
  9 points
  2
  Parent
  Thank you for this comment! I have reflected on it and I think that it is mostly correct.
  Have you tried emailing the authors of that paper and asking if they think you’re missing any important details?
  I didn’t end up emailing the authors of the paper because at the time, I was busy and overwhelmed and it didn’t occur to me (which I know isn’t a good reason).
  I’m pro more safety work being replicated, and would be down to fund a good effort here
  Awesome! I’m excited that a credible AI safety researcher is endorsing the general vibe of the idea. If you have any ideas for how to make a replication group/org successful please let me know!
  but I’m concerned about 2 and 3 getting confused
  I think that this is a good thing to be concerned about. Although I generally agree with this concern I think there is one caveat: #2 turns into #3 quickly depending on the claims made and the nature of the tacit knowledge required.
  A real life example from this canonical paper from computer security: Many papers claimed that they had found effective techniques to find bugs in programs via fuzzing, but results depended on things like random seed and exactly how “number of bugs found” is counted. You maybe could “replicate” the results if you knew all the details but the whole purpose of the replication is to show that you can get the results without that kind of tacit knowledge.
  - Daniel Kokotajlo 30 Nov 2025 19:17 UTC
    37 points
    12
    Parent
    If there was an org devoted to attempting to replicate important papers relevant to AI safety, I’d probably donate at least $100k to it this year, fwiw, and perhaps more on subsequent years depending on situation. Seems like an important institution to have. (This is not a promise ofc, I’d want to make sure the people knew what they were doing etc., but yeah)
    - Eli Tyre 1 Dec 2025 4:58 UTC
      9 points
      1
      Parent
      Speaking as one of the people involved with running the Survival and Flourishing Fund, I’m confident that such an org would easily raise money from SFF and SFF’s grant-speculators (modulo some basic due diligence panning out).
      
      I myself would allocate at least 50k.
      
      (SFF ~only grants to established organizations, but if someone wanted to start a project to do this, and got an existing 401c3 to fiscally sponsor the project, that totally counts.)
      - zroe1 1 Dec 2025 6:09 UTC
        6 points
        0
        Parent
        Awesome! Thank you for this comment! I’m 95% UChicago Existential Risk Lab would fiscally sponsor if funding came from SFF or OpenPhil or some individual donor. This would probably be the fastest way to get this started quickly by a trustworthy organization (one piece of evidence of trustworthiness is OpenPhil consistently gives reasonably big grants to the UChicago Existential Risk Lab).
    - zroe1 30 Nov 2025 20:40 UTC
      5 points
      2
      Parent
      This is fantastic! Thank you so much for the interest.
      Even if you do not end up supporting financially, I think it is hugely impactful for someone like you to endorse the idea so I’m extremely grateful, even for just the comment.
      I’ll make some kind of plan/proposal in the next 3-4 weeks and try to scout people who may want to be involved. After I have a more concrete idea of what this would look like, I’ll contact you and others who may be interested to raise some small sum for a pilot (probably ~$50k).
      
      Thank you again Daniel. This is so cool!
      - Daniel Kokotajlo 1 Dec 2025 19:03 UTC
        5 points
        0
        Parent
        Thank you! I look forward to seeing your proposal!
    - jacquesthibs 30 Nov 2025 21:41 UTC
      2 points
      0
      Parent
      I’ve looked into this as part of my goal of accelerating safety research and automating as much as we can. It was one of the primary things I imagined we would do when we pushed for the non-profit path. We eventually went for-profit because we expected there would not be enough money dispersed to do this, especially in a short timelines world.
      I am again considering going non-profit again to pursue this goal, among others. I’ll send you and others a proposal on what I would imagine this looks like in the grander scheme.
      I’ve been in AI safety for a while now and feel like I’ve formed a fairly comprehensive view of what would accelerate safety research, reduce power concentration, what it takes to automate research more safely as capabilities increase, and more.
      I’ve tried to make this work as part of a for-profit, but it is incredibly hard to tackle the hard parts of the problem in that situation and since that is my intention, I’m again considering if a non-profit will have to do despite the unique difficulties that come with that.