Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 21 Jan 2025 18:47 UTC
139 points
26
I first encountered this tweet taped to the wall in OpenAI’s office where the Superalignment team sat:

RIP Superalignment team. Much respect for them.
What links here?
- leogao 21 Jan 2025 20:15 UTC
  114 points
  12
  Parent
  lol i was the one who taped it to the wall. it’s one of my favorite tweets of all time
- Roman Malov 21 Jan 2025 20:51 UTC
  6 points
  0
  Parent
  I am a bit confused. If the question is, ‘Will this alignment paradigm work with superintelligence?’ is the recommendation from the tweet to try it and see if it works?
  - leogao 21 Jan 2025 21:46 UTC
    67 points
    50
    Parent
    the tweet is making fun of people who are too eager to do something EMPIRICAL and SCIENTIFIC and ignore the pesky little detail that their empirical thing actually measures something subtly but importantly different from what they actually care about
    - RedMan 23 Jan 2025 3:08 UTC
      5 points
      0
      Parent
      We won’t let our lack of data stop us from running our analysis program!
  - whestler 21 Jan 2025 22:05 UTC
    14 points
    12
    Parent
    The tweet is sarcastically recommending that instead of investigating the actual hard problem, they should instead investigate a much easier problem which superficially sounds the same.
    In the context of AI safety (and the fact that the superalignment team is gone) the post is suggesting that OpenAI isn’t actually addressing the hard alignment problem, instead opting to tune their models to avoid outputting offensive or dangerous messages in the short term, which might seem like a solution to a lay-person.