I am a bit confused. If the question is, ‘Will this alignment paradigm work with superintelligence?’ is the recommendation from the tweet to try it and see if it works?
the tweet is making fun of people who are too eager to do something EMPIRICAL and SCIENTIFIC and ignore the pesky little detail that their empirical thing actually measures something subtly but importantly different from what they actually care about
The tweet is sarcastically recommending that instead of investigating the actual hard problem, they should instead investigate a much easier problem which superficially sounds the same.
In the context of AI safety (and the fact that the superalignment team is gone) the post is suggesting that OpenAI isn’t actually addressing the hard alignment problem, instead opting to tune their models to avoid outputting offensive or dangerous messages in the short term, which might seem like a solution to a lay-person.
I first encountered this tweet taped to the wall in OpenAI’s office where the Superalignment team sat:
RIP Superalignment team. Much respect for them.
lol i was the one who taped it to the wall. it’s one of my favorite tweets of all time
I am a bit confused. If the question is, ‘Will this alignment paradigm work with superintelligence?’ is the recommendation from the tweet to try it and see if it works?
the tweet is making fun of people who are too eager to do something EMPIRICAL and SCIENTIFIC and ignore the pesky little detail that their empirical thing actually measures something subtly but importantly different from what they actually care about
We won’t let our lack of data stop us from running our analysis program!
The tweet is sarcastically recommending that instead of investigating the actual hard problem, they should instead investigate a much easier problem which superficially sounds the same.
In the context of AI safety (and the fact that the superalignment team is gone) the post is suggesting that OpenAI isn’t actually addressing the hard alignment problem, instead opting to tune their models to avoid outputting offensive or dangerous messages in the short term, which might seem like a solution to a lay-person.