Writing a short explainer on the original / true motivation of the paper-clip maximizer could be very valuable.
I stepped into the “recursive self-improvement” workshop at ICLR, and the panelists were basically dunking on an extremely straw-maned paper-clipping-as-outer-misalignment theat model. I think the really good viral blogpost could make this a more obviously bad-faith/naive argument.
Writing a short explainer on the original / true motivation of the paper-clip maximizer could be very valuable.
I stepped into the “recursive self-improvement” workshop at ICLR, and the panelists were basically dunking on an extremely straw-maned paper-clipping-as-outer-misalignment theat model. I think the really good viral blogpost could make this a more obviously bad-faith/naive argument.