Thanks. Am interested in hearing more at some point.
I also want to note that insofar as this extremely basic approach (“reward the agent for diamond-related activities”) is obviously doomed for reasons the community already knew about, then it should be vulnerable to a convincing linkpost comment which points out a fatal, non-recoverable flaw in my reasoning (like: “TurnTrout, you’re ignoring the obvious X and Y problems, linked here:”). I’m posting this comment as an invitation for people to reply with that, if appropriate![1]
And if there is nothing previously known to be obviously fatal, then I think the research community moved on too quickly by assuming the frame of inner/outer alignment. Even if this proposal has a new fatal flaw, that implies the perceived old fatal flaws (like “the agent games its imperfect objective”) were wrong / only applicable in that particular frame.
ETA: I originally said “devastating” instead of “convincing.” To be clear: I am looking for curteous counterarguments focused on truth-seeking, and not optimized for “devastation” in a social sense.
That’s not to say you should have supplied it. I think it’s good for people to say “I disagree” if that’s all they have time for, and I’m glad you did.
Thanks. Am interested in hearing more at some point.
I also want to note that insofar as this extremely basic approach (“reward the agent for diamond-related activities”) is obviously doomed for reasons the community already knew about, then it should be vulnerable to a convincing linkpost comment which points out a fatal, non-recoverable flaw in my reasoning (like: “TurnTrout, you’re ignoring the obvious X and Y problems, linked here:”). I’m posting this comment as an invitation for people to reply with that, if appropriate![1]
And if there is nothing previously known to be obviously fatal, then I think the research community moved on too quickly by assuming the frame of inner/outer alignment. Even if this proposal has a new fatal flaw, that implies the perceived old fatal flaws (like “the agent games its imperfect objective”) were wrong / only applicable in that particular frame.
ETA: I originally said “devastating” instead of “convincing.” To be clear: I am looking for curteous counterarguments focused on truth-seeking, and not optimized for “devastation” in a social sense.
That’s not to say you should have supplied it. I think it’s good for people to say “I disagree” if that’s all they have time for, and I’m glad you did.