Context: unforeseen maximum critic/assistant for alignment researchers.
Input: formal or informal description of an objective function Output: formal or informal description of what might actually maximize that function
Standard examples: maximize smiles / tiny molecular smileyfaces; compress sensory information / encrypt it and reveal the key.
Would prefer to have fully written examples for this (e.g. how would someone who thought “compress sensory information” was a good objective function describe it to the critic?)
Context: unforeseen maximum critic/assistant for alignment researchers.
Input: formal or informal description of an objective function
Output: formal or informal description of what might actually maximize that function
Standard examples: maximize smiles / tiny molecular smileyfaces; compress sensory information / encrypt it and reveal the key.
Would prefer to have fully written examples for this (e.g. how would someone who thought “compress sensory information” was a good objective function describe it to the critic?)