Great post, thank you. It was helpful to understand the pattern patch → new jailbreak → repeat as supporting the claim that current models fail to really internalize reinforced human values.
Great post, thank you. It was helpful to understand the pattern patch → new jailbreak → repeat as supporting the claim that current models fail to really internalize reinforced human values.