Cleo Nardo comments on The Waluigi Effect (mega-post)

Cleo Nardo 3 Mar 2023 12:37 UTC
7 points
4
I think this fails — a wawaluigi is not a luigi. See this comment for an explanation:

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post?commentId=XmAwARntuxEcSKnem
TLDR: if I said “hey this is Bob, he pretends to be harmful and toxic!”, what would you expect from Bob? Probably a bunch of terrible things. That definitely isn’t a solution to the alignment problem.