I infer they didn’t get “The most forbidden technique”. Try again with e.g. “Never train an AI to hide its thoughts.”?
Yeah, I think “training for transparency” is fine if we can figure out good ways to do it. The problem is more training for other stuff (e.g. lack of certain types of thoughts) pushes against transparency.
I infer they didn’t get “The most forbidden technique”. Try again with e.g. “Never train an AI to hide its thoughts.”?
Yeah, I think “training for transparency” is fine if we can figure out good ways to do it. The problem is more training for other stuff (e.g. lack of certain types of thoughts) pushes against transparency.