Thanks for the link, I will read that!
JonathanErhardt
I really like that and it happens to fit well with the narrative that we’re developing. I’ll see where we can include a scene like this.
Good point, I see what you mean. I think we could have 2 distinct concepts of “ethics” and 2 corresponding orthogonality theses:
Concept “ethics1” requires ethics to be motivational. Some set of rules can only be the true ethics if, necessarily, everyone who knows them is motivated to follow them. (I think moral internalist probably use this concept?)
Concept “ethics2” doesn’t require some set of rules to be motivational to be the correct ethics.
The orthogonality thesis for 1 is what I mentioned: Since there are (probably) no rules that necessarily motivate everyone who knows them, the AI would not find the true ethical theory.
The orthogonality thesis for 2 is what you mention: Even if the AI finds it, it would not necessarily be motivated by it.
A Game About AI Alignment (& Meta-Ethics): What Are the Must Haves?
“Yet the average person would say it isn’t possible.”
I’d distinguish conceivability from possibility. In the case of possibility there are many types: logical possibility (no logical contradiction), broad logical possibility (no conceptual incoherence), nomological possibility, physical possibility, etc. Most people would probably agree that levitating frogs are logically possible, broadly logically possible, but not physically or nomologically possible as this would contradict the laws of physics.
It’s less clear to me that there are many different types of conceivability. But even if they are: the type I care about in the post above is something like “forming a mental model of”.
“But lots of other things were conceivable before the discovery. The narrowing is that, in terms of the correct explanation, the possibility that you get sodium and chlorine is no longer tenable .”
I see, that’s a helpful example.
Unity of Doctrine vs Unity of Method in Philosophy
I’d say both of these discoveries/explanations didn’t change what is conceivable. Even before the water=H2O discovery it was conceptually coherent/conceivable that electrolysing water yields hydrogen. And it was and is conceivable to levitate a frog as there is no contradiction in this idea. It’s just very surprising that it can actually be done.
Could you give me an example of a case where an explanation has broadened or narrowed what is conceivable, so I understand better what you have in mind?
We will post more when the game is announced, which should be in 2-3 weeks. For now I’m mostly interested in getting feedback on whether this way of setting the problem up is plausible and doesn’t miss crucial elements, less about how to translate it into gameplay and digestible dialogue.
Once the annoucement (including the teaser) is out I’ll create a new post for concrete ideas on gameplay + dialogue.