And yes, I want to spend at least a little time on very abstract concepts, perhaps ones dealing with how agentic beings interact with each other.
I propose “honesty”. Justification:
It just seems fundamental to lots of alignment work (deception, Claude being honest according to its constitution, also it’s one H in HHH)
It’s genuinely unclear to me whether “honesty” makes any sense from the POV of a goal-directed agent, especially superintelligent.
Example: consider ant traps. They make ants think they carry home tasty nutrients, while in fact they’re carrying poison, so the ants are deceived. Would we say that a human setting up a trap is “dishonest”?
I’d say—not really, because honesty happens in communication between agents, and we don’t consider setting up a trap as an “act of communication”.
But why do we consider this to not be communication? Probably because we don’t think of ants as “agents we communicate with”.
OK but why are ants not agents we communicate with? And why would a superintelligent AI treat humans differently?
So. I’m worried that if e.g. honesty makes sense only between agents-on-similar-level-who-trade-with-each-other, then all our efforts to make AIs honest and not deceptive are useless.
One possible reason for why this view might make lots of sense:
Suppose we’re living in a simulation
The simulation is likely optimized for good performance
The most straightforward optimization is just not-computing-the-same-thing-many-times
So if you tile the galaxy with billions of identical, happy beings, it might be that they are actually being (happily) computed only once and rendered in many places for the in-simulation entity to see. Similar argument goes for sufficiently similar beings.
Or, to put that differently—when people talk about the “resources we can turn into happiness”, they usually mean matter/energy. But if we’re in a simulation, this might actually be the compute that is being used to simulate our reality, and the easier our reality is to compress, the fewer effective resources we have.