Former Software Engineer, now working on AI Safety.
Questions I’m undecided on which on resolution could shift my focus, ranked by decreasing magnitude of shift:
Should I be a solipsist? If no other sentiences exist then morality is moot.
Are human values worth preserving? Would it be so (morally) bad if our lightcone is paved with a (misaligned) superintelligent AI’s values instead of human values?
Could technical alignment research be exploited by a TAI to align a superintelligence to its values?
If the technical alignment problem is unsolvable, would this preclude an intelligence explosion as a TAI would have no way to align superintelligence to its values
If the technical alignment problem is solvable and we do partially solve it, would it promote a misaligned AGI’s takeover by making it easier for it to align AIs smarter than it?
What’s the appropriate level of alarm to detecting misbehaviour in AIs(via CoT monitoring, interpretability or any other technique)? Almost everything we do(directly train against visible misbehaviour, discard the model and start afresh) creates selection pressure towards being stealthy. It would be good to quantify the number of bits of optimisation pressure that gets us into different regimes(do not misbehave, or misbehave covertly), and how many we have already expended.
Full control of ASI by a few model developers is IMO the main prosaic worry. The crux(between extinction and prosaic risks) is whether alignment is hard.
> (if we successfully built AI that cared about us in the right way)
I’m unsure if this means cares about human flourishing/agency or fully obedient to its developer, but we only need the latter.