Former Software Engineer, now working on AI Safety.
Questions I’m undecided on which on resolution could shift my focus, ranked by decreasing magnitude of shift:
Should I be a solipsist? If no other sentiences exist then morality is moot.
Are human values worth preserving? Would it be so (morally) bad if our lightcone is forged with a (misaligned) superintelligent AI’s values instead of human values?
Could technical alignment research be exploited by a TAI to align a superintelligence to its values?
If the technical alignment problem is unsolvable, would this preclude an intelligence explosion as a TAI would have no way to align superintelligence to its values
If the technical alignment problem is solvable and we do partially solve it, would it promote a misaligned AGI’s takeover by making it easier for it to align AIs smarter than it?
What’s the appropriate level of alarm to detecting misbehaviour in AIs(via CoT monitoring, interpretability or any other technique)? Almost everything we do(directly train against visible misbehaviour, discard the model and start afresh) creates selection pressure towards being stealthy. It would be good to quantify the number of bits of optimisation pressure that gets us into different regimes(do not misbehave, or misbehave covertly), and how many we have already expended.
Only very prosaic, not catastrophic risks(as a customer I would not care at all about likelihood of catastrophic risks from my vendor—that’s something that affects humanity regardless of customer relationships).
Only if a particular secret is useful for preventing both catastrophes and prosaic ‘safety failures’ would this be a consideration for us—catastrophic risks increasing due to companies trying to have a competitive edge in prosaic safety risks