At some point, we need to actually align an AI system. But my claim is that this AI system doesn’t need to be much smarter than us, and it doesn’t need to be able to do much more work than we can evaluate.
IMO even if this is true, very clearly AIs are misaligned right now, and insofar as the very very underdeveloped world of model evals don’t show that, I have personal experience with telling them to do something, and them routinely fucking me over in subtle & malicious enough ways that I do think its intentional.
IMO even if this is true, very clearly AIs are misaligned right now, and insofar as the very very underdeveloped world of model evals don’t show that, I have personal experience with telling them to do something, and them routinely fucking me over in subtle & malicious enough ways that I do think its intentional.