As far as my own AI optimism, I think it is a quite caricatured view of my opinions, but not utterly deranged.
My biggest reasons I’ve become more optimistic about AI alignment personally are the following:
I’ve become convinced that a lot of the complexity of human values, to the extent that they are there to be surprisingly unnecessary for alignment, and a lot of this is broadly downstream of thinking that inductive biases matter much less for values than people thought it was necessary.
I think that AI control is surprisingly useful, and I think a lot of the criticism around it is pretty misguided, and in particular I think slop is both a real problem, but also surprisingly easy to make iteration work, compared to other problems on adversarial AI.
Some other reasons are:
I think the argument on the Solomonoff prior is malign doesn’t actually work, because in the general case it’s equally costly to simulate solipsist universes compared to non-solipsist universes, compared to their resource budget, combined with a lot of values wanting to simulate things due to instrumental convergence, meaning you can’t get much evidence if at all for what the values of the multiverse are:
I believe that a lot of people on LW overestimate the goodharting the market does, because they don’t realize the constraints that real humans and markets work under, which not only includes physical constraints but also economic constraints, and an example is where the entire discussion about the 1-hose air conditioner being a market failure seems to have been based on a false premise, since 1-hose air conditioners are acceptable enough at the price points they are sold at for consumers:
Those are sort of counterstatements against doom, explaining that you don’t see certain problems that doomers raise. But the OP more attempts to just make an independently-standing argument about what is present.
As far as my own AI optimism, I think it is a quite caricatured view of my opinions, but not utterly deranged.
My biggest reasons I’ve become more optimistic about AI alignment personally are the following:
I’ve become convinced that a lot of the complexity of human values, to the extent that they are there to be surprisingly unnecessary for alignment, and a lot of this is broadly downstream of thinking that inductive biases matter much less for values than people thought it was necessary.
I think that AI control is surprisingly useful, and I think a lot of the criticism around it is pretty misguided, and in particular I think slop is both a real problem, but also surprisingly easy to make iteration work, compared to other problems on adversarial AI.
Some other reasons are:
I think the argument on the Solomonoff prior is malign doesn’t actually work, because in the general case it’s equally costly to simulate solipsist universes compared to non-solipsist universes, compared to their resource budget, combined with a lot of values wanting to simulate things due to instrumental convergence, meaning you can’t get much evidence if at all for what the values of the multiverse are:
https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#w2M3rjm6NdNY9WDez
I believe that a lot of people on LW overestimate the goodharting the market does, because they don’t realize the constraints that real humans and markets work under, which not only includes physical constraints but also economic constraints, and an example is where the entire discussion about the 1-hose air conditioner being a market failure seems to have been based on a false premise, since 1-hose air conditioners are acceptable enough at the price points they are sold at for consumers:
https://www.lesswrong.com/posts/5re4KgMoNXHFyLq8N/air-conditioner-test-results-and-discussion#maJBX3zAEtx5gFcBG
https://www.lesswrong.com/posts/5re4KgMoNXHFyLq8N/air-conditioner-test-results-and-discussion#3TFECJ3urX6wLre5n
I would be curious what you think of [this](https://www.lesswrong.com/posts/TCmj9Wdp5vwsaHAas/knocking-down-my-ai-optimist-strawman).
Those are sort of counterstatements against doom, explaining that you don’t see certain problems that doomers raise. But the OP more attempts to just make an independently-standing argument about what is present.