It very much doesn’t feel that black and white when it comes to alignment and intelligence?
Clearly it is a fixed point process that is dependent on initial conditions and so if the initial conditions improve the likelihood of the end-point being good also improves?
Also if the initial conditions (LLMs) have a larger intelligence than something like a base utility function does, then that means that the depth of the part of the fixed point process of alignment is higher in the beginning.
It’s quite nice that we have this property and depending on how you believe the rest of the fixed point process going (to what extent power-seeking is naturally arising and what type of polarity the world is in, e.g uni or multi-polar) you might still be really scared or you might be more chill with it.
I don’t think Davidad says that technical alignment is solved, I think he’s more saying that we have a nicer basin as a starting condition?
Why are these the two camps?
It very much doesn’t feel that black and white when it comes to alignment and intelligence?
Clearly it is a fixed point process that is dependent on initial conditions and so if the initial conditions improve the likelihood of the end-point being good also improves?
Also if the initial conditions (LLMs) have a larger intelligence than something like a base utility function does, then that means that the depth of the part of the fixed point process of alignment is higher in the beginning.
It’s quite nice that we have this property and depending on how you believe the rest of the fixed point process going (to what extent power-seeking is naturally arising and what type of polarity the world is in, e.g uni or multi-polar) you might still be really scared or you might be more chill with it.
I don’t think Davidad says that technical alignment is solved, I think he’s more saying that we have a nicer basin as a starting condition?