Thinking of it as “a property” will mislead you about how Max’s strategy works. It needs to become the AI’s only top-level goal in order to work as Max imagines.
It sure looks like AI growers know how to instill some goals in AIs. I’m confused as to why you think they don’t. Maybe you’re missing the part where the shards that want corrigibility are working to overcome any conflicting shards?
I find it quite realistic that the AI growers would believe at the end of Red Heart that they probably had succeeded (I’ll guess that they ended up 80% confident?). That doesn’t tell us what probability we should put on it. I’m sure that in that situation Eliezer would still believe that the AI is likely not corrigible.
I don’t know what year the novel is actually set in,
It’s an alternate timeline where AI capabilities have progressed faster than ours, likely by a couple of years.
I’m not sure I believe that having some shards that want corrigibility work to overcome the shards that conflict is a useful strategy if we don’t know how to make any of the shards want corrigibility in the first place.
The alternate timeline definitely makes the timeline feel more realistic, so thanks for pointing that out.
There are issues with corrigibility anyway, even as sole goal. To simplify some objections: Total obedience fails, self-shutdown or paralysis also fails.
(Alignment forum would have specifics to CAST. I have not read those either as I don’t grant preimises of CAST enough to begin with. Corrigibility is not bad, but as a sole goal it doesn’t make sense to me on several levels.)
Thinking of it as “a property” will mislead you about how Max’s strategy works. It needs to become the AI’s only top-level goal in order to work as Max imagines.
It sure looks like AI growers know how to instill some goals in AIs. I’m confused as to why you think they don’t. Maybe you’re missing the part where the shards that want corrigibility are working to overcome any conflicting shards?
I find it quite realistic that the AI growers would believe at the end of Red Heart that they probably had succeeded (I’ll guess that they ended up 80% confident?). That doesn’t tell us what probability we should put on it. I’m sure that in that situation Eliezer would still believe that the AI is likely not corrigible.
It’s an alternate timeline where AI capabilities have progressed faster than ours, likely by a couple of years.
Note this Manifold market on when the audiobook is released.
Thanks for the clarifications.
I’m not sure I believe that having some shards that want corrigibility work to overcome the shards that conflict is a useful strategy if we don’t know how to make any of the shards want corrigibility in the first place.
The alternate timeline definitely makes the timeline feel more realistic, so thanks for pointing that out.
There are issues with corrigibility anyway, even as sole goal. To simplify some objections: Total obedience fails, self-shutdown or paralysis also fails.
(Alignment forum would have specifics to CAST. I have not read those either as I don’t grant preimises of CAST enough to begin with. Corrigibility is not bad, but as a sole goal it doesn’t make sense to me on several levels.)
Thank you for the reading tip!