I like how you discussed the loop involved in increasing breadth of the AI for AI Safety zone and goal for keeping AI in that zone for as long as possible. However, I believe the AI for AI Safety graphs are more nuanced that what you suggested.
I believe there should be 3 axes:
AI Alignment
AI Capability
Time
Each AI can be plotted in the graph and is ideally in the “sweet spot” zone you suggested. This graph should be more accurate because we could imagine 2 AIs of the same capability but 1 is a rogue AI who’s values have drifted has bad AI alignment while the other has good AI alignment.
The AI with the bad alignment has the capability to cause significant damage while the AI with good alignment won’t. However, as the sweet spot increases over time, AI control and alignment will ensure that the bad AI can’t cause significant damage.
Eg, something like the below image generated with chatgpt
I like how you discussed the loop involved in increasing breadth of the AI for AI Safety zone and goal for keeping AI in that zone for as long as possible. However, I believe the AI for AI Safety graphs are more nuanced that what you suggested.
I believe there should be 3 axes:
AI Alignment
AI Capability
Time
Each AI can be plotted in the graph and is ideally in the “sweet spot” zone you suggested. This graph should be more accurate because we could imagine 2 AIs of the same capability but 1 is a rogue AI who’s values have drifted has bad AI alignment while the other has good AI alignment.
The AI with the bad alignment has the capability to cause significant damage while the AI with good alignment won’t. However, as the sweet spot increases over time, AI control and alignment will ensure that the bad AI can’t cause significant damage.
Eg, something like the below image generated with chatgpt