Galathir comments on Galathir’s Shortform

Galathir 3 Oct 2025 7:28 UTC
1 point
0
Companies seem to be trapped in a security dilemma situation, they worry about being last to develop AGI so seem to be rushing towards the capabilities rather than safety. This in part is due to worries about owning/controlling the future.

Other aspects of humanity such as governments and the scientific community aren’t (at least visibly) rushing because they aren’t completely (economically) rational in that regard. Being more norm or rule following. Other ways not to be economically rational include caring for others (or humanity/nature in general)

We need to embed more rule-following into AI, so it doesn’t rush. This might need to be government mandated, as the rational companies might not be incentivised to rush. Government/International community mandated tests in simulated environments to make sure an AI follows the rules or cares about humanity, might be the way forward.

Caring and rule following seem different from corrigibility work or from the idea of alignment. A caring AI can have different goals from humanity but it would still allow/enchance humans ability to go about their business.

The rules I would look towards would definitely include never modifying the caring code of the AI.

Caring could be operationalised by using explainable AI to capture what parts of NN they thought were humans and embedding in the AIs goal system something that sought to increase or not modify the options the human could take.