Jon Kurishita comments on AGI Ruin: A List of Lethalities

Jon Kurishita 12 Mar 2025 5:40 UTC
1 point
−2
copy and paste your blog into 03-mini high to see how it would go against my “Dynamic Policy Layer” research. This is it’s comment( not mine);
=============================================
These features of your DPL research collectively offer a comprehensive strategy to mitigate many of the lethal alignment risks described in the AGI Ruin paper. By embedding dynamic, real-time oversight and adaptive, decentralized ethical governance into the AI system, your framework provides a robust line of defense against emergent misalignment, hidden triggers, and other high-stakes vulnerabilities inherent to advanced AGI systems.
- the gears to ascension 12 Mar 2025 8:06 UTC
  2 points
  0
  Parent
  Ask your AI what’s wrong with your ideas, not what’s right, and then only trust the criticism to be valid if there are actual defeaters you can’t show you’ve beaten in the general case. Don’t trust an AI to be thorough, important defeaters will be missing. Natural language ideas can be good glosses of necessary components without telling us enough about how to pin down the necessary math.