larval stage AI alignment researcher
https://twitter.com/thezahima
Zahima
Is the usage of “Leviathan” (like here and in https://gwern.net/fiction/clippy ) just convergence on an appropriate and biblical name, or is there additional history of it specifically being used as a name for an AI?
I’m trying to catch up with the general alignment ecosystem—is this site still intended to be live/active? I’m getting a 404.
This letter, among other things, makes me concerned about how this PR campaign is being conducted.
Really extremely happy with this podcast—but I feel like it also contributed to a major concern I have about how this PR campaign is being conducted.
Catching the Eye of Sauron
Consciousness Actually Explained: EC Theory
With so much apparently available energy/effort for eliezer-centered-improvement initiatives (like the $100,000 bounty mentioned in this post), I’d like to propose that we seriously consider cloning Eliezer.
From a layman/outsider perspective, it seems the hardest thing would be keeping it a secret so as to avoid controversy and legal trouble, since from a technical perspective it seems possible and relatively cheap. EA folks seem well connected and capable of such coordination, even under the burden of secrecy and keeping as few people “in the know” as possible.
Partially related: (in the category of comparatively off-the-wall—but nonviolent—AI alignment strategies): at some point there was a suggestion that MIRI pay $10mil (or some such figure) to Terence Tao (or some such prodigy) to help with alignment work. Eliezer replied thus:
We’d absolutely pay him if he showed up and said he wanted to work on the problem. Every time I’ve asked about trying anything like this, all the advisors claim that you cannot pay people at the Terry Tao level to work on problems that don’t interest them. We have already extensively verified that it doesn’t particularly work for eg university professors.
I’d love to see more visibility into proposed strategies like these (i.e. strategies surrounding/above the object-level strategy of “everyone who can do alignment research puts their head down and works”, and the related: “everyone else make money in their comparative specialization/advantage and donate to MIRI/FHI/etc”). Even visibility into why various strategies were shot down would be useful, and a potential catalyst for farming further ideas from the community. (even if—for game theoretic reasons—one may never be able to confirm that an idea has been tried, as in my cloning suggestion)
There we go—thank you! That matches my memory for what I was looking for.
For one thing, there is a difference between disagreement and “overall quality” (good faith, well reasoned, etc), and this division already exists in comments. So maybe it is a good idea to have this feature for posts as well, and only have disciplinary actions taken against posts that meet some low/negative threshold for “overall quality”.
Further, having multiple tiers of moderation/community-regulatory action in response to “overall quality” (encompassing both things like karma and explicit moderator action) seem good to me, and this comment limitation you describe seems like just another tier in such a system, one that is above “just ban them”, but below “just let them catch the lower karma from other users downvoting them”.
It’s possible that, lacking the existence of the tier you are currently on, the next best tier you’d be rounded-off to would be getting banned. (I haven’t read your stuff, and so I’m not suggesting either way that this should or should not be done in your case).
If you were downvoted for good faith disagreement, and are now limited/penalized, then yeah that’s probably bad and maybe a split voting system as mentioned would help. But its possible you were primarily downvoted for the “overall quality” aspect.