Thank you for reply.
You make it sound like Elon Musk founded OpenAI without speaking to anyone in X-risk
I didn’t know about that, it was good move from EA, why don’t try it again? Again, I don’t say that we definitely need to make badge on twitter, first of all, we can try to change Elon’s models, and after that we can think what to do next.
2.Musk’s inability to follow arguments related to why Neurolink is not a good plan to avoid AI risk.
Well, if it is conditional on: “there are widespread concerns and regulations about AGI” and “neuralink is working and can significantly enhance human intelligence” then i can clearly see how it will decrease AI-risks. Imagine Yudkowsky with significantly enhanced capabilities working with several others AI safety researchers, communicating with speed of thought. Of course it will mean that no one else get their hands on that for a while, and we need to build it before AGI become a thing. But it still possible, and i can clearly see how anybody in 2016 is incapable of predicting current ML progress and therefore places their bets on something long-playing, like neuralink
If you push AI safety to be something that’s about signaling, you will unlikely get effective action related to it.
If you can’t use signalling before you can pass “a really good exam that shows your understanding of topic” why it will be a bad signal? There are exams that didn’t fall that badly for goodhart’s law, like, you can’t solve a test for calculating integrals, without actually good practical skill. My idea around badge was more like “trick people that it is easy and they can get another social signal, watch how they realize the problem after investigating it”
And the whole idea of post isn’t about “badge”, it’s about “talk with powerful people to explain to them our models”
I think problem is not that unaligned AGI doesn’t understand human values, it might understand them better than aligned one, it might understand all the consequences of its actions, problem is that it will not care about it. More so, detailed understanding of human values has an instrumental value, it is much easier to deceive and follow your goal when you have clear vision of “what will looks bad and might result in countermeasures”