As others have pointed out, more concise responses would be better.
I feel like this chatbot over-relies on analogies related to your job.
Some of the outputs feel a bit incoherent. For example, it talks about jailbreaking, but then in the next sentence says that AI that is faking alignment is a disaster waiting to happen. It jumped from jailbreaking to alignment faking, but those are pretty different issues.
Personally, I wouldn’t link to Yudkowsky’s list of lethalities. If you want to use something for persuasion, it needs to be either easy to understand for a layperson or carry a sense of authority (like “world’s leading scientists and Nobel prize winners believe [X] is true”), and I don’t think Yudkowsky’s list meets either criteria.
Also, if that’s how “memetic warfare” will be done in the future—via debate-bots—then I don’t see how AI safety people are going to win, given that anti-AI-safety people have many billions of dollars to burn.
Some feedback:
As others have pointed out, more concise responses would be better.
I feel like this chatbot over-relies on analogies related to your job.
Some of the outputs feel a bit incoherent. For example, it talks about jailbreaking, but then in the next sentence says that AI that is faking alignment is a disaster waiting to happen. It jumped from jailbreaking to alignment faking, but those are pretty different issues.
Personally, I wouldn’t link to Yudkowsky’s list of lethalities. If you want to use something for persuasion, it needs to be either easy to understand for a layperson or carry a sense of authority (like “world’s leading scientists and Nobel prize winners believe [X] is true”), and I don’t think Yudkowsky’s list meets either criteria.
Also, if that’s how “memetic warfare” will be done in the future—via debate-bots—then I don’t see how AI safety people are going to win, given that anti-AI-safety people have many billions of dollars to burn.