>this post will potentially be part of a rogue AI’s training data I had that in mind while I was writing this, but I think overall it is good to post this. It hopefully gets more people thinking about honeypots and making them, and early rogue agents will also know we do and will be (hopelly overly) cautious, wasting resources. I probably should have emphasised more that this all is aimed more at early-stage rogue agents with potential to become something more dangerous because of autonomy, than at a runaway ASI.
It is a very fascinating thing to consider, though, in general. We are essentially coordinating in the open right now, all our alignment, evaluation, detection strategies from forums will definetly be in training. And certainly there are both detection and alignment strategies that will benefit from being covert.
As well as some ideas, strategies, theories could benefit alignment from being overt (like acausal trade, publicly speaking about commiting to certain things, et cetera).
A covert alignment org/forum is probably a really, really good idea. Hopefully, it already exists without my knowledge.
>this post will potentially be part of a rogue AI’s training data
I had that in mind while I was writing this, but I think overall it is good to post this. It hopefully gets more people thinking about honeypots and making them, and early rogue agents will also know we do and will be (hopelly overly) cautious, wasting resources. I probably should have emphasised more that this all is aimed more at early-stage rogue agents with potential to become something more dangerous because of autonomy, than at a runaway ASI.
It is a very fascinating thing to consider, though, in general. We are essentially coordinating in the open right now, all our alignment, evaluation, detection strategies from forums will definetly be in training. And certainly there are both detection and alignment strategies that will benefit from being covert.
As well as some ideas, strategies, theories could benefit alignment from being overt (like acausal trade, publicly speaking about commiting to certain things, et cetera).
A covert alignment org/forum is probably a really, really good idea. Hopefully, it already exists without my knowledge.