I’ve been working towards this direction for a while. Though what I’m imagining is a lot more elaborate. If anyone would like to help out, send me a DM and I can invite you to a discord server where we talk about this stuff. Please let me know who you are and what you do if you do DM me.
I’ve been working towards this direction for a while. Though what I’m imagining is a lot more elaborate. If anyone would like to help out, send me a DM and I can invite you to a discord server where we talk about this stuff. Please let me know who you are and what you do if you do DM me.
I wrote some brief notes about it in the Accelerating Alignment section here: https://www.lesswrong.com/posts/jXjeYYPXipAtA2zmj/jacquesthibs-s-shortform?commentId=iLJDjBQBwFod7tjfz
And cover some of the philosophy in the beginning of this post: https://www.lesswrong.com/posts/a2io2mcxTWS4mxodF/results-from-a-survey-on-tool-use-and-workflows-in-alignment
Additionally, I added a comment about the general LLM for alignment approach on John’s recent post: https://www.lesswrong.com/posts/KQfYieur2DFRZDamd/why-not-just-build-weak-ai-tools-for-ai-alignment-research?commentId=DXt7mBkW7WiL36nBN