Existing bots do benefit greatly from the RLHF alignment efforts.
What I primarily mean is that you can and should include alignment goals in the bot’s top-level goals. You can tell them to make me a bunch of money but also check with you before doing anything with any chance of harming people, leaving your control, etc. GPT4 does really well at interpreting these and balancing multiple goals. This doesn’t address outer alignment or alignment stability, but it’s a heck of a start.
Existing bots do benefit greatly from the RLHF alignment efforts.
What I primarily mean is that you can and should include alignment goals in the bot’s top-level goals. You can tell them to make me a bunch of money but also check with you before doing anything with any chance of harming people, leaving your control, etc. GPT4 does really well at interpreting these and balancing multiple goals. This doesn’t address outer alignment or alignment stability, but it’s a heck of a start.
I just finished my post elaborating on this point: capabilities and alignment of LLM cognitive architectures