It’s worth keeping in mind that before Microsoft launched the GPT-4 Bing chatbot that ended up threatening and gaslighting users, OpenAI advised against launching so early as it didn’t seem ready. Microsoft went ahead anyway, apparently in part due to some resentment that OpenAI stole its “thunder” with releasing ChatGPT in November 2022. In principle, if Microsoft wanted to, there’s nothing stopping Microsoft from doing the same thing with future AI models: taking OpenAI’s base model, fine-tuning it in a less robustly safe manner, and releasing it in a relatively unsafe manner. Perhaps dangerous capability evaluations are not just about convincing OpenAI or Anthropic to adhere to higher safety standards and potentially pause, but also Microsoft.
That’s concerning to me, as this could imply that Microsoft won’t apply alignment techniques or reverse alignment techniques due to resentment, endangering people solely out of spite.
This is not good at all, and that’s saying something, since I’m usually the optimist and am quite optimistic on AI safety working out.
Now I worry that Microsoft will cause a potentially dangerous/misaligned AI from reversing OpenAI’s alignment techniques.
I’m happy that the alignment and safety were restored before it launched, but next time let’s not reverse alignment techniques, so that we don’t have to deal with more dangerous things later on.
To be clear, I don’t think Microsoft deliberately reversed OpenAI’s alignment techniques, but rather it seemed that Microsoft probably received the base model of GPT-4 and fine-tuned it separately from OpenAI.
Last Summer, OpenAI shared their next generation GPT model with us, and it was game-changing. The new model was much more powerful than GPT-3.5, which powers ChatGPT, and a lot more capable to synthesize, summarize, chat and create. Seeing this new model inspired us to explore how to integrate the GPT capabilities into the Bing search product, so that we could provide more accurate and complete search results for any query including long, complex, natural queries.
This seems to correspond to when GPT-4 “finished training in August of 2022”. OpenAI says it spent six months fine-tuning it with human feedback before releasing it in March 2023. I would guess that Microsoft doing its own fine-tuning of the version of GPT-4 from August 2022, separately from OpenAI. Especially with Bing’s tendency to repeat itself, it doesn’t feel like a fine-tuned version of GPT-3.5/4, after OpenAI’s RLHF, but rather more like a base model.
It’s worth keeping in mind that before Microsoft launched the GPT-4 Bing chatbot that ended up threatening and gaslighting users, OpenAI advised against launching so early as it didn’t seem ready. Microsoft went ahead anyway, apparently in part due to some resentment that OpenAI stole its “thunder” with releasing ChatGPT in November 2022. In principle, if Microsoft wanted to, there’s nothing stopping Microsoft from doing the same thing with future AI models: taking OpenAI’s base model, fine-tuning it in a less robustly safe manner, and releasing it in a relatively unsafe manner. Perhaps dangerous capability evaluations are not just about convincing OpenAI or Anthropic to adhere to higher safety standards and potentially pause, but also Microsoft.
That’s concerning to me, as this could imply that Microsoft won’t apply alignment techniques or reverse alignment techniques due to resentment, endangering people solely out of spite.
This is not good at all, and that’s saying something, since I’m usually the optimist and am quite optimistic on AI safety working out.
Now I worry that Microsoft will cause a potentially dangerous/misaligned AI from reversing OpenAI’s alignment techniques.
I’m happy that the alignment and safety were restored before it launched, but next time let’s not reverse alignment techniques, so that we don’t have to deal with more dangerous things later on.
To be clear, I don’t think Microsoft deliberately reversed OpenAI’s alignment techniques, but rather it seemed that Microsoft probably received the base model of GPT-4 and fine-tuned it separately from OpenAI.
Microsoft’s post “Building the New Bing” says:
This seems to correspond to when GPT-4 “finished training in August of 2022”. OpenAI says it spent six months fine-tuning it with human feedback before releasing it in March 2023. I would guess that Microsoft doing its own fine-tuning of the version of GPT-4 from August 2022, separately from OpenAI. Especially with Bing’s tendency to repeat itself, it doesn’t feel like a fine-tuned version of GPT-3.5/4, after OpenAI’s RLHF, but rather more like a base model.
That’s good news, but still I’m not happy Microsoft ignored OpenAI’s warnings.