[Question] How hard would it be to change GPT-3 in a way that allows audio?

GPT-3 currently only works on text. If OpenAI would desire to make it work with similar performance for audio, how much work would that likely be?