I’m wondering. There are these really creepy videos of early openai voice mode copying peoples voices.
https://www.youtube.com/shorts/RbCoIa7eXQE
I wonder if they’re a result of openai failing to do this loss-masking with their voice models, and then messing up turn-tokenization somehow.
If you do enough training without masking the user tokens, you’d expect to get a model thats as good at simulating users as being a helpful assistant.
I’m wondering. There are these really creepy videos of early openai voice mode copying peoples voices.
https://www.youtube.com/shorts/RbCoIa7eXQE
I wonder if they’re a result of openai failing to do this loss-masking with their voice models, and then messing up turn-tokenization somehow.
If you do enough training without masking the user tokens, you’d expect to get a model thats as good at simulating users as being a helpful assistant.