Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
Was circa 2015 speech-to-text using deep learning? If not, how did it work?
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.