Azure TTS

Jan 15, 2021

What is TTS?

TTS stands for Text-To-Speech. We use TTS all the time. Whenever we hear Siri, Google Assistant, or GPS directions we’re hearing the output of TTS.

TTS is all around us because it’s super easy for computers to output text (see: “Hello World”) and people typically find voice easy to use.

TTS isn’t a new technology. Wikipedia’s article on speech synthesis lists 1975 as the year where commercial text to speech was first available.

But if you have an early GPS with TTS, or you listen to phone prompts (“You entered 0-2-3 is that right?”) you’d sometimes hear an obviously generated voice. These TTS voices were created by having an actor record many syllables and then stitching those back together. But this meant that pauses, pitch and speed changes that come naturally to all of us were not included.

Enter the next generation of TTS with Azure TTS.

Azure Text to Speech

Azure Text to Speech is part of the next generation text to speech services that uses deep nueral networks to produce sound. The advantage of this process is the ability to generate voices from fewer samples and simulate the changes in pitch and speed that make up acents.

Demo

Here’s a demo of the TTS Service in Action

Code

All code is available on GitHub