In providing voice services to my customers, I'm often tasked with creating audio files for IVR/auto-attendant menu recordings. When there is no professional voice talent required, I primarily use the rather excellent Google Cloud Text-to-Speech API.
Now, my Kazoo phone system does support the Google TTS API for generating media files, but it has no option to change pitch, speed, or various other parameters. The Google API of course supports all the options possible, but unfortunately it's not a user-friendly API, and quite a pain to use with curl or the like. So, I decided to write an application to handle the job for me.
While Elixir is my favorite programming language, it is not very well suited to commandline applications, so Go was the obvious best option. Go is very suitable for such programs. I went to the Google documentation for their text to speech API, and was pleasantly greeted with an introduction to using their very nice SDK to interact with it. Google is a heavy user of Go, so its sample code is the first they show.
I decided I wanted my application to be able to read from standard input, as that can be a boon to facilitating automation and integrating programs into scripts. Later, I added text file input, and an option for reading SSML (Speech Synthesis Markup Language). The ability to use SSML is necessary if you want to format your input text in order to add pauses, emphasis, and some other things, without messing with the audio data after the fact.
After a few hours of coding, my application was working, and dubbed Google Squawk. A few more hours of tweaks, and it did everything I wanted, including
- SSML Support
- Reading from File or Standard Input
- Output to File or Standard Output
- Language Selection
- Voice Gender Selection
- Specific Voice Selection
- Speed Selection
- Pitch Selection
- Audio Format Selection (MP3, OPUS/OGG, PCM, μLaw, ALaw)
- Sample Rate Selection
- Volume/Gain Adjustment
- Listing of All Available Voices
All that is required to use it is Google Cloud credentials (normally a service account credentials JSON file). You can specify the path to the file in an environment variable, and then run gsquawk
, and generate Text to Speech audio files to your heart's content! :-)
I have released the code on GitHub.
Being a Go app, it should compile and run on Linux, BSD, Windows, or Mac.
Give it a star on Github and drop me a line if you use it and find it beneficial.
Add new comment