Tuesday, August 2, 2022

Google AI Audiobook Creation: A Review

By James M. Jackson

Several months ago, Google Play announced it was beta testing Artificial Intelligence (AI) narrators for audiobooks. My first publisher did not do audiobooks, and when I became my own publisher, I didn’t have sufficient sales to justify directly paying a quality narrator; nor did I feel it was right to convince someone to spend a lot of time doing the narration on a royalty basis. Whether those were good decisions, only an alternate universe knows.

However, when Google Play made its announcement, I had a catalogue of books available to choose from and picked my novella “Low Tide at Tybee” for the experiment. It’s part of the Seamus McCree series but reads very well as a standalone. And it’s short, so if the experiment failed, I would have wasted less time than if I chose a novel.

AI readers are nothing new. For years, I’ve been using Microsoft Word’s version as part of my editing process. But boy does it produce some howler pronunciations. Google takes the ePub version of your book (which must be available at its store) and does its “magic.” You can choose your narrator based on age, sex, and country (US, British, Australian, Indian for English; Mexico and Spain for Spanish). You also pick a narration speed between 50% and 150% of normal. I chose “Mike,” an “American male, Age 31-45” who uses “medium speed and medium pitch.” I raised the speed to 150%.

When the Mike reads a word like read, which it can pronounce as “reed” or “red” it chooses but gives you an easy method to select the other. It’s easy to go through all the occurrences and pick the right one. Some words it simply mispronounces without giving you an alternative (especially common with place names). Google provides you with two ways to correct the error. If you have a microphone, you can speak the word and Google translates what it hears into its phonetic symbolism. I found that approach to work about 75% of the time; the other quarter, it comes up short.

When pronouncing doesn’t work, you can give it a preferred phonetic spelling, which for me requires ingenuity and trial and error before I come up with the right symbolism. Here is an example. It pronounces the word “leaden” something like “leeden” using the symbols: ˈli:dən. The accent on the first syllable (ˈ) was fine, the second syllable dən was also okay. I asked it how it pronounced the word “led.” It told me ˈlɛd, which I combined with dən to create ˈlɛd:dən.

Mike often messes up abbreviations. Seamus used to work for Criminal Investigations Group, shortened to CIG. Mike pronounced CIG as “cig,” like the first part of cigarette. I did a global search and replace (which is easy) to insert spaces between C and I, and I and G. Mike then pronounced each letter. In another book, Mike didn’t recognize the III in Albert Cunningham III as “the third,” instead stuttering the Is. Another global search and replace inserted spaces and fixed the problem.

Sometimes—and I haven’t figured out the trigger—Mike pronounces the word “I” as “the first.” (Which makes me wonder why he wouldn’t pronounce III as “the third.”) I wrestle Mike into submission by changing that I to eye, which Mike always gets right. Also a glitch, Mike occasionally overrides its preferred pronunciation (or the one I have provided) with something else. I think the word before it may cause the glitch. I haven’t found a fix for that issue other than changing words or word order. And twice, it has refused to pop-up the editor to change a word. The first time the global fix of turning the program off and turning it back on again corrected the issue. As of this writing, I have not gotten the second situation to work, resulting in Mike pronouncing the word content like what might be in a bottom drawer rather than how a cat feels being scratched behind the ears.

Those issues are frustrating and real, but relatively rare. I have also noticed over the months that Google’s AI has learned, and words Mike routinely mispronounced when I first used it for Low Tide at Tybee it gets correct while creating audiobooks for the later novels.

The area where AI suffers is inflection. A human narrator would take a sentence like, “She squinted and drew out her reply, ‘Really.’” The reader elongates the pronunciation of “really.” Not Mike. Sometimes with sentences that end in question marks Mike lilts the question at the end. Other times it speaks the sentence the same way as if it had a period. Sometimes Mike now pauses for ellipses and em-dashes. When I first started, he never paused.

When I first started, it took about three hours of my time for every hour of narration. With Granite Oath, it only required about an hour and a quarter for each hour of narration. Part of that is because I have a list of mispronounced words and can rapidly substitute the correct phonetics. But much of it is because Mike is making many fewer errors—it’s learning, exactly what artificial intelligence is supposed to do.

How good is the result? One member of my Readers Group who listens to lots of audiobooks gave Low Tide at Tybee a test listen. He said he was pleasantly surprised that it was better than some human narrators, but not as good as the best ones. I’m fine with that and have priced my audiobooks to reflect the compromise of somewhat lower quality and significantly lower production costs.

Here are links to the completed audiobooks if you want to pick one to check it out. The novels are $5.99 each and the novellas $2.99 each.

AntFarm    BadPolicy    CabinFever    Doubtful Relations    GraniteOath (pre-order)

Furthermore(novella)    Low Tideat Tybee (novella)

So, what do you think?

* * * * *

James M. Jackson authors the Seamus McCree series. Full of mystery and suspense, these thrillers explore financial crimes, family relationships, and what happens when they mix. You can sign up for his newsletter and find more information about Jim and his books at https://jamesmjackson.com.

4 comments:

  1. Fascinating! Thanks for experimenting with your own work.

    ReplyDelete
  2. I echo Margaret, Jim. This is fascinating and I appreciate your experiment and thorough report.

    ReplyDelete
  3. And you are also very welcome, Molly. :)

    ReplyDelete