By James M. Jackson
However, when Google Play made its announcement, I had a
catalogue of books available to choose from and picked my novella “Low Tide at
Tybee” for the experiment. It’s part of the Seamus McCree series but reads very
well as a standalone. And it’s short, so if the experiment failed, I would have
wasted less time than if I chose a novel.
AI readers are nothing new. For years, I’ve been using
Microsoft Word’s version as part of my editing process. But boy does it produce
some howler pronunciations. Google takes the ePub version of your book (which
must be available at its store) and does its “magic.” You can choose your
narrator based on age, sex, and country (US, British, Australian, Indian for
English; Mexico and Spain for Spanish). You also pick a narration speed between
50% and 150% of normal. I chose “Mike,” an “American male, Age 31-45” who uses
“medium speed and medium pitch.” I raised the speed to 150%.
When the Mike reads a word like read, which it can pronounce
as “reed” or “red” it chooses but gives you an easy method to select the other.
It’s easy to go through all the occurrences and pick the right one. Some words
it simply mispronounces without giving you an alternative (especially common
with place names). Google provides you with two ways to correct the error. If
you have a microphone, you can speak the word and Google translates what it
hears into its phonetic symbolism. I found that approach to work about 75% of
the time; the other quarter, it comes up short.
When pronouncing doesn’t work, you can give it a preferred
phonetic spelling, which for me requires ingenuity and trial and error before I
come up with the right symbolism. Here is an example. It pronounces the word
“leaden” something like “leeden” using the symbols: ˈli:dən. The accent on the
first syllable (ˈ) was fine, the second syllable dən was also okay. I asked it
how it pronounced the word “led.” It told me ˈlɛd, which I combined with dən to
create ˈlɛd:dən.
Mike often messes up abbreviations. Seamus used to work for
Criminal Investigations Group, shortened to CIG. Mike pronounced CIG as “cig,”
like the first part of cigarette. I did a global search and replace (which is
easy) to insert spaces between C and I, and I and G. Mike then pronounced each
letter. In another book, Mike didn’t recognize the III in Albert Cunningham III
as “the third,” instead stuttering the Is. Another global search and replace
inserted spaces and fixed the problem.
Sometimes—and I haven’t figured out the trigger—Mike
pronounces the word “I” as “the first.” (Which makes me wonder why he wouldn’t
pronounce III as “the third.”) I wrestle Mike into submission by changing that
I to eye, which Mike always gets right. Also a glitch, Mike occasionally
overrides its preferred pronunciation (or the one I have provided) with
something else. I think the word before it may cause the glitch. I haven’t
found a fix for that issue other than changing words or word order. And twice,
it has refused to pop-up the editor to change a word. The first time the global
fix of turning the program off and turning it back on again corrected the issue.
As of this writing, I have not gotten the second situation to work, resulting
in Mike pronouncing the word content like what might be in a bottom drawer
rather than how a cat feels being scratched behind the ears.
Those issues are frustrating and real, but relatively rare.
I have also noticed over the months that Google’s AI has learned, and words Mike
routinely mispronounced when I first used it for Low Tide at Tybee it
gets correct while creating audiobooks for the later novels.
The area where AI suffers is inflection. A human narrator
would take a sentence like, “She squinted and drew out her reply, ‘Really.’” The
reader elongates the pronunciation of “really.” Not Mike. Sometimes with
sentences that end in question marks Mike lilts the question at the end. Other
times it speaks the sentence the same way as if it had a period. Sometimes Mike
now pauses for ellipses and em-dashes. When I first started, he never paused.
When I first started, it took about three hours of my time
for every hour of narration. With Granite Oath, it only required about
an hour and a quarter for each hour of narration. Part of that is because I
have a list of mispronounced words and can rapidly substitute the correct
phonetics. But much of it is because Mike is making many fewer errors—it’s
learning, exactly what artificial intelligence is supposed to do.
How good is the result? One member of my Readers Group who
listens to lots of audiobooks gave Low Tide at Tybee a test listen. He
said he was pleasantly surprised that it was better than some human narrators,
but not as good as the best ones. I’m fine with that and have priced my
audiobooks to reflect the compromise of somewhat lower quality and
significantly lower production costs.
Here are links to the completed audiobooks if you want to
pick one to check it out. The novels are $5.99 each and the novellas $2.99
each.
AntFarm BadPolicy CabinFever Doubtful Relations GraniteOath (pre-order)
Furthermore(novella) Low Tideat Tybee (novella)
So, what do you think?
* * * * *
James M. Jackson authors the Seamus McCree series. Full of mystery and suspense, these thrillers explore financial crimes, family relationships, and what happens when they mix. You can sign up for his newsletter and find more information about Jim and his books at https://jamesmjackson.com.
 

 
Fascinating! Thanks for experimenting with your own work.
ReplyDeleteYou're very welcome Margaret.
ReplyDeleteI echo Margaret, Jim. This is fascinating and I appreciate your experiment and thorough report.
ReplyDeleteAnd you are also very welcome, Molly. :)
ReplyDeletethank you so much for this comprehensive review. I hope that your books continue to succeed!
ReplyDeleteGoogle AI’s audiobook creation is revolutionizing accessibility, much like kaushal vikas yojana 4.0 revolutionizes support systems for senior citizens with thoughtful initiatives!
ReplyDelete