Before this year, most mechanical voices were constructed from recordings of a person speaking, with sentences formed as combinations of recorded phrases and names. Now phonetically synthesized voices are becoming more common. These voices are not recordings of people, but are constructed digitally to sound like a person speaking. They are so convincing, you might not notice that you are listening to a synthesized voice rather than one assembled from recordings.
Voice designers have to choose a dialect for each voice they synthesize, though they likely wish they didn’t have to. A dialect serves mainly to help establish the individual identity of a person, but mechanical voices have identity enough just by virtue of being mechanical in rhythm. The purpose of a mechanical voice is to convey information, so voice designers try for a version of a broadcasting dialect, meant to be clear and understandable to the broadest possible audience, including non-native speakers of the language.
The synthesized version of a broadcasting dialect is a dialect nonetheless, and just as people influence each others’ dialects, people can’t help being influenced by synthesized dialects (and vice versa — voice designers are inevitably influenced by the human dialects around them). People tend to pronounce words consistent with the way they have heard them, and now, synthesized pronunciations are part of that mix. This, I am guessing, will tend to reduce dialectical variations. It especially could help standardize specific words that are correctly pronounced in several different ways. This effect is similar to the way spell-checking dictionaries have standardized the spelling of some words that before 2002 had multiple spellings. I have a feeling, though, that the influence of synthesized dialects will not be so simple as this.
This post originally appeared in Rick Aster’s World.