Unit Size In Unit Selection Speech Synthesis

ADVERTISEMENT

EUROSPEECH 2003 - GENEVA
Unit Size in Unit Selection Speech Synthesis
S P Kishore and Alan W Black
Language Technologies Research Center
International Institute of Information Technology, Hyderabad
and ISRI, Carnegie Mellon Univesity
kishore@iiit.net
Language Technologies Institute, Carnegie Mellon University
awb@cs.cmu.edu
Abstract
Speech Synthesis System [3].
FestVox offers a language independent method for build-
In this paper, we address the issue of choice of unit size in
ing synthetic voices, offering mechanisms to abstractly describe
unit selection speech synthesis. We discuss the development of
phonetic and syllabic structure in the language. It is that flex-
a Hindi speech synthesizer and our experiments with different
ibility in the language building process that we will exploit in
choices of units: syllable, diphone, phone and half phone. Per-
this paper.
ceptual tests conducted to evaluate the quality of the synthesiz-
ers with different unit size indicate that the syllable synthesizer
3. Hindi Synthesis
performs better than the phone, diphone and half phone syn-
thesizers, and the half phone synthesizer performs better than
The basic units of the writing system in Indian languages are
diphone and phone synthesizers.
characters which are an orthographic representation of speech
sounds. A character in Indian language scripts is close to a syl-
1. Background
lable and can be typically of the following form: C, V, CV, VC,
CCV and CVC, where C is a consonant and V is a vowel. All
Most of the Information in digital world is accessible to a few
Indian language scripts have a common phonetic base, and an
who can read or understand a particular language. Language
universal phoneset consists of about 35 consonants and about 18
technologies can provide solutions in the form of natural inter-
vowels. In Hindi, there are five vowels, five long vowels, two
faces so that digital content can reach to the masses and facili-
diphthongs, four semivowels, and 31 consonants. There are a
tate the exchange of information across different people speak-
few more vowels and consonants existing in Hindi, but we did
ing different languages.
not consider them as they are rarely used in the current times.
These technologies play a crucial role in multi-lingual so-
cieties such as India which has about 1652 dialects/native lan-
3.1. Letter to Sound Rules
guages. While Hindi written in Devanagari script, is the official
language, the other 17 languages recognized by the constitution
The scripts of Indian languages are phonetic in nature. There is
of India are: 1) Assamese 2) Tamil 3) Malayalam 4) Gujarati
more or less one to one correspondence between what is written
5) Telugu 6) Oriya 7) Urdu 8) Bengali 9) Sanskrit 10) Kashmiri
and what is spoken. However, in Hindi the inherent vowel (short
11) Sindhi 12) Punjabi 13) Konkani 14) Marathi 15) Manipuri
/a/) associated with a consonant is not pronounced depending on
16) Kannada and 17) Nepali. Seamless integration of speech
the context. This is referred to as Inherent Vowel Suppression
recognition, machine translation and speech synthesis systems
(IVS) or schwa deletion. For example, the word kamala [lotus]
could facilitate the exchange of information between two peo-
is mapped to a sequence of consonant and vowel sounds /k/ /a/
ple speaking two different languages. Our overall goal is to de-
/m/ /a/ /l/, ignoring the vowel associated with /l/.
velop speech recognition and speech synthesis systems for most
A set of heuristic rules to detect IVS of a consonant charac-
of these languages.
ter are noted below. These rules have been derived by observing
In this paper we discuss the details of the development of a
a few hundred Hindi words, and the rule set may not be a com-
Hindi speech synthesizer using unit selection techniques and in
plete description of the phenomenon.
particular address the issue of choice of unit size in unit selec-
tion synthesis.
1 No two successive characters undergo IVS.
2. Synthesis Framework
2 Characters present in the first position of a word, never
undergo IVS. IVS occurs only to the characters present
This work is done within the FestVox voice building framework
in middle and final positions.
[1], which offers general tools for building unit selection syn-
thesizers in new languages. The unit selection paradigm is a
3 For characters in final position, the inherent vowel (/a/)
cluster based technique where units of the same type (phones,
is always suppressed.
diphones, syllables or whatever) are clustered based on their
acoustic differences [2]. The clusters are then indexed based
4 For characters in word middle position, IVS occurs if the
on high level features such as phonetic and prosodic context.
next character in the word is not the last character or the
Voices generated by this system may be run in the Festival
next character has a vowel other than /a/.
1317

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Medical
Go
Page of 4