How to use GamaTTS
- Modes of execution
- Gnuspeech phonetic format
- MBROLA phonetic format
-
Description of the data files (in
data/voice/english/*) - Notes
Modes of execution
Execute gama_tts --help to see how to run the program.
There are three modes of execution:
- Text-to-speech
gama_tts tts ...In this mode, the input text is converted to speech audio. First the text parser converts the text to a phonetic string. Then the control model converts this string to vocal tract model parameters. Finally, the vocal tract model converts these parameters to speech audio.
- Synthesis from phonetic string
gama_tts pho ...In this mode, the control model converts the input phonetic string to vocal tract model parameters. Then the vocal tract model converts these parameters to speech audio.
The input format is selected in the file
vtm_control_model.txt, fieldphonetic_string_format. The available choices are gnuspeech and mbrola.Note: This mode does not use a Posture rewriter.
- Synthesis from vocal tract model parameters
gama_tts vtm ...In this mode the vocal tract model converts the input parameters to speech audio.
Gnuspeech phonetic format
- Valid Posture names contain Unicode characters except:
0000-0020
007F (DEL)
' (apostrophe)
_ (underscore)
*
.
/
0-9
- Postures are separated by _ (underscore).
- Other symbols:
/0 |
Set current tone group type: statement |
/1 |
Set current tone group type: exclamation |
/2 |
Set current tone group type: question |
/3 |
Set current tone group type: continuation |
/4 |
Set current tone group type: semicolon |
/_ |
New foot |
/* |
New marked foot |
// |
New tone group |
/c |
New chunk |
/l |
Last foot in tone group marker |
/w |
New word |
/fNN.NN |
Foot tempo |
/rNN.NN |
Rule tempo |
. |
New syllable |
NN.NN |
Tempo of the next posture |
- Example:
The quick black dog jumps over the lazy brown fox.
/c // /0 # /w dh_uh /w /_k_w_i_k /w /_b_l_aa_k /w /_d_o_g /w /_j_a_m_p_s /w /_uh_uu.v_uh_r /w dh_uh /w /_l_e_i.z_i /w /_b_r_ah_uu_n /w /l /*f_o_k_s #
// /c
MBROLA phonetic format
The phonetic string contains lines in the format:
phoneme duration intonation_point_pos_1 intonation_point_value_1 intonation_point_pos_2 intonation_point_value_2...
Comments start with a semicolon. The intonation points are optional.
The duration is in milliseconds. The intonation point position is defined as the percentage of the time between phonemes. The intonation point value is in Hz.
Example:
; This is a test.
pau 220
dh 37 0 99
ax 44 50 111
k 133 0 111
w 48
ih 52 50 131
k 58
b 95 0 128
r 44
aw 183 50 137
n 84
f 108 0 130
aa 165 50 131
k 115
s 144 100 124
pau 220
Description of the data files (in data/voice/english/*)
Note: Only some parameters are shown here.
_index.txt
File that indicates the location of other files/directories.
artic.xml
Contains the articulatory database.
It can be modified with a plain text editor, but the use of
GamaTTS:Editor is recommended.
interactive.txt
Configuration file for the interactive vocal tract model.
jack.txt
Configuration for JACK audio.
vtm.txt
Contains the parameters for the vocal tract model.
model
The vocal tract model type.
vocal_tract_length_offset
This value is added to the vocal tract length.
loss_factor
Defines the acoustic loss inside the vocal tract.
vtm_control_model.txt
Contains the parameters for the vocal tract model controller.
variant_name
Example: If variant_name = male, the file variant/male.txt will
be loaded.
tempo
Values greater than 1.0 will speed up the speech.
pitch_offset
Modifies the voice pitch.
drift_deviation
drift_lowpass_cutoff
Control the random perturbations in the intonation
(requires intonation_drift = 1).
phonetic_string_format
gnuspeech or mbrola.
variant/baby.txt
variant/female.txt
variant/large_child.txt
variant/male.txt
variant/small_child.txt
Contain the variant parameters.
vocal_tract_length
glottal_pulse_tp
Rise time, in % of the period.
glottal_pulse_tn_min
Fall time, in % of the period - for the highest pulse
amplitude.
glottal_pulse_tn_max
Fall time, in % of the period - for the lowest pulse
amplitude.
These parameters modify the glottal pulse shape.
reference_glottal_pitch
Modifies the voice pitch.
breathiness
intonation_factor
The intonation curve is multiplied by this factor.
intonation_rhythm/
intonation.txt
Contains constants for the intonation.
rhythm.txt
Contains constants for the rhythm calculation.
tone_group_param-continuation.txt
tone_group_param-exclamation.txt
tone_group_param-question.txt
tone_group_param-semicolon.txt
tone_group_param-statement.txt
Intonation parameters for each tone group.
If random_intonation = 0 in vtm_control_model.txt, only the
first line in each file will be used. If random_intonation = 1,
the line will be randomly selected.
pho1_parser/
Configuration for the phonetic string parser (MBROLA format).
pho1.txt
Main configuration file.
phoneme_map-*.txt
Contains the mapping between the input phonemes and the
internal phonemes (Postures).
phonetic_string_parser/
Configuration for the phonetic string parser (Gnuspeech format).
rewrite.txt
Contains rules that may modify the phonetic string.
text_parser/
Configuration for the text to phonetic string converter.
abbreviations.txt
abbreviations_with_number.txt
external_text_parser.txt
Configuration for an external text parser, used when the MBROLA
phonetic string format is selected.
main_dictionary.txt
The main dictionary, which relates words to postures.
special_acronyms.txt
suffix_list.txt
text_parser.txt
Configuration for the text parser, used when the Gnuspeech phonetic
string format is selected.
dictionary_1_file
dictionary_2_file
dictionary_3_file
Indicate the dictionaries (the dictionaries will be
searched in the order 1, 2, 3).
Notes
- When setting noise parameters like breathiness or frication volume, the value indicates the amplitude of the noise source (the pseudo-random number generator). The amplitude after the filtering can be significantly lower.