How to use GamaTTS
- Modes of execution
- Gnuspeech phonetic format
- MBROLA phonetic format
-
Description of the data files (in
data/voice/english/*
) - Notes
Modes of execution
Execute gama_tts --help
to see how to run the program.
There are three modes of execution:
- Text-to-speech
gama_tts tts ...
In this mode, the input text is converted to speech audio. First the text parser converts the text to a phonetic string. Then the control model converts this string to vocal tract model parameters. Finally, the vocal tract model converts these parameters to speech audio.
- Synthesis from phonetic string
gama_tts pho ...
In this mode, the control model converts the input phonetic string to vocal tract model parameters. Then the vocal tract model converts these parameters to speech audio.
The input format is selected in the file
vtm_control_model.txt
, fieldphonetic_string_format
. The available choices are gnuspeech and mbrola.Note: This mode does not use a Posture rewriter.
- Synthesis from vocal tract model parameters
gama_tts vtm ...
In this mode the vocal tract model converts the input parameters to speech audio.
Gnuspeech phonetic format
- Valid Posture names contain Unicode characters except:
0000-0020 007F (DEL) ' (apostrophe) _ (underscore) * . / 0-9
- Postures are separated by _
(underscore).
- Other symbols:
/0 |
Set current tone group type: statement |
/1 |
Set current tone group type: exclamation |
/2 |
Set current tone group type: question |
/3 |
Set current tone group type: continuation |
/4 |
Set current tone group type: semicolon |
/_ |
New foot |
/* |
New marked foot |
// |
New tone group |
/c |
New chunk |
/l |
Last foot in tone group marker |
/w |
New word |
/fNN.NN |
Foot tempo |
/rNN.NN |
Rule tempo |
. |
New syllable |
NN.NN |
Tempo of the next posture |
- Example:
The quick black dog jumps over the lazy brown fox.
/c // /0 # /w dh_uh /w /_k_w_i_k /w /_b_l_aa_k /w /_d_o_g /w /_j_a_m_p_s /w /_uh_uu.v_uh_r /w dh_uh /w /_l_e_i.z_i /w /_b_r_ah_uu_n /w /l /*f_o_k_s #
// /c
MBROLA phonetic format
The phonetic string contains lines in the format:
phoneme duration intonation_point_pos_1 intonation_point_value_1 intonation_point_pos_2 intonation_point_value_2...
Comments start with a semicolon. The intonation points are optional.
The duration is in milliseconds. The intonation point position is defined as the percentage of the time between phonemes. The intonation point value is in Hz.
Example:
; This is a test. pau 220 dh 37 0 99 ax 44 50 111 k 133 0 111 w 48 ih 52 50 131 k 58 b 95 0 128 r 44 aw 183 50 137 n 84 f 108 0 130 aa 165 50 131 k 115 s 144 100 124 pau 220
Description of the data files (in data/voice/english/*
)
Note: Only some parameters are shown here.
_index.txt File that indicates the location of other files/directories. artic.xml Contains the articulatory database. It can be modified with a plain text editor, but the use of GamaTTS:Editor is recommended. interactive.txt Configuration file for the interactive vocal tract model. jack.txt Configuration for JACK audio. vtm.txt Contains the parameters for the vocal tract model. model The vocal tract model type. vocal_tract_length_offset This value is added to the vocal tract length. loss_factor Defines the acoustic loss inside the vocal tract. vtm_control_model.txt Contains the parameters for the vocal tract model controller. variant_name Example: If variant_name = male, the file variant/male.txt will be loaded. tempo Values greater than 1.0 will speed up the speech. pitch_offset Modifies the voice pitch. drift_deviation drift_lowpass_cutoff Control the random perturbations in the intonation (requires intonation_drift = 1). phonetic_string_format gnuspeech or mbrola. variant/baby.txt variant/female.txt variant/large_child.txt variant/male.txt variant/small_child.txt Contain the variant parameters. vocal_tract_length glottal_pulse_tp Rise time, in % of the period. glottal_pulse_tn_min Fall time, in % of the period - for the highest pulse amplitude. glottal_pulse_tn_max Fall time, in % of the period - for the lowest pulse amplitude. These parameters modify the glottal pulse shape. reference_glottal_pitch Modifies the voice pitch. breathiness intonation_factor The intonation curve is multiplied by this factor. intonation_rhythm/ intonation.txt Contains constants for the intonation. rhythm.txt Contains constants for the rhythm calculation. tone_group_param-continuation.txt tone_group_param-exclamation.txt tone_group_param-question.txt tone_group_param-semicolon.txt tone_group_param-statement.txt Intonation parameters for each tone group. If random_intonation = 0 in vtm_control_model.txt, only the first line in each file will be used. If random_intonation = 1, the line will be randomly selected. pho1_parser/ Configuration for the phonetic string parser (MBROLA format). pho1.txt Main configuration file. phoneme_map-*.txt Contains the mapping between the input phonemes and the internal phonemes (Postures). phonetic_string_parser/ Configuration for the phonetic string parser (Gnuspeech format). rewrite.txt Contains rules that may modify the phonetic string. text_parser/ Configuration for the text to phonetic string converter. abbreviations.txt abbreviations_with_number.txt external_text_parser.txt Configuration for an external text parser, used when the MBROLA phonetic string format is selected. main_dictionary.txt The main dictionary, which relates words to postures. special_acronyms.txt suffix_list.txt text_parser.txt Configuration for the text parser, used when the Gnuspeech phonetic string format is selected. dictionary_1_file dictionary_2_file dictionary_3_file Indicate the dictionaries (the dictionaries will be searched in the order 1, 2, 3).
Notes
- When setting noise parameters like breathiness or frication volume, the value indicates the amplitude of the noise source (the pseudo-random number generator). The amplitude after the filtering can be significantly lower.