How to use GamaTTS

  1. Modes of execution
  2. Gnuspeech phonetic format
  3. MBROLA phonetic format
  4. Description of the data files (data/english/vtm0 to vtm5)
  5. Notes

Modes of execution

Execute gama_tts --help to see how to run the program.

There are three modes of execution:

  1. Text-to-speech
    gama_tts tts ...

    In this mode, the input text is converted to speech audio. First the text parser converts the text to a phonetic string. Then the control model converts this string to vocal tract model parameters. Finally, the vocal tract model converts these parameters to speech audio.

  2. Synthesis from phonetic string
    gama_tts pho ...

    In this mode, the control model converts the input phonetic string to vocal tract model parameters. Then the vocal tract model converts these parameters to speech audio.

    The input format is selected in the file vtm_control_model.txt, field phonetic_string_format. The available choices are gnuspeech and mbrola.

    Note: This mode does not use a Posture rewriter.

  3. Synthesis from vocal tract model parameters
    gama_tts vtm ...

    In this mode the vocal tract model converts the input parameters to speech audio.


Gnuspeech phonetic format

- Valid Posture names contain Unicode characters except:

    0000-0020
    007F (DEL)
    '    (apostrophe)
    _    (underscore)
    *
    .
    /
    0-9

- Postures are separated by _ (underscore).

- Other symbols:

/0 Set current tone group type: statement
/1 Set current tone group type: exclamation
/2 Set current tone group type: question
/3 Set current tone group type: continuation
/4 Set current tone group type: semicolon
/_ New foot
/* New marked foot
// New tone group
/c New chunk
/l Last foot in tone group marker
/w New word
/fNN.NN Foot tempo
/rNN.NN Rule tempo
. New syllable
NN.NN Tempo of the next posture

- Example:

The quick black dog jumps over the lazy brown fox.
/c // /0 # /w dh_uh /w /_k_w_i_k /w /_b_l_aa_k /w /_d_o_g /w /_j_a_m_p_s /w /_uh_uu.v_uh_r /w dh_uh /w /_l_e_i.z_i /w /_b_r_ah_uu_n /w /l /*f_o_k_s # // /c


MBROLA phonetic format

The phonetic string contains lines in the format:
phoneme duration intonation_point_pos_1 intonation_point_value_1 intonation_point_pos_2 intonation_point_value_2...
Comments start with a semicolon. The intonation points are optional.

The duration is in milliseconds. The intonation point position is defined as the percentage of the time between phonemes. The intonation point value is in Hz.

Example:

    ; This is a test.
    pau 220
    dh   37   0  99
    ax   44  50 111
    k   133   0 111
    w    48
    ih   52  50 131
    k    58
    b    95   0 128
    r    44
    aw  183  50 137
    n    84
    f   108   0 130
    aa  165  50 131
    k   115
    s   144 100 124
    pau 220

Description of the data files (data/english/vtm0 to vtm5)

Note: Only some parameters are shown here.

artic.xml
    Contains the articulatory database.

    It can be modified with a plain text editor, but the use of
    GamaTTS:Editor is recommended.

interactive.txt
    Configuration file for the interactive vocal tract model.

voice/baby.txt
voice/female.txt
voice/large_child.txt
voice/male.txt
voice/small_child.txt
    Contain the voice parameters.

        vocal_tract_length

        glottal_pulse_tp
            Rise time, in % of the period.
        glottal_pulse_tn_min
            Fall time, in % of the period - for the highest pulse
            amplitude.
        glottal_pulse_tn_max
            Fall time, in % of the period - for the lowest pulse
            amplitude.

            These parameters modify the glottal pulse shape.

        reference_glottal_pitch
            Modifies the voice pitch.

        breathiness

        loss_factor
            Defines the acoustic loss inside the vocal tract.

        intonation_factor
            The intonation curve is multiplied by this factor.

vtm_control_model.txt
    Contains the parameters for the vocal tract model controller.

        voice_name
            Example: If voice_name = male, the voice file named
            voice/male.txt will be loaded.
        tempo
            Values greater than 1.0 will speed up the speech.
        pitch_offset
            Modifies the voice pitch.

        drift_deviation
        drift_lowpass_cutoff
            Control the random perturbations in the intonation
            (requires intonation_drift = 1).

        phonetic_string_format
            gnuspeech or mbrola.

vtm.txt
    Contains the parameters for the vocal tract model.

        model
            The vocal tract model type.
        vocal_tract_length_offset
            This value is added to the vocal tract length.

intonation_rhythm/
    intonation.txt
        Contains constants for the intonation.
    rhythm.txt
        Contains constants for the rhythm calculation.
    tone_group_param-continuation.txt
    tone_group_param-exclamation.txt
    tone_group_param-question.txt
    tone_group_param-semicolon.txt
    tone_group_param-statement.txt
        Intonation parameters for each tone group.
        If random_intonation = 0 in vtm_control_model.txt, only the
        first line in each file will be used. If random_intonation = 1,
        the line will be randomly selected.

pho1_parser/
    Configuration for the phonetic string parser (MBROLA format).

        pho1.txt
            Main configuration file.
        phoneme_map-*.txt
            Contains the mapping between the input phonemes and the
            internal phonemes (Postures).

phonetic_string_parser/
    Configuration for the phonetic string parser (Gnuspeech format).

        rewrite.txt
            Contains rules that may modify the phonetic string.

text_parser/
    Configuration for the text to phonetic string converter.

        abbreviations_with_number.txt
        abbreviations.txt
        external_text_parser.txt
            Configuration for an external text parser, used when the MBROLA
            phonetic string format is selected.
        main_dictionary.txt
            The main dictionary, which relates words to postures.
        special_acronyms.txt
        suffix_list.txt
        text_parser.txt
            Configuration for the text parser, used when the Gnuspeech phonetic
            string format is selected.

                dictionary_1_file
                dictionary_2_file
                dictionary_3_file
                    Indicate the dictionaries (the dictionaries will be
                    searched in the order 1, 2, 3).


Notes