Introduction
About
GamaTTS is an experimental articulatory speech synthesizer, started as a C++ port of Gnuspeech. The port is based on the original TTS_Server (developed for NeXTSTEP), which was written in C (70% in lines of code) and ObjC (30%).
GamaTTS:Editor was later developed to edit the articulatory database, using parts of the source code from Monet (written in ObjC), an editor in Gnuspeech. During the development, the available documentation for Monet was used as a reference.
Some of the changes in GamaTTS (compared with Gnuspeech) are:
- Added features:
- Support for multiple vocal tract models.
- The vocal tract parameters are arbitrary (except the first which must be the pitch).
- Support for external front-ends, using the MBROLA input file format.
- Partial support for Unicode text / IPA.
- Many hard-coded values have been moved to configuration files.
- Real-time manipulation of vocal tract parameters during speech synthesis.
- The control sample rate can be changed from 250 Hz to 333 Hz, 500 Hz or 1000 Hz.
- Removed features:
- Meta-parameters, which were not working in Gnuspeech. The idea was to use high-level parameters such as "tongue height", "tongue position", or "lung pressure", for example, as meta-parameters. But in GamaTTS such parameters should be used as normal parameters, and the mapping to low-level parameters must be done inside the vocal tract model.
- In the Interactive Vocal Tract Model (replacement for TRAcT in Gnuspeech), there is no diagram of the vocal tract geometry. Figures showing frequency response of vocal tract model components, and other information, are also missing. The settings that are specific to a vocal tract model must be changed using the configuration files. These limitations are due to the support for multiple vocal tract models.
- There are parameters that in Monet can be changed in the GUI, but in GamaTTS:Editor must be changed using the configuration files.
- Marked Postures were removed. It seems that they were not being used in Gnuspeech.
- The dictionary is stored in memory during runtime, not in databases such as ndbm/gdbm. For this reason, some data may be lost in case of power failure. This change was done to remove the external dependency.
- The original code could use a DSP, but GamaTTS uses only a CPU, because the modern CPUs are fast enough.
- In the Synthesis Window, the curves for Special Transitions and normal Transitions are shown in the same figure (with Special Transitions in red), they can't be shown in separate figures.
- The output sound is always in mono. It can be converted to stereo using JACK effects.
Due to these and other changes, most of the file formats are not compatible with Gnuspeech anymore. But the vocal tract models 0, 1 and 2 produce speech almost equivalent to Gnuspeech's output.
Description
The main modules are:
- Text Parser:
Converts the input text to a phonetic string. This string contains phonemes and control codes for example to indicate phoneme duration or the start of a word. - VTM Control Model:
Converts the phonetic string to vocal tract model parameters. This module controls the vocal tract model. - VTM (vocal tract model):
Converts the VTM parameters to speech audio, using a simulation of the acoustics of the human vocal tract. - GamaTTS:Editor is used to modify the articulatory database, because manually editing the database would be very difficult.
The VTM parameters can be adjusted to produce a "schwa" sound, for example. If the parameter values remain constant, the output will be a continuous sound. In the Control Model, such a configuration of the VTM is called a Posture.
To produce speech, the VTM parameters must change along the time (this is called articulation). In the Control Model, the way the parameters change from Posture to Posture is defined by Transitions.
Transitions use Transition Points to define the piecewise linear function that will control the VTM parameter along the time. The time of each Point can be defined using constants, but this is not very flexible. For this reason the Control Model uses Equations to define the times. The Equations use formulas to calculate time, using as parameters the durations of the Postures involved in a Transition.
The Control Model must decide which Transition will be used for each Posture sequence and for each parameter. The Rules are used to do this selection, they contain boolean expressions to match a sequence of Postures. Boolean expressions can also match Categories, which are groups of Postures.