2024-11-27

Gnuspeech (external project)

Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2024). The original authors are David R. Hill, Craig R. Taube-Schock and Leonard Manzara.

The synthesizer was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It has a simple vocal tract model, because the NeXT was a very slow computer (the CPUs of the 90s operated at a clock frequency of tens of MHz). The relative low complexity of the model allows low latency synthesis on modern personal computers.

License: GNU GPLv3-or-later.


1. Ports

The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. The OS X and C++ ports below are based on the C tube model.


2. Synthesis examples

- The Chaos by Gerard Nolst Trenité (short version) synthesized by the original TTS system for NeXTSTEP, running inside Previous, a 68k NeXT emulator. The speech was synthesized using the DSP tube model.

English - Male MP3

The video "Gnuspeech on NeXTSTEP" shows the software running on the emulator.

- The Chaos by Gerard Nolst Trenité (short version) synthesized by GnuspeechSA 0.1.8.

English - Male MP3
English - Female MP3
English - Large child MP3
English - Small child MP3
English - Baby MP3

Performance of GnuspeechSA 0.1.8 using the english male voice:
CPU: Ryzen 5 1600AF (the software uses only one thread).
Duration of the synthesized speech: 421 s.
Time to synthesize the speech (includes initialization): 6.6 s (63x faster than real-time).


3. Documentation