Gnuspeech (external project)
Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2024). The original authors are David R. Hill, Craig R. Taube-Schock and Leonard Manzara.
The synthesizer was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It has a simple vocal tract model, because the NeXT was a very slow computer (the CPUs of the 90s operated at a clock frequency of tens of MHz). The relative low complexity of the model allows low latency synthesis on modern personal computers.
License: GNU GPLv3-or-later.
1. Ports
The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. The OS X and C++ ports below are based on the C tube model.
-
OS X (ObjC)
This port includes the synthesizer and the GUI tools.
[ download (gnuspeech-*.tar.gz) | git repository ] -
GnuspeechSA (stand-alone, multi-platform C++)
GnuspeechSA is a command-line speech synthesizer.
Tested on Linux+GNU x86_64.
[ git repository ]
2. Synthesis examples
- The Chaos by Gerard Nolst Trenité (short version) synthesized by the original TTS system for NeXTSTEP, running inside Previous, a 68k NeXT emulator. The speech was synthesized using the DSP tube model.
English - Male | MP3 |
The video "Gnuspeech on NeXTSTEP" shows the software running on the emulator.
- The Chaos by Gerard Nolst Trenité (short version) synthesized by GnuspeechSA 0.1.8.
English - Male | MP3 |
English - Female | MP3 |
English - Large child | MP3 |
English - Small child | MP3 |
English - Baby | MP3 |
Performance of GnuspeechSA 0.1.8 using the english male voice:
CPU: Ryzen 5 1600AF (the software uses only one thread).
Duration of the synthesized speech: 421 s.
Time to synthesize the speech (includes initialization): 6.6 s (63x faster than real-time).