Gnuspeech (external project)

Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2023). The original authors are David R. Hill, Craig R. Taube-Schock and Leonard Manzara.

Gnuspeech was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It uses a simple vocal tract model, because the NeXT was a very slow computer, even considering its DSP. The CPUs of the 90s operated at a frequency of tens of MHz (not a typo), around 100x slower than the technology in 2023. The relative low complexity of the model makes it not too difficult to understand, and it allows low latency synthesis on modern personal computers.

License: GNU GPLv3-or-later.


1. Ports

The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. The OS X and C++ ports below are based on the C tube model.


2. Synthesis examples

- The Chaos by Gerard Nolst Trenité (short version) synthesized by the original TTS system for NeXTSTEP, running inside Previous, a 68k NeXT emulator. The speech was synthesized using the DSP tube model.

English - Male MP3

The video "Gnuspeech on NeXTSTEP" shows the software running on the emulator.

- The Chaos by Gerard Nolst Trenité (short version) synthesized by GnuspeechSA 0.1.8.

English - Male MP3
English - Female MP3
English - Large child MP3
English - Small child MP3
English - Baby MP3

Performance of GnuspeechSA 0.1.8 using the english male voice:
CPU: Ryzen 5 1600AF (the software uses only one thread).
Duration of the synthesized speech: 421 s.
Time to synthesize the speech (includes initialization): 6.6 s (63x faster than real-time).


3. Documentation