2025-01-13

Gnuspeech (external project)

Gnuspeech is an articulatory speech synthesizer. The project implemented the first articulatory text-to-speech (TTS) software (as far as I know). It was developed in the 90s, around 30 years ago (in 2024). The original authors are David R. Hill, Craig R. Taube-Schock and Leonard Manzara.

The synthesizer was previously a closed source commercial software, available only for NeXT computers. After the demise of NeXT, the software was donated to the GNU project. It has a simple vocal tract model, because the NeXT was a very slow computer (the CPUs of the 90s operated at a clock frequency of tens of MHz). The relative low complexity of the model allows low latency synthesis on modern personal computers.

License: GNU GPLv3-or-later


1. Ports

The original TTS system had two implementations of the vocal tract model (tube model), one that executed on a 56k DSP, written in assembly, and another that executed on the CPU, written in C. The DSP tube model generates better speech, with more balanced fricatives/plosives. The OS X and C++ ports below are based on the C tube model.


2. Synthesis examples

- The Chaos by Gerard Nolst Trenité (short version) synthesized by the original TTS system for NeXTSTEP, running inside Previous, a 68k NeXT emulator. The speech was synthesized using the DSP tube model.

English - Male MP3

The video "Gnuspeech on NeXTSTEP" shows the software running on the emulator.

- The Chaos by Gerard Nolst Trenité (short version) synthesized by GnuspeechSA 0.1.8.

English - Male MP3
English - Female MP3
English - Large child MP3
English - Small child MP3
English - Baby MP3

Performance of GnuspeechSA 0.1.9 using the english male voice:
CPU: Ryzen 7 5700X (the software uses only one thread)

Compiler Speech duration Synthesis time Real-time factor
GCC 12.2 421 s 4.3 s 98x
Clang 14.0.6 421 s 3.1 s 136x

3. Documentation