---
license: other
pipeline_tag: text-to-speech
language:
- en
---

# NeuTTS Nano (English)

[![NeuTTSNano_Intro](neutts-nano.png)](https://www.youtube.com/watch?v=_USVA-JA0o0)

[🚀 Spaces Demo](https://huggingface.co/spaces/neuphonic/neutts-nano-multilingual-collection), [🔧 Github](https://github.com/neuphonic/neutts)

[Q8 GGUF version](https://huggingface.co/neuphonic/neutts-nano-q8-gguf), [Q4 GGUF version](https://huggingface.co/neuphonic/neutts-nano-q4-gguf)

*Created by [Neuphonic](http://neuphonic.com/) - building faster, smaller, on-device voice AI*

State-of-the-art Voice AI has been locked behind web APIs for too long. The **NeuTTS Nano Multilingual Collection** is a collection of super-fast, highly realistic, **on-device** TTS speech language models with **instant voice cloning** - built to run smoothly on CPUs and edge devices. With a compact backbone and an efficient LM + codec design, Nano models deliver strong naturalness and cloning quality at a fraction of the compute, making them ideal for embedded voice agents, assistants, toys, and privacy-sensitive applications.

> [!NOTE]
> This model is **English only**: see the [multilingual collection page](https://huggingface.co/collections/neuphonic/neutts-nano-multilingual-collection) for other languages.

## Key Features

- ⚡️ **Ultra-fast for on-device** — built for real-time or better-than-real-time generation on laptop-class CPUs
- 🗣 **High realism for its size** — natural, expressive speech in a comp act footprint
- 👫 **Instant voice cloning** — create a new speaker from just a few seconds of audio
- 📦 **GGUF/GGML-friendly deployment** — easy to run locally via CPU-first tooling
- 🔒 **Local-first + compliance-friendly** — keep audio and text on-device

> [!CAUTION]
> Websites like neutts.com are popping up and they're not affliated with Neuphonic, our github or this repo.
>
> We are on neuphonic.com only. Please be careful out there! 🙏

## Model Details

NeuTTS Nano models are designed for **maximum speed per parameter** while retaining strong speaker similarity and naturalness:

- **Backbone**: compact LM backbone tuned for TTS token generation (Nano class)
- **Audio Codec**: [NeuCodec](https://huggingface.co/neuphonic/neucodec) - our open-source neural audio codec that achieves exceptional audio quality at low bitrates using a single codebook
- **Format**: quantisations available in GGUF format for efficient on-device inference
- **Responsibility**: Watermarked outputs
- **Inference Speed**: Optimised for real-time generation on CPUs
- **Power Consumption**: Designed for mobile and embedded devices

### Parameter Count (Nano class)

- **Active params (backbone only):** **~116.8M**
- **Total params (backbone + tied embeddings/head):** **~228.7M**

## Get Started with NeuTTS

1. **Install System Dependencies (required): `espeak-ng`**

> [!CAUTION]
> `espeak-ng` is an updated version of `espeak`, as of February 2026 on version 1.52.0. Older versions of `espeak` and `espeak-ng` can exhibit significant phonemisation issues, particularly for non-English languages. Updating your system version of `espeak-ng` to the latest version possible is highly recommended. 

> [!NOTE]
> `brew` on macOS Ventura and later, `apt` in Ubuntu version 25 or Debian version 13, and `choco`/`winget` on Windows, install the latest version of `espeak-ng` with the commands below. If you have a different or older operating system, you may need to install from source: see the following link https://github.com/espeak-ng/espeak-ng/blob/master/docs/building.md

   Please refer to the following link for instructions on how to install `espeak-ng`:

   https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md

   ```bash
   # Mac OS
   brew install espeak-ng

   # Ubuntu/Debian
   sudo apt install espeak-ng

   # Windows install
   # via chocolatey (https://community.chocolatey.org/packages?page=1&prerelease=False&moderatorQueue=False&tags=espeak)
   choco install espeak-ng
   # via winget
   winget install -e --id eSpeak-NG.eSpeak-NG
   # via msi (need to add to path or folow the "Windows users who installed via msi" below)
   # find the msi at https://github.com/espeak-ng/espeak-ng/releases
   ```

   Windows users who installed via msi / do not have their install on path need to run the following (see https://github.com/bootphon/phonemizer/issues/163)
   ```pwsh
   $env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
   $env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
   setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
   setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"
   ```

2. **Install NeuTTS**
   ```bash
   pip install neutts
   ```

   Or for a local editable install, clone the [neutts repository](https://github.com/neuphonic/neutts) and run in the base folder:
   ```bash
   pip install -e .
   ```

   Alternatively to install all dependencies, including `onnxruntime` and `llama-cpp-python` (equivalent to steps 3 and 4 below):

   ```bash
   pip install neutts[all]
   ```

   or for an editable install:

   ```bash
   pip install -e .[all]
   ```

3. **(Optional) Install `llama-cpp-python` to use `.gguf` models.**

   ```bash
   pip install "neutts[llama]"
   ```

   Note that this installs `llama-cpp-python` without GPU support. To install with GPU support (e.g., CUDA, MPS) please refer to:
   https://pypi.org/project/llama-cpp-python/

4. **(Optional) Install `onnxruntime` to use the `.onnx` decoder.**
   ```bash
   pip install "neutts[onnx]"
   ```

## Examples

To get started with the example scripts, clone the neutts repository and navigate into the project directory:

   ```bash
   git clone https://github.com/neuphonic/neutts.git
   cd neutts
   ```

Several examples are available, including a Jupyter notebook in the `examples` folder.

### Basic Example

Run the basic example script to synthesize speech:

```bash
python -m examples.basic_example \
  --input_text "My name is Andy. I'm 25 and I just moved to London. The underground is pretty confusing, but it gets me around in no time at all." \
  --ref_audio samples/jo.wav \
  --ref_text samples/jo.txt
```

To specify a particular model repo for the backbone or codec, add the --backbone argument. Available backbones are listed in the [NeuTTS Nano Multilingual Collection](https://huggingface.co/collections/neuphonic/neutts-nano-multilingual-collection) huggingface collection.

> [!CAUTION]
> It is highly recommended to use a same-language reference for best performance: see [this readme section](https://github.com/neuphonic/neutts/tree/main?tab=readme-ov-file#example-reference-files) for appropriate example references.

### Simple One-Code Block Usage

```python
from neutts import NeuTTS
import soundfile as sf

tts = NeuTTS(
    backbone_repo="neuphonic/neutts-nano",
    backbone_device="cpu",
    codec_repo="neuphonic/neucodec",
    codec_device="cpu",
)

input_text = "My name is Andy. I'm 25 and I just moved to London. The underground is pretty confusing, but it gets me around in no time at all."

ref_text_path = "samples/jo.txt"
ref_audio_path = "samples/jo.wav"

ref_text = open(ref_text_path, "r").read().strip()
ref_codes = tts.encode_reference(ref_audio_path)

wav = tts.infer(input_text, ref_codes, ref_text)
sf.write("test.wav", wav, 24000)
```

## Tips

NeuTTS Nano requires two inputs:

1. A reference audio sample (`.wav` file)
2. A text string

The model then synthesises the text as speech in the style of the reference audio. This is what enables NeuTTS Nano’s instant voice cloning capability.

### Example Reference Files

You can find some ready-to-use samples in the `samples` folder:

- `samples/dave.wav`
- `samples/jo.wav`

### Guidelines for Best Results

For optimal performance, reference audio samples should be:

- **Mono channel**
- **16-44 kHz** sample rate
- **3–15 seconds** in length
- Saved as a **`.wav`** file
- **Clean** — minimal to no background noise
- **Natural, continuous speech** — like a monologue or conversation, with few pauses, so the model can capture tone effectively