Chatterbox TTS API Service
This is a high-performance text-to-speech (TTS) service based on Chatterbox-TTS. It provides an OpenAI TTS-compatible API interface, an enhanced interface supporting voice cloning, and a clean web user interface.
This project aims to offer developers and content creators a private, powerful, and easy-to-integrate TTS solution.

Project URL: https://github.com/jianchang512/chatterbox-api
Using with pyVideoTrans
This project can serve as a powerful TTS backend, providing high-quality English voiceovers for pyVideoTrans.
Start this project: Ensure the Chatterbox TTS API service is running locally (
http://127.0.0.1:5093).Update pyVideoTrans: Make sure your pyVideoTrans is upgraded to
v3.73or higher.Configure pyVideoTrans:

- In the pyVideoTrans menu, go to
TTS Settings->Chatterbox TTS. - API Address: Enter the address of this service, default is
http://127.0.0.1:5093. - Reference Audio (Optional): If you want to use voice cloning, enter the filename of the reference audio here (e.g.,
my_voice.wav). Ensure the audio file is placed in thechatterboxfolder in the pyVideoTrans root directory. - Adjust Parameters: Tune
cfg_weightandexaggerationas needed for optimal results.
Parameter Tuning Suggestions:
- General Scenarios (TTS, Voice Assistants): Default settings (
cfg_weight=0.5,exaggeration=0.5) work for most cases. - Fast-Paced Reference Audio: If the reference audio is fast, try lowering
cfg_weightto around0.3to improve speech rhythm. - Expressive/Dramatic Speech: Try lower
cfg_weight(e.g.,0.3) and higherexaggeration(e.g.,0.7or higher). Increasingexaggerationoften speeds up speech, while loweringcfg_weighthelps balance for a more measured and clearer pace.
- In the pyVideoTrans menu, go to
✨ Features
- Two API Interfaces:
- OpenAI-Compatible Interface:
/v1/audio/speech, seamlessly integrates with existing workflows using OpenAI SDK. - Voice Cloning Interface:
/v2/audio/speech_with_prompt, generates speech with the same voice characteristics by uploading a short reference audio.
- OpenAI-Compatible Interface:
- Web User Interface: Provides an intuitive frontend for quick testing and use of TTS features without coding.
- Flexible Output Formats: Supports generating audio in
.mp3and.wavformats. - Cross-Platform Support: Detailed installation guides for Windows, macOS, and Linux.
- One-Click Windows Deployment: Offers a compressed package for Windows users with all dependencies and startup scripts for out-of-the-box use.
- GPU Acceleration: Supports NVIDIA GPU (CUDA), with a one-click upgrade script for Windows users.
- Seamless Integration: Easily integrates as a backend service with tools like pyVideoTrans.
🚀 Quick Start
Method 1: Windows Users (Recommended, One-Click Start)
We provide a portable package win.7z for Windows users containing all dependencies, greatly simplifying installation.
Download and Extract:
Baidu Netdisk Download (Includes models, total 5.9G): https://pan.baidu.com/s/1daWaP5hk6dVWZk2NicotMA?pwd=1wsd
The package includes models and required environment files, large in size. Download both parts and extract them together. Avoid Chinese characters in the extraction path.
Baidu Netdisk Download (No models, 460MB, auto-downloads after startup, requires internet access): https://pan.baidu.com/s/1wLQb8YMD_Z_geRXJ_l-tlw?pwd=dbtp
GitHub Download (No models, 460MB, auto-downloads after startup, requires internet access): https://github.com/jianchang512/chatterbox-api/releases/download/0.2/chatterbox-win-NoModels-1005.7z
Start the Service:
- Double-click the
启动服务.batscript in the root directory.
When you see information like the following in the command window, the service has started successfully:
- Double-click the

```
✅ Model loading complete.
Service started successfully, HTTP address: http://127.0.0.1:5093
```
Method 2: macOS, Linux, and Manual Installation Users
For macOS, Linux users, or Windows users preferring manual setup, follow these steps.
1. Prerequisites
- Python: Ensure Python 3.9 or higher is installed.
- ffmpeg: A required audio/video processing tool.
- macOS (using Homebrew):
brew install ffmpeg - Debian/Ubuntu:
sudo apt-get update && sudo apt-get install ffmpeg - Windows (manual): Download ffmpeg and add it to the system
PATHenvironment variable.
- macOS (using Homebrew):
2. Installation Steps
# 1. Clone the project repository
git clone https://github.com/jianchang512/chatterbox-api.git
cd chatterbox-api
# 2. Create and activate a Python virtual environment (recommended)
python3 -m venv venv
# on Windows:
# venv\Scripts\activate
# on macOS/Linux:
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start the service
python app.pyOnce the service starts successfully, you will see the service address http://127.0.0.1:5093 in the terminal.
⚡ Upgrade to GPU Version (Optional)
If your computer has a CUDA-capable NVIDIA GPU and has NVIDIA drivers and CUDA Toolkit correctly installed, you can upgrade to the GPU version for significant performance gains.
Windows Users (One-Click Upgrade)
- First, ensure you have successfully run
启动服务.batat least once to complete the basic environment setup. - Double-click the
安装N卡GPU支持.batscript. - The script will automatically uninstall the CPU version of PyTorch and install the GPU version compatible with CUDA 12.6.
Linux Manual Upgrade
After activating the virtual environment, run the following commands:
# 1. Uninstall the existing CPU version of PyTorch
pip uninstall -y torch torchaudio
# 2. Install PyTorch matching your CUDA version
# The following command is for CUDA 12.6; get the correct command from the PyTorch website for your CUDA version
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126Visit the PyTorch website for installation commands suitable for your system.
After upgrading, restart the service. You will see Using device: cuda in the startup logs.
📖 Usage Guide
1. Web Interface
After starting the service, open http://127.0.0.1:5093 in your browser to access the Web UI.
- Input Text: Enter the text you want to convert in the text box.
- Adjust Parameters:
cfg_weight: (Range 0.0 - 1.0) Controls speech rhythm. Lower values result in slower, more measured speech. For fast-paced reference audio, lower this value (e.g., 0.3).exaggeration: (Range 0.25 - 2.0) Controls emotional and intonational exaggeration. Higher values mean more expressive speech and potentially faster speed.
- Voice Cloning: Click "Choose File" to upload a reference audio file (e.g., .mp3, .wav). If provided, the service uses the cloning interface.
- Generate Speech: Click the "Generate Speech" button, wait a moment, then listen online or download the generated MP3 file.
2. API Calls
Interface 1: OpenAI-Compatible Interface (/v1/audio/speech)
This interface does not require reference audio and can be called directly using the OpenAI SDK.
Python Example (openai SDK):
from openai import OpenAI
import os
# Point the client to our local service
client = OpenAI(
base_url="http://127.0.0.1:5093/v1",
api_key="not-needed" # API key is not required but needed by the SDK
)
response = client.audio.speech.create(
model="chatterbox-tts", # This parameter is ignored
voice="en", #
speed=0.5, # Corresponds to cfg_weight parameter
input="Hello, this is a test from the OpenAI compatible API.",
instructions="0.5" # (Optional) Corresponds to exaggeration parameter, note it must be a string
response_format="mp3" # Optional 'mp3' or 'wav'
)
# Save the audio stream to a file
response.stream_to_file("output_api1.mp3")
print("Audio saved to output_api1.mp3")Interface 2: Voice Cloning Interface (/v2/audio/speech_with_prompt)
This interface requires uploading both text and a reference audio file in multipart/form-data format.
Python Example (requests library):
import requests
API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
REFERENCE_AUDIO = "path/to/your/reference.mp3" # Replace with your reference audio path
form_data = {
'input': 'This voice should sound like the reference audio.',
'cfg_weight': '0.5',
'exaggeration': '0.5',
'response_format': 'mp3' # Optional 'mp3' or 'wav'
}
with open(REFERENCE_AUDIO, 'rb') as audio_file:
files = {'audio_prompt': audio_file}
response = requests.post(API_URL, data=form_data, files=files)
if response.ok:
with open("output_api2.mp3", "wb") as f:
f.write(response.content)
print("Cloned audio saved to output_api2.mp3")
else:
print("Request failed:", response.text)