Skip to content

This is a webui and API project for the Kokoro TTS project, supporting voiceovers in 8 languages: Chinese, English, Japanese, French, Italian, Portuguese, Spanish, and Hindi.

Project address: https://github.com/jianchang512/kokoro-uiapi

Web Interface

Default UI address after startup: http://127.0.0.1:5066

  • Supports voiceovers for text and SRT subtitles
  • Supports online listening and downloading
  • Supports subtitle alignment

Installation Instructions

Windows

For Win10/11, you can directly download the integration package and double-click start.bat to launch. For GPU acceleration, please ensure you have an NVIDIA graphics card and CUDA12 installed.

Baidu Netdisk Download: https://pan.baidu.com/s/1jTB84E3-gaLqFrl32f4sDw?pwd=xnwp

GitHub Download (excluding models, requires VPN for online download): https://github.com/jianchang512/kokoro-uiapi/releases/download/v0.1/kokoro-uiapi-noModels-v0.2.7z

Linux/MacOS

First, ensure your system has python3.8+ installed (recommended 3.10-3.11).

On Linux, pre-install ffmpeg using apt install ffmpeg or yum install ffmpeg.

On MacOS, install ffmpeg using brew install ffmpeg.

  1. Pull the source code: git clone https://github.com/jianchang512/kokoro-uiapi
  2. Create and activate a virtual environment:
    cd kokoro-uiapi
    python3 -m venv venv
    . venv/bin/activate
  3. Install dependencies: pip3 install -r requirements.txt
  4. Start: python3 app.py

Usage in pyVideoTrans

  1. First, start this project. For the Windows integration package, double-click start.bat. For source code installations, execute python3 app.py.

  2. Upgrade pyVideoTrans to v3.48+. Open Menu -- TTS Settings - Kokoro TTS -- and fill in the HTTP address with http://127.0.0.1:5066.

OpenAI API Compatibility

The API is compatible with the OpenAI TTS API.

Default API address after startup: http://127.0.0.1:5066/v1/audio/speech

Request method: POST Request data: application/json

{
		input: Text to be voiced,
		voice: Voice actor,
		speed: Speech speed (default 1.0)
}

Successful response returns MP3 audio data.

OpenAI SDK Usage Example

from openai import OpenAI
client = OpenAI(
    api_key='123456',
    base_url='http://127.0.0.1:5066/v1'
)

try:
    response = client.audio.speech.create(
		model='tts-1',
        input='Hello, dear friends',
        voice='zf_xiaobei',
        response_format='mp3',
        speed=1.0
	)
    with open('./test_openai.mp3', 'wb') as f:
        f.write(response.content)
    print("MP3 file saved successfully to test_openai.mp3")
except Exception as e:
    print(f"An error occurred: {e}")

Voice Actor List

English Voice Actors:


af_alloy
af_aoede
af_bella
af_jessica
af_kore
af_nicole
af_nova
af_river
af_sarah
af_sky
am_adam
am_echo
am_eric
am_fenrir
am_liam
am_michael
am_onyx
am_puck
am_santa
bf_alice
bf_emma
bf_isabella
bf_lily
bm_daniel
bm_fable
bm_george
bm_lewis

Chinese Voice Actors:

zf_xiaobei
zf_xiaoni
zf_xiaoxiao
zf_xiaoyi
zm_yunjian
zm_yunxi
zm_yunxia
zm_yunyang

Japanese Voice Actors:

jf_alpha
jf_gongitsune
jf_nezumi
jf_tebukuro
jm_kumo

French Voice Actors: ff_siwis

Italian Voice Actors: if_sara,im_nicola

Hindi Voice Actors: hf_alpha,hf_beta,hm_omega,hm_psi

Spanish Voice Actors: ef_dora,em_alex,em_santa

Portuguese Voice Actors: pf_dora,pm_alex,pm_santa

Proxy VPN

For source code deployments, you need to download the voice PT files from huggingface.co. You need to set up a global or system proxy in advance to ensure access.

Alternatively, you can download the model in advance and extract it to the directory where app.py is located.

Model download address: https://github.com/jianchang512/kokoro-uiapi/releases/download/v0.1/moxing--jieya--dao--app.py--mulu.7z

Credit