This is a webui and API project for the kokoro TTS project, supporting text-to-speech in 8 languages: Chinese, English, Japanese, French, Italian, Portuguese, Spanish, and Hindi.
Project address: https://github.com/jianchang512/kokoro-uiapi
Web Interface
Default UI address after startup: http://127.0.0.1:5066
- Supports text-to-speech for both text and SRT subtitles
- Supports online listening and downloading
- Supports subtitle alignment
Installation
Windows
For Win10/11, you can directly download the integrated package and double-click start.bat
to launch it. If you need GPU acceleration, please ensure you have an NVIDIA graphics card and have installed CUDA12.
GitHub Download: https://github.com/jianchang512/kokoro-uiapi/releases/v0.1
Linux/MacOS
First, ensure that python3.8+ is installed on the system (3.10-3.11 is recommended).
On Linux, use
apt install ffmpeg
oryum install ffmpeg
to pre-install ffmpeg.On MacOS, use
brew install ffmpeg
to install ffmpeg.
- Clone the source code:
git clone https://github.com/jianchang512/kokoro-uiapi
- Create and activate a virtual environment:
cd kokoro-uiapi python3 -m venv venv . venv/bin/activate
- Install dependencies:
pip3 install -r requirements.txt
- Start the application:
python3 app.py
Usage with pyVideoTrans
- First, start this project. For the Windows integrated package, double-click
start.bat
. For source code installations, executepython3 app.py
. - Upgrade pyVideoTrans to v3.48+. Open Menu -> TTS Settings -> Kokoro TTS -> and fill in the HTTP address as
http://127.0.0.1:5066
.
OpenAI API Compatibility
The API is compatible with the OpenAI TTS API.
Default API address after startup: http://127.0.0.1:5066/v1/audio/speech
Request method: POST
Request data: application/json
{
input: Text to be voiced,
voice: Voice actor,
speed: Speech rate, default is 1.0
}
Returns MP3 audio data on success.
OpenAI SDK Example
from openai import OpenAI
client = OpenAI(
api_key='123456',
base_url='http://127.0.0.1:5066/v1'
)
try:
response = client.audio.speech.create(
model='tts-1',
input='Hello, dear friends!',
voice='zf_xiaobei',
response_format='mp3',
speed=1.0
)
with open('./test_openai.mp3', 'wb') as f:
f.write(response.content)
print("MP3 file saved successfully to test_openai.mp3")
except Exception as e:
print(f"An error occurred: {e}")
Voice Roles
English Voice Roles:
af_alloy
af_aoede
af_bella
af_jessica
af_kore
af_nicole
af_nova
af_river
af_sarah
af_sky
am_adam
am_echo
am_eric
am_fenrir
am_liam
am_michael
am_onyx
am_puck
am_santa
bf_alice
bf_emma
bf_isabella
bf_lily
bm_daniel
bm_fable
bm_george
bm_lewis
Chinese Voice Roles:
zf_xiaobei
zf_xiaoni
zf_xiaoxiao
zf_xiaoyi
zm_yunjian
zm_yunxi
zm_yunxia
zm_yunyang
Japanese Voice Roles:
jf_alpha
jf_gongitsune
jf_nezumi
jf_tebukuro
jm_kumo
French Voice Roles: ff_siwis
Italian Voice Roles: if_sara,im_nicola
Hindi Voice Roles: hf_alpha,hf_beta,hm_omega,hm_psi
Spanish Voice Roles: ef_dora,em_alex,em_santa
Portuguese Voice Roles: pf_dora,pm_alex,pm_santa
Proxy/VPN
Source code deployments require downloading voice pt files from huggingface.co. You need to set up a global or system proxy in advance to ensure access.
You can also download the model in advance and extract it to the directory where app.py
is located.
Model download address: https://github.com/jianchang512/kokoro-uiapi/releases/download/v0.1/moxing--jieya--dao--app.py--mulu.7z