CosyVoice Open Source: https://github.com/FunAudioLLM/CosyVoice
CosyVoice-api Open Source: https://github.com/jianchang512/cosyvoice-api
Supports Chinese, English, Japanese, Korean, Cantonese. Corresponding language codes are
zh|en|jp|ko|yue
Using CosyVoice in Video Translation Software
- First, update the software to version 2.08+.
- Ensure that the CosyVoice project is deployed, that the
api.py
from CosyVoice-api has been placed, and thatapi.py
has been successfully started (the API service must be running to use it in the translation software). - Open the video translation software, go to Settings (top left) -- CosyVoice: fill in the API address, which is
http://127.0.0.1:9233
by default. - Fill in the reference audio and the corresponding text.
Reference audio format:
Each line is divided into two parts by the # symbol. The first part is the path to the WAV audio, and the second part is the corresponding text content. Multiple lines can be filled in.
The optimal duration for WAV audio is 5-15 seconds. If the audio is placed in the root directory of the CosyVoice project (i.e., the same directory as webui.py), simply enter the name here.
If it is placed in the wavs directory under the root directory, then you need to enter wavs/audio_name.wav
Reference audio example:
1.wav#Hello dear friends
wavs/2.wav#Hello friends
- After filling in, select CosyVoice as the dubbing channel on the main interface, and select the corresponding role. The "clone" role copies the timbre from the original video.
For other systems, please deploy CosyVoice first. The specific deployment method is as follows:
Source Code Deployment of the Official CosyVoice Project
Deployment uses conda, and this method is strongly recommended. Otherwise, installation may fail, and you may encounter many problems. Some dependencies cannot be installed successfully with pip on Windows, such as
pynini
.
1. Download and Install Miniconda
Miniconda is a conda management software that is easy to install on Windows, just like a regular software, just click next to complete.
Download address: https://docs.anaconda.com/miniconda/
After downloading, double-click the .exe file.
The only thing to note is that on the interface below, you need to select the top 2 checkboxes. Otherwise, the subsequent operations will be a bit troublesome. The second box selected means "Add conda command to system environment variables". If you do not select it, you will not be able to directly use the conda short command.
Then click "install" and wait for it to complete before closing.
2. Download the CosyVoice Source Code
First create an empty directory, such as creating a folder D:/py
under the D drive. The following will use this as an example.
Open the CosyVoice open source address: https://github.com/FunAudioLLM/CosyVoice
After downloading and decompressing, copy all the files in the CosyVoice-main directory to D:/py.
3. Create and Activate a Virtual Environment
Enter the D:/py folder, enter cmd
in the address bar, and press Enter. This will open a black cmd window.
In this window, enter the command conda create -n cosyvoice python=3.10
and press Enter. This will create a virtual environment named "cosyvoice" with Python version "3.10".
Continue to enter the command conda activate cosyvoice
and press Enter. This activates the virtual environment. Only after activation can you continue to install and start, otherwise errors are inevitable.
The sign of activation is that the beginning of the command line adds the "(cosyvoice)" character.
4. Install the pynini
Module
This module can only be installed with the conda command under Windows, which is why it is recommended to use conda on Windows at the beginning.
Continue to enter the command conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3
in the cmd window that has been opened and activated, and press Enter.
Note: During installation, a prompt requiring confirmation will appear. At this time, enter y
and press Enter, as shown in the figure below.
5. Install Other Dependencies Using the Alibaba Cloud Mirror
Open the
requirements.txt
file, delete the last lineWeTextProcessing==1.0.3
. Otherwise, the installation will definitely fail, because this module depends onpynini
, and pynini cannot be installed under Windows pip.Then add 3 lines
Matcha-TTS
,flask
andwaitress
to the requirements.txt file.
Continue to enter the command
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
And press Enter. After a long wait, the installation will be successful without any accidents.
6. Download the api.py
File and Place it in the Project
Go to this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py to download the api.py
file, and place it with webui.py
after downloading.
Start the API Service
The API interface address is:
http://127.0.0.1:9233
Enter the command and execute python api.py
API Interface List
Synthesize Text Based on Built-in Roles
Interface address: /tts
Simply synthesize text into speech without timbre cloning
Required parameters:
text
: The text to be synthesized into speech
role
: Choose one of '中文女' (Chinese Female), '中文男' (Chinese Male), '日语男' (Japanese Male), '粤语女' (Cantonese Female), '英文女' (English Female), '英文男' (English Male), '韩语女' (Korean Female)
Successful return: WAV audio data
Sample code
data={
"text":"你好啊亲爱的朋友们",
"reference_audio":"10.wav"
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
Same Language Timbre Cloning Synthesis
- Address: /clone_eq
The pronunciation language of the reference audio is the same as the text language to be synthesized. For example, the reference audio is Chinese pronunciation, and the Chinese text needs to be synthesized into speech based on this audio.
- Required parameters:
text
: The text to be synthesized into speech
reference_audio
: The reference audio for cloning the timbre
reference_text
: The text content corresponding to the reference audio The path of the reference audio relative to api.py
. For example, if you are referencing 1.wav
and the file is in the same folder as api.py
, fill in 1.wav
Successful return: WAV data
Sample code
data={
"text":"你好啊亲爱的朋友们。",
"reference_audio":"10.wav",
"reference_text":"希望你过的比我更好哟。"
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
Different Language Timbre Cloning:
- Address: /cone
The pronunciation language of the reference audio is different from the text language to be synthesized. For example, you need to synthesize an English text into speech based on the reference audio of Chinese pronunciation.
- Required parameters:
text
: The text to be synthesized into speech
reference_audio
: The reference audio for cloning the timbre The path of the reference audio relative to api.py
. For example, if you are referencing 1.wav
and the file is in the same folder as api.py
, fill in 1.wav
Successful return: WAV data
Sample code
data={
"text":"親友からの誕生日プレゼントを遠くから受け取り、思いがけないサプライズと深い祝福に、私の心は甘い喜びで満たされた!。",
"reference_audio":"10.wav"
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
Compatible with OpenAI TTS
- Interface address /v1/audio/speech
- Request method POST
- Request type Content-Type: application/json
- Request parameters
input
: The text to synthesizemodel
: Fixed to tts-1, compatible with OpenAI parameters, not actually usedspeed
: Speech rate, default 1.0reponse_format
: Return format, fixed WAV audio datavoice
: Only used for text synthesis, choose one of '中文女' (Chinese Female), '中文男' (Chinese Male), '日语男' (Japanese Male), '粤语女' (Cantonese Female), '英文女' (English Female), '英文男' (English Male), '韩语女' (Korean Female)
When used for cloning, fill in the path of the referenced audio relative to api.py. For example, if you are referencing
1.wav
and the file is in the same folder asapi.py
, fill in1.wav
- Sample code
from openai import OpenAI
client = OpenAI(api_key='12314', base_url='http://127.0.0.1:9933/v1')
with client.audio.speech.with_streaming_response.create(
model='tts-1',
voice='中文女',
input='你好啊,亲爱的朋友们',
speed=1.0
) as response:
with open('./test.wav', 'wb') as f:
for chunk in response.iter_bytes():
f.write(chunk)