MegaTTS3 is an open-source Chinese and English voice cloning project from ByteDance that delivers impressive results. However, the official installation documentation can be a bit sparse, especially for Windows users, leading to installation challenges. This tutorial aims to guide you through these hurdles, enabling you to successfully install and use MegaTTS3 on Windows.
Before we begin, let's clarify some basic concepts used throughout this tutorial:
- CMD Console (Command Prompt):
- How to open: In the address bar of the folder you want to work in (e.g.,
D:/python/megatts3
), delete the existing path, typecmd
, and press Enter. - Purpose: This will open a black window, which is the CMD console. All commands mentioned in this tutorial are entered and executed here by pressing Enter.
- How to open: In the address bar of the folder you want to work in (e.g.,
- Executing Commands:
- Type a specified line of text (i.e., the "command") into the CMD console and press Enter.
Initial Installation and Configuration
Strongly Recommended: Use
Miniconda
to deploy MegaTTS3 on Windows to avoid unnecessary issues. The following tutorial is based onMiniconda
. Example Path: This tutorial assumes your working directory (where you're installing MegaTTS3) isD:/python/megatts3
. If your path is different, adjust the commands accordingly.
Step 1: Install Miniconda
Download Miniconda:
- Visit
https://www.anaconda.com/download/success#miniconda
in your browser. - Locate the
Miniconda Installers
section and click the download link for your operating system.
- Visit
Install Miniconda:
- Double-click the downloaded
.exe
installation file. - Click
Next
through the initial screens and clickI Agree
on the license agreement page. - Crucial Step: During the installation options, make sure to check the second checkbox: "Add Miniconda3 to my PATH environment variable". Ignore the red warning text next to it; it's essential to check this box.
- Continue clicking
Next
orInstall
until the installation is complete.
- Double-click the downloaded
Step 2: Download MegaTTS3 Source Code
Visit the Official Repository:
- Open
https://github.com/bytedance/MegaTTS3
.
- Open
Download the Code:
- Click the green
<> Code
button and selectDownload ZIP
.
- Click the green
Extract and Place Files:
- Extract the downloaded
MegaTTS3-main.zip
file. - Copy all the files and subfolders inside the extracted
MegaTTS3-main
folder to your prepared working directory, e.g.,D:/python/megatts3
. - After copying, the
D:/python/megatts3
folder should contain folders likeassets
,checkpoints
, andtts
.
- Extract the downloaded
Step 3: Create and Activate a Virtual Environment
Open CMD Console:
- Navigate to your working directory
D:/python/megatts3
. - Type
cmd
in the address bar and press Enter.
- Navigate to your working directory
Create a Virtual Environment:
- In the CMD console, enter the following command to create an environment named
megatts3env
using Python 3.10:
bashconda create -n megatts3env python=3.10
During installation, if prompted with
Proceed ([y]/n)?
, typey
and press Enter.- In the CMD console, enter the following command to create an environment named
Activate the Virtual Environment:
- After creation, enter the following command to activate the environment (you need to execute this step to activate the virtual environment every time you run MegaTTS3):
bashconda activate megatts3env
Upon successful activation,
(megatts3env)
will appear before the command prompt.
Note: All installation and execution commands below must be executed in the CMD console with the (megatts3env)
environment activated!
Step 4: Install Dependencies
Special Note: Direct installation on Windows according to the official repository documentation usually fails. Be sure to follow the command execution order below strictly.
Install pynini:
- In the activated CMD console, enter and execute:bash
conda install -y -c conda-forge pynini==2.1.5
- Wait for the command to complete.
- In the activated CMD console, enter and execute:
Install WeTextProcessing 1.0.3:
- Continue in the CMD console, enter and execute:bash
pip install WeTextProcessing==1.0.3
- Wait for the command to complete.
- Continue in the CMD console, enter and execute:
Modify requirements.txt and Install Remaining Dependencies:
- Open the
requirements.txt
file in your working directory (D:/python/megatts3
) using Notepad or another text editor. - Find and delete the line containing
WeTextProcessing==1.0.4.1
. - Save and close the file.
- Return to the CMD console and execute the following command to install the remaining dependencies:bash
pip install -r requirements.txt
- Open the
- Set Environment Variables:
- Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not
D:/python/megatts3
, please modify the path in the command to your actual path.bashconda env config vars set PYTHONPATH="D:/python/megatts3;%PYTHONPATH%"
- After the setting is successful, you need to close the current CMD window, then reopen a new CMD window, and reactivate the environment
conda activate megatts3env
, so that the environment variable will take effect.
- Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not
Check: If all the above steps are completed without any errors (ignore some yellow warning messages WARN), the dependent environment is installed successfully. If you encounter a red error, please carefully check whether you have strictly followed the execution order, especially whether you have correctly modified the requirements.txt
file.
Step 5: Download Pre-trained Models
Hint: The model files are hosted on Hugging Face Hub, which may be inaccessible from within mainland China without a VPN.
- Make sure your CMD console is in the activated
(megatts3env)
state again. - Execute the following command to download the model files to the
checkpoints
folder in your working directory:bashhuggingface-cli download ByteDance/MegaTTS3 --local-dir ./checkpoints --local-dir-use-symlinks False
- Wait patiently for the download to complete.
Step 6: (Optional) Add GPU Acceleration Support
If your computer is equipped with an NVIDIA graphics card and you have installed CUDA version 12.x, you can accelerate voice synthesis by installing the GPU version.
- Make sure the CMD console is activated with
(megatts3env)
. - Execute the following command:
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
At this point, all installation and configuration work is complete!
Start the MegaTTS3 Web Service
Each time you want to use MegaTTS3, you need to start it according to the following steps.
Open CMD Console:
- Navigate to your MegaTTS3 working directory (e.g.,
D:/python/megatts3
). - Type
cmd
in the address bar and press Enter.
- Navigate to your MegaTTS3 working directory (e.g.,
Activate Virtual Environment:
- Execute the command:
conda activate megatts3env
- Execute the command:
(Recommended) Modify Gradio Listening Address:
- It is strongly recommended to perform this step before the first startup: Open the file
D:\python\megatts3\tts\gradio_api.py
with a code editor or Notepad. - Scroll to the end of the file and find
server_name="0.0.0.0"
and modify it toserver_name="127.0.0.1"
. - Reason: Using
0.0.0.0
on Windows may cause a large amount of irrelevant error information to be output, or even cause the startup to fail. Modifying it to127.0.0.1
is usually more stable. - Save the file after modification.
- It is strongly recommended to perform this step before the first startup: Open the file
- Start the Program:
- In the activated CMD console, execute:bash
python tts/gradio_api.py
- In the activated CMD console, execute:
- If the startup is successful, you will see output similar to the following in the CMD console, indicating that the service is running:
Access the Web Interface:
- Open
http://127.0.0.1:7929
in your browser.
- Open
Using MegaTTS3 for Voice Cloning
Understanding Voice Source
MegaTTS3 is currently a "semi-open-source" project. This means that you cannot clone any voice sample you provide. You can only use the voices (latents) that ByteDance has officially pre-processed and published on a specific page.
- Official Explanation: This is done for security and legal compliance reasons.
- If you want to clone your own voice: You need to submit your audio in the manner specified by the official guidelines, wait for them to review and approve it, and then download and use it after it is placed on the Latents page. (See below for details)
Download Available Voice Files
Access the Google Drive Folder:
- You need to use a VPN to access Google services and have a Google account (which you can register for free if you don't have one).
- Open the website (i.e., the latents page):
https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr
- There are three subfolders here (
librispeech_testclean_40
,official_test_case
,user_batch_1-3
), which contain all currently available voices.
Select and Download Files:
- Enter any folder, browse the
.wav
audio files inside, listen and select the voice you want to clone (right-click on the wav file - Open with - Preview, you can listen to it). - Important: When you decide to download a
.wav
file (e.g.,speaker_xxx.wav
), you must also download the.npy
file with the same name (i.e.,speaker_xxx.npy
). These two files are used in pairs and are indispensable. - Save the downloaded
.wav
and.npy
files on your computer.
- Enter any folder, browse the
Synthesize Speech in the Web Interface
Open the Web Interface:
- Make sure the MegaTTS3 service is running and open
http://127.0.0.1:7929
in your browser.
- Make sure the MegaTTS3 service is running and open
Upload Voice Files:
- Find the upload area on the page.
- Click the "Upload.wav" area and select the
.wav
file you just downloaded. - Click the "Upload.npy" area and select the
.npy
file with the same name as the.wav
file.
Enter Text and Synthesize:
- In the "Input Text" input box, enter the Chinese or English text you want this voice to read.
- Click the "Submit" button to execute.
Get Results:
- Wait a short period of time, the synthesis process will be performed in the background.
- Once completed, you can directly play the generated speech in the upper right corner, or find the download button to save it as an audio file.
You have now successfully installed and used MegaTTS3 on Windows for voice cloning!
Upload the Voice You Want to Clone Yourself
If the voice you want to clone does not exist, you can upload it yourself.
- First, convert the audio file of the voice you want to clone into a wav format audio, the duration should not exceed 24 seconds, it is recommended to be within 5-24 seconds.
- It must be ensured that the audio content is legal, does not infringe copyright, and has no background noise, clear pronunciation, and one speaker.
- Open this website
https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl
, drag and drop the wav file you have organized into it, and then wait for approval before you can use it.
After ByteDance passes the review, it will create an npy file with the same name, and then put both the wav and npy files into the
user_batch_1-3
folder of thelatens
page mentioned above, and then you can download this wav file and the npy file with the same name to clone it.