MegaTTS3 A Semi-Open-Source Voice Cloning Toolkit - From Zero to Hero | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

MegaTTS3 is an open-source Chinese and English voice cloning project from ByteDance that delivers impressive results. However, the official installation documentation can be a bit sparse, especially for Windows users, leading to installation challenges. This tutorial aims to guide you through these hurdles, enabling you to successfully install and use MegaTTS3 on Windows.

Before we begin, let's clarify some basic concepts used throughout this tutorial:

CMD Console (Command Prompt):
- How to open: In the address bar of the folder you want to work in (e.g., D:/python/megatts3), delete the existing path, type cmd, and press Enter.
- Purpose: This will open a black window, which is the CMD console. All commands mentioned in this tutorial are entered and executed here by pressing Enter.
Executing Commands:
- Type a specified line of text (i.e., the "command") into the CMD console and press Enter.

Initial Installation and Configuration

Strongly Recommended: Use Miniconda to deploy MegaTTS3 on Windows to avoid unnecessary issues. The following tutorial is based on Miniconda. Example Path: This tutorial assumes your working directory (where you're installing MegaTTS3) is D:/python/megatts3. If your path is different, adjust the commands accordingly.

Step 1: Install Miniconda

Download Miniconda:
- Visit https://www.anaconda.com/download/success#miniconda in your browser.
- Locate the Miniconda Installers section and click the download link for your operating system.
Install Miniconda:
- Double-click the downloaded .exe installation file.
- Click Next through the initial screens and click I Agree on the license agreement page.
- Crucial Step: During the installation options, make sure to check the second checkbox: "Add Miniconda3 to my PATH environment variable". Ignore the red warning text next to it; it's essential to check this box.
- Continue clicking Next or Install until the installation is complete.

Step 2: Download MegaTTS3 Source Code

Visit the Official Repository:
- Open https://github.com/bytedance/MegaTTS3.
Download the Code:
- Click the green <> Code button and select Download ZIP.
Extract and Place Files:
- Extract the downloaded MegaTTS3-main.zip file.
- Copy all the files and subfolders inside the extracted MegaTTS3-main folder to your prepared working directory, e.g., D:/python/megatts3.
- After copying, the D:/python/megatts3 folder should contain folders like assets, checkpoints, and tts.

Step 3: Create and Activate a Virtual Environment

Open CMD Console:
- Navigate to your working directory D:/python/megatts3.
- Type cmd in the address bar and press Enter.
Create a Virtual Environment:
- In the CMD console, enter the following command to create an environment named megatts3env using Python 3.10:
bash
```
conda create -n megatts3env python=3.10
```
1
During installation, if prompted with Proceed ([y]/n)?, type y and press Enter.
Activate the Virtual Environment:
- After creation, enter the following command to activate the environment (you need to execute this step to activate the virtual environment every time you run MegaTTS3):
bash
```
conda activate megatts3env
```
1
Upon successful activation, (megatts3env) will appear before the command prompt.

Note: All installation and execution commands below must be executed in the CMD console with the (megatts3env) environment activated!

Step 4: Install Dependencies

Special Note: Direct installation on Windows according to the official repository documentation usually fails. Be sure to follow the command execution order below strictly.

Install pynini:
- In the activated CMD console, enter and execute:
  bash
```
conda install -y -c conda-forge pynini==2.1.5
```
  1
- Wait for the command to complete.
Install WeTextProcessing 1.0.3:
- Continue in the CMD console, enter and execute:
  bash
```
pip install WeTextProcessing==1.0.3
```
  1
- Wait for the command to complete.
Modify requirements.txt and Install Remaining Dependencies:
- Open the requirements.txt file in your working directory (D:/python/megatts3) using Notepad or another text editor.
- Find and delete the line containing WeTextProcessing==1.0.4.1.
- Save and close the file.
- Return to the CMD console and execute the following command to install the remaining dependencies:
  bash
```
pip install -r requirements.txt
```
  1

Must Delete This Line, Otherwise an Error Is Guaranteed

Set Environment Variables:
- Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not D:/python/megatts3, please modify the path in the command to your actual path.
  bash
```
conda env config vars set PYTHONPATH="D:/python/megatts3;%PYTHONPATH%"
```
  1
- After the setting is successful, you need to close the current CMD window, then reopen a new CMD window, and reactivate the environment conda activate megatts3env, so that the environment variable will take effect.

Check: If all the above steps are completed without any errors (ignore some yellow warning messages WARN), the dependent environment is installed successfully. If you encounter a red error, please carefully check whether you have strictly followed the execution order, especially whether you have correctly modified the requirements.txt file.

Installation Complete

Step 5: Download Pre-trained Models

Hint: The model files are hosted on Hugging Face Hub, which may be inaccessible from within mainland China without a VPN.

Make sure your CMD console is in the activated (megatts3env) state again.
Execute the following command to download the model files to the checkpoints folder in your working directory:
bash
```
huggingface-cli download ByteDance/MegaTTS3 --local-dir ./checkpoints --local-dir-use-symlinks False
```
1
Wait patiently for the download to complete.

Step 6: (Optional) Add GPU Acceleration Support

If your computer is equipped with an NVIDIA graphics card and you have installed CUDA version 12.x, you can accelerate voice synthesis by installing the GPU version.

Make sure the CMD console is activated with (megatts3env).
Execute the following command:

bash

    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

At this point, all installation and configuration work is complete!

Start the MegaTTS3 Web Service

Each time you want to use MegaTTS3, you need to start it according to the following steps.

Open CMD Console:
- Navigate to your MegaTTS3 working directory (e.g., D:/python/megatts3).
- Type cmd in the address bar and press Enter.
Activate Virtual Environment:
- Execute the command: conda activate megatts3env
(Recommended) Modify Gradio Listening Address:
- It is strongly recommended to perform this step before the first startup: Open the file D:\python\megatts3\tts\gradio_api.py with a code editor or Notepad.
- Scroll to the end of the file and find server_name="0.0.0.0" and modify it to server_name="127.0.0.1".
- Reason: Using 0.0.0.0 on Windows may cause a large amount of irrelevant error information to be output, or even cause the startup to fail. Modifying it to 127.0.0.1 is usually more stable.
- Save the file after modification.

Modify 0.0.0.0 to 127.0.0.1

Correctly Modified

Start the Program:
- In the activated CMD console, execute:
  bash
```
python tts/gradio_api.py
```
  1

If the startup is successful, you will see output similar to the following in the CMD console, indicating that the service is running:

Access the Web Interface:
- Open http://127.0.0.1:7929 in your browser.

Using MegaTTS3 for Voice Cloning

Understanding Voice Source

MegaTTS3 is currently a "semi-open-source" project. This means that you cannot clone any voice sample you provide. You can only use the voices (latents) that ByteDance has officially pre-processed and published on a specific page.

Official Explanation: This is done for security and legal compliance reasons.
If you want to clone your own voice: You need to submit your audio in the manner specified by the official guidelines, wait for them to review and approve it, and then download and use it after it is placed on the Latents page. (See below for details)

Download Available Voice Files

Access the Google Drive Folder:
- You need to use a VPN to access Google services and have a Google account (which you can register for free if you don't have one).
- Open the website (i.e., the latents page): https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr
- There are three subfolders here (librispeech_testclean_40, official_test_case, user_batch_1-3), which contain all currently available voices.
Select and Download Files:
- Enter any folder, browse the .wav audio files inside, listen and select the voice you want to clone (right-click on the wav file - Open with - Preview, you can listen to it).
- Important: When you decide to download a .wav file (e.g., speaker_xxx.wav), you must also download the .npy file with the same name (i.e., speaker_xxx.npy). These two files are used in pairs and are indispensable.
- Save the downloaded .wav and .npy files on your computer.

Synthesize Speech in the Web Interface

Open the Web Interface:
- Make sure the MegaTTS3 service is running and open http://127.0.0.1:7929 in your browser.
Upload Voice Files:
- Find the upload area on the page.
- Click the "Upload.wav" area and select the .wav file you just downloaded.
- Click the "Upload.npy" area and select the .npy file with the same name as the .wav file.
Enter Text and Synthesize:
- In the "Input Text" input box, enter the Chinese or English text you want this voice to read.
- Click the "Submit" button to execute.
Get Results:
- Wait a short period of time, the synthesis process will be performed in the background.
- Once completed, you can directly play the generated speech in the upper right corner, or find the download button to save it as an audio file.

You have now successfully installed and used MegaTTS3 on Windows for voice cloning!

Upload the Voice You Want to Clone Yourself

If the voice you want to clone does not exist, you can upload it yourself.

First, convert the audio file of the voice you want to clone into a wav format audio, the duration should not exceed 24 seconds, it is recommended to be within 5-24 seconds.
It must be ensured that the audio content is legal, does not infringe copyright, and has no background noise, clear pronunciation, and one speaker.
Open this website https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl , drag and drop the wav file you have organized into it, and then wait for approval before you can use it.

Drag and Drop to Upload

After ByteDance passes the review, it will create an npy file with the same name, and then put both the wav and npy files into the user_batch_1-3 folder of the latens page mentioned above, and then you can download this wav file and the npy file with the same name to clone it.

Initial Installation and Configuration ​

Step 1: Install Miniconda ​

Step 2: Download MegaTTS3 Source Code ​

Step 3: Create and Activate a Virtual Environment ​

Step 4: Install Dependencies ​

Step 5: Download Pre-trained Models ​

Step 6: (Optional) Add GPU Acceleration Support ​

Start the MegaTTS3 Web Service ​

Using MegaTTS3 for Voice Cloning ​

Understanding Voice Source ​

Download Available Voice Files ​

Synthesize Speech in the Web Interface ​

Upload the Voice You Want to Clone Yourself ​