Skip to content

MegaTTS3 is an open-source Chinese-English voice cloning project from ByteDance with impressive results. However, the official installation documentation is somewhat brief, especially for Windows systems, where many users report installation difficulties. This tutorial aims to help solve these problems and successfully install and use MegaTTS3 on Windows.

Before starting, let's understand a few basic concepts used in this tutorial:

  • CMD Console (Command Prompt):
    • How to open: In the address bar of the folder you want to work in (e.g., D:/python/megatts3), delete the original path, type cmd, and press Enter. Open CMD Console
    • Purpose: A black window will pop up; this is the CMD console. All commands mentioned in this tutorial are entered here and executed by pressing Enter. CMD Console Example
  • Execute a Command:
    • Type a specific line of text (the "command") into the CMD console and press Enter.

Initial Installation and Configuration

Strongly Recommended: Use Miniconda to deploy MegaTTS3 on Windows systems to avoid many unnecessary issues. The following tutorial is based on Miniconda. Example Path: This tutorial assumes your working directory (where MegaTTS3 is installed) is D:/python/megatts3. If your path is different, please modify the paths in the commands accordingly.

Step 1: Install Miniconda

  1. Download Miniconda:

    • Visit in your browser: https://www.anaconda.com/download/success#miniconda
    • Find the Miniconda Installers section on the page and click the download link. Click Download in the Miniconda Installers Section
  2. Install Miniconda:

    • Double-click the downloaded .exe installer file.
    • Click Next through the steps, and click I Agree on the license agreement page. Click Next
    • Crucial Step: During the installation options, be sure to check the second checkbox "Add Miniconda3 to my PATH environment variable". Ignore the red warning text next to it; please check it. Select the First and Second Checkboxes
    • Continue clicking Next or Install until the installation is complete.

Step 2: Download MegaTTS3 Source Code

  1. Access the Official Repository:

    • Open the URL https://github.com/bytedance/MegaTTS3
  2. Download the Code:

    • Click the green <>Code button, then select Download ZIP.
    • Click Download ZIP to Download the Archive
  3. Extract and Place Files:

    • Extract the downloaded MegaTTS3-main.zip file.
    • Copy all files and subfolders inside the extracted MegaTTS3-main folder to your prepared working directory, e.g., D:/python/megatts3. All Files Inside the Second Layer of the Archive
    • After copying, the D:/python/megatts3 folder should contain folders like assets, checkpoints, tts, etc. Correct File List After Extraction and Copying

Step 3: Create and Activate a Virtual Environment

  1. Open CMD Console:

    • Navigate to your working directory D:/python/megatts3.
    • Type cmd in the address bar and press Enter. Open CMD Console
  2. Create Virtual Environment:

    • In the CMD console, enter the following command to create an environment named megatts3env using Python 3.10:
bash
conda create -n megatts3env python=3.10

Execute Command to Create Virtual Environment During installation, if prompted with Proceed ([y]/n)?, type y and press Enter. Type y, then Enter

  1. Activate Virtual Environment:
    • After creation, enter the following command to activate the environment (you must execute this step to activate the virtual environment every time before running MegaTTS3):
bash
conda activate megatts3env

Activate Environment

Upon successful activation, the command prompt will show (megatts3env) at the beginning.

After Activation, (megatts3env) Appears at the Start

Note: All following installation and run commands must be executed in the CMD console with the (megatts3env) environment activated!

Step 4: Install Dependency Libraries

Special Note: Installing directly according to the official repository documentation on Windows usually fails. Please strictly follow the order below.

  1. Install pynini:

    • In the activated CMD console, enter and execute:
      bash
      conda install -y -c conda-forge pynini==2.1.5
    • Wait for the command to complete.
  2. Install WeTextProcessing 1.0.3:

    • Continue in the CMD console, enter and execute:
      bash
      pip install WeTextProcessing==1.0.3
    • Wait for the command to complete.
  3. Modify requirements.txt and Install Remaining Dependencies:

    • Open the requirements.txt file in your working directory (D:/python/megatts3) with Notepad or another text editor.
    • Find and delete the line containing WeTextProcessing==1.0.4.1.
    • Save and close the file.
    • Return to the CMD console and execute the following command to install the remaining dependencies:
      bash
      pip install -r requirements.txt

Must Delete This Line, Otherwise It Will Definitely Error

  1. Set Environment Variable:
    • Copy the entire command below, paste it into the CMD console, and press Enter to execute. Note: If your installation directory is not D:/python/megatts3, modify the path in the command to your actual path.
      bash
      conda env config vars set PYTHONPATH="D:/python/megatts3;%PYTHONPATH%"
    • After successful setting, you need to close the current CMD window, then open a new CMD window, and reactivate the environment with conda activate megatts3env for the environment variable to take effect.

Check: If the above steps complete without errors (ignore some yellow WARN messages), the dependency environment is successfully installed. If you encounter red errors, carefully check if you followed the order strictly, especially if you correctly modified the requirements.txt file.

Installation Complete

Step 5: Download Pre-trained Models

Hint: Model files are hosted on Hugging Face Hub, which is inaccessible from some regions without a VPN.

  • Ensure your CMD console is activated with (megatts3env).
  • Execute the following command to download model files to the checkpoints folder in your working directory:
    bash
    huggingface-cli download ByteDance/MegaTTS3 --local-dir ./checkpoints --local-dir-use-symlinks False
  • Wait patiently for the download to complete. Model Downloading

Step 6: (Optional) Add GPU Acceleration Support

If your computer has an NVIDIA graphics card and CUDA 12.x installed, you can install the GPU version to accelerate speech synthesis.

  • Ensure the CMD console is activated with (megatts3env).
  • Execute the following command:
bash
    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

That's it! All installation and configuration work is complete!


Starting the MegaTTS3 Web Service

You need to start MegaTTS3 following these steps every time you want to use it.

  1. Open CMD Console:

    • Navigate to your MegaTTS3 working directory (e.g., D:/python/megatts3).
    • Type cmd in the address bar and press Enter.
  2. Activate Virtual Environment:

    • Execute the command: conda activate megatts3envActivate Environment Before Starting
  3. (Recommended) Modify Gradio Listening Address:

    • Strongly recommended before first startup: Open the file D:\python\megatts3\tts\gradio_api.py with a code editor or Notepad.
    • Scroll to the end of the file, find server_name="0.0.0.0" and change it to server_name="127.0.0.1".
    • Reason: Using 0.0.0.0 on Windows may cause numerous irrelevant error messages or even startup failure. Changing it to 127.0.0.1 is generally more stable.
    • Save the file after modification.

Change 0.0.0.0 to 127.0.0.1

After Correct Modification

  1. Start the Program:
    • In the activated CMD console, execute:
      bash
      python tts/gradio_api.py
  • If startup is successful, you will see output similar to the following in the CMD console, indicating the service is running: Screen After Successful Startup
  1. Access the Web Interface:

    • Open this address in your browser: http://127.0.0.1:7929. Open in Browser

Using MegaTTS3 for Voice Cloning

Understanding Voice Source

MegaTTS3 is currently a "semi-open source" project. This means you cannot clone any voice sample you provide. You can only use voices (latents) that ByteDance has pre-processed and published on a specific page.

  • Official Explanation: This is done for security and legal compliance reasons.
  • If you want to clone your own voice: You need to submit your audio according to the official method, wait for their review and placement on the Latents page, then you can download and use it. (Specific method described below)

Downloading Usable Voice Files

  1. Access the Google Drive Folder:

    • You need VPN access to Google services and a Google account (free to register if you don't have one).
    • Open the URL (i.e., the latens page): https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr
    • There are three subfolders here (librispeech_testclean_40, official_test_case, user_batch_1-3) containing all currently available voices.
  2. Select and Download Files:

    • Enter any folder, browse the .wav audio files, listen and select the voice you want to clone (right-click on the wav file -> Open with -> Preview to listen). Enter Folder and Select Desired Voice to CloneRight-click wav file -> Open with -> Preview to Listen
    • Important: When you decide to download a specific .wav file (e.g., speaker_xxx.wav), you must also download the corresponding .npy file with the same name (i.e., speaker_xxx.npy). These two files are paired and both are essential. After Downloading a wav, Must Also Download the Corresponding npy File
    • Save the downloaded .wav and .npy files on your computer.

Synthesizing Speech in the Web Interface

  1. Open the Web Interface:

    • Ensure the MegaTTS3 service is running and open http://127.0.0.1:7929 in your browser.
  2. Upload Voice Files:

    • Find the upload area on the page.
    • Click the "Upload.wav" area and select the .wav file you just downloaded.
    • Click the "Upload.npy" area and select the .npy file with the same name as the .wav file. Web Interface Usage
  3. Input Text and Synthesize:

    • In the "Input Text" input box, enter the Chinese or English text you want this voice to read.
    • Click the "Submit" button to execute.
  4. Get the Result:

    • Wait a short while; the synthesis process runs in the background.
    • After completion, you can directly play the generated speech in the upper right corner, or find the download button to save it as an audio file.

You have now successfully installed and used MegaTTS3 for voice cloning on Windows!

Uploading Your Own Voice for Cloning

If the voice you wish to clone is not available, you can upload it yourself.

  1. First, convert the audio file of the voice you want to clone to WAV format. The duration should not exceed 24 seconds; recommended is 5-24 seconds.
  2. Ensure the audio content is legal, does not infringe copyright, has no background noise, features clear pronunciation, and contains only one speaker.
  3. Open this URL: https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl, drag and drop your prepared WAV file into it, then wait for review and approval before it can be used.

Drag and Drop Upload

After ByteDance approves, they will create a corresponding .npy file with the same name, and place both the .wav and .npy files into the user_batch_1-3 folder on the aforementioned latens page. Then you can download this .wav file and the corresponding .npy file to use for cloning.