Skip to content

An ideal translated video features accurate and appropriately timed subtitles, voice-over tones consistent with the original, and perfect synchronization between subtitles, audio, and visuals.

This guide will walk you through the four essential steps of video translation, providing optimal configuration suggestions for each.

Step 1: Speech Recognition

  • Goal: Convert the audio in the video into a subtitle file in the original language.

  • Corresponding Control Element: "Speech Recognition" row image.png

  • Best Configuration:

    • Select faster-whisper(local)
    • Choose model large-v2, large-v3, or large-v3-turbo
    • For speech segmentation mode, select Overall Recognition
    • Check Voice Denoising (time-consuming)
    • Check Preserve Original Background Sound (time-consuming)
    • If the video is in Chinese, also check Chinese Re-segmentation
  • Note: Without an NVIDIA graphics card or a configured CUDA environment with CUDA acceleration enabled, processing will be extremely slow. Insufficient video memory may lead to crashes.

Step 2: Subtitle Translation

  • Goal: Translate the subtitle file generated in the first step into the target language.

  • Corresponding Control Element: "Translation Channel" row image.png

  • Best Configuration:

    • Preferred Choice: If you have a VPN and know how to configure it, use the gemini-1.5-flash model in Gemini Pro (Gemini AI channel) under Menu - Translation Settings.
    • Second Best: If you don't have a VPN or don't know how to configure a proxy, select OpenAI ChatGPT in "Translation Channel" and use the chagpt-4o series models in Menu - Translation Settings - OpenAI ChatGPT (requires a third-party relay).
    • Alternative: If you cannot find a suitable third-party relay, consider using domestic AIs like Moonshot AI or DeepSeek.
    • In Menu - Tools/Options - Advanced Options, check the following two items: image.png

    How to use GeminiAI: https://pyvideotrans.com/gemini.html

Step 3: Voice-Over

  • Goal: Generate voice-over based on the translated subtitle file.

  • Corresponding Control Element: "Voice-Over Channel" row image.png

  • Best Configuration:

    • Chinese or English: F5-TTS(local), select clone for voice-over role
    • Japanese or Korean: CosyVoice(local), select clone for voice-over role
    • Other languages: clone-voice(local), select clone for voice-over role
    • The above three channels can retain the emotional tone of the original video to the greatest extent, with F5-TTS providing the best results.

    You need to additionally install the corresponding F5-TTS/CosyVoice/clone-voice integration package, see the documentation: https://pyvideotrans.com/f5tts.html

Step 4: Subtitle, Voice-Over, and Visual Synchronization

  • Goal: Synchronize subtitles, voice-over, and visuals.
  • Corresponding Control Element: Synchronization row image.png
  • Best Configuration:
    • When translating from Chinese to English, you can set the Voice-Over Speed value (e.g., 10 or 15) to speed up the voice-over because English sentences are usually longer.
    • Check the three options: Video Extension, Voice-Over Acceleration, and Video Slowdown to force alignment of subtitles, audio, and visuals.
    • In Menu - Tools/Options - Advanced Options - Subtitle/Audio/Visual Alignment area, make the following settings: image.png
    • Maximum Audio Acceleration Multiple and Video Slowdown Multiple can be adjusted according to the actual situation (default value is 3).

    It is recommended to fine-tune whether each option is checked and what value is set based on the actual speaking speed in the video.

Output Video Quality Control

  • The default output is lossy compression. For lossless output, in Menu - Tools - Advanced Options - Video Output Control area, set Video Transcoding Loss Control to 0: image.png
  • Note: If the original video is not in MP4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase output video size.