Skip to content

An ideal translated video should have the following characteristics: accurate subtitles, appropriate length, voice tone matching the original, and perfect synchronization between subtitles, audio, and visuals.

This guide details the four steps of video translation and provides optimal configuration recommendations for each step.

Step 1: Speech Recognition

  • Goal: Convert speech in the video into subtitle files in the corresponding language.

  • Corresponding Control Element: "Speech Recognition" row

  • Best Configuration for Non-Chinese:

    • Select faster-whisper(local)
    • Choose model large-v2, large-v3, or large-v3-turbo
    • Select speech segmentation mode Overall Recognition
    • Check Preserve Original Background Audio (time-consuming)
  • Best Configuration for Chinese:

    • Select Ali FunASR
    • Select speech segmentation mode Overall Recognition
    • Check Preserve Original Background Audio (time-consuming)
  • Best Configuration for Minor Languages:

    • Select Gemini Large Model Recognition
  • Note: Processing is extremely slow without an Nvidia GPU or a configured CUDA environment for CUDA acceleration. May crash with insufficient VRAM.

Step 2: Subtitle Translation

  • Goal: Translate the subtitle file generated in Step 1 into the target language.

  • Corresponding Control Element: "Translation Channel" row

  • Best Configuration:

    • First Choice: If you have a VPN and know how to configure it, use the gemini-2.5-flash model in Menu - Translation Settings - Gemini pro (Gemini AI channel).
    • Second Choice: If you don't have a VPN or cannot configure a proxy, select DeepSeek in the "Translation Channel".

    How to use GeminiAI: https://pyvideotrans.com/gemini.html

Step 3: Dubbing

  • Goal: Generate dubbing based on the translated subtitle file.

  • Corresponding Control Element: "Dubbing Channel" row

  • Best Configuration:

    • Edge-TTS: Free and supports all languages
    • Chinese or English: F5-TTS/Index-TTS(local)
    • Japanese or Korean: CosyVoice(local)

    Requires additional installation of corresponding F5-TTS/CosyVoice/clone-voice integration packages. See documentation: https://pyvideotrans.com/f5tts.html

Step 4: Synchronize Subtitles, Dubbing, and Video

  • Goal: Synchronize subtitles, dubbing, and video.
  • Corresponding Control Element: Sync Alignment row
  • Best Configuration:
    • When translating Chinese to English, set the Dubbing Speed value (e.g., 10 or 15) to speed up dubbing, as English sentences are typically longer.
    • Check both Speed Up Dubbing and Slow Down Video options to force alignment of subtitles, audio, and video.

Output Video Quality Control

  • Default output is lossy compression. For lossless output, go to Menu - Tools - Advanced Options - Video Output Control area, and set Video Transcoding Loss Control to 0:
  • Note: If the original video is not in mp4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but it is usually minimal. Increasing video quality significantly reduces processing speed and increases output video size.