Recommended Best Practices for Video Translation | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

An ideal translated video should have the following characteristics: accurate subtitles, appropriate length, voice tone matching the original, and perfect synchronization between subtitles, audio, and visuals.

This guide details the four steps of video translation and provides optimal configuration recommendations for each step.

Step 1: Speech Recognition

Goal: Convert speech in the video into subtitle files in the corresponding language.
Corresponding Control Element: "Speech Recognition" row
Best Configuration for Non-Chinese:
- Select faster-whisper(local)
- Choose model large-v2, large-v3, or large-v3-turbo
- Select speech segmentation mode Overall Recognition
- Check Preserve Original Background Audio (time-consuming)
Best Configuration for Chinese:
- Select Ali FunASR
- Select speech segmentation mode Overall Recognition
- Check Preserve Original Background Audio (time-consuming)
Best Configuration for Minor Languages:
- Select Gemini Large Model Recognition
Note: Processing is extremely slow without an Nvidia GPU or a configured CUDA environment for CUDA acceleration. May crash with insufficient VRAM.

Step 2: Subtitle Translation

Goal: Translate the subtitle file generated in Step 1 into the target language.
Corresponding Control Element: "Translation Channel" row
Best Configuration:
- First Choice: If you have a VPN and know how to configure it, use the gemini-2.5-flash model in Menu - Translation Settings - Gemini pro (Gemini AI channel).
- Second Choice: If you don't have a VPN or cannot configure a proxy, select DeepSeek in the "Translation Channel".
How to use GeminiAI: https://pyvideotrans.com/gemini.html

Step 3: Dubbing

Goal: Generate dubbing based on the translated subtitle file.
Corresponding Control Element: "Dubbing Channel" row
Best Configuration:
- Edge-TTS: Free and supports all languages
- Chinese or English: F5-TTS/Index-TTS(local)
- Japanese or Korean: CosyVoice(local)
Requires additional installation of corresponding F5-TTS/CosyVoice/clone-voice integration packages. See documentation: https://pyvideotrans.com/f5tts.html

Step 4: Synchronize Subtitles, Dubbing, and Video

Goal: Synchronize subtitles, dubbing, and video.
Corresponding Control Element: Sync Alignment row
Best Configuration:
- When translating Chinese to English, set the Dubbing Speed value (e.g., 10 or 15) to speed up dubbing, as English sentences are typically longer.
- Check both Speed Up Dubbing and Slow Down Video options to force alignment of subtitles, audio, and video.

Output Video Quality Control

Default output is lossy compression. For lossless output, go to Menu - Tools - Advanced Options - Video Output Control area, and set Video Transcoding Loss Control to 0:
Note: If the original video is not in mp4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but it is usually minimal. Increasing video quality significantly reduces processing speed and increases output video size.

Step 1: Speech Recognition ​

Step 2: Subtitle Translation ​

Step 3: Dubbing ​

Step 4: Synchronize Subtitles, Dubbing, and Video ​

Output Video Quality Control ​

Step 1: Speech Recognition

Step 2: Subtitle Translation

Step 3: Dubbing

Step 4: Synchronize Subtitles, Dubbing, and Video

Output Video Quality Control