An ideal translated video features accurate and appropriately timed subtitles, voice-over tones consistent with the original, and perfect synchronization between subtitles, audio, and visuals.
This guide will walk you through the four essential steps of video translation, providing optimal configuration suggestions for each.
Step 1: Speech Recognition
Goal: Convert the audio in the video into a subtitle file in the original language.
Corresponding Control Element: "Speech Recognition" row
Best Configuration:
- Select
faster-whisper(local)
- Choose model
large-v2
,large-v3
, orlarge-v3-turbo
- For speech segmentation mode, select
Overall Recognition
- Check
Voice Denoising
(time-consuming) - Check
Preserve Original Background Sound
(time-consuming) - If the video is in Chinese, also check
Chinese Re-segmentation
- Select
Note: Without an NVIDIA graphics card or a configured CUDA environment with CUDA acceleration enabled, processing will be extremely slow. Insufficient video memory may lead to crashes.
Step 2: Subtitle Translation
Goal: Translate the subtitle file generated in the first step into the target language.
Corresponding Control Element: "Translation Channel" row
Best Configuration:
- Preferred Choice: If you have a VPN and know how to configure it, use the
gemini-1.5-flash
model in Gemini Pro (Gemini AI channel) under Menu - Translation Settings. - Second Best: If you don't have a VPN or don't know how to configure a proxy, select
OpenAI ChatGPT
in "Translation Channel" and use thechagpt-4o
series models in Menu - Translation Settings - OpenAI ChatGPT (requires a third-party relay). - Alternative: If you cannot find a suitable third-party relay, consider using domestic AIs like Moonshot AI or DeepSeek.
- In Menu - Tools/Options - Advanced Options, check the following two items:
How to use GeminiAI: https://pyvideotrans.com/gemini.html
- Preferred Choice: If you have a VPN and know how to configure it, use the
Step 3: Voice-Over
Goal: Generate voice-over based on the translated subtitle file.
Corresponding Control Element: "Voice-Over Channel" row
Best Configuration:
- Chinese or English:
F5-TTS(local)
, selectclone
for voice-over role - Japanese or Korean:
CosyVoice(local)
, selectclone
for voice-over role - Other languages:
clone-voice(local)
, selectclone
for voice-over role - The above three channels can retain the emotional tone of the original video to the greatest extent, with
F5-TTS
providing the best results.
You need to additionally install the corresponding
F5-TTS/CosyVoice/clone-voice
integration package, see the documentation: https://pyvideotrans.com/f5tts.html- Chinese or English:
Step 4: Subtitle, Voice-Over, and Visual Synchronization
- Goal: Synchronize subtitles, voice-over, and visuals.
- Corresponding Control Element:
Synchronization
row - Best Configuration:
- When translating from Chinese to English, you can set the
Voice-Over Speed
value (e.g.,10
or15
) to speed up the voice-over because English sentences are usually longer. - Check the three options:
Video Extension
,Voice-Over Acceleration
, andVideo Slowdown
to force alignment of subtitles, audio, and visuals. - In Menu - Tools/Options - Advanced Options - Subtitle/Audio/Visual Alignment area, make the following settings:
Maximum Audio Acceleration Multiple
andVideo Slowdown Multiple
can be adjusted according to the actual situation (default value is 3).
It is recommended to fine-tune whether each option is checked and what value is set based on the actual speaking speed in the video.
- When translating from Chinese to English, you can set the
Output Video Quality Control
- The default output is lossy compression. For lossless output, in Menu - Tools - Advanced Options - Video Output Control area, set
Video Transcoding Loss Control
to 0: - Note: If the original video is not in MP4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase output video size.