Understanding the Main Interface Options
As shown in the image above, here's what each option does:
- Select Video: Choose the original video you want to translate. The video must have clear human speech, without excessive noise. Otherwise, the recognition result won't be very accurate. Note that if there is no speech, it won't work, regardless of whether there are subtitles or not, because this software works by recognizing human speech to generate subtitles. You can hold down the Ctrl key to select multiple videos at once, but the spoken language in all videos must be the same.
- Translation Channel: FreeGoogle and Microsoft can be used directly without a proxy or configuration. Other translation channels are either free but require a proxy (like Google), or require configuration (like Baidu Translate, Tencent Translate, etc.). If you don't understand, it's recommended to choose Microsoft or FreeGoogle.
- Original Language: Select the language spoken in the video. For example, if the human speech in the video is English, you must select English here.
- Target Language: Select the target language to translate to. For example, if you want to translate the video to Chinese audio and embed Chinese subtitles, you should select Chinese Simplified here.
- Network Proxy Address: If you're using services that are inaccessible in your region (like Google or Gemini), you must fill in the proxy address. For example, if you're using a V2Ray software, fill in something like
http://127.0.0.1:10809
. If you don't understand proxies, don't fill this in and avoid using services that are inaccessible in your region. - Voiceover Channel: edgeTTS is free and can be used directly without configuration. Other voiceover channels require configuration or installation. If you don't understand, it's recommended to choose edgeTTS.
- Voiceover Role: Choose the speaker role. Different roles have different timbres. You need to select the target language first and then select the role.
- faster Mode: The mode used to recognize human speech in the video. If you don't understand, just select the default "faster" mode.
- tiny: The model used to recognize human speech in the video. The default includes the "tiny" model under the "faster" mode. It's recommended to choose "medium" or a larger model for higher accuracy. If you selected "faster" mode or "openai" mode, you need to download the models to the "models" directory under the software directory. The default only includes the "tiny" model under the "faster" mode. Other model download addresses: https://github.com/jianchang512/stt/releases/tag/0.0 If you don't understand and just want to try it out, choose "tiny" here. No download is required, and it can be used directly.
- Overall Recognition: Keep the default setting. No need to change.
- Embed Subtitles: The way subtitles are embedded into the video. Soft subtitles require player support to be displayed and cannot be displayed in web pages. Hard subtitles are displayed no matter where you play them, including in web pages.
- Video End: The duration of the voiceover may be longer than the original video duration. Selecting this extends the video by 10ms at the end until the voiceover is finished. It's recommended to select this.
- Voiceover Auto Speed Up: The duration of the voiceover may be longer than the original language duration. Selecting this forces the speech speed to be increased to achieve consistency. The maximum acceleration can be modified in Menu -> Tools/Advanced Settings -> Advanced Settings.
- Video Auto Slow Down: Select this to slow down the video to align the video with the audio and subtitles. The slow-down rate can also be controlled in the Advanced Settings menu.
- Keep Background Sound: Select this to keep the original background sound in the video, such as background music. If you select this, the processing speed will be slower, especially for larger videos.
- CUDA Acceleration: If you have an NVIDIA graphics card on your Windows or Linux machine, you can use it for acceleration. You need to install the CUDA environment on your machine. See the installation tutorial at https://pyvideotrans.com/gpu.html
- Clean Up Generated Files: If you repeatedly execute the process on the same video, you can select this to delete the previously generated files and regenerate them.
- Shutdown After Completion: Whether to shut down the computer after the task is completed.
- Start Processing: After everything is set up, click "Start" to execute the process.
- Import Subtitles: If you want to use existing local subtitles, you can click "Import". After importing, it will use them directly and no longer perform recognition.
- Overall Voiceover Speed: For example, 10 means the speed is increased by 10% based on the normal speed, and -10 means subtracted by 10%.
- Volume +: Adjust the volume up or down relative to the normal volume. Only effective under edgeTTS.
- Pitch +: Adjust the pitch up or down relative to the normal pitch. Only effective under edgeTTS.