Skip to content

Why Audio, Subtitles, and Video Are Out of Sync

When translating between different languages, the length of sentences changes, and the duration of pronunciation usually changes as well. For example, when translating from Chinese to English, the length of the sentence will definitely be different, and the time it takes to pronounce the Chinese sentence and the English sentence is also generally different.

Chinese: 有多远滚多远 (duō yuǎn gǔn duō yuǎn) - "Get as far away as possible"

English: Get out of here as far as you can!

Chinese: 滚远点 (gǔn yuǎn diǎn) - "Get lost"

Japanese: ここから出て行け。 (Koko kara dete ike.) - "Get out of here."

If the original video's Chinese pronunciation takes 2 seconds, and the translated English dubbing takes 4 seconds, this will inevitably lead to synchronization issues.

How to Synchronize Them (Even if the Result Isn't Perfect)

As mentioned above, if the duration before translation is 2 seconds, and the duration after translation is 4 seconds, if you only need them to be synchronized, regardless of the speed of speech or the speed of the video, you can directly speed up the audio by 2 times. The 4-second duration will be shortened to 2 seconds, naturally achieving synchronization. Or slow down the video to extend the original 2-second segment to 4 seconds, which can also achieve alignment.

Specific Steps for Audio Acceleration for Alignment:

  1. In the software interface, select "Automatic Audio Acceleration" and uncheck "Automatic Video Slowdown." image-20240902003425516
  2. Open the menu Tools - Options, and set the maximum audio acceleration multiple to 100.

This will achieve synchronization, but the drawbacks are obvious: the speech speed will fluctuate.

Steps for Video Slowdown for Alignment:

  1. Uncheck "Automatic Audio Acceleration" in the software interface and select "Automatic Video Slowdown."

    image-20240902003436797

  2. Open the menu Tools - Options, and set the maximum video slowdown multiple to 20.

This can also achieve alignment. The speech speed remains unchanged, and the video slows down, but the video will also fluctuate.

If you just want simple alignment and don't care about the effect, you can use these two methods.

A Better, More Acceptable Synchronization Method

Obviously, the above synchronization methods are not practical. Audio that is too fast or video that is too slow is unacceptable and the experience is poor. For better results, you can enable both "Automatic Audio Acceleration" and "Automatic Video Slowdown."

Specific Steps:

  1. When selecting faster mode or openai mode, try to use medium or larger models and select "Overall Recognition". image-20240902004236786

  2. In the software interface, select "Automatic Audio Acceleration" and "Automatic Video Slowdown," and set a relatively small overall acceleration value, such as 10%.

image-20240902003457505

  1. Open the menu Tools - Options, and set the maximum audio acceleration multiple to 1.8, which means the maximum speech speed accelerates to 1.8 times the normal speed. You can manually change it to a value greater than 1, such as 2 or 1.5.image-20240902003537160
  2. Open the menu Tools - Options, and set the maximum video slowdown multiple to 2, which means slowing down to 0.05 times the normal speed. You can change it to a value greater than 1, such as 3 or 5.
  3. After the above steps 1-3, it may still not be aligned because the maximum value is limited. When the maximum value is reached but it is still not aligned, it will give up and directly delay. Then you can continue to adjust the picture and subtitle related options in the menu - Tools - Options.

Is There a Perfect Synchronization Method?

Apart from manual intervention, such as streamlining the translation or adding transition shots, there is currently no perfect method that can be automated programmatically.

To simultaneously ensure that in very long or very short videos, in any language translation and dubbing, the program can automatically achieve the goals of "acceptable audio acceleration range," "acceptable video slowdown range," and "matching mouth movements with the beginning of speech," it seems like an impossible task. Apart from manual adjustment, there is no perfect method.