Skip to content

Difference between Whole Recognition and Equal Division

Whole Recognition:

This method delivers the best speech recognition results but consumes the most computer resources. If you are working with a large video and using the large-v3 model, it may cause the program to crash.

During recognition, the entire audio file is passed to the model. The model internally uses VAD (Voice Activity Detection) for segmentation, recognition, and sentence breaking. The default silence split is 200ms, and the maximum sentence length is 3s. These settings can be configured in the Menu -- Tools/Options -- Advanced Options -- VAD area.

Equal Division:

As the name suggests, this method cuts the audio file into segments of equal length and then passes them to the model. Additionally, the OpenAI model forces the use of equal division. That is, when using the OpenAI model, whether you choose "Whole Recognition" or "Pre-Segmentation," it will forcibly use "Equal Division."

With equal division, each segment is 10 seconds long, and the silent segment interval is 500ms. These settings can be configured in the Menu -- Tools/Options -- Advanced Options -- VAD area.

Note: Even though the setting is 10 seconds, each subtitle will generally be around 10 seconds long. However, the duration of each audio clip may not be exactly 10 seconds, considering pronunciation duration and the removal of silence at the end of the audio.