Speaker Recognition | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Speaker Recognition/Separation

Starting from version v3.74, the three speech recognition channels—Ali FunASR Chinese/Deepgram.com/Gemini Large Model Recognition—support speaker recognition. After audio transcription, the speaker will be marked before the subtitle text.

1
00:00:01,920 --> 00:00:06,800
[spk0] Organic molecules have been discovered in the Five Old Star System. How many more people are we away from third-type contact?

2
00:00:07,260 --> 00:00:12,940
[spk1] The Weibo mission has been underway for nearly half a year, and many photos that were difficult to capture in the past have been transmitted recently.

3
00:00:13,460 --> 00:00:21,380
[spk0] In early June, astronomers published this photo in Nature, showing a ring of orange light outside the blue core.

In the subtitles above, [spk0] indicates the first speaker, [spk1] the second speaker, and so on.

Note:

Ali FunASR Chinese: Only supports recognition of Chinese pronunciation
Deepgram.com: Supports multiple languages, but Chinese performance is poor
Gemini Large Model Recognition: Supports any language

Due to limitations in current model performance, speaker recognition is not accurate.

Speaker Recognition/Separation ​

Speaker Recognition/Separation