Speaker Recognition/Diarization
From version v3.74 onwards, the Alibaba FunASR Chinese/Deepgram.com/Gemini Large Model Recognition
channels in the speech recognition feature support speaker recognition. This means that after audio transcription, the speaker will be marked before the subtitles.
1
00:00:01,920 --> 00:00:06,800
[spk0]Organic molecules were found in the five old star system. We are close to the third kind of contact.
2
00:00:07,260 --> 00:00:12,940
[spk1]Weibo is really launching shooting missions, and recently many photos that were difficult to shoot in the past have been sent over.
3
00:00:13,460 --> 00:00:21,380
[spk0]In early June, the astronomer published this photo in Nature, with a ring of orange light outside the blue core.
In the subtitles above, [spk0]
indicates the first speaker, [spk1]
indicates the second speaker, and so on.
Note:
- Alibaba FunASR Chinese: Only supports recognizing Chinese pronunciation.
- Deepgram.com: Supports multiple languages, but the effect is not good for Chinese.
- Gemini Large Model Recognition: Supports any language.
Due to the current model performance limitations, speaker recognition is not always accurate.