Skip to content

Custom Speech Recognition API

From version v3.56, Gladia's speech recognition service is supported in this custom speech recognition channel. Please check this tutorial for specific instructions on how to use it.

If you are not satisfied with the existing speech recognition methods, you can customize your own speech recognition API. Simply fill in the relevant information in Menu - Speech Recognition Settings - Custom Speech Recognition API.

image-20240901132849867

Fill in your API address, starting with http. A wav format audio data with a sampling rate of 16k and 1 channel will be sent to the API address you filled in with the key name "audio". If your API has key verification, fill in the relevant password in the key box. The password will be appended to the API address as sk=password.

requests.post(api_url, files={"audio": open(audio_file, 'rb')})

Your API needs to return JSON format data. When it fails, set the code to 1 and the msg to the reason for the recognition failure.

Returned on failure:

res={
	"code":1,
	"msg":"Reason for error"
}

Returned on success:

res={
	"code":0,
	"data":[
		{
			"text":"Subtitle text",
			"time":'00:00:01,000 --> 00:00:06,500'
		},
		{
			"text":"Subtitle text",
			"time":'00:00:06,900 --> 00:00:12,200'
		},
		...more
	]
}

As follows, if you fill in the key password value, it will be appended to the api_url and sent. api_url?sk=filled in sk value

requests.post(api_url, files={"audio": open(audio_file, 'rb')})

#Returned on failure
res={
	"code":1,
	"msg":"Reason for error"
}

#Returned on success
res={
	"code":0,
	"data":[
		{
			"text":"Subtitle text",
			"time":'00:00:01,000 --> 00:00:06,500'
		},
		{
			"text":"Subtitle text",
			"time":'00:00:06,900 --> 00:00:12,200'
		},
	]
}