XiaoHongShu, a popular social media and e-commerce platform, has open-sourced FireRedASR, an automatic speech recognition (ASR) project that demonstrates excellent performance in Chinese speech recognition. Previously, they had only open-sourced a smaller Acoustic Encoding-Decoding (AED) model. More recently, they released a larger Language Model (LLM), which has significantly improved recognition accuracy.
This ASR model has been integrated into a convenient all-in-one package and can be easily used within the pyVideoTrans video translation software.
All-in-One Package Download and Model Details
Model Size:
- AED Model (model.pth.tar): 4.35GB
- LLM Model: Consists of two models
- XiaoHongShu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): Totaling 17GB
The total model size is approximately 21GB. Even when compressed into 7z format, it still exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or cloud storage. Therefore, the all-in-one package only contains the program's core files and does not include any model files.
After downloading the package, please follow the steps below to download the model files separately and place them in the specified location.
Note: The model files are hosted on huggingface.co. This website may not be directly accessible in some regions, and you may need a VPN to download them.
Download the All-in-One Package Core Files
The core files are relatively small, at 1.7GB. You can directly download them in your browser from the following address:
https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z
After downloading, extract the archive. You should see a file structure similar to the image below:
Download the AED Model
Downloading the AED model is straightforward and involves downloading only one file.
Download the
model.pth.tar
file.Download address:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-AED-L
folder within the all-in-one package directory.
After downloading, the file storage location should look like this:
Download the LLM Model
Downloading the LLM model is slightly more complex, requiring the download of a total of 5 files (1 XiaoHongShu model + 4 Qwen2 models).
1. Download the XiaoHongShu Model (model.pth.tar):
Download address: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-LLM-L
folder within the all-in-one package. Ensure the folder name containsLLM
to avoid placing the file in the wrong location.
The file storage location should look like this:
2. Download the Qwen2 Model (4 files):
Download the files from the following 4 links separately and place them into the
pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instruct
folder within the all-in-one package.- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00001-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00002-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00003-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00004-of-00004.safetensors?download=true
After downloading, the Qwen2-7B-Instruct
folder should contain 4 files, as shown in the image below:
Launch the All-in-One Package
Once all model files have been downloaded and correctly placed, double-click the 启动.bat
file in the all-in-one package directory to start the program.
After the program starts, it will automatically open the address http://127.0.0.1:5078
in your browser. If you see the interface below, it indicates that the program has started successfully and is ready for use.
Using it in Video Translation Software
If you want to use the FireRedASR model in the pyVideoTrans video translation software, follow these steps:
Ensure you have downloaded and placed the model files as described above and that you have successfully launched the all-in-one package.
Open the pyVideoTrans software.
In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition & Compatible AI.
In the settings interface, fill in the relevant information as shown in the image below.
After filling in the information, click Save.
In the speech recognition channel selection, choose OpenAI Speech Recognition.
API Address:
Default Address: http://127.0.0.1:5078/v1
Using with OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key='123456',
base_url='http://127.0.0.1:5078/v1')
audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
timeout=86400
)
print(transcript.text)