Recognition Accuracy Optimization of Power Grid Dispatching Voice Interaction System Based on Wav2Vec 2.0 and Conformer
DOI:
https://doi.org/10.52152/4384Keywords:
Power Grid Dispatching, Voice Recognition Optimization, Wav2Vec 2.0 Model, Conformer Model, Feature LearningAbstract
In view of the problem that the current power grid dispatching voice interaction system is not adaptable enough to the terminology specific to the power grid dispatching field and is easily disturbed by environmental noise, resulting in command recognition errors and missing keywords, this paper constructs a deep fusion model based on Wav2Vec 2.0 self-supervised pre-training and Conformer structure, aiming to achieve Automatic Speech Recognition (ASR) and optimize accuracy. First, based on the Wav2Vec 2.0 model, the original dispatching voice signal is self-supervised pre-trained to extract features and capture its low-level time domain and frequency domain expressions. Then, the extracted voice features are input into the Conformer structure fine-tuned by the dispatching field corpus to achieve high-precision modeling of long-distance context. Finally, the power grid professional terminology dictionary is embedded in the decoding stage, and the spectrogram enhancement and background noise synthesis mechanism are combined to achieve end-to-end joint optimization. The results showed that the accuracy, recall, and F1 score of the speech recognition model in this article were 92.3%, 89.1%, and 90.7%, respectively, with an average of Word Error Rate (WER), Character Error Rate (CER), Weighted WER were 10.8%, 5.7%, and 13.8%, respectively; The F1 score for term recognition reached 90.7%; The recognition rate of Top-3 is above 0.75, and the complete recognition rate of instructions reaches 84.6%. Under extreme low signal-to-noise ratio conditions of -5dB, its WER control is 42.1%. The conclusion shows that the method proposed in this paper can effectively improve the accuracy and scene adaptability of ASR, provide reliable support for high-precision voice interaction in power grid scheduling, help improve the safety and reliability of power facility operations, and reduce work delays caused by misoperation or poor communication.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Min Gao, Chenguang Zhu, Lei Chen, Weizhe Sun, Wengang Wang (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.