Improving the Precision of Hidden Danger Recognition in Power Dispatch Duty Logs through RLHF Multi-round Human Feedback Mechanism

Authors

  • Siwu Yu Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Yumin He Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Guobang Ban Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Jintong Ma Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Guanghui Xi Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Lingwen Meng Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Shasha Luo Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author
  • Siqi Guo Electric Power Research Institute of Guizhou Power Grid Co, Guiyang, 550002, China Author

DOI:

https://doi.org/10.52152/4409

Keywords:

Power dispatch, Duty log, Hidden danger recognition, Multiple rounds of human feedback, Reinforcement learning from human feedback

Abstract

The problem of hidden danger identification in power dispatch duty logs is that professional text semantics are complex and expert annotations are scarce, resulting in insufficient recognition accuracy. This paper proposes an optimization method based on multi-round RLHF (Reinforcement Learning from Human Feedback). The reward model is trained through interactive expert feedback to drive the fine-tuning of the BERT model, and active learning is combined to screen high-value samples to achieve continuous improvement in the accuracy of hidden danger identification. A multi-dimensional reward function based on semantic similarity and hidden danger severity is designed. The reward model is trained using real-time expert scoring of the model output to quantify the recognition accuracy. With the reward model as the optimization target, the PPO (Proximal Policy Optimization) algorithm is used to fine-tune the pre-trained BERT model for multiple rounds. Active learning combines uncertainty sampling and diversity sampling strategies to give priority to log texts with low model prediction confidence and large semantic differences. Expert annotation data, reward model output, and active learning samples are jointly included in the training cycle to gradually improve model performance. Experiments show that the multi-round RLHF optimization framework significantly improves the precision and recall of hidden danger identification, can effectively deal with the scarcity of expert annotations, and shows a high coverage rate in long-tail hidden danger identification, demonstrating strong professional text semantic understanding capabilities and practical value.

References

Ji Z, Wang X, Zhang J, et al. Construction and application of knowledge graph for grid dispatch fault handling based on pre-trained model[J]. Global Energy Interconnection, 2023, 6(4): 493-504.

Huang C, Zhang E, Guo K, et al. Potential application of Six Sigma method in operation and maintenance management of UHVDC converter station[J]. International Journal of Emerging Electric Power Systems, 2023, 24(2): 151-162.

Li J, Yao J, Yu T, et al. Distributed deep reinforcement learning for integrated generation‐control and power‐dispatch of interconnected power grid with various renewable units[J]. IET Renewable Power Generation, 2022, 16(7): 1316-1335.

Khaleefah S H, Mostafa S A, Gunasekaran S S, et al. A deep learning-based fault detection and classification in smart electrical power transmission system[J]. JOIV: International Journal on Informatics Visualization, 2024, 8(2): 812-818.

Paul C, Roy P K, Mukherjee V. Optimal solution of combined heat and power dispatch problem using whale optimization algorithm[J]. International Journal of Applied Metaheuristic Computing (IJAMC), 2022, 13(1): 1-26.

Ucheniya R, Saraswat A, Siddiqui S A. Decision making under wind power generation and load demand uncertainties: a two-stage stochastic optimal reactive power dispatch problem[J]. International Journal of Modelling and Simulation, 2022, 42(1): 47-62.

Luo Q, Garcia-Menendez F, Yang H, et al. The health and climate benefits of economic dispatch in China’s power system[J]. Environmental science & technology, 2023, 57(7): 2898-2906.

Ali M H, Soliman A M A, Adel A H. Optimization of reactive power dispatch considering DG units uncertainty by dandelion optimizer algorithm[J]. International Journal of Renewable Energy Research (IJRER), 2022, 12(4): 1805-1818.

Carreño I L, Scaglione A, Saha S S, et al. Log (v) 3LPF: A linear power flow formulation for unbalanced three-phase distribution systems[J]. IEEE Transactions on Power Systems, 2022, 38(1): 100-113.

Sengupta S, Spencer T, Rodrigues N, et al. Current and future estimates of marginal emission factors for Indian power generation[J]. Environmental Science & Technology, 2022, 56(13): 9237-9250.

Hosseini M M, Rodriguez-Garcia L, Parvania M. Hierarchical combination of deep reinforcement learning and quadratic programming for distribution system restoration[J]. IEEE Transactions on Sustainable Energy, 2023, 14(2): 1088-1098.

Chen Y, Wei W. Robust generation dispatch with strategic renewable power curtailment and decision-dependent uncertainty[J]. IEEE Transactions on Power Systems, 2022, 38(5): 4640-4654.

Acharya S, Ganesan S, Kumar D V, et al. Optimization of cost and emission for dynamic load dispatch problem with hybrid renewable energy sources[J]. Soft Computing, 2023, 27(20): 14969-15001.

Huang L, Lai C S, Zhao Z, et al. Robust $ Nk $ Security-constrained Optimal Power Flow Incorporating Preventive and Corrective Generation Dispatch to Improve Power System Reliability[J]. CSEE Journal of Power and Energy Systems, 2022, 9(1): 351-364.

Wang S, Liu J, Chen H, et al. Modeling state transition and head-dependent efficiency curve for pumped storage hydro in look-ahead dispatch[J]. IEEE Transactions on power systems, 2021, 36(6): 5396-5407.

Pattanaik J K, Basu M, Dash D P. Improved real-coded genetic algorithm for reactive power dispatch[J]. IETE Journal of Research, 2022, 68(2): 1462-1474.

Chen W, Tanneau M, Van Hentenryck P. End-to-end feasible optimization proxies for large-scale economic dispatch[J]. IEEE Transactions on Power Systems, 2023, 39(2): 4723-4734.

Kamruzzaman M, Duan J, Shi D, et al. A deep reinforcement learning-based multi-agent framework to enhance power system resilience using shunt resources[J]. IEEE Transactions on Power Systems, 2021, 36(6): 5525-5536.

Bin Thaneya A, Horvath A. Exploring regional fine particulate matter (PM2. 5) exposure reduction pathways using an optimal power flow model: the case of the Illinois power grid[J]. Environmental Science & Technology, 2023, 57(21): 7989-8001.

Pandya S, Jariwala H R. Single-and multiobjective optimal power flow with stochastic wind and solar power plants using moth flame optimization algorithm[J]. Smart Science, 2022, 10(2): 77-117.

Huma Z, Muzaffar J. Hybrid AI Models for Enhanced Network Security: Combining Rule-Based and Learning-Based Approaches[J]. Global Perspectives on Multidisciplinary Research, 2024, 5(3): 52-63.

Ayo F E, Awotunde J B, Ogundele L A, et al. Ontology-based layered rule-based network intrusion detection system for cybercrimes detection[J]. Knowledge and Information Systems, 2024, 66(6): 3355-3392.

Shi X, Tian X, Ma L, et al. A knowledge graph–based structured representation of assembly process planning combined with deep learning[J]. The International Journal of Advanced Manufacturing Technology, 2024, 133(3): 1807-1821.

Yang P, Li Q, Zhu L, et al. Research of lighting system fault diagnosis method based on knowledge graph[J]. Journal of Computational Methods in Science and Engineering, 2024, 24(4-5): 2135-2151.

Wang H, Ji X, Zhao X, et al. Power data quality assessment and verification governance based on knowledge graph[J]. Intelligent Decision Technologies, 2024, 18(2): 1271-1286.

Gong Z, Cao Z, Zhou S, et al. Thermal Fault Detection of High-Voltage Isolating Switches based on Hybrid Data and BERT[J]. Arabian Journal for Science and Engineering, 2024, 49(5): 6429-6443.

Yu K, Tan L, Mumtaz S, et al. Securing critical infrastructures: Deep-learning-based threat detection in IIoT[J]. IEEE Communications Magazine, 2021, 59(10): 76-82.

Jiamiao Y, Huifang W, Yixiang Z, et al. Automatic Risk Rating Method for Power Grid Field Operation Based on BERT[J]. Power System Technology, 2023, 47(11): 4746-4754.

Jeddi A B, Shafieezadeh A, Hur J, et al. Multi‐hazard typhoon and earthquake collapse fragility models for transmission towers: An active learning reliability approach using gradient boosting classifiers[J]. Earthquake Engineering & Structural Dynamics, 2022, 51(15): 3552-3573.

Lombardi D, Shipley T F, Astronomy Team, Biology Team, Chemistry Team, Engineering Team, Geography Team, Geoscience Team, and Physics Team. The curious construct of active learning[J]. Psychological Science in the Public Interest, 2021, 22(1): 8-43.

Li M, Zhang H, Ji T, et al. Fault identification in power network based on deep reinforcement learning[J]. CSEE Journal of Power and Energy Systems, 2021, 8(3): 721-731.

Hu J, Wang Q, Ye Y, et al. Toward online power system model identification: A deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2022, 38(3): 2580-2593.

Yu Q, Liang D, Qin M, et al. Cybertwin based cloud native networks[J]. Journal of Communications and Information Networks, 2023, 8(3): 187-202.

Rane N, Choudhary S, Rane J. Gemini versus ChatGPT: applications, performance, architecture, capabilities, and implementation[J]. Journal of Applied Artificial Intelligence, 2024, 5(1): 69-93.

Shi H, Fang L, Chen X, et al. Review of the opportunities and challenges to accelerate mass‐scale application of smart grids with large‐language models[J]. IET Smart Grid, 2024, 7(6): 737-759.

Downloads

Published

2025-07-25

Issue

Section

Articles