Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Deep reinforcement learning for scheduling of a steel plant in the electricity spot market

Shah, Margi ORCID: https://orcid.org/0000-0003-2222-8412, Zhou, Yue ORCID: https://orcid.org/0000-0002-6698-4714, Wu, Jianzhong ORCID: https://orcid.org/0000-0001-7928-3602 and Mowbray, Max 2026. Deep reinforcement learning for scheduling of a steel plant in the electricity spot market. Engineering 10.1016/j.eng.2025.12.038

[thumbnail of 1-s2.0-S2095809926000706-main.pdf] PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB)

Abstract

The steel industry, characterized by its substantial energy consumption, is grappling with rising energy costs and the imperative to decarbonize. However, the scheduling of a steel plant is challenged by the complexity and interdependency of its processes with various uncertainties. This study introduces a deep reinforcement learning (DRL) methodology specifically designed to optimize scheduling in the presence of the exogenous uncertainties brought by electricity prices and on-site renewable generation. The scheduling problem is formulated as a partially observable Markov decision process (POMDP), which enables decision-making despite the state not being fully observable. The attention mechanism is utilized to abstract a representation of a window of observations upon which decisions are conditioned. The control space is defined by domain knowledge-informed heuristic rules, and evolutionary search is utilized for the purpose of policy optimization. The case study considers an electric arc furnace (EAF)-based steel plant with various problem sizes and processing times for steelmaking tasks. The performance of the proposed method is compared with a traditional mixed integer linear programming (MILP) approach and the policy gradient method, proximal policy optimization (PPO). The proposed method is evaluated under uncertainty conditions arising from market prices and on-site renewable energy sources. Case study results reveal that the proposed DRL strategy effectively integrates uncertainties into real-time decision-making, achieving a desirable performance level with minimal online computational cost.

Item Type: Article
Date Type: Published Online
Status: In Press
Schools: Schools > Engineering
Publisher: Elsevier
ISSN: 2095-8099
Date of First Compliant Deposit: 23 February 2026
Date of Acceptance: 23 December 2025
Last Modified: 23 Feb 2026 12:45
URI: https://orca.cardiff.ac.uk/id/eprint/185116

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics