Deep Reinforcement Learning-Based MPPT for PV Systems Under Partial Shading: A HybridMetaheuristic-Optimized Control Framework.

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Volume: 12 | Issue: 01 | Year 2026 | Subscription
International Journal of Electrical Power System and Technology
Received Date: 12/03/2025
Acceptance Date: 02/20/2026
Published On: 2026-04-03
First Page:
Last Page:

Journal Menu


By: TajPrakash Verma and Rohit Kumar.

Assistant Professor, Department of Electrical Engineering, B.I.E.T. Lucknow, UttarPradesh
Student,Department of Electrical Engineering, B.I.E.T. Lucknow, UttarPradesh

Abstract

Partial shade and quickly changing environmental conditions make Maximum Power Point Tracking (MPPT) in photovoltaic (PV) arrays particularly difficult. In these situations, traditional algorithms frequently fail to find the global optimum or experience sluggish convergence and steady-state oscillations. To obtain reliable, real-time MPPT under non-uniform irradiance, this research suggests a hybrid control architecture that combines a deep recurrent reinforcement learning (DRL) agent with metaheuristic-based parameter optimization and a digital-twin training environment. While a metaheuristic optimizer (such as evolutionary/Dandelion-inspired search) adjusts learning and control hyper parameters to speed convergence and prevent local maxima, the DRRL agent uses sequence modelling (LSTM) and an actor-critic architecture to learn temporally consistent control policies from real-world and synthetic irradiance profiles. This research suggests a hybrid intelligent MPPT architecture that combines a high- fidelity digital twin environment, metaheuristic-based hyperparameter optimization, and a deep recurrent reinforcement learning (DRL) controller in order to address these issues. To provide adaptive and reliable control decisions, the DRL agent models temporal dependencies in irradiance, temperature, and voltage–current dynamics using an actor–critic architecture improved by Long Short-Term Memory (LSTM). To improve tracking accuracy and convergence speed while avoiding local optima, a population- based metaheuristic optimizer that incorporates evolutionary and Dandelion-inspired techniques is used to automatically adjust learning rates, exploration strategies, and reward parameters. For safe offline training and transfer learning to hardware-in-the-loop (HIL) configurations, a digital twin of the PV string and power-electronics interface is utilized. In comparison to traditional and modern ML-based MPPT techniques, extensive simulation studies and HIL experiments under standardized partial-shading situations show that the suggested framework enhances tracking efficiency, decreases transient settling time, and mitigates steady-state oscillations. Lastly, we highlight future work directions and address sample efficiency, safe exploration, and on-board implementation limits.
Keywords: Deep reinforcement learning, Partial shading, MPPT, photovoltaic systems, and
metaheuristic optimization.

Loading

Citation:

How to cite this article: TajPrakash Verma and Rohit Kumar Deep Reinforcement Learning-Based MPPT for PV Systems Under Partial Shading: A HybridMetaheuristic-Optimized Control Framework.. International Journal of Electrical Power System and Technology. 2026; 12(01): -p.

How to cite this URL: TajPrakash Verma and Rohit Kumar, Deep Reinforcement Learning-Based MPPT for PV Systems Under Partial Shading: A HybridMetaheuristic-Optimized Control Framework.. International Journal of Electrical Power System and Technology. 2026; 12(01): -p. Available from:https://journalspub.com/publication/ijepst/article=24683

Refrences: