A theoretical demonstration for reinforcement learning of PI control dynamics for optimal speed control of DC motors by using Twin Delay Deep Deterministic Policy Gradient Algorithm

dc.authoridAlagoz, Baris Baykant/0000-0001-5238-6433
dc.authoridHerencsar, Norbert/0000-0002-9504-2275
dc.authoridKavuran, Gurkan/0000-0003-2651-5005
dc.authorwosidAlagoz, Baris Baykant/ABG-8526-2020
dc.authorwosidHerencsar, Norbert/A-6539-2009
dc.authorwosidKavuran, Gurkan/S-6935-2016
dc.contributor.authorTufenkci, Sevilay
dc.contributor.authorAlagoz, Baris Baykant
dc.contributor.authorKavuran, Gurkan
dc.contributor.authorYeroglu, Celaleddin
dc.contributor.authorHerencsar, Norbert
dc.contributor.authorMahata, Shibendu
dc.date.accessioned2024-08-04T20:53:08Z
dc.date.available2024-08-04T20:53:08Z
dc.date.issued2023
dc.departmentİnönü Üniversitesien_US
dc.description.abstractTo benefit from the advantages of Reinforcement Learning (RL) in industrial control applications, RL methods can be used for optimal tuning of the classical controllers based on the simulation scenarios of operating con-ditions. In this study, the Twin Delay Deep Deterministic (TD3) policy gradient method, which is an effective actor-critic RL strategy, is implemented to learn optimal Proportional Integral (PI) controller dynamics from a Direct Current (DC) motor speed control simulation environment. For this purpose, the PI controller dynamics are introduced to the actor-network by using the PI-based observer states from the control simulation envi-ronment. A suitable Simulink simulation environment is adapted to perform the training process of the TD3 algorithm. The actor-network learns the optimal PI controller dynamics by using the reward mechanism that implements the minimization of the optimal control objective function. A setpoint filter is used to describe the desired setpoint response, and step disturbance signals with random amplitude are incorporated in the simu-lation environment to improve disturbance rejection control skills with the help of experience based learning in the designed control simulation environment. When the training task is completed, the optimal PI controller coefficients are obtained from the weight coefficients of the actor-network. The performance of the optimal PI dynamics, which were learned by using the TD3 algorithm and Deep Deterministic Policy Gradient algorithm, are compared. Moreover, control performance improvement of this RL based PI controller tuning method (RL-PI) is demonstrated relative to performances of both integer and fractional order PI controllers that were tuned by using several popular metaheuristic optimization algorithms such as Genetic Algorithm, Particle Swarm Opti-mization, Grey Wolf Optimization and Differential Evolution.en_US
dc.identifier.doi10.1016/j.eswa.2022.119192
dc.identifier.issn0957-4174
dc.identifier.issn1873-6793
dc.identifier.scopus2-s2.0-85141914275en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1016/j.eswa.2022.119192
dc.identifier.urihttps://hdl.handle.net/11616/100993
dc.identifier.volume213en_US
dc.identifier.wosWOS:000890664400010en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherPergamon-Elsevier Science Ltden_US
dc.relation.ispartofExpert Systems With Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDeep reinforcement learningen_US
dc.subjectDC motoren_US
dc.subjectPI controlleren_US
dc.subjectTwin -delayed deep deterministic policyen_US
dc.subjectgradienten_US
dc.subjectMetaheuristic optimizationen_US
dc.titleA theoretical demonstration for reinforcement learning of PI control dynamics for optimal speed control of DC motors by using Twin Delay Deep Deterministic Policy Gradient Algorithmen_US
dc.typeArticleen_US

Dosyalar