Actor Double Critic Architecture for Dialogue System

Y. Saffari; J. Salimi Sartakhti

Actor Double Critic Architecture for Dialogue System

محل انتشار: مجله نوآوری های مهندسی برق و کامپیوتر، دوره: 11، شماره: 2

سال انتشار: 1402

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 97

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1681168

شناسه ملی سند علمی:

JR_JECEI-11-2_011

تاریخ نمایه سازی: 4 تیر 1402

چکیده مقاله:

kground and Objectives: Most of the recent dialogue policy learning ‎methods are based on reinforcement learning (RL). However, the basic RL ‎algorithms like deep Q-network, have drawbacks in environments with ‎large state and action spaces such as dialogue systems. Most of the ‎policy-based methods are slow, cause of the estimating of the action value ‎using the computation of the sum of the discounted rewards for each ‎action. In value-based RL methods, function approximation errors lead to ‎overestimation in value estimation and finally suboptimal policies. There ‎are works that try to resolve the mentioned problems using combining RL ‎methods, but most of them were applied in the game environments, or ‎they just focused on combining DQN variants. This paper for the first time ‎presents a new method that combines actor-critic and double DQN named ‎Double Actor-Critic (DAC), in the dialogue system, which significantly ‎improves the stability, speed, and performance of dialogue policy learning. ‎Methods: In the actor critic to overcome the slow learning of normal DQN, ‎the critic unit approximates the value function and evaluates the quality ‎of the policy used by the actor, which means that the actor can learn the ‎policy faster. Moreover, to overcome the overestimation issue of DQN, ‎double DQN is employed. Finally, to have a smoother update, a heuristic ‎loss is introduced that chooses the minimum loss of actor-critic and ‎double DQN. ‎Results: Experiments in a movie ticket booking task show that the ‎proposed method has more stable learning without drop after ‎overestimation and can reach the threshold of learning in fewer episodes ‎of learning. ‎Conclusion: Unlike previous works that mostly focused on just proposing ‎a combination of DQN variants, this study combines DQN variants with ‎actor-critic to benefit from both policy-based and value-based RL methods ‎and overcome two main issues of both of them, slow learning and ‎overestimation. Experimental results show that the proposed method can ‎make a more accurate conversation with a user as a dialogue policy ‎learner.‎

کلیدواژه ها:

Actor-Critic ، Dialogue system ، DQN ، Actor Double Critic

نویسندگان

Y. Saffari

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

J. Salimi Sartakhti

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Z. C. Lipton, J. Gao, L. Li, X. Li, F. ...
T. H. Wen, D. Vandyke, N. Mrkšić, M. Gašić, L. ...
H. Cuay ́ahuitl, S. Renals, O. Lemon, H. Shimodaira, "Hierarchical ...
X. Li, Y. N. Chen, L. Li, J. Gao, A. ...
H. Sun, C. Zhao, S. Liu, H. Jiang, "A pipeline ...
R. Fellows, H. Ihshaish, S. Battle, C. Haines, P. Mayhew, ...
M. I. Bahria, Z. Yan, "Supervised machine learning approaches: A ...
R. Howard, Dynamic Programming and Markov Processes, The MIT Press, ...
S. Young, M. Gasiˇ c, B. Thomson, J. D. Williams, ...
J. D. Williams, S. Young, "Partially observable markov decision processes ...
J. Williams, A. Raux, D. Ramachandran, A. Black, "The dialog ...
P. Swazinna, S. Udluft, D. Hein, T. Runkler, "Comparing model-free ...
V. Mnih , K. Kavukcuoglu, D. Silver, A. A. Rusu ...
S. Thrun, A. Schwartz, "Issues in using function approximation for ...
H. van Hasselt, A. Guez, D. Silver, "Deep reinforcement learning ...
R. Chen, J. H. Goldberg, "Actor-critic reinforcement learning in the ...
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, ...
D. Silver, A. Huang, C. J Maddison, A. Guez, L. ...
B. Peng, X. Li, J. Gao, J. Liu, Y.-N. Chen, ...
X. Li, Z. C. Lipton, B. Dhingra, L. Li, J. ...
C. J. Watkins and P. Dayan, "Q-learning," Mach. Learn., (۸): ...
V. Mnih, A. Puigdomènech Badia, "Asynchronous methods for deep reinforcement ...
J. Gao, M. Galley, L. Li, "Neural approaches to conversational ...
Y. Wu, E. Mansimov, S. Liao, R. Grosse, "Scalable trust-region ...
Y. C. Wu, B. H. Tseng, M. Gas, "Actor-double-critic: incorporating ...
J. Peters, S. Vijayakumar, S. Schaal, "Natural Actor-Critic," ECML: ۲۸۰–۲۹۱, ...
Z. Wang, V. Bapst, N. Hees, V. Mnih, R. Munos, ...
M. Sabry, K. M. A. Amr , "On the reduction ...
X. Wang, A. Vinel, "Cross learning in deep q-networks," arxive ...
Y. Chen, L. Schomaker, M. A. Wiering, "An Investigation Into ...
S. Fujimoto , H. van Hoof, D. Meger , "Addressing ...
Y. A. Wang, Y. N. Chen, "Dialogue environments are different ...
D. Vath, N. T. Vu, "To combine or not to ...
M. Henderson, B. Thomson, J. D. William, "The second dialog ...
M. Fatemi, L. E. Asri, H. Schulz, J. He, K. ...
H. R. Chinaei, B. Chaib-draa, L. Lamontagne, "Learning observation models ...
I. Grondman, L. Busoniu, G. A. D. Lopes, R. Babuska, ...
D. P. Kingma, J. Ba, "Adam: A method for stochastic ...
Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. ...

نمایش کامل مراجع