王宇新
开通时间:..
最后更新时间:..
点击次数:
论文类型:会议论文
发表时间:2006-06-21
收录刊物:EI、CPCI-S、Scopus
卷号:2
页面范围:205-205
关键字:reinforcement learning; MAS; actor-critic; RoboCup; function approximation
摘要:Actor-Critic method combines the fast convergence of value-based (Critic) and directivity on search of policy gradient (Actor). It is suitable for solving the problems with large state space. In this paper, the Actor Critic method with the tile-coding linear function approximation is analysed and applied to a RoboCup simulation subtask named "Soccer Keepaway". The experiments on Soccer Keepaway show that the policy learned by Actor-Critic method is better than policies from value-based Sarsa(lambda) and benchmarks.