1. 置信域方法(Trust Region Methods)
[1]将置信域方法用到强化学习中,并取到了非常好的结果.
1.1 优化问题
1.2 置信域
1.3 置信域方法的过程
References
[1] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897.
[2] GitHub - wangshusen/DeepLearning