数学学科Seminar第2215讲 Value-Gradient Formulation for Optimal Control Problem and its Machine-Learning Algorithm

创建时间:  2021/11/25  龚惠英   浏览次数:   返回

报告题目 (Title):Value-Gradient Formulation for Optimal Control Problem and its Machine-Learning Algorithm

报告人 (Speaker): 周翔 教授(香港城市大学)

报告时间 (Time):2021年11月25日(周四) 10:30

报告地点 (Place):G507

邀请人(Inviter):余长君

主办部门:理学院数学系

报告摘要:Optimal control problem is typically cast as a nonlinear Hamiltonian-Jacobi-Bellman PDE problem which the value function satisfies. In this talk, we show motivations of focusing its gradient and derive a PDE system for the (vector-valued) gradient of the value function (value-gradient function), which is closed and enjoys a nice component-decoupling property. This PDE system of value-gradient can be solved by the method of characteristics as the linear HJB equation: one curve of characteristics will produce the data for both value and value-gradient. Supplemented by this additional value-gradient data, the value function is then computed by minimizing the sum of two mean square errors between the data and the parametric function approximations. We show by a few numerical examples the improvement of both robustness and accuracy when such value-gradient is taken into account. The linear convergence of the iterative algorithm is proved under mild conditions. This is joint work with A. Bensoussan and P. Yam and JY Han.

上一条:数学学科Seminar第2216讲 An Iterative Scheme of Safe Reinforcement Learning for Nonlinear Systems via Barrier Certificate Generation

下一条:数学学科Seminar第2214讲 基于表示学习的知识图谱推理技术——从简单推理到复杂推理


数学学科Seminar第2215讲 Value-Gradient Formulation for Optimal Control Problem and its Machine-Learning Algorithm

创建时间:  2021/11/25  龚惠英   浏览次数:   返回

报告题目 (Title):Value-Gradient Formulation for Optimal Control Problem and its Machine-Learning Algorithm

报告人 (Speaker): 周翔 教授(香港城市大学)

报告时间 (Time):2021年11月25日(周四) 10:30

报告地点 (Place):G507

邀请人(Inviter):余长君

主办部门:理学院数学系

报告摘要:Optimal control problem is typically cast as a nonlinear Hamiltonian-Jacobi-Bellman PDE problem which the value function satisfies. In this talk, we show motivations of focusing its gradient and derive a PDE system for the (vector-valued) gradient of the value function (value-gradient function), which is closed and enjoys a nice component-decoupling property. This PDE system of value-gradient can be solved by the method of characteristics as the linear HJB equation: one curve of characteristics will produce the data for both value and value-gradient. Supplemented by this additional value-gradient data, the value function is then computed by minimizing the sum of two mean square errors between the data and the parametric function approximations. We show by a few numerical examples the improvement of both robustness and accuracy when such value-gradient is taken into account. The linear convergence of the iterative algorithm is proved under mild conditions. This is joint work with A. Bensoussan and P. Yam and JY Han.

上一条:数学学科Seminar第2216讲 An Iterative Scheme of Safe Reinforcement Learning for Nonlinear Systems via Barrier Certificate Generation

下一条:数学学科Seminar第2214讲 基于表示学习的知识图谱推理技术——从简单推理到复杂推理