数学学科Seminar第2341讲 多臂老虎机问题的经验Gittins指数策略与ε-探索

创建时间:  2023/03/08  龚惠英   浏览次数:   返回

报告题目 (Title):Empirical Gittins Index Strategies with ε-Explorations for Multi-armed Bandit Problem(多臂老虎机问题的经验Gittins指数策略与ε-探索)

报告人 (Speaker): 吴贤毅 教授(华东师范大学)

报告时间 (Time):2023年3月10日(周五) 14:00

报告地点 (Place):校本部F309

邀请人(Inviter):余长君 教授

主办部门:理学院数学系

报告摘要:The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule with ε-exploration (abbreviated as empirical ε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea of ε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empirical ε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over the ε-mean reward is discussed.

上一条:数学学科Seminar第2342讲 外区域上的里斯变换

下一条:数学学科Seminar第2340讲 数学教学改革热点与线性代数课程思政案例


数学学科Seminar第2341讲 多臂老虎机问题的经验Gittins指数策略与ε-探索

创建时间:  2023/03/08  龚惠英   浏览次数:   返回

报告题目 (Title):Empirical Gittins Index Strategies with ε-Explorations for Multi-armed Bandit Problem(多臂老虎机问题的经验Gittins指数策略与ε-探索)

报告人 (Speaker): 吴贤毅 教授(华东师范大学)

报告时间 (Time):2023年3月10日(周五) 14:00

报告地点 (Place):校本部F309

邀请人(Inviter):余长君 教授

主办部门:理学院数学系

报告摘要:The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule with ε-exploration (abbreviated as empirical ε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea of ε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empirical ε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over the ε-mean reward is discussed.

上一条:数学学科Seminar第2342讲 外区域上的里斯变换

下一条:数学学科Seminar第2340讲 数学教学改革热点与线性代数课程思政案例