turkmath.org

Türkiye'deki Matematiksel Etkinlikler


29 Nisan 2022, 18:00


Bilkent Üniversitesi Analiz Seminerleri

Near Optimality of Finite Memory Policies for Partially Observed Stochastic Control Problems under Filter Stability

Ali Devran Kara
University of Michigan, Amerika Birleşik Devletleri

The talk focuses on partially observed Markov Decision processes (POMDPs). In POMDPs, the existence of optimal policies has in general been established via converting the original partially observed stochastic control problem to a fully observed one on the belief space, leading to a belief-MDP. However, computing an optimal policy for this fully observed model using classical methods is challenging even if the original system has a finite state and action spaces, since the state space of the fully observed belief-MDP model is always uncountable. We provide an approximation technique for POMPDs that use a finite window history of past information variables. We establish near optimality of finite window control policies in POMDPs under filter stability conditions and the assumption that the measurement and action sets are finite (and the state space is real vector valued). We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound, where the rate of convergence is exponential under the filter stability conditions, where filter stability refers to the correction of an incorrectly initialized filter for a partially observed stochastic dynamical system (controlled or control-free) with increasing measurements.

Finally, we establish the convergence of the associated Q learning algorithm for control policies using such a finite history of past observations and control actions (by viewing the finite window as a 'state') and we show near optimality of such limit Q functions under the filter stability condition.

While there exist many experimental results for POMDPs, (i) the near optimality with an explicit rate of convergence (in the memory size) and relations to filter stability, and (ii) the asymptotic convergence (to the approximate MDP value function) for such finite-memory Q-learning algorithms are results that are new to the literature, to our knowledge.

-Joint work with Serdar Yuksel (Queen's University).


NOT: To request the event link, please send a message to goncha@fen.bilkent.edu.tr

Optimal Kontrol İngilizce
Zoom

botan 26.04.2022


Yaklaşan Seminerler Seminer Arşivi
 

İLETİŞİM

Akademik biriminizin ya da çalışma grubunuzun ülkemizde gerçekleşen etkinliklerini, ilan etmek istediğiniz burs, ödül, akademik iş imkanlarını veya konuk ettiğiniz matematikçileri basit bir veri girişi ile kolayca turkmath.org sitesinde ücretsiz duyurabilirsiniz. Sisteme giriş yapmak için gerekli bilgileri almak ya da görüş ve önerilerinizi bildirmek için iletişime geçmekten çekinmeyiniz. Katkı verenler listesi için tıklayınız.

Özkan Değer ozkandeger@gmail.com

DESTEK VERENLER

ja2019

31. Journees Arithmetiques Konferansı Organizasyon Komitesi

Web sitesinin masraflarının karşılanması ve hizmetine devam edebilmesi için siz de bağış yapmak, sponsor olmak veya reklam vermek için lütfen iletişime geçiniz.

ONLİNE ZİYARETÇİLER

©2013-2024 turkmath.org
Tüm hakları saklıdır