Monte Carlo Tree Search

Dynammic Programming and Reinforcement Learning (MA338) - University of Essex

Machine learning has become a prominent tool in data analytics. One major category of it, i.e. the reinforcement learning/adaptive learning, has been widely used in industry to maximize the notion of cumulative reward. This module is concerned with the conceptual background of reinforcement learning, i.e. Markov decision process (MDP) and dynamic programming. Modern reinforcement learning approaches and typical applications will also be covered throughout the teaching and laboratory practices. Adaptive learning/ reinforcement learning, has been covered under Dynamic Programming for decades. DP is designed on the divide-and-conquer basis which fits well into the computing concepts for MSc Optimization and Analytics and MSc Data Science. The stochastic version of DP links closely with the stochastic process, with the similar idea of describing the problem status by stages, states and transition matrices, but allowing decisions in the whole process. So this module fits naturally well into the current course structure, whereas compensates what we are offering by linking several topics (maths and computing, deterministic and stochastic) together. This module can certainly be used, at least as an option, by the computational pathway of G100. It will create one more compulsory module to the MSc Optimization and Data Analytics, and one more optional for MSc Data Science (the largest MSc program we are having so far). Actually this module can be taken by any MSc students with or without optimization background, because the divide-and-conquer idea behind it is straightforward to get at the very beginning (even without knowing Linear Programming), whereas the later contents will be largely linked to statistics (e.g. regression to approximate the value-to-go), machine learning (e.g. neural networks to predict what's going to happen in later stages) and stochastic process (e.g. stochastic dynamic programming where information reveals as time going). DP also has wide applications in Finance so we can also make it available as an option for MSc Maths and Finance.