File | Description | Size | Format | |
---|---|---|---|---|
11.98 MB | Adobe PDF | |
Download(s) 5.
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.
Modern deep reinforcement learning (RL) algorithms, despite being at the forefront of artificial intelligence capabilities, typically require a prohibitive amount of training samples to reach a human-equivalent level of performance. This severe data inefficiency is the major obstruction to deep RL’s practical application: it is often near impossible to apply deep RL to any domain without at least a simulator available. Motivated to address this critical data inefficiency, in this thesis we work towards the design of meta-learning agents that are capable of rapidly adapting to new environments. In contrast to standard reinforcement learning, meta-learning learns over distributions of environments, from which specific tasks are sampled and with which the meta-learner is directly optimized to improve the speed of policy improvement on. By exploiting a distribution of tasks which share common substructure with the tasks of interest, the meta-learner can adjust its own inductive biases to enable rapid adaptation at test time.
This thesis focuses on the design of meta-learning algorithms which exploit memory as the main mechanism driving rapid adaptation in novel environments. Meta learning with inter-episodic memories are a class of meta-learning methods that leverage a memory architecture conditioned on the entire interaction history of a particular environment to produce a policy. The learning dynamics driving policy improvement in a particular task are thus subsumed by the computational process of the sequence model, essentially offloading the design of the learning algorithm to the architecture. While conceptually straightforward, meta-learning using inter-episodic memory is highly effective and remains a state-of-the-art method.
We present and discuss several techniques for meta-learning through memory. The first part of the thesis focuses on the “embodied” class of environments, where an agent has a physical manifestation in an environment resembling the natural world. We exploit this highly structured set of environments to work towards the design of a monolithic embodied agent architecture that has the capabilities of rapid memorization, planning and state inference. In the second part of the thesis, we move to focus on methods that apply in general environments without strong common substructure. First, we re-examine the modes of interaction a meta-learning agent has with the environment: proposing to replace the typically sequential processing of interaction history with a concurrent execution framework where multiple agents act in the environment in parallel. Next, we discuss the use of a general and powerful sequence model for inter-episodic memory, the gated transformer, demonstrating large improvements in performance and data efficiency. Finally, we develop a method that significantly reduces the training cost and acting latency of transformer models in (meta-)reinforcement learning settings, with the aim to both (1) make their use more widespread within the research community, and, (2) unlock their use in real-time and latency-constrained applications, such as in robotics.
PhD thesis, by Rémi Coulom
This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle situations with discontinuous states and controls, and the vario-eta algorithm is proposed as a simple but efficient method to perform gradient descent. The main contributions of this thesis are experimental successes that clearly indicate the potential of feedforward neural networks to estimate high-dimensional value functions. Linear function approximators have been often preferred in reinforcement learning, but their success is restricted to relatively simple mechanical systems, or require a lot of prior knowledge. The method presented in this thesis was tested successfully on an original task of learning to swim by a simulated articulated robot, with 4 control variables and 12 independent state variables.
(only the first pages are in French, the rest is in English):
For those who cannot run the win32 demos below, some avi movies demonstrating the movements of swimmers (DivX codec required):
A few interactive (win32) swimmer demos (click in the window to change swimming direction):
Source code of the swimmer simulator:
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library , or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.
Scaling up reinforcement learning for robot control, safer reinforcement learning for robotics, texplore: real-time sample-efficient reinforcement learning for robots, q-learning for robots, towards a common implementation of reinforcement learning for multiple robotic tasks, guided deep reinforcement learning for swarm systems, q-learning for robots 1 q-learning for robots, guided deep reinforcement learning for robot swarms guided deep reinforcement learning, texplore: temporal difference reinforcement learning for robots and time-constrained domains, learning robot control - using control policies as abstract actions, 71 references, input generalization in delayed reinforcement learning: an algorithm and performance comparisons, learning in embedded systems, programming robots using reinforcement learning and teaching, memory approaches to reinforcement learning in non-markovian domains, explanation-based neural network learning for robot control, efficient memory-based learning for robot control, self-improving reactive agents: case studies of reinforcement learning frameworks, reinforcement learning with a hierarchy of abstract models, learning a cost-sensitive internal representation for reinforcement learning, scaling reinforcement learning to robotics by exploiting the subsumption architecture, related papers.
Showing 1 through 3 of 0 Related Papers
This week: the arXiv Accessibility Forum
Help | Advanced Search
Title: reinforcement learning for portfolio management.
Abstract: In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so-called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based approach) as well as on context-independent agents (model-free approach). The analysis provides conclusive support for the ability of model-free reinforcement learning methods to act as universal trading agents, which are not only capable of reducing the computational and memory complexity (owing to their linear scaling with the size of the universe), but also serve as generalizing strategies across assets and markets, regardless of the trading universe on which they have been trained. The relatively low volume of daily returns in financial market data is addressed via data augmentation (a generative approach) and a choice of pre-training strategies, both of which are validated against current state-of-the-art models. For rigour, a risk-sensitive framework which includes transaction costs is considered, and its performance advantages are demonstrated in a variety of scenarios, from synthetic time-series (sinusoidal, sawtooth and chirp waves), simulated market series (surrogate data based), through to real market data (S\&P 500 and EURO STOXX 50). The analysis and simulations confirm the superiority of universal model-free reinforcement learning agents over current portfolio management model in asset allocation strategies, with the achieved performance advantage of as much as 9.2\% in annualized cumulative returns and 13.4\% in annualized Sharpe Ratio.
Comments: | Imperial College London MEng Thesis 2018 |
Subjects: | Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Machine Learning (stat.ML) |
Cite as: | [q-fin.PM] |
(or [q-fin.PM] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
IMAGES
VIDEO
COMMENTS
Reinforcement learning (RL) has recently emerged as a generic yet powerful solution for learning complex decision-making policies, providing the key foundational underpinnings of recent successes in various domains, such as game playing and robotics. ... Xu-zhihu-PhD-EECS-2021-thesis.pdf Size: 16.46Mb Format: PDF Description: Thesis PDF. View ...
In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.
A Reinforcement Learning Perspective Wanqi Xue School of Computer Science and Engineering A thesis submitted to the Nanyang Technological University in partial ful llment of the requirements for the degree of Doctor of Philosophy 2023. Statement of Originality
Recent years have witnessed tremendous successes of AI and machine learning, especially reinforcement learning (RL), in solving many decision-making and control tasks. However, many RL algorithms are still miles away from being applied to practical autonomous systems, which usually involve more complicated scenarios with model uncertainty and multiple decision-makers by nature. In this talk, I ...
Over the past decade, Deep Reinforcement Learning (RL) has driven many advances in sequential decision-making, including remarkable applications in superhuman Go-playing, robotic control, and automated algorithm discovery. However, despite these successes, deep RL is also notoriously
Deep Learning and Reward Design for Reinforcement Learning by Xiaoxiao Guo A dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2017 Doctoral Committee: Professor Satinder Singh Baveja, Co-Chair Professor Richard L. Lewis, Co-Chair
the learning process. Lastly, HER cannot be applied for sequential manipu-lation tasks, which significantly limits its practical application. 1.2 Research Objective This thesis is about enabling manipulators to learn new challenging skills from sparse feedback using deep reinforcement learning algorithms. We aim
DEEP REINFORCEMENT LEARNING FOR ADAPTIVE CONTROL IN ROBOTICS By Luke Bhan Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER of SCIENCE in Computer Science May 13, 2022 Nashville, Tennessee Approved: Gautam Biswas, Ph.D. Marcos Quinnones-Grueiro, Ph.D.
Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids.
E cient Reinforcement Learning using Gaussian Processes Marc Peter Deisenroth Dissertation November 22, 2010 Revised October 23, 2011 Original version available at ... Uwe D. Hanebeck for accepting me as an external PhD student and for his longstanding support since my undergraduate student times. I am deeply grateful to my supervisor Dr. Carl ...
In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing
Deep multi-agent reinforcement learning. Abstract: A plethora of real world problems, such as the control of autonomous vehicles and drones, packet delivery, and many others consists of a number of agents that need to take actions based on local observations and can thus be formulated in the multi-agent reinforcement learning (MARL) setting.
In this thesis we tackle these issues in the context of on-policy Deep Reinforcement Learning (DRL), both theoretically and algorithmically. This work addresses both the discounted and average reward criteria. In the ˙rst part of this thesis, we develop theory for average reward on-policy reinforcement learning by extending recent results
these DNNs have been applied to reinforcement learning tasks with state-. of-the-art results using Deep Q-Networks (DQNs) based on the Q-Learning. algorithm. However, the DQN training process is diferent from standard. DNNs and poses significant challenges for certain reinforcement learning envi-. ronments.
of developing autonomous learning. In this thesis we will present methods to increase the autonomy of reinforcement learning al-gorithms, i.e., learning without expert pre-engineering, by addressing the issues discussed above. The key points of our research address (1) techniques to deal with multiple conflicting reward
In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with ...
Meta Reinforcement Learning through Memory. Download (20.02 MB) thesis. posted on 2022-12-02, 11:40 authored by Emilio Parisotto. Modern deep reinforcement learning (RL) algorithms, despite being at the forefront of artificial intelligence capabilities, typically require a prohibitive amount of training samples to reach a human-equivalent level ...
PhD thesis, by Rémi Coulom. Abstract. This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle ...
This model represents one small, but important steps towards more useful dynamics models in model-based reinforcement learning. This thesis concludes with future directions on the synergy of prediction and control in MBRL, primarily focused on state-abstractions, temporal correlation, and future prediction methodologies.}, } EndNote citation:
PhD thesis Faculty Faculty of Science (FNWI) Institute ... The machine learning paradigm of reinforcement learning (RL) enables learning (neural network) policies for decision making through continuous interaction with the environment. However, if the rewards that are received as feedback are sparse, improving the policy gets difficult and ...
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems. Reinforcement learning agents are adaptive, reactive, and self-supervised. The aim of this dissertation is to extend the state of the art of reinforcement learning and enable ...
This thesis is mostly focused on reinforcement learning, which is viewed as an opti-mization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient meth- ... Reinforcement learning can be viewed as a special case of optimizing an expectation,
Reinforcement Learning for Portfolio Management. In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so-called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based ...