• DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Show simple item record

Data Efficient Reinforcement Learning

dc.contributor.advisorShah, Devavrat
dc.contributor.authorXu, Zhi
dc.date.accessioned2022-01-14T14:38:37Z
dc.date.available2022-01-14T14:38:37Z
dc.date.issued2021-06
dc.date.submitted2021-06-23T19:41:22.526Z
dc.identifier.urihttps://hdl.handle.net/1721.1/138930
dc.description.abstractReinforcement learning (RL) has recently emerged as a generic yet powerful solution for learning complex decision-making policies, providing the key foundational underpinnings of recent successes in various domains, such as game playing and robotics. However, many state-of-the-art algorithms are data-hungry and computationally expensive, requiring large amounts of data to succeed. While this is possible for certain scenarios, in applications arising in social sciences and healthcare for example, where available data is sparse, this naturally can be costly or infeasible. With the surging interest in applying RL to broader domains, it is imperative to develop an informed view about the usage of data involved in its algorithmic design. This thesis hence focuses on studying the data efficiency of RL, through a structural perspective. Advancement along this direction naturally requires us to understand when and why algorithms are successful to begin with; and building upon such understanding, further improve the data efficiency of RL. To this end, this thesis begins by taking inspiration from the empirical successes. We consider the popular use of simulation-based Monte Carlo Tree Search (MCTS) in RL, as exemplified by the remarkable achievement of AlphaGo Zero, and probe the data efficiency of incorporating such a key ingredient. Specifically, we investigate the correct form to utilize such a tree structure for estimating values and characterize the corresponding data complexity. These results further enable us to analyze the data complexity of a RL algorithm that combines MCTS with supervised learning as done in AlphaGo Zero. Having developed a better understanding, as a next step, we improve the algorithmic designs of simulation-based data-efficient RL algorithms that have access to a generative model. We provide such improvements for both bounded and unbounded spaces. Our first contribution is a structural framework through a novel lens of low-rank representation of the Q-function. The proposed data-efficient RL algorithm exploits the low-rank structure to perform pseudo-exploration by querying/simulating only a selected subset of state-action pairs, via a new matrix estimation technique. Remarkably, this leads to a significant (exponential) improvement in data complexity. Moving to our endeavor with unbounded spaces, one must first address the unique conceptual challenges incurred by the unbounded domains. Inspired by classical queueing systems, we propose an appropriate notion of stability for quantifying "goodness" of policies. Subsequently, by leveraging the stability structure of the underlying systems, we design efficient, adaptive algorithms with a modified, efficient Monte Carlo oracle that guarantee the desired stability with a favorable data complexity that is polynomial with respect to the parameters of interest. Altogether, through new analytical tools and structural frameworks, this thesis contributes to the design and analysis of data-efficient RL algorithms.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleData Efficient Reinforcement Learning
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show Statistical Information

  • My Playlists
  • Media Upload

Media Space Illinois

  • Best of Illinois
  • Big Ten Network
  • College Features
  • News Bureau
  • Video Services
  • Diversity Equity & Inclusion
  • Alumni Newsmakers
  • Campus Conversations
  • Campus Tours
  • Commencements
  • Opening Symposium
  • iSEE Congress 2014
  • Illinois Athletics
  • Illinois Online
  • Archive: Compass2g
  • Video Creation Examples
  • News@Illinois
  • Research & Innovation
  • Teaching Excellence
  • Computing History
  • ACES Technology Services
  • Agricultural and Biological Engineering
  • Agricultural and Consumer Economics
  • Agricultural Communications
  • Internships
  • Animal Sciences
  • Career Services
  • Crop Sciences
  • Division of Nutritional Sciences
  • Food Science and Human Nutrition
  • Human Development and Family Studies
  • Special Events
  • Kinesiology and Community Health
  • Speech and Hearing Science
  • Carle Illinois Virtual Open House
  • College of Media
  • I-TEACH I-LEAD
  • User Services
  • FAA Online Courses
  • School of Architecture
  • School of Art and Design
  • School of Music
  • General Studies
  • Gies College of Business
  • Doctoral Hooding
  • Research Live
  • For Students
  • For Faculty and Staff
  • Aerospace Engineering
  • Bioengineering
  • Civil and Environmental Engineering
  • Computer Science
  • Electrical & Computer Engineering
  • Cell - Matrix Mechanobiology Meeting Presentations
  • Engineer Guy
  • Everitt Award for Teaching Excellence
  • Hall of Fame - Engineering
  • Labor & Employment Relations
  • Center for East Asian and Pacific Studies
  • Center for Latin American and Caribbean Studies
  • French and Italian
  • Jewish Studies
  • LAS International Programs
  • LAS Teaching Academy
  • MCB Advising Program
  • Program in Jewish Culture and Society
  • Talent Show
  • Sociology Alumni
  • Sociology Advising
  • School of Information Sciences
  • Social Work
  • Veterinary Medicine
  • Beckman Institute
  • Cancer Center at Illinois
  • Center for Advanced Study
  • Coordinated Science Lab
  • CyberGIS Center
  • Institute for Genomic Biology
  • Illinois Sustainable Technology Center
  • Research at Illinois
  • Illinois Abroad and Global Exchange
  • Illinois Leadership Center
  • Illinois Student Government
  • The Career Center
  • Center for Global Studies
  • Teaching Tips & Tutorials
  • Faculty Stories
  • Workshop and Event Recordings
  • CITL Promos
  • European Union Center
  • Campus Wellbeing Services
  • Faculty/Staff Assistance Services
  • Know Your U
  • Orientation
  • The Illinois Professional
  • Campus Email
  • Classroom Technology
  • Digital Media
  • Digital Signage
  • IT Power Plant
  • Mediaspace - Kaltura Help and Tutorials
  • Training Services
  • Training Videos
  • History Philosophy and Newspaper Library
  • Illinois Open Publishing Network
  • International and Area Studies Library
  • Media Commons
  • Mortenson Center
  • Music & Performing Arts Library
  • Rare Book and Manuscript Library
  • Social Sciences Health and Education Library
  • University of Illinois Archives
  • University Senate
  • Web Con Campus Workshops

2022 PhD Thesis Award: Kaiqing Zhang, "Reinforcement Learning for Multi-Agent and Robust Control Systems: Towards Large-scale and Reliable Autonomy"

Related media.

Repository logo

Sample-Efficient Deep Reinforcement Learning for Continuous Control

Repository uri, repository doi.

Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids. However, the common RL approaches are known to be sample intensive, making them difficult to be applied to real-world problems such as robotics. This thesis makes several contributions toward developing RL algorithms for learning in the wild, where sample-efficiency and stability are critical. The key contributions include Normalized Advantage Functions (NAF), extending Q-learning for continuous action problems; Interpolated Policy Gradient (IPG), unifying prior policy gradient algorithm variants through theoretical analyses on bias and variance; and Temporal Difference Models (TDM), interpreting a parameterized Q-function as a generalized dynamics model for novel temporally abstracted model-based planning. Importantly, this thesis highlights that these algorithms can be seen as bridging gaps between branches of RL – model-based with modelfree, and on-policy with off-policy. The proposed algorithms not only achieve substantial improvements over the prior approaches, but also provide novel perspectives on how to mix different branches of RL effectively to gain the best of both worlds. NAF has subsequently been shown to be able to train two 7-DoF robot arms to open doors using only 2.5 hours of real-world experience, making it one of the first demonstrations of deep RL approaches on real robots.

Description

Qualification, awarding institution, sponsorship, collections.

  • AI Collection
  • Oxford Thesis Collection
  • CC0 version of this metadata

Deep multi-agent reinforcement learning

A plethora of real world problems, such as the control of autonomous vehicles and drones, packet delivery, and many others consists of a number of agents that need to take actions based on local observations and can thus be formulated in the multi-agent reinforcement learning (MARL) setting. Furthermore, as more machine learning systems are deployed in the real world, they will start having impact on each other, effectively turning most decision making problems into multiagent proble...

Email this record

Please enter the email address that the record information will be sent to.

Please add any additional information to be included within the email.

Cite this record

Chicago style, access document.

  • DeepMARL.pdf ( Preview , pdf, 4.8MB, Terms of use )

Why is the content I wish to access not available via ORA?

Content may be unavailable for the following four reasons.

  • Version unsuitable We have not obtained a suitable full-text for a given research output. See the versions advice for more information.
  • Recently completed Sometimes content is held in ORA but is unavailable for a fixed period of time to comply with the policies and wishes of rights holders.
  • Permissions All content made available in ORA should comply with relevant rights, such as copyright. See the copyright guide for more information.
  • Clearance Some thesis volumes scanned as part of the digitisation scheme funded by Dr Leonard Polonsky are currently unavailable due to sensitive material or uncleared third-party copyright content. We are attempting to contact authors whose theses are affected.

Alternative access to the full-text

Request a copy.

We require your email address in order to let you know the outcome of your request.

Provide a statement outlining the basis of your request for the information of the author.

Please note any files released to you as part of your request are subject to the terms and conditions of use for the Oxford University Research Archive unless explicitly stated otherwise by the author.

Contributors

Bibliographic details, item description, related items, terms of use, views and downloads.

If you are the owner of this record, you can report an update to it here: Report update to this record

Report an update

We require your email address in order to let you know the outcome of your enquiry.

Nanyang Technological University

  • Show simple item record
  • Show full item record
  • Export item record
Title: Deep reinforcement learning-based dynamic scheduling
Authors: 
Keywords: 
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Liu, R. (2022). Deep reinforcement learning-based dynamic scheduling. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158353
Abstract: Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver. Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time. Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events. Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment. Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed. Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents. Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches. For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making.
URI: 
DOI: 
Schools:   
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:

Files in This Item:

File Description SizeFormat 
11.98 MBAdobe PDF

Page view(s) 1

Download(s) 5.

phd thesis on reinforcement learning

Google Scholar TM

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.

Carnegie Mellon University

Meta Reinforcement Learning through Memory

Modern deep reinforcement learning (RL) algorithms, despite being at the forefront of artificial intelligence capabilities, typically require a prohibitive amount of training samples to reach a human-equivalent level of performance. This severe data inefficiency is the major obstruction to deep RL’s practical application: it is often near impossible to apply deep RL to any domain without at least a simulator available. Motivated to address this critical data inefficiency, in this thesis we work towards the design of meta-learning agents that are capable of rapidly adapting to new environments. In contrast to standard reinforcement learning, meta-learning learns over distributions of environments, from which specific tasks are sampled and with which the meta-learner is directly optimized to improve the speed of policy improvement on. By exploiting a distribution of tasks which share common substructure with the tasks of interest, the meta-learner can adjust its own inductive biases to enable rapid adaptation at test time.

This thesis focuses on the design of meta-learning algorithms which exploit memory as the main mechanism driving rapid adaptation in novel environments. Meta learning with inter-episodic memories are a class of meta-learning methods that leverage a memory architecture conditioned on the entire interaction history of a particular environment to produce a policy. The learning dynamics driving policy improvement in a particular task are thus subsumed by the computational process of the sequence model, essentially offloading the design of the learning algorithm to the architecture. While conceptually straightforward, meta-learning using inter-episodic memory is highly effective and remains a state-of-the-art method. 

We present and discuss several techniques for meta-learning through memory. The first part of the thesis focuses on the “embodied” class of environments, where an agent has a physical manifestation in an environment resembling the natural world. We exploit this highly structured set of environments to work towards the design of a monolithic embodied agent architecture that has the capabilities of rapid memorization, planning and state inference. In the second part of the thesis, we move to focus on methods that apply in general environments without strong common substructure. First, we re-examine the modes of interaction a meta-learning agent has with the environment: proposing to replace the typically sequential processing of interaction history with a concurrent execution framework where multiple agents act in the environment in parallel. Next, we discuss the use of a general and powerful sequence model for inter-episodic memory, the gated transformer, demonstrating large improvements in performance and data efficiency. Finally, we develop a method that significantly reduces the training cost and acting latency of transformer models in (meta-)reinforcement learning settings, with the aim to both (1) make their use more widespread within the research community, and, (2) unlock their use in real-time and latency-constrained applications, such as in robotics.

Degree Type

  • Dissertation
  • Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Usage metrics

  • Artificial Intelligence and Image Processing

CC BY 4.0

Reinforcement Learning Using Neural Networks, with Applications to Motor Control

PhD thesis, by Rémi Coulom

This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle situations with discontinuous states and controls, and the vario-eta algorithm is proposed as a simple but efficient method to perform gradient descent. The main contributions of this thesis are experimental successes that clearly indicate the potential of feedforward neural networks to estimate high-dimensional value functions. Linear function approximators have been often preferred in reinforcement learning, but their success is restricted to relatively simple mechanical systems, or require a lot of prior knowledge. The method presented in this thesis was tested successfully on an original task of learning to swim by a simulated articulated robot, with 4 control variables and 12 independent state variables.

(only the first pages are in French, the rest is in English):

For those who cannot run the win32 demos below, some avi movies demonstrating the movements of swimmers (DivX codec required):

A few interactive (win32) swimmer demos (click in the window to change swimming direction):

  • swimmer3.exe
  • swimmer4-Slow.exe
  • swimmer4-Fast.exe
  • sw5-l0.exe (beginner)
  • sw5-l5.exe (expert)
  • sw5-l6.exe (performance drop)

Source code of the swimmer simulator:

  • swimmer.tar.bz2
  • RARSRLDemo.exe

ORCID logo

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library , or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

  • Corpus ID: 60875658

Reinforcement learning for robots using neural networks

  • Longxin Lin
  • Published 1992
  • Computer Science, Engineering

Figures and Tables from this paper

figure 1.1

1,012 Citations

Scaling up reinforcement learning for robot control, safer reinforcement learning for robotics, texplore: real-time sample-efficient reinforcement learning for robots, q-learning for robots, towards a common implementation of reinforcement learning for multiple robotic tasks, guided deep reinforcement learning for swarm systems, q-learning for robots 1 q-learning for robots, guided deep reinforcement learning for robot swarms guided deep reinforcement learning, texplore: temporal difference reinforcement learning for robots and time-constrained domains, learning robot control - using control policies as abstract actions, 71 references, input generalization in delayed reinforcement learning: an algorithm and performance comparisons, learning in embedded systems, programming robots using reinforcement learning and teaching, memory approaches to reinforcement learning in non-markovian domains, explanation-based neural network learning for robot control, efficient memory-based learning for robot control, self-improving reactive agents: case studies of reinforcement learning frameworks, reinforcement learning with a hierarchy of abstract models, learning a cost-sensitive internal representation for reinforcement learning, scaling reinforcement learning to robotics by exploiting the subsumption architecture, related papers.

Showing 1 through 3 of 0 Related Papers

This week: the arXiv Accessibility Forum

Help | Advanced Search

Quantitative Finance > Portfolio Management

Title: reinforcement learning for portfolio management.

Abstract: In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so-called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based approach) as well as on context-independent agents (model-free approach). The analysis provides conclusive support for the ability of model-free reinforcement learning methods to act as universal trading agents, which are not only capable of reducing the computational and memory complexity (owing to their linear scaling with the size of the universe), but also serve as generalizing strategies across assets and markets, regardless of the trading universe on which they have been trained. The relatively low volume of daily returns in financial market data is addressed via data augmentation (a generative approach) and a choice of pre-training strategies, both of which are validated against current state-of-the-art models. For rigour, a risk-sensitive framework which includes transaction costs is considered, and its performance advantages are demonstrated in a variety of scenarios, from synthetic time-series (sinusoidal, sawtooth and chirp waves), simulated market series (surrogate data based), through to real market data (S\&P 500 and EURO STOXX 50). The analysis and simulations confirm the superiority of universal model-free reinforcement learning agents over current portfolio management model in asset allocation strategies, with the achieved performance advantage of as much as 9.2\% in annualized cumulative returns and 13.4\% in annualized Sharpe Ratio.
Comments: Imperial College London MEng Thesis 2018
Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: [q-fin.PM]
  (or [q-fin.PM] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Towards Generalization Reinforcement Learning

    phd thesis on reinforcement learning

  2. Reinforcement Learning: A Brief Guide

    phd thesis on reinforcement learning

  3. (PDF) A proposal of reinforcement learning system to use knowledge

    phd thesis on reinforcement learning

  4. PHD Thesis Reinforcement Learning For Hydrobatic AUVs by Wozniac

    phd thesis on reinforcement learning

  5. Reinforcement Learning Introduction

    phd thesis on reinforcement learning

  6. PhD thesis on 'Flow Navigation by Smart Particles via Reinforcement

    phd thesis on reinforcement learning

VIDEO

  1. RLHF

  2. Intro to reinforcement learning ⅼ PHD Student. Abdelfatah Kermali

  3. Thesis experiments

  4. 7 hours Study +Thesis Writing |Background noise, 15 min Break, No music

  5. Reinforcement Learning 7: Planning and Models

  6. 7 hours Study +Thesis Writing |Background noise, 15 min Break, No music

COMMENTS

  1. Data Efficient Reinforcement Learning

    Reinforcement learning (RL) has recently emerged as a generic yet powerful solution for learning complex decision-making policies, providing the key foundational underpinnings of recent successes in various domains, such as game playing and robotics. ... Xu-zhihu-PhD-EECS-2021-thesis.pdf Size: 16.46Mb Format: PDF Description: Thesis PDF. View ...

  2. Exploration and Safety in Deep Reinforcement Learning

    In this thesis, we address these challenges in the deep reinforcement learning setting by modifying the underlying optimization problem that agents solve, incentivizing them to explore in safer or more-efficient ways.

  3. PDF Robust and Adaptive Decision-Making: A Reinforcement Learning Perspective

    A Reinforcement Learning Perspective Wanqi Xue School of Computer Science and Engineering A thesis submitted to the Nanyang Technological University in partial ful llment of the requirements for the degree of Doctor of Philosophy 2023. Statement of Originality

  4. 2022 PhD Thesis Award: Kaiqing Zhang, "Reinforcement Learning for Multi

    Recent years have witnessed tremendous successes of AI and machine learning, especially reinforcement learning (RL), in solving many decision-making and control tasks. However, many RL algorithms are still miles away from being applied to practical autonomous systems, which usually involve more complicated scenarios with model uncertainty and multiple decision-makers by nature. In this talk, I ...

  5. Towards efficient and robust reinforcement learning via synthetic

    Over the past decade, Deep Reinforcement Learning (RL) has driven many advances in sequential decision-making, including remarkable applications in superhuman Go-playing, robotic control, and automated algorithm discovery. However, despite these successes, deep RL is also notoriously

  6. PDF Deep Learning and Reward Design for Reinforcement Learning

    Deep Learning and Reward Design for Reinforcement Learning by Xiaoxiao Guo A dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2017 Doctoral Committee: Professor Satinder Singh Baveja, Co-Chair Professor Richard L. Lewis, Co-Chair

  7. PDF Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse

    the learning process. Lastly, HER cannot be applied for sequential manipu-lation tasks, which significantly limits its practical application. 1.2 Research Objective This thesis is about enabling manipulators to learn new challenging skills from sparse feedback using deep reinforcement learning algorithms. We aim

  8. PDF Deep Reinforcement Learning for Adaptive Control In Robotics

    DEEP REINFORCEMENT LEARNING FOR ADAPTIVE CONTROL IN ROBOTICS By Luke Bhan Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER of SCIENCE in Computer Science May 13, 2022 Nashville, Tennessee Approved: Gautam Biswas, Ph.D. Marcos Quinnones-Grueiro, Ph.D.

  9. Sample-Efficient Deep Reinforcement Learning for Continuous Control

    Reinforcement learning (RL) is a powerful, generic approach to discovering optimal policies in complex sequential decision-making problems. Recently, with flexible function approximators such as neural networks, RL has greatly expanded its realm of applications, from playing computer games with pixel inputs, to mastering the game of Go, to learning parkour movements by simulated humanoids.

  10. PDF Efficient Reinforcement Learning using Gaussian Processes

    E cient Reinforcement Learning using Gaussian Processes Marc Peter Deisenroth Dissertation November 22, 2010 Revised October 23, 2011 Original version available at ... Uwe D. Hanebeck for accepting me as an external PhD student and for his longstanding support since my undergraduate student times. I am deeply grateful to my supervisor Dr. Carl ...

  11. Inductive biases and generalisation for deep reinforcement learning

    In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a fundamental challenge for any type of learning, determining how acquired knowledge can be transferred to new, previously unseen situations. We focus on reinforcement learning, a framework describing

  12. Deep multi-agent reinforcement learning

    Deep multi-agent reinforcement learning. Abstract: A plethora of real world problems, such as the control of autonomous vehicles and drones, packet delivery, and many others consists of a number of agents that need to take actions based on local observations and can thus be formulated in the multi-agent reinforcement learning (MARL) setting.

  13. PDF On-Policy Deep Reinforcement Learning

    In this thesis we tackle these issues in the context of on-policy Deep Reinforcement Learning (DRL), both theoretically and algorithmically. This work addresses both the discounted and average reward criteria. In the ˙rst part of this thesis, we develop theory for average reward on-policy reinforcement learning by extending recent results

  14. Reinforcement Learning with Deep Q-Networks

    these DNNs have been applied to reinforcement learning tasks with state-. of-the-art results using Deep Q-Networks (DQNs) based on the Q-Learning. algorithm. However, the DQN training process is diferent from standard. DNNs and poses significant challenges for certain reinforcement learning envi-. ronments.

  15. PDF Reinforcement Learning with Sparse and Multiple Rewards

    of developing autonomous learning. In this thesis we will present methods to increase the autonomy of reinforcement learning al-gorithms, i.e., learning without expert pre-engineering, by addressing the issues discussed above. The key points of our research address (1) techniques to deal with multiple conflicting reward

  16. Deep reinforcement learning-based dynamic scheduling

    In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with ...

  17. Meta Reinforcement Learning through Memory

    Meta Reinforcement Learning through Memory. Download (20.02 MB) thesis. posted on 2022-12-02, 11:40 authored by Emilio Parisotto. Modern deep reinforcement learning (RL) algorithms, despite being at the forefront of artificial intelligence capabilities, typically require a prohibitive amount of training samples to reach a human-equivalent level ...

  18. Reinforcement Learning Using Neural Networks, with Applications to

    PhD thesis, by Rémi Coulom. Abstract. This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle ...

  19. Tech Reports

    This model represents one small, but important steps towards more useful dynamics models in model-based reinforcement learning. This thesis concludes with future directions on the synergy of prediction and control in MBRL, primarily focused on state-abstractions, temporal correlation, and future prediction methodologies.}, } EndNote citation:

  20. Reinforcement learning and planning for autonomous agent navigation

    PhD thesis Faculty Faculty of Science (FNWI) Institute ... The machine learning paradigm of reinforcement learning (RL) enables learning (neural network) policies for decision making through continuous interaction with the environment. However, if the rewards that are received as feedback are sparse, improving the policy gets difficult and ...

  21. Reinforcement learning for robots using neural networks

    This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems. Reinforcement learning agents are adaptive, reactive, and self-supervised. The aim of this dissertation is to extend the state of the art of reinforcement learning and enable ...

  22. PDF Optimizing Expectations: From Deep Reinforcement Learning to Stochastic

    This thesis is mostly focused on reinforcement learning, which is viewed as an opti-mization problem: maximize the expected total reward with respect to the parameters of the policy. The first part of the thesis is concerned with making policy gradient meth- ... Reinforcement learning can be viewed as a special case of optimizing an expectation,

  23. [1909.09571] Reinforcement Learning for Portfolio Management

    Reinforcement Learning for Portfolio Management. In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so-called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based ...