Basketball is one of the world's most popular sports because of the agility and speed demonstrated by the players. This agility and speed makes designing controllers to realize robust control of basketball skills a challenge for physics-based character animation. The highly dynamic behaviors and precise manipulation of the ball that occur in the game are difficult to reproduce for simulated players.

In this paper, we present an approach for learning robust basketball dribbling controllers from motion capture data. Our system decouples a basketball controller into locomotion control and arm control components and learns each component separately.

To achieve robust control of the ball, we develop an efficient pipeline based on trajectory optimization and deep reinforcement learning and learn non-linear arm control policies. We also present a technique for learning skills and the transition between skills simultaneously. Our system is capable of learning robust controllers for various basketball dribbling skills, such as dribbling between the legs and crossover moves.

The resulting control graphs enable a simulated player to perform transitions between these skills and respond to user interaction.

cmu reinforcement learning

Paper Video. ACM Transactions on Graphics, 37 4.This course assumes some familiarity with reinforcement learning, numerical optimization, and machine learning.

Machine Learning Blog | [email protected] | Carnegie Mellon University

For an introduction to machine learning and neural networks, see:. We very much appreciate your feedback. Feel free to remain anonymous, yet always try to be polite. Web design: Anton Badev. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza link.

Class goals Schedule Resources Assignments and grading Prerequisites Class goals Implement and experiment with existing algorithms for learning control policies guided by reinforcement, expert demonstrations or self-trials. Evaluate the sample complexity, generalization and generality of these algorithms. Be able to understand research papers in the field of robotic learning.

Particular focus on incorporating true sensory signal from vision or tactile sensing, and exploring the synergy between learning from simulation versus learning from real experience. Schedule The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.

Monte Carlo learning: value function VF estimation and optimization. Recitation: OpenAI Gym recitation. Planning and learning: Dyna, Monte carlo tree search. VF approximation, Deep Learning, Convnets, back-propagation.

Deep Learning, Convnets, optimization tricks. Deep Q Learning : Double Q learning, replay memory. Recitation: Homework 2 Overview TensorFlow. Continuous Actions, Variational Autoencoders, multimodal stochastic policies. Optimal control, trajectory optimization.

End-to-end policy optimization through back-propagation. Exploration and Exploitation. Hierarchical RL and Tranfer Learning.Molecules are composed of functional groups that behave similarly in different contexts. We are using machine learning to develop quantum chemical methods that take better advantage of this molecular similarity. In particular, we are developing neural networks that use quantum chemistry as an integral part of their prediction process. By training the neural network on data from ab initio computations, the neural net learns to predict molecular properties at a cost that is a small fraction of that of ab initio theory.

We are also developing ways to use machine learning to control experimental conditions and drive reactions to desired outcomes. We use electronic structure theory to probe the structure property relationships of conjugated polymers. This often involves extracting information on the structure and electronic properties of these materials from spectroscopy and other measurements. In our most recent work, we have been exploring how incorporation of heteroatoms into the conjugated backbone influences their properties, and how these properties may be modified by varying the sequence of the polymer units.


Collins, Matteus Tanha, Geoffrey J. Gordon, and David J. Collins, Thomas G. Ribelli, Krystof Matyjaszewski, Geoffrey J. Gordon, Tomasz Kowalewski, and David J. Yaron, Mol Syst Des Eng. Constant size descriptors for accurate machine learning models of molecular properties Christopher R.

Collins, Geoffrey J. Gordon, O. Anotole von Lilienfeld, and David J. Yaron, J. Interfaces8 33pp — Conjugated polymers with repeated sequences of group 16 heterocycles synthesized through catalyst-transfer polycondensation Chia-Hua Tsai, Andria Fortney, Yunyan Qiu, Roberto R.

Theory Comput. Watkins, Alan S.By Byron Spice bspice through cs. Basketball players need lots of practice before they master the dribble, and it turns out that's true for computer-animated players as well.

By using deep reinforcement learning, players in video basketball games can glean insights from motion capture data to sharpen their dribbling skills. In this case, the system learns from motion capture of the movements performed by people dribbling basketballs.

This trial-and-error learning process is time consuming, requiring millions of trials, but the results are arm movements that are closely coordinated with physically plausible ball movement.

Phanteks rgb adapter install

Players learn to dribble between their legs, dribble behind their backs and do crossover moves, as well as how to transition from one skill to another. Motion capture data already add realism to state-of-the-art video games. But these games also include disconcerting artifacts, Liu noted, such as balls that follow impossible trajectories or that seem to stick to a player's hand.

cmu reinforcement learning

A physics-based method has the potential to create more realistic games, but getting the subtle details right is difficult. That's especially so for dribbling a basketball because player contact with the ball is brief and finger position is critical. Some details, such as the way a ball may continue spinning briefly when it makes light contact with the player's hands, are tough to reproduce. And once the ball is released, the player has to anticipate when and where the ball will return.

Liu and Hodgins opted to use deep reinforcement learning to enable the model to pick up these important details. Artificial intelligence programs have used this form of deep learning to figure out a variety of video games and the AlphaGo program famously employed it to master the board game Go.

The motion capture data used as input was of people doing things such as rotating the ball around the waist, dribbling while running and dribbling in place both with the right hand and while switching hands. This capture data did not include the ball movement, which Liu explained is difficult to record accurately. Instead, they used trajectory optimization to calculate the ball's most likely paths for a given hand motion. The program learned the skills in two stages — first it mastered locomotion and then learned how to control the arms and hands and, through them, the motion of the ball.

This decoupled approach is sufficient for actions such as dribbling or perhaps juggling, where the interaction between the character and the object doesn't have an effect on the character's balance. Further work is required to address sports, such as soccer, where balance is tightly coupled with game maneuvers, Liu said. Official Events Calendar.Abstract: Deep reinforcement learning has achieved many successes over the recent years.

However, its high sample complexity and the difficulty in specifying a reward function have limited its application. In this talk, I will take a representation learning perspective towards these issues. Is it possible to map from the raw observation, potentially in high dimension, to a low dimension representation where learning from this representation will be more efficient?

Is it beneficial to define a reward function based on the representation? The talk will be in three parts. First, I will talk about how to combine a variety of self-supervised auxiliary tasks to learn a better representation for the control tasks at hand.

Second, I will talk about how to utilize an indicator reward function, a simple but strong baseline for learning goal-conditioned policies without explicit reward specification. Finally, I will briefly introduce SoftGym, our recently proposed benchmark for deformable object manipulation, highlighting the challenges in learning from high dimension representation.

Suzanne Lyons Muth. PhD Speaking Qualifier February.

Thunkable projects download

Friday, February 14 pm - pm WEH Share This Event! Facebook Twitter Email.

Custom panfish lures

Now what?To realize the dreams and impact of AI requires creating autonomous systems that can learn to make good decisions. In this advanced topics in AI class, we will start with a short background in reinforcement learning and sequential decision making under uncertainty. We will then quickly move on to covering state-of-the-art approaches for some of the critical challenges in applying reinforcement learning to the real world e.

These challenges include leveraging old data to make new decisions, highly sample efficient RL, and safe and risk sensitive RL. Prerequisites: The course is mainly intended for graduate students in computer science, machine learning, and robotics. Undergraduates and students in other relevant areas are welcome to join if you have the relevant background. It is useful, but not required, to have taken one or more of the following classes or their equivalent : Machine Learning, Statistical Techniques in Robotics, and Artificial Intelligence.

Creativity and enthusiastic participation are required. Late Policy You will have 4 late days without penalty to be used across the entire semester. These can only be used for homeworks, not the project deliverables. After those late days are used, you will be penalized according to the following policy: 1 homework is worth full credit at the start of class on the due date; 2 homework is worth half credit for the next 48 hours; 3 it is worth zero credit after that.

Collaboration Unless otherwise specified, written homeworks can be discussed with others but must be written up individually. You must write on your homework the student names you collaborated with.

Piazza We will use the Piazza course management system. About Schedule and Resources Homeworks. Overview To realize the dreams and impact of AI requires creating autonomous systems that can learn to make good decisions.As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market. In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks.

By the end of the course, it is expected that students will have significant familiarity with the subject, and be able to apply Deep Learning to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

cmu reinforcement learning

If you are only interested in the lectures, you can watch them on the YouTube channel listed below. Course description from student point of view The course is well rounded in terms of concepts. It helps us understand the fundamentals of Deep Learning. The course starts off gradually with MLPs and it progresses into the more complicated concepts such as attention and sequence-to-sequence models.

We get a complete hands on with PyTorch which is very important to implement Deep Learning models.

Courses held in macerata for learning agreement

As a student, you will learn the tools required for building Deep Learning models. The homeworks usually have 2 components which is Autolab and Kaggle. The Kaggle components allow us to explore multiple architectures and understand how to fine-tune and continuously improve models. The task for all the homeworks were similar and it was interesting to learn how the same task can be solved using multiple Deep Learning approaches.

How a Computer Learns To Dribble: Practice, Practice, Practice

Overall, at the end of this course you will be confident enough to build and tune Deep Learning models. TAs: Advait Gadhikar : agadhika andrew.

Rplidar support

There are 14 quizzes in all. We will retain your best 12 scores. Quizzes will generally but not always be released on Friday and due 48 hours later. Quizzes are scored by the number of correct answers. Assignments There will be five assignments in all. Assignments will include autolab components, where you must complete designated tasks, and a kaggle component where you compete with your colleagues. Autolab components are scored according to the number of correctly completed parts.

These will translate to scores of80, 60, 40 and 0 respectively. Scores will be interpolated linearly between these cutoffs. This is intended to encourage students to begin working on their assignments early.

cmu reinforcement learning

On-time deadline: People who submit by this deadline are eligible for up to five bonus points. These points will be computed by interpolation between the A cutoff and the highest performance obtained for the HW.

thoughts on “Cmu reinforcement learning

Leave a Reply

Your email address will not be published. Required fields are marked *