Robot Learning Talks @ CMU

Robot Learning Talks @ CMU

We have informal meetings almost every Thursday at 5pm in Wean Hall room 7220.

The purpose of the meetings is to make people of the robot learning community of CMU and visitors aware of interesting papers and results that have come to the attention of the speakers, and also for the informal exchange of ideas.

Food is provided :)

If you have any questions concerning our meetings, please contact Geoff Gordon at "ggordon AT cs DOT cmu DOT edu". If you'd like to suggest a change to this web page, please contact Joelle Pineau or Max Likhachev at "maxim+ AT cs DOT cmu DOT edu".

Wed
10/15/03
12.00pm
NSH
1505 Geoff Gordon Solving extensive-form games and playing one-card poker

Spring 2003

Here is a tentative schedule of upcoming presentations.

Thu 07/31/03
5.00pm
Wean 7220 Drew Bagnell Covariant Policy Search (IJCAI'03)

Most standard policy search reinforcement learning algorithms exhibit non-covariant behavior during the learning process. That is, the "steepest descent" rule depends on the exact parameterization of a policy class we are considering, and not simply on the policy class itself. One may analyze these algorithms in information-geometric terms and discover that there is indeed a version of "steepest descent" that is natural and covariant. Although the approach we describe here is motivated by purely theoretical considerations, the resulting algorithm obeys Vapnik's maxim that "Nothing is more practical than a good theory": in practice, the resulting covariant algorithm has lead to the best known performance in some benchmarks RL applications and often outperforms canonical gradient methods by orders of magnitude.

Thu 07/24/03
5.00pm
Wean 7220 Brendan McMahan Planning in the Presence of Cost Functions Controlled by an Adversary (ICML'03)

We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond to deterministic policies for the planning player and columns correspond to cost vectors the adversary can select. For a fixed cost vector, fast algorithms (such as value iteration) are available for solving MDPs. We develop efficient algorithms for matrix games where such best response oracles exist. We show that for our path planning problem these algorithms are at least an order of magnitude faster than direct solution of the linear programming formulation.

Thu 07/10/03
5.00pm
Wean 7220 Geoff Gordon Tutorial on Support Vector Machines

Thu 07/03/03
5.00pm
Wean 4623 Vandi Verma Variable Resolution Particle Filter

Particle filters are used extensively for tracking the state of nonlinear dynamic systems. This paper presents a new particle filter that maintains samples in the state space at dynamically varying resolution for computational efficiency. Resolution within statespace varies by region, depending on the belief that the true state lies within each region. Where belief is strong, resolution is fine. Where belief is low, resolution is coarse, abstracting multiple similar states together. The resolution of the statespace is dynamically updated as the belief changes. The proposed algorithm makes an explicit bias-variance tradeoff to select between maintaining samples in a biased generalization of a region of state space versus in a high variance specialization at fine resolution. Samples are maintained at a coarser resolution when the bias introduced by the generalization to a coarse resolution is outweighed by the gain in terms of reduction in variance, and at a finer resolution when it is not. Maintaining samples in abstraction prevents potential hypotheses from being eliminated prematurely for lack of a sufficient number of particles. Empirical results show that our variable resolution particle filter requires significantly lower computation for performance comparable to a classical particle filter.

Thu 06/19/03
5.00pm
Wean 4623 Mike Montemerlo Thesis practice talk

Thu 05/08/03
5.00pm
Wean 7220 Vandi Verma Efficient Monitoring for Planetary Rovers

Planetary rovers operate in environments where human intervention is expensive, slow, unreliable, or impossible. It is therefore essential to monitor the behavior of these robots so that contingencies may be addressed before they result in catastrophic failures. This monitoring needs to be efficient since there is limited computational power available on rovers. We propose an efficient particle filter for monitoring faults that combines the Unscented Kalman Filter (UKF) [7] and the Variable Resolution Particle Filter (VRPF) [16]. We begin by using the UKF to obtain an improved proposal distribution for a particle filter which tracks discrete fault variables as part of its state space. This requires computing an unscented transform for every particle and every possible discrete transition to a fault or nominal state at each instant in time. Since there are potentially a large number of faults that may occur at any instant, this approach does not scale well. We use the VRPF to address this concern. The VRPF tracks abstract states that may represent single states or sets of states. There are many fewer transitions between states when they are represented in abstraction. We show that the VRPF in conjunction with a UKF proposal improves performance and may potentially be used in large state spaces. Experimental results show a significant improvement in efficiency.

Thu 05/01/03
5.30pm
Wean 7220 Dieter Fox People Tracking and Multi-Robot Exploration

This talk consists of two parts. In the first part I will present some recent work we did on particle filter-based people tracking. In the context of tracking multiple people using a combination of anonymous and id sensors installed throughout the environment, particles represent assignments between objects and observations (similar to multi hypothesis tracking). To estimate the location of people using GPS (outdoors) or very noisy, sparse indoor data, we project particles onto graph structures. Such a representation has two key advantages. First, less samples are needed to estimate a location on the graph. Second, the graph provides a discretization of continuous trajectories, thereby making the learning of long term behaviour patterns much easier. In the second part I will describe a hierarchical Bayesian approach used for multi robot exploration. The technique allows multiple robots to merge their maps even under complete uncertainty about their relative start locations.

Thu 03/20/03
5.00pm
Wean 7220 Trey Smith Brenda Ng, Leonid Peshkin, Avi Pfeffer, "Factored Particles for Scalable Monitoring," UAI-02

This paper presents a new family of approximate monitoring algorithms that combines the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that the factored particle method outperforms ordinary particle filtering.

Thu 03/13/03
5.00pm
Wean 7220 Brendan McMahan I'll discuss an algorithm for online convex programming, and try to motivate it with some interesting problems from a robotics perspective, including an online shortest path problem. This is work being done here at CMU by Martin Zinkevich:

Martin Zinkevich, "Online Convex Programming and Generalized Infinitesimal Gradient Ascent," CMU Technical Report CMU-CS-03-110.

Convex programming involves a convex set F and a convex functionc:F->R. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.

Thu 03/06/03
5.00pm
Wean 7220 Joelle Pineau and Geoff Gordon Belief compression in POMDPs; Poisson PCA.

Thu 2/6/03
5.00pm
Wean 7220 Drew Bagnell H-infinity control

Thu 1/30/03
5.00pm
Wean 7220 Meeting cancelled.

Thu 1/23/03
5.00pm
Wean 7220 Allison Bruce Atkeson and Morimoto, "Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach", NIPS 2002. (Cont'd.)

Second half of last week's talk.

Thu 1/16/03
5.00pm
Wean 7220 Allison Bruce Atkeson and Morimoto, "Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach", NIPS 2002

This paper describes a method for reinforcement learning in which policies and value functions are represented along selected trajectories. Local linear models are used to estimate the dynamics of the system. Given a trajectory, the first and second derivatives of the value function and the first derivatives of the policy can be updated in a backwards sweep that is derived from the discrete time Riccati equations. In addition to explaining this approach and describing its application to two simple tasks (pendulum swinging and hopping), I'll attempt to relate it to some of the control theory that we learned last semester.

Additional reading:

Atkeson, "Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming", NIPS 94.

Stengel, R. F., "Optimal Control and Estimation", pp.270-284.

Fall 2002

Thu 12/05/02
5.00pm
Wean 7220 Geoff Gordon Designing a simple controller using the tricks of feedback linearization and proportional feedback.

Thu 11/28/02
5.00pm
Wean 7220 No meeting - Thanksgiving.

Thu 11/21/02
5.00pm
Wean 7220 Meeting cancelled.

Thu 11/14/02
5.00pm
Wean 7220 Rosemary Emery-Montemerlo Optimality Conditions for control of linear problems.

Stengel, R. F., "Optimal Control and Estimation", Section 3.4.

Thu 11/7/02
5.00pm
Wean 7220 Andrew Stein and Dave Ferguson How to derive linear controllers and Matlab demo of adaptive robot arm. [Slides&pictures(.tgz)]

Thu 10/31/02
5.00pm
Wean 7220 Mike Montemerlo More fun with control: Bode plots, root locus, frequency response and block diagrams.

Stengel, R. F., "Optimal Control and Estimation", Chapter 2.

Thu 10/24/02
5.00pm
Wean 7220 Matt Rosencrantz How to find a deterministic optimal trajectory.

Stengel, R. F., "Optimal Control and Estimation", Sections 3.1-3.4.

Thu 10/17/02
5.00pm
Wean 7220 Brendan McMahan Behavior of linear ODEs and how to solve them using Laplace transform methods. [SLIDES (.pdf)]

Stengel, R. F., "Optimal Control and Estimation", Sections 2.3, 2.5, 2.6. Examples 2.6.1.

Thu 10/10/02
5.00pm
Wean 7220 No meeting.

Thu 10/3/02
5.00pm
Wean 7220 Drago Anguelov Learning Object Maps from Laser Range Data

In this talk I will motivate the need for learning symbolic maps that represent the world in terms of objects. I will describe an algorithm that learns a map in terms of high-level objects -- walls and doors -- rather than in terms of occupancy grids. The algorithm uses a probabilistic generative model, that describes the shape and motion properties of the objects in the environment. It also includes a realistic sensor model, which addresses the phenomenon of geometric occlusion and specifies how the objects in the environment generate the robot's laser range readings. I will show how the algorithm uses EM to construct a map of an office environment containing both static walls and dynamic doors.

Thu 9/26/02
5.00pm
Wean 7220 Rahul Biswas A Hierarchical Non-Stationary Object Mapping Algorithm for Mobile Robots

I will present some of our recent work in mapping environments where objects change location over time. Using a set of static maps collected by a mobile robot, we learn high-fidelity shape models of both objects and classes of objects. I will discuss our techniques for leveraging multiple observations to improve models, solving the data association problem between observations and obejcts, and clustering objects into classes. I will also show a technique for determining how many objects and classes exist even when only a subset of objects is present in any map.

Spring 2002

Thu 05/16/02
5.00pm
Wean 7220 Rosemary Emery and Mike Montemerlo ICRA redux

Thu 05/10/02
5.00pm
Wean 7220 Rosemary Emery TBA

Thu 05/03/02
5.00pm
Wean 7220 No meeting.

Thu 04/25/02
5.00pm
Wean 7220 Mike Montemerlo Michael Littman. "PROVERB: the Probabilistic Cruciverbalist"

Thu 04/09/02
5.00pm
Wean 7220 Allison Bruce Tim Oates, Laura Firoiu, and Paul R. Cohen. Clustering time series with hidden markov models and dynamic time warping. In working notes of the IJCAI-99 workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, pages 17-21, 1999.

This paper presents a method for automatically determining K, the number of generating HMMs, and for learning the parameters of those HMMs. An initial estimate of K is obtained by unsupervised clustering of the time series using dynamic time warping (DTW) to assess similarity. In addition to producing an estimate of K, this process yields an initial partitioning of the data. As later sections will explain, DTW is related to HMM training algorithms but is weaker in several respects. Therefore, the clusters based on DTW are likely to contain mistakes. These initial clusters serve as input to a process that trains one HMM on each cluster and iteratively moves time series between clusters based on their likelihoods given the various HMMs. Ultimately the process converges to a final clustering of the data and a generative model (the HMMs) for each of the clusters.

Thu 02/28/02
5.00pm
Wean 7220 Sungwoo Park In this talk, I'll show you several works on extending traditional framework of programming languages to incorporate probabilistic expressions, and introduce my work in this direction, whose main idea is to use sampling functions to represent probabilistic quantities.

Traditional framework of programming languages assumes that sufficient information is available at the time of actual computation. However, it fails to model systems in which uncertainty is inherent. In order to model such systems, various probabilistic modeling languages have been developed. The problem with all these approaches is that they are primarily built upon discrete computational framework only. As a consequence, it is hard to encode continuous probabilistic quantities, which frequently arise in the robotics applications, under this discrete computation framework. In this talk, I will introduce a different approach to express probabilistic quantities. The main idea is simple: instead of probability measures, we exploit sampling functions, that is, measurable mappings from Borel sets to sample domains. This approach brings both discrete and continuous probability distributions (and even weird distributions which do not belong to either class) under a unified framework of computation. I will also discuss an application of this approach to a real robotics problem, and its drawbacks (and hopefully strengths).

Thu 02/21/02
5.00pm
Wean 7220 Tom Minka Most informative projections

Visualization allows quick determination of an appropriate data model, and projection is an intuitive way to visualize high-dimensional data. I will present new projection techniques for classification and regression data. In contrast to unsupervised methods like PCA, these projections deliberately try to preserve the dependence or `mutual information' between the input and output variables, thereby revealing what sort of classifier or regression function you should be using. For classification, they are typically much more informative than Fisher projection, which assumes the classes are Gaussian with equal covariance. Furthermore, with the right choice of objective function, the optimal projections can be computed quickly.

Thu 01/31/02
5.00pm
Wean 7220 Nick Roy "Algorithms for Inverse Reinforcement Learning" Andrew Ng and Stuart Russell

This paper addresses the problem of inverse reinforcement learning IRL in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behavior. IRL may be useful for apprenticeship learning to acquire skilled behavior, and for ascertaining the reward function being optimized by a natural system. We first characterize the set of all reward functions for which a given policy is optimal. We then derive three algorithms for IRL. The first two deal with the case where the entire policy is known; we handle tabulated reward functions on a finite state space and linear functional approximation of the reward function over a potentially infinite state space. The third algorithm deals with the more realistic case in which the policy is known only through a finite set of observed trajectories. In all cases, a key issue is degeneracy -- the existence of a large set of reward functions for which the observed policy is optimal. To remove degeneracy, we suggest some natural heuristics that attempt to pick a reward function that maximally differentiates the observed policy from other, sub-optimal policies. This results in an efficiently solvable linear programming formulation of the IRL problem. We demonstrate our algorithms on simple discrete/finite and continuous/infinite state problems.

Thu 01/24/02
5.00pm
Wean 7220 David Hershberger Randomized Kinodynamic Planning (1999) Steven M. LaValle, James J. Kuffner, Jr.

Abstract: This paper presents a state-space perspective on the kinodynamic planning problem, and introduces a randomized path planning technique that computes collision-free kinodynamic trajectories for high degreeof -freedom problems. By using a state space formulation, the kinodynamic planning problem is treated as a 2n-dimensional nonholonomic planning problem, derived from an n-dimensional configuration space. The state space serves the same role as the configuration space for basic path planning; however, standard randomized path planning techniques do not directly apply to planning trajectories in the state space. We have developed a randomized planning approach that is particularly tailored to kinodynamic problems in state spaces, although it also applies to standard nonholonomic and holonomic planning problems. The basis for this approach is the construction of a tree that attempts to rapidly and uniformly explore the state space, offering benefits that are similar to those obtained by successful randomized planning methods, but applies to a much broader class of problems. Some preliminary results are discussed for an implementation that determines kinodynamic trajectories for hovercrafts and satellites in cluttered environments, resulting in state spaces of up to 12 dimensions.

Thu 01/17/02
5.00pm
Wean 7220 Chuck Rosenberg Partially Labeled Classification with Markov Random Walks (2001) Martin Szummer and Tommi Jaakkola

To classify a large number of unlabeled examples we combine a limited number of labeled examples with a Markov random walk representation over the unlabeled examples. The random walk representation exploits any low dimensional structure in the data in a robust, probabilistic manner. We develop and compare several estimation criteria/algorithms suited to this representation. This includes in particular multi-way classification with an average margin criterion which permits a closed form solution. The time scale of the random walk regularizes the representation and can be set through a margin-based criterion favoring unambiguous classification. We also extend this basic regularization by adapting time scales for individual examples. We demonstrate the approach on synthetic examples and on text classification problems.

Fall 2001

We have a draft schedule of upcoming presentations.

Thu 3/1/01
5.00pm
Wean 7220 No Meeting

Thu 2/22/01
5.00pm
Wean 7220 No meeting

Thu 2/15/01
5.00pm
Wean 7220 Vandi Verma Bayesian Fault Detection and Diagnosis in Dynamic Systems

Thu 2/8/01
5.00pm
Wean 7220 Nick Roy Active Learning

Thu 2/1/01
5.00pm
Wean 7220 Frank Dellaert RANSAC I will describe RANSAC, a stochastic search algorithm that is used a lot in computer vision to find a good set of correspondences in the context of motion estimation.

Thu 1/25/01
5.00pm
Wean 7220 Tom Minka ``Mixture Kalman filters'' and Rao-Blackwellization from ``Sequential Monte Carlo methods for Dynamic Systems'' and ``On Sequential Simulation-Based Methods for Bayesian Filtering''

Fall 2000

Thu 12/7/00
5.00pm
Wean 7220 Dimitris Margaritis ``Approximate Bayes Net Inference''

I'll present a paper of Daphne Koller, Lerner and Angelov from UAI-99: "A General Algorithm for Approximate Inference and its Applications to Hybrid Bayes Nets".

Thu 11/30/00 NIPS*2000 (no meeting)

Thu 11/23/00 Thanksgiving break (no meeting)

Thu 11/16/00
4.30pm
Wean 7220 Joelle Pineau ``Hierarchical Policy Constraint for POMDP Planning and Execution''

POMDPs provide a useful framework for decision-making in the presence of uncertainty, however finding an optimal policy is computationally infeasible for large problems. I will present a hierarchical approach to POMDPs which takes advantage of structure in the domain to find policies for complex tasks. The idea is to divide the action set along domain-specific task decompositions. A POMDP problem is thereby broken-down into a hierachically related set of smaller problems (i.e. subtasks). A value function is learned locally (i.e. related to the local set of actions) for each subtask, thereby yielding local policies.

This extends the work on hierarchical reinforcement learning (Dietterich, NIPS'99) to the POMDP framework; the task decomposition and state abstraction are similar, however I propose significantly different planning and execution algorithms to ensure consistent belief states between subtasks, and to accomodate partial observability of subtask completion. This work is applied to the problem of dialogue management, for which I will present results showing that planning time was significantly reduced compared to a conventional POMDP, while the policy performed much better than when using a greedy heuristic approximation.

Thu 11/9/00
5.00pm
Wean 7220 Alexander Gray ``A Tutorial on Gaussian Processes''

I'll explain what Gaussian processes are, and why they are interesting. I may also talk about Gaussian process networks (which won the ICML-2000 best paper award), and how to overcome the computational problem of learning Gaussian processes using exciting and fabulous new CMU algorithms!

Thu 11/2/00
5.00pm
Wean 7220 John Langford ``A Tutorial on Wavelets in Statistics''

Thu 10/26/00
5.00pm
Wean 7220 Chuck Rosenberg ``Semi-supervised Pattern Recognition Approaches in Vision''

An overview of two papers:

Recognition of Planar Object Classes 
M.C. Burl and P. Perona 
CVPR 96, San Francisco, CA, June 1996 
ftp://vision.caltech.edu/pub/tech-reports-vision/CVPR96-recog.ps.gz

Unsupervised Learning of Models for Recognition 
M. Weber, M. Welling and P. Perona 
ECCV 2000, Dublin, Ireland, June 2000
ftp://vision.caltech.edu/pub/tech-reports/ECCV00-recog.ps.gz

Thu 10/19/00
5.00pm
Wean 7220 Daniel Nikovski ``PEGASUS Reinforcement Learning''

An overview of the ICML-2000 paper by Ng and Jordan.

Spring 2000

Thu 5/11/00
5.00pm
Wean 7220 Liam Pedersen ``Bayes Networks on Ice: The Robotic Search for Antarctic Meteorites''

In January 2000 the Nomad robot was deployed to the ice sheets of Antarctica and made the first ever autonomous field identification of a meteorite by a robot. Nomad's rock/meteorite classifier is based on a Bayes network which addresses several issues unique to classifying rocks from a mobile robotic vehicle: the ambiguity of rock classes, the non-trivial costs of deploying sensors, noisy data, limited training examples and the need to incorporate human expert opinions.

I'll also describe the Robotic Antarctic Meteorite Search project and the latest results from the recent expedition 2000 to the Elephant Moraine in Antarctica.

Thu 5/4/00
5.00pm
Wean 7220 John Langford ``Making Sample Complexity Bounds Quantitatively Useful''

Thu 4/27/00
5.00pm
Wean 7220 Nick Roy ``Finding Approximate POMDP Solutions through Belief Compression''

Human beings take for granted their ability to move around and interact with each other in a wide range of environments. These same environments are tremendous sources of uncertainty for mobile robots. Not only does navigation introduce positional uncertainty, but systems that try to interact with human beings are faced with the tremendously noisy and ambiguous behaviours that humans exhibit.

The Partially Observable Markov Decision Process (POMDP) constitutes a particular planning methodology that attempts to model uncertainty in the way that a system acts and senses the world. POMDPs are unfortunately useless for most real world problems due to the overwhelming computational complexity involved with planning in belief spaces. I propose an method for finding an approximation of the optimal POMDP policy by compressing the full belief state into statistics that represent the belief space compactly, and demonstrate the utility of this technique in the domains of mobile robot navigation and speech dialogue management.

Thu 4/20/00
5.00pm
Wean 7220 Eric Bauer ``Boosted Mixture Models''

Thu 4/13/00
5.00pm
Wean 7220 Daniel Nikovski ``Pattern discovery via entropy minimization, parameter extinction, and deterministic annealing''

I will present three papers by Matthew Brand on several novel techniques for learning probabilistic models and pattern discovery. Entropic priors are derived from the basic premise that the world is predictable, and actually make sense, unlike most other priors people have been using. Parameter extinction is a related technique that produces sparse probability tables and might even make the learned models interpretable by humans. Finally, deterministic annealing overcomes shallow local maxima in likelihood and produces quasi-global maxima, which improves the quality of the models. The combination of these techniques results in a method for learning the structure of probabilistic models, which is much better and faster than the usual test-and-discard approach.

The three papers are available online, if you want to check them out in advance: http://www.merl.com/projects/structure/index.html

Thu 4/6/00
5.00pm
Wean 7220 Dieter Fox ``Multi-Hypothesis Tracking''

I will present some of the work done in Stuart Russell's group on traffic surveillance. One of the key problems in this context is to detect that cars observed at different cameras are actually the same car (object identification). They apply techniques similar to those used in the data association community to solve this problem.

Suggested Paper:

Hanna Pasula, Stuart Russell, Michael Ostland, and Ya'acov Ritov,
Tracking many objects with many sensors. In Proc. IJCAI-99, Stockholm, 1999.

Thu 3/23/00
5.00pm
Wean 7220 John Langford ``Improvements in Multi-Robot Localization''

Thu 3/16/00
5.00pm
Wean 4625 Mike Gross ``PERSES: A Vision-Based Interactive Mobile Shopping Assistant''

Mobile robot with omnicam vision and gesture-based interaction.

Thu 3/9/00
5.00pm
Wean 7220 Frank Dellaert ``Structure from Motion with EM''

Thu 3/2/00
5.00pm
Wean 7220 Sebastian Thrun ``SVM Tutorial''

Thu 2/10/00
5.00pm
Wean 7220 Joelle Pineau ``Introduction to Hierarchical Reinforcement Learning''

Suggested Papers:

Dietterich, T. G., 1998, The MAXQ method for hierarchical reinforcement learning, ICML 1998.

Dayan, P. and Hinton, G. E., 1993, Feudal Reinforcement Learning, NIPS 5.

Parr, R. and Russell, S., 1997 Reinforcement Learning with Hierarchies of Machines, NIPS 1997.

In this talk I will present an introduction to hierarchical reinforcement learning. Hierarchical decomposition of reinforcement learning (RL) problems has been used to accelerate learning in large-scale MDP problems by exploiting structure in the problem domain. The components of the hierarchy typically consist of related subtasks with varying degrees of specialization, which can independently learn optimal policies. The papers listed above each present a different approach to hierarchical RL. I will explain the general motivation, characteristics, advantages and limitations of hierarchical RL, and will present a few examples drawn from the Dietterich(1998) and Dayan&Hinton(1993) papers.

Thu 2/3/00
5.00pm
Wean 7220 Daniel Nikovski ``An Introduction to Variational Methods for Graphical Models''

Paper by M. I. Jordan, Z. Ghahramani, T.S. Jaakkola and L.K. Saul

In M.I. Jordan (editor) Learning in Graphical Models, p. 105, Dordrecht: Kluwer Academic Publishers (1998).

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models. We present a number of examples of graphical models, including QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduct variational methods, showing how upper and lower bounds can be found for local probabilities, and discussing methods for extending these bounds to bounds on global probabilities of interest. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

Talks from previous semesters are linked below:

Retreats!

ALL

1999 Spring , Fall
1998 Spring , Fall
1997 Fall

Dimitris Margaritis (last update April 8, 2002)