We have informal meetings almost every Thursday at 5pm in Wean Hall room 7220.
The purpose of the meetings is to make people of the robot learning community of CMU and visitors aware of interesting papers and results that have come to the attention of the speakers, and also for the informal exchange of ideas.
Food is provided :)
If you have any questions concerning our meetings, please contact Geoff Gordon at "ggordon AT cs DOT cmu DOT edu". If you'd like to suggest a change to this web page, please contact Joelle Pineau or Max Likhachev at "maxim+ AT cs DOT cmu DOT edu".
Spring 2003
Here is a tentative schedule of upcoming
presentations.
Most standard policy search reinforcement learning algorithms exhibit non-covariant behavior during the learning process. That is, the "steepest descent" rule depends on the exact parameterization of a policy class we are considering, and not simply on the policy class itself. One may analyze these algorithms in information-geometric terms and discover that there is indeed a version of "steepest descent" that is natural and covariant. Although the approach we describe here is motivated by purely theoretical considerations, the resulting algorithm obeys Vapnik's maxim that "Nothing is more practical than a good theory": in practice, the resulting covariant algorithm has lead to the best known performance in some benchmarks RL applications and often outperforms canonical gradient methods by orders of magnitude.
We investigate methods for planning in a Markov Decision Process where the cost function is chosen by an adversary after we fix our policy. As a running example, we consider a robot path planning problem where costs are influenced by sensors that an adversary places in the environment. We formulate the problem as a zero-sum matrix game where rows correspond to deterministic policies for the planning player and columns correspond to cost vectors the adversary can select. For a fixed cost vector, fast algorithms (such as value iteration) are available for solving MDPs. We develop efficient algorithms for matrix games where such best response oracles exist. We show that for our path planning problem these algorithms are at least an order of magnitude faster than direct solution of the linear programming formulation.
Particle filters are used extensively for tracking the state of nonlinear dynamic systems. This paper presents a new particle filter that maintains samples in the state space at dynamically varying resolution for computational efficiency. Resolution within statespace varies by region, depending on the belief that the true state lies within each region. Where belief is strong, resolution is fine. Where belief is low, resolution is coarse, abstracting multiple similar states together. The resolution of the statespace is dynamically updated as the belief changes. The proposed algorithm makes an explicit bias-variance tradeoff to select between maintaining samples in a biased generalization of a region of state space versus in a high variance specialization at fine resolution. Samples are maintained at a coarser resolution when the bias introduced by the generalization to a coarse resolution is outweighed by the gain in terms of reduction in variance, and at a finer resolution when it is not. Maintaining samples in abstraction prevents potential hypotheses from being eliminated prematurely for lack of a sufficient number of particles. Empirical results show that our variable resolution particle filter requires significantly lower computation for performance comparable to a classical particle filter.
Planetary rovers operate in environments where human intervention is expensive, slow, unreliable, or impossible. It is therefore essential to monitor the behavior of these robots so that contingencies may be addressed before they result in catastrophic failures. This monitoring needs to be efficient since there is limited computational power available on rovers. We propose an efficient particle filter for monitoring faults that combines the Unscented Kalman Filter (UKF) [7] and the Variable Resolution Particle Filter (VRPF) [16]. We begin by using the UKF to obtain an improved proposal distribution for a particle filter which tracks discrete fault variables as part of its state space. This requires computing an unscented transform for every particle and every possible discrete transition to a fault or nominal state at each instant in time. Since there are potentially a large number of faults that may occur at any instant, this approach does not scale well. We use the VRPF to address this concern. The VRPF tracks abstract states that may represent single states or sets of states. There are many fewer transitions between states when they are represented in abstraction. We show that the VRPF in conjunction with a UKF proposal improves performance and may potentially be used in large state spaces. Experimental results show a significant improvement in efficiency.
This talk consists of two parts. In the first part I will present some recent work we did on particle filter-based people tracking. In the context of tracking multiple people using a combination of anonymous and id sensors installed throughout the environment, particles represent assignments between objects and observations (similar to multi hypothesis tracking). To estimate the location of people using GPS (outdoors) or very noisy, sparse indoor data, we project particles onto graph structures. Such a representation has two key advantages. First, less samples are needed to estimate a location on the graph. Second, the graph provides a discretization of continuous trajectories, thereby making the learning of long term behaviour patterns much easier. In the second part I will describe a hierarchical Bayesian approach used for multi robot exploration. The technique allows multiple robots to merge their maps even under complete uncertainty about their relative start locations.
This paper presents a new family of approximate monitoring algorithms that combines the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that the factored particle method outperforms ordinary particle filtering.
Martin Zinkevich, "Online Convex Programming and Generalized Infinitesimal Gradient Ascent," CMU Technical Report CMU-CS-03-110.
Convex programming involves a convex set F and a convex functionc:F->R. The goal of convex programming is to find a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.
Second half of last week's talk.
This paper describes a method for reinforcement learning in which policies and value functions are represented along selected trajectories. Local linear models are used to estimate the dynamics of the system. Given a trajectory, the first and second derivatives of the value function and the first derivatives of the policy can be updated in a backwards sweep that is derived from the discrete time Riccati equations. In addition to explaining this approach and describing its application to two simple tasks (pendulum swinging and hopping), I'll attempt to relate it to some of the control theory that we learned last semester.
Additional reading:
Atkeson, "Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming", NIPS 94.
Stengel, R. F., "Optimal Control and Estimation", pp.270-284.
Fall 2002
Stengel, R. F., "Optimal Control and Estimation", Section 3.4.
Stengel, R. F., "Optimal Control and Estimation", Chapter 2.
Stengel, R. F., "Optimal Control and Estimation", Sections 3.1-3.4.
Stengel, R. F., "Optimal Control and Estimation", Sections 2.3, 2.5, 2.6. Examples 2.6.1.
In this talk I will motivate the need for learning symbolic maps that represent the world in terms of objects. I will describe an algorithm that learns a map in terms of high-level objects -- walls and doors -- rather than in terms of occupancy grids. The algorithm uses a probabilistic generative model, that describes the shape and motion properties of the objects in the environment. It also includes a realistic sensor model, which addresses the phenomenon of geometric occlusion and specifies how the objects in the environment generate the robot's laser range readings. I will show how the algorithm uses EM to construct a map of an office environment containing both static walls and dynamic doors.
I will present some of our recent work in mapping environments where objects change location over time. Using a set of static maps collected by a mobile robot, we learn high-fidelity shape models of both objects and classes of objects. I will discuss our techniques for leveraging multiple observations to improve models, solving the data association problem between observations and obejcts, and clustering objects into classes. I will also show a technique for determining how many objects and classes exist even when only a subset of objects is present in any map.
Spring 2002
This paper presents a method for automatically determining K, the number of generating HMMs, and for learning the parameters of those HMMs. An initial estimate of K is obtained by unsupervised clustering of the time series using dynamic time warping (DTW) to assess similarity. In addition to producing an estimate of K, this process yields an initial partitioning of the data. As later sections will explain, DTW is related to HMM training algorithms but is weaker in several respects. Therefore, the clusters based on DTW are likely to contain mistakes. These initial clusters serve as input to a process that trains one HMM on each cluster and iteratively moves time series between clusters based on their likelihoods given the various HMMs. Ultimately the process converges to a final clustering of the data and a generative model (the HMMs) for each of the clusters.
Traditional framework of programming languages assumes that sufficient information is available at the time of actual computation. However, it fails to model systems in which uncertainty is inherent. In order to model such systems, various probabilistic modeling languages have been developed. The problem with all these approaches is that they are primarily built upon discrete computational framework only. As a consequence, it is hard to encode continuous probabilistic quantities, which frequently arise in the robotics applications, under this discrete computation framework. In this talk, I will introduce a different approach to express probabilistic quantities. The main idea is simple: instead of probability measures, we exploit sampling functions, that is, measurable mappings from Borel sets to sample domains. This approach brings both discrete and continuous probability distributions (and even weird distributions which do not belong to either class) under a unified framework of computation. I will also discuss an application of this approach to a real robotics problem, and its drawbacks (and hopefully strengths).
Visualization allows quick determination of an appropriate data model, and projection is an intuitive way to visualize high-dimensional data. I will present new projection techniques for classification and regression data. In contrast to unsupervised methods like PCA, these projections deliberately try to preserve the dependence or `mutual information' between the input and output variables, thereby revealing what sort of classifier or regression function you should be using. For classification, they are typically much more informative than Fisher projection, which assumes the classes are Gaussian with equal covariance. Furthermore, with the right choice of objective function, the optimal projections can be computed quickly.
This paper addresses the problem of inverse reinforcement learning IRL in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behavior. IRL may be useful for apprenticeship learning to acquire skilled behavior, and for ascertaining the reward function being optimized by a natural system. We first characterize the set of all reward functions for which a given policy is optimal. We then derive three algorithms for IRL. The first two deal with the case where the entire policy is known; we handle tabulated reward functions on a finite state space and linear functional approximation of the reward function over a potentially infinite state space. The third algorithm deals with the more realistic case in which the policy is known only through a finite set of observed trajectories. In all cases, a key issue is degeneracy -- the existence of a large set of reward functions for which the observed policy is optimal. To remove degeneracy, we suggest some natural heuristics that attempt to pick a reward function that maximally differentiates the observed policy from other, sub-optimal policies. This results in an efficiently solvable linear programming formulation of the IRL problem. We demonstrate our algorithms on simple discrete/finite and continuous/infinite state problems.
Abstract: This paper presents a state-space perspective on the kinodynamic planning problem, and introduces a randomized path planning technique that computes collision-free kinodynamic trajectories for high degreeof -freedom problems. By using a state space formulation, the kinodynamic planning problem is treated as a 2n-dimensional nonholonomic planning problem, derived from an n-dimensional configuration space. The state space serves the same role as the configuration space for basic path planning; however, standard randomized path planning techniques do not directly apply to planning trajectories in the state space. We have developed a randomized planning approach that is particularly tailored to kinodynamic problems in state spaces, although it also applies to standard nonholonomic and holonomic planning problems. The basis for this approach is the construction of a tree that attempts to rapidly and uniformly explore the state space, offering benefits that are similar to those obtained by successful randomized planning methods, but applies to a much broader class of problems. Some preliminary results are discussed for an implementation that determines kinodynamic trajectories for hovercrafts and satellites in cluttered environments, resulting in state spaces of up to 12 dimensions.
To classify a large number of unlabeled examples we combine a limited number of labeled examples with a Markov random walk representation over the unlabeled examples. The random walk representation exploits any low dimensional structure in the data in a robust, probabilistic manner. We develop and compare several estimation criteria/algorithms suited to this representation. This includes in particular multi-way classification with an average margin criterion which permits a closed form solution. The time scale of the random walk regularizes the representation and can be set through a margin-based criterion favoring unambiguous classification. We also extend this basic regularization by adapting time scales for individual examples. We demonstrate the approach on synthetic examples and on text classification problems.
Fall 2001
We have a draft schedule of upcoming
presentations.
Spring 2001
Fall 2000
I'll present a paper of Daphne Koller, Lerner and Angelov from UAI-99: "A General Algorithm for Approximate Inference and its Applications to Hybrid Bayes Nets".
POMDPs provide a useful framework for decision-making in the presence of uncertainty, however finding an optimal policy is computationally infeasible for large problems. I will present a hierarchical approach to POMDPs which takes advantage of structure in the domain to find policies for complex tasks. The idea is to divide the action set along domain-specific task decompositions. A POMDP problem is thereby broken-down into a hierachically related set of smaller problems (i.e. subtasks). A value function is learned locally (i.e. related to the local set of actions) for each subtask, thereby yielding local policies.
This extends the work on hierarchical reinforcement learning (Dietterich, NIPS'99) to the POMDP framework; the task decomposition and state abstraction are similar, however I propose significantly different planning and execution algorithms to ensure consistent belief states between subtasks, and to accomodate partial observability of subtask completion. This work is applied to the problem of dialogue management, for which I will present results showing that planning time was significantly reduced compared to a conventional POMDP, while the policy performed much better than when using a greedy heuristic approximation.
I'll explain what Gaussian processes are, and why they are interesting. I may also talk about Gaussian process networks (which won the ICML-2000 best paper award), and how to overcome the computational problem of learning Gaussian processes using exciting and fabulous new CMU algorithms!
An overview of two papers:
Recognition of Planar Object Classes M.C. Burl and P. Perona CVPR 96, San Francisco, CA, June 1996 ftp://vision.caltech.edu/pub/tech-reports-vision/CVPR96-recog.ps.gz
Unsupervised Learning of Models for Recognition M. Weber, M. Welling and P. Perona ECCV 2000, Dublin, Ireland, June 2000 ftp://vision.caltech.edu/pub/tech-reports/ECCV00-recog.ps.gz
An overview of the ICML-2000 paper by Ng and Jordan.
Spring 2000
In January 2000 the Nomad robot was deployed to the ice sheets of Antarctica and made the first ever autonomous field identification of a meteorite by a robot. Nomad's rock/meteorite classifier is based on a Bayes network which addresses several issues unique to classifying rocks from a mobile robotic vehicle: the ambiguity of rock classes, the non-trivial costs of deploying sensors, noisy data, limited training examples and the need to incorporate human expert opinions.
I'll also describe the Robotic Antarctic Meteorite Search project and the latest results from the recent expedition 2000 to the Elephant Moraine in Antarctica.
Human beings take for granted their ability to move around and interact with each other in a wide range of environments. These same environments are tremendous sources of uncertainty for mobile robots. Not only does navigation introduce positional uncertainty, but systems that try to interact with human beings are faced with the tremendously noisy and ambiguous behaviours that humans exhibit.
The Partially Observable Markov Decision Process (POMDP) constitutes a particular planning methodology that attempts to model uncertainty in the way that a system acts and senses the world. POMDPs are unfortunately useless for most real world problems due to the overwhelming computational complexity involved with planning in belief spaces. I propose an method for finding an approximation of the optimal POMDP policy by compressing the full belief state into statistics that represent the belief space compactly, and demonstrate the utility of this technique in the domains of mobile robot navigation and speech dialogue management.
I will present three papers by Matthew Brand on several novel techniques for learning probabilistic models and pattern discovery. Entropic priors are derived from the basic premise that the world is predictable, and actually make sense, unlike most other priors people have been using. Parameter extinction is a related technique that produces sparse probability tables and might even make the learned models interpretable by humans. Finally, deterministic annealing overcomes shallow local maxima in likelihood and produces quasi-global maxima, which improves the quality of the models. The combination of these techniques results in a method for learning the structure of probabilistic models, which is much better and faster than the usual test-and-discard approach.
The three papers are available online, if you want to check them out in advance: http://www.merl.com/projects/structure/index.html
I will present some of the work done in Stuart Russell's group on traffic surveillance. One of the key problems in this context is to detect that cars observed at different cameras are actually the same car (object identification). They apply techniques similar to those used in the data association community to solve this problem.
Mobile robot with omnicam vision and gesture-based interaction.
Paper by M. I. Jordan, Z. Ghahramani, T.S. Jaakkola and L.K. Saul
In M.I. Jordan (editor) Learning in Graphical Models, p. 105, Dordrecht: Kluwer Academic Publishers (1998).
This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models. We present a number of examples of graphical models, including QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduct variational methods, showing how upper and lower bounds can be found for local probabilities, and discussing methods for extending these bounds to bounds on global probabilities of interest. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.
Talks from previous semesters are linked below:
ALL
1999
Spring ,
Fall
1998
Spring ,
Fall
1997
Fall