Probabilistic Pointing Target Prediction via Inverse Optimal
Control
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell
International Conference on Intelligent User Interfaces
(IUI 2012)
[]
[
pdf]
[]
Abstract
Numerous interaction techniques have been developed that
make "virtual" pointing at targets in graphical user interfaces
easier than analogous physical pointing tasks by invoking
target-based interface modifications. These pointing facilitation
techniques crucially depend on methods for estimating the
relevance of potential targets. Unfortunately, many of
the simple methods employed to date are inaccurate in common
settings with many selectable targets in close proximity.
In this paper, we bring recent advances in statistical machine
learning to bear on this underlying target relevance estimation
problem. By framing past target-driven pointing trajectories
as approximate solutions to well-studied control problems,
we learn the probabilistic dynamics of pointing trajectories
that enable more accurate predictions of intended targets.
Bibtex
@inproceedings{ziebart2012probabilistic,
author = {Brian D. Ziebart and Anind K. Dey and J. Andrew Bagnell},
title = {Probabilistic Pointing Target Prediction via Inverse Optimal
Control},
year = {2012},
booktitle = {Proc. of the International Conference on Intelligent User
Interfaces}
}
Best Paper Award Nominee
Factorized Decision Forecasting via Combining Value-based and
Reward-based Estimation
Brian D. Ziebart
Allerton Conference on Communication, Control and Computing
(Allerton 2011)
[]
[
pdf]
[]
Abstract
A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems
that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities,
which can often be defined as functions of state and action
features to enable generalization across decision tasks. Two
approaches have been proposed from this perspective: estimate
a feature-based reward function and recursively compute values
from it, or directly estimate a feature-based value function.
In this work, we investigate the combination of these two approaches into a single learning task using directed information
theory and the principle of maximum entropy. This enables
uncovering which type of estimate is most appropriate -- in
terms of predictive accuracy and/or computational benefit -- for
different portions of the decision space.
Bibtex
@inproceedings{ziebart2011process,
author = {Brian D. Ziebart},
title = {Factorized Decision Forecasting via Combining Value-based
and Reward-based Estimation},
year = {2011},
booktitle = {Proc. of the Allerton Conference on Communications,
Control and Computing}
}
Process-Conditioned Investing with Incomplete Information
using Maximum Causal Entropy
Brian D. Ziebart
International Workshop on Bayesian Inference and Maximum Entropy
Methods in Science and Engineering (MaxEnt 2011)
[]
[
pdf]
[]
Abstract
Investing to optimally maximize the growth rate of wealth based on sequences of event
outcomes has many information-theoretic interpretations. Namely, the mutual information characterizes the benefit of additional side information being available when making investment decisions
in settings where the probabilistic relationships between side information and event outcomes
are known. Additionally, the relative variant of the principle of maximum entropy provides the
optimal investment allocation in the more general setting where the relationships between side information and event outcomes are only partially known. In this paper, we build upon recent
work characterizing the growth rates of investment in settings with inter-dependent side information and event outcome sequences. We consider the extension to settings with inter-dependent
event outcomes and side information where the probabilistic relationships between side information and event outcomes are only partially known. We introduce the principle of minimum relative
causal entropy to obtain the optimal worst-case investment allocations for this setting. We present
efficient algorithms for obtaining these investment allocations using convex optimization techniques
and dynamic programming that illustrates a close connection to optimal control theory.
Bibtex
@inproceedings{ziebart2011process,
author = {Brian D. Ziebart},
title = {Process-Conditioned Investing with Incomplete Information
using Maximum Causal Entropy},
year = {2011},
booktitle = {Proc. of the International Workshop on Bayesian Inference and
Maximum Entropy Methods in Science and Engineering}
}
Computational Rationalization: The Inverse Equilibrium
Problem
Kevin Waugh, Brian D. Ziebart, and J. Andrew Bagnell
International Conference on Machine Learning (ICML 2011).
[]
[
pdf]
[]
Best Paper Award
(An earlier version appeared in
Workshop on
Decision Making with Multiple Imperfect Decision
Makers at
NIPS 2010.)
Abstract
Modeling the purposeful behavior of imperfect agents from a small
number of observations is a challenging task. When restricted to
the single-agent decision-theoretic setting, inverse optimal
control techniques assume that observed behavior is an
approximately optimal solution to an unknown decision problem.
These techniques learn a utility function that explains the
example behavior and can then be used to accurately predict
or imitate future behavior in similar observed or unobserved
situations.
In this work, we consider similar tasks in competitive and
cooperative multi-agent domains. Here, unlike single-agent
settings, a player cannot myopically maximize its reward -- it
must speculate on how the other agents may act to influence the
game's outcome. Employing the game-theoretic notion of regret and
the principle of maximum entropy, we introduce a technique for
predicting and generalizing behavior, as well as recovering
a reward function in these domains.
Bibtex
@inproceedings{waugh2011computational,
author = {Kevin Waugh and Brian D. Ziebart and J. Andrew Bagnell},
title = {Computational Rationalization: The Inverse Equilibrium Problem},
year = {2011},
booktitle = {Proc. of the International Conference on Machine Learning}
}
Maximum Causal Entropy Correlated Equilibria for
Markov Games
Brian D. Ziebart, J. Andrew Bagnell, and Anind K. Dey
International Conference on Autonomous Agents and
Multiagent Systems (AAMAS 2011).
[]
[
pdf]
[]
(An earlier version appeared in the
Interactive Decision Theory and Game Theory Workshop
at
AAAI 2010.)
Abstract
Motivated by a machine learning perspective -- that game-theoretic
equilibria constraints should serve as guidelines for predicting
agents' strategies,
we introduce maximum causal entropy correlated equilibria (MCECE),
a novel solution concept for general-sum Markov games.
In line with this perspective, a MCECE strategy profile is a
uniquely-defined joint probability distribution over actions for
each game state that minimizes the worst-case prediction of agents'
actions under log-loss.
Equivalently, it maximizes the worst-case growth rate for gambling
on the sequences of agents' joint actions under uniform odds.
We present a convex optimization technique for obtaining MCECE
strategy profiles that resembles value iteration in finite-horizon
games. We assess the
predictive benefits of our approach by predicting the strategies
generated by previously proposed correlated equilibria solution
concepts, and compare against those previous approaches on that
same prediction task.
Bibtex
@inproceedings{ziebart2011maximum,
author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
title = {Maximum Causal Entropy Correlated Equilibria for {M}arkov Games},
year = {2011},
booktitle = {Proc. of the International Conference on Autonomous Agents
and Multiagent Systems}
}
Learning Patterns of Pick-ups and Drop-offs to Support Busy
Family Coordination
Scott Davidoff, Brian D. Ziebart, John Zimmerman, and Anind K. Dey
SIG CHI Conference on Human Factors in Computing
Systems (CHI 2011).
[]
[
pdf]
[]
Abstract
Part of being a parent is taking responsibility for arranging and
supplying transportation of children between various events.
Dual-income parents frequently develop routines to help manage
transportation with a minimal amount of attention. On days when
families deviate from their routines, effective logistics can often
depend on knowledge of the routine location, availability and
intentions of other family members. Since most families rarely
document their routine activities, making that needed information
unavailable, coordination breakdowns are much more likely to occur.
To address this problem we demonstrate the feasibility of learning
family routines using mobile phone GPS. We describe how we (1) detect
pick-ups and drop- offs; (2) predict which parent will perform a
future pick-up or drop-off; and (3) infer if a child will be left at
an activity. We discuss how these routine models give digital
calendars, reminder and location systems new capabilities to help
prevent breakdowns, and improve family life.
Bibtex
@inproceedings{davidoff2011learning,
author = {Scott Davidoff and Brian D. Ziebart and John Zimmerman and
Anind K. Dey},
title = {Learning Patterns of Pick-ups and Drop-offs to Support Busy
Family Coordination},
year = {2011},
booktitle = {Proc. of the SIG CHI Conference on Human Factors in Computing
Systems}
}
Modeling Purposeful Adaptive Behavior with
the Principle of Maximum Causal Entropy
Brian D. Ziebart
PhD Thesis. Department of Machine Learning. December 2010.
[]
[
pdf]
[]
School of Computer Science Distinguished
Dissertation Award, Honorable Mention
Abstract
Predicting human behavior from a small amount of training examples
is a challenging machine learning problem. In this thesis, we
introduce the principle of maximum causal entropy, a general
technique for applying information theory to decision-theoretic,
game-theoretic, and control settings where relevant information
is sequentially revealed over time. This approach guarantees
decision-theoretic performance by matching purposeful measures of
behavior (Abbeel & Ng, 2004), and/or enforces game-theoretic
rationality constraints (Aumann, 1974), while otherwise being
as uncertain as possible, which minimizes worst-case predictive
log-loss (Grunwald & Dawid, 2003).
We derive probabilistic models for decision, control, and
multi-player game settings using this approach. We then develop
corresponding algorithms for efficient inference that include
relaxations of the Bellman equation (Bellman, 1957), and simple
learning algorithms based on convex optimization. We apply the
models and algorithms to a number of behavior prediction tasks.
Specifically, we present empirical evaluations of the approach in
the domains of vehicle route preference modeling using over
100,000 miles of collected taxi driving data, pedestrian motion
modeling from weeks of indoor movement data, and robust prediction
of game play in stochastic multi-player games.
Bibtex
@phdthesis{ziebart2010modelingB},
author = {Brian D. Ziebart},
title = {Modeling Purposeful Adaptive Behavior with the Principle of
Maximum Causal Entropy},
year = {2010},
month = {Dec},
school = {Machine Learning Department, Carnegie Mellon University}
}
Modeling Interaction via the Principle of Maximum Causal Entropy
Brian D. Ziebart, J. Andrew Bagnell, and Anind K. Dey
International Conference on Machine Learning
(ICML 2010).
[]
[
pdf]
[]
Best Student Paper Award, Runner-Up
(An earlier version appeared in
Workshop on
Probabilistic Approaches for Robotics and Control at
NIPS 2009.)
Abstract
The principle of maximum entropy provides a powerful framework
for statistical models of joint, conditional, and marginal
distributions. However, there are many important distributions
with elements of interaction and feedback where its applicability
has not been established. This work presents the principle of
maximum causal entropy -- an approach based on causally
conditioned probabilities that can appropriately model the
availability and influence of sequentially revealed side
information. Using this principle, we derive Maximum Causal
Entropy Influence Diagrams, a new probabilistic graphical
framework for modeling decision making in settings with latent
information, sequential interaction, and feedback. We describe
the theoretical advantages of this model and demonstrate its
applicability for statistically framing inverse optimal control
and decision prediction tasks.
Bibtex
@inproceedings{ziebart2010modeling,
author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
title = {Modeling Interaction via the Principle of Maximum Causal Entropy},
year = {2010},
booktitle = {Proc. of the International Conference on Machine Learning},
pages = {1255--1262}
}
Planning-based Prediction for Pedestrians
Brian D. Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz,
Kevin Peterson, J. Andrew Bagnell,
Martial Hebert, A K. Dey, Siddhartha Srinivasa
International Conference on Intelligent Robots and Systems
(IROS 2009).
[]
[
pdf]
[]
Abstract
We present a novel approach for determining
robot movements that efficiently accomplish the robot's tasks
while not hindering the movements of people within the
environment. Our approach models the goal-directed trajectories
of pedestrians using maximum entropy inverse optimal control.
The advantage of this modeling approach is the generality of
its learned cost function to changes in the environment and
to entirely different environments. We employ the predictions
of this model of pedestrian trajectories in a novel incremental
planner and quantitatively show the improvement in hindrance-
sensitive robot trajectory planning provided by our approach.
Bibtex
@inproceedings{bziebart2009planning,
author = {Brian D. Ziebart and Nathan Ratliff and Garratt Gallagher and
Christoph Mertz and Kevin Peterson and J. Andrew Bagnell and
Martial Hebert and Anind K. Dey and Siddhartha Srinivasa},
title = {Planning-based Prediction for Pedestrians},
year = {2009},
booktitle = {Proc. of the International Conference on Intelligent Robotsi
and Systems}
}
Inverse Optimal Heuristic Control for Imitation Learning
Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew Bagnell,
Martial Hebert, Anind K. Dey, Siddhartha Srinivasa
Artificial Intelligence and Statistics (AISTATS 2009).
[]
[
pdf]
[]
Abstract
One common approach to imitation learning is
behavioral cloning (BC), which employs straight-
forward supervised learning (i.e., classification)
to directly map observations to controls. A second
approach is inverse optimal control (IOC),
which formalizes the problem of learning sequential
decision-making behavior over long horizons
as a problem of recovering a utility function
that explains observed behavior. This paper
presents inverse optimal heuristic control
(IOHC), a novel approach to imitation learning
that capitalizes on the strengths of both
paradigms. It employs long-horizon IOC-style
modeling in a low-dimensional space where inference
remains tractable, while incorporating an
additional descriptive set of BC-style features to
guide a higher-dimensional overall action selection.
We provide experimental results demonstrating
the capabilities of our model on a simple
illustrative problem as well as on two real
world problems: turn-prediction for taxi drivers,
and pedestrian prediction within an office environment.
Bibtex
@inproceedings{ratliff2009inverse,
author = {Nathan Ratliff and Brian Ziebart and Kevin Peterson and
J. Andrew Bagnell and Martial Hebert and Anind K. Dey and
Siddhartha Srinivasa},
title = {Inverse Optimal Heuristic Control for Imitation Learning},
year = {2009},
booktitle = {Proc. AISTATS},
pages = {424--431}
}
Navigate Like a Cabbie: Probabilistic Reasoning from Observed
Context-Aware Behavior
Brian D. Ziebart, Andrew Maas, Anind K. Dey, and J. Andrew Bagnell.
International Conference on Ubiquitous Computing (Ubicomp 2008).
[]
[
pdf]
[]
Abstract
We present PROCAB, an efficient method for Probabilistically
Reasoning from Observed Context-Aware Behavior. It
models the context-dependent utilities and underlying reasons
that people take different actions. The model generalizes to
unseen situations and scales to incorporate rich
contextual information. We train our model using the route
preferences of 25 taxi drivers demonstrated in over 100,000
miles of collected data, and demonstrate the performance of
our model by inferring: (1) decision at next intersection, (2)
route to known destination, and (3) destination given partially
traveled route.
Bibtex
@inproceedings{bziebart2008navigate,
author = {Brian D. Ziebart and Andrew Maas
and J. Andrew Bagnell and Anind K. Dey},
title = {Navigate Like a Cabbie: Probabilistic Reasoning from
Observed Context-Aware Behavior},
year = {2008},
booktitle = {Proc. Ubicomp},
pages = {322--331}
}
Fast Planning for Dynamic Preferences
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell.
International Conference on Automated Planning and Scheduling
(ICAPS 2008).
[]
[
pdf]
[]
Abstract
We present an algorithm that quickly finds optimal plans for
unforeseen agent preferences within graph-based planning
domains where actions have deterministic outcomes and action
costs are linearly parameterized by preference parameters.
We focus on vehicle route planning for drivers with personal
trade-offs for different types of roads, and specifically
on settings where these preferences are not known until planning
time. We employ novel bounds (based on the triangle
inequality and on the the concavity of the optimal plan cost
in the space of preferences) to enable the reuse of previously
computed optimal plans that are similar to the new plan
preferences. The resulting lower bounds are employed to guide
the search for the optimal plan up to 60 times more efficiently
than previous methods.
Bibtex
@inproceedings{ziebart2008fast,
author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
title = {Fast Planning for Dynamic Preferences},
year = {2008},
booktitle = {Proc. of International Conference on Auomated Planning
and Scheduling},
pages = {412--419}
}
Maximum Entropy Inverse Reinforcement Learning
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey.
AAAI Conference on Artificial Intelligence (AAAI 2008).
[]
[
pdf]
[]
(An earlier version appeared in
Workshop on Robotic Challenges for Machine Learning at
NIPS 2007.)
Abstract
Recent research has shown the benefit of framing problems
of imitation learning as solutions to Markov Decision Problems.
This approach reduces learning to the problem of recovering a
utility function that makes the behavior induced
by a near-optimal policy closely mimic demonstrated behavior.
In this work, we develop a probabilistic approach based
on the principle of maximum entropy. Our approach provides
a well-defined, globally normalized distribution over decision
sequences, while providing the same performance guarantees
as existing methods.
We develop our technique in the context of modeling real-world
navigation and driving behaviors where collected data
is inherently noisy and imperfect. Our probabilistic approach
enables modeling of route preferences as well as a powerful
new approach to inferring destinations and routes based on
partial trajectories.
Bibtex
@inproceedings{ziebart2008maximum,
author = {Brian D. Ziebart and Andrew Maas
and J. Andrew Bagnell and Anind K. Dey},
title = {Maximum Entropy Inverse Reinforcement Learning},
year = {2008},
booktitle = {Proc. AAAI},
pages = {1433--1438}
}
Learning Selectively Conditioned Forest Structures with
Applications to DBNs and Classification
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell.
Uncertainty in Artificial Intelligence (UAI 2007).
[]
[
pdf]
[]
Abstract
Dealing with uncertainty in Bayesian Network
structures using maximum a posteriori
(MAP) estimation or Bayesian Model Averaging (BMA)
is often intractable due to
the superexponential number of possible directed,
acyclic graphs. When the prior is
decomposable, two classes of graphs where
efficient learning can take place are tree-structures,
and fixed-orderings with limited
in-degree. We show how MAP estimates
and BMA for selectively conditioned forests
(SCF), a combination of these two classes,
can be computed efficiently for ordered sets of
variables. We apply SCFs to temporal data
to learn Dynamic Bayesian Networks having
an intra-timestep forest and inter-timestep
limited in-degree structure, improving model
accuracy over DBNs without the combination
of structures. We also apply SCFs to Bayes
Net classification to learn selective forest-augmented
Naive Bayes classifiers. We argue that the
built-in feature selection of selective augmented Bayes
classifiers makes them
preferable to similar non-selective classifiers
based on empirical evidence.
Bibtex
@inproceedings{bziebart2007learning,
author = {Brian D. Ziebart and Anind K. Dey and J. Andrew Bagnell},
title = {Learning Selectively Conditioned Forest Structures with
Applications to DBNs and Classification},
year = {2007},
booktitle = {Proc. UAI},
pages = {458--465}
}
Learning
Automation Policies for Pervasive Computing Environments
Brian D. Ziebart, Dan Roth, Roy H. Campbell, and Anind K. Dey.
IEEE International Conference on Autonomic Computing
(ICAC 2005).
[]
[
pdf]
[]
Abstract
If current trends in cellular phone technology, personal
digital assistants, and wireless networking are indicative
of the future, we can expect our environments to contain
an abundance of networked computational devices and resources.
We envision these devices acting in an orchestrated
manner to meet users' needs, pushing the level of interaction
away from particular devices and towards interactions
with the environment as a whole. Computation will be based
not only on input explicitly provided by the user, but also
on contextual information passively collected by networked
sensing devices. Configuring the desired responses to different
situations will need to be easy for users. However,
we anticipate that the triggering situations for many desired
automation policies will be complex, unforeseen functions
of low-level contextual information. This is problematic
since users, though easily able to perceive triggering
situations, will not be able to define them as functions of the
devices' available contextual information, even when such
a function (or a close approximation) does exist.
In this paper, we present an alternative approach for
specifying the automation rules of a pervasive computing
environment using machine learning techniques. Using this
approach, users generate training data for an automation
policy through demonstration, and, after training is completed,
a learned function is employed for future automation.
This approach enables users to automate the environment
based on changes in the environment that are complex,
unforeseen combinations of contextual information. We developed
our learning service within Gaia, our pervasive
computing system, and deployed it within our prototype pervasive
computing environment. Using the system, we were
able to have users demonstrate how sound and lighting controls
should adjust to different applications used within the
environment, the users present, and the locations of those
users and then automate those demonstrated preferences.
Bibtex
@inproceedings{bziebart2005learning,
author = {Brian D. Ziebart and Dan Roth and Roy H. Campbell and
Anind K. Dey},
title = {Learning Automation Policies for Pervasive Computing Environments},
year = {2005},
booktitle = {Proc. of the International Conference on Autonomic Computing}
}
Towards a Pervasive Computing Benchmark
Anand Ranganathan, Jalal Al-Muhtadi, Jacob Biehl, Brian Ziebart,
Roy H. Campbell, and Brian Bailey.
PerWare '05 Workshop on Support for Pervasive Computing at
PerCom 2005.
[
pdf]