Proto-social responses
Kismet's behavior system is configured to emulate those key action
patterns observed in an infant's initial repertoire that allow him/her
to interact socially with the caregiver. Because the infant's initial
responses are often described in ethological terms, the architecture
of the behavior system adopts several key concepts from ethology
regarding the organization of behavior (esp. Lorenz and Tinbergen). To
implement this organization, we have categorized a variety of infant
proto-social responses into four categories. With respect to Kismet,
the affective responses are important because they allow the caregiver
to attribute feelings to the robot, which encourages the human to
modify the interaction to bring Kismet into a positive emotional
state. The exploratory responses are important because they allow the
caregiver to attribute curiosity, interest, and desires to the
robot. The human can use these responses to direct the interaction
towards things and events in the world. The protective responses are
important to keep the robot from damaging stimuli, but also to elicit
concern and caring responses from the caregiver. The regulatory
responses are important for pacing the interaction at a level that is
suitable for both human and robot.
In addition, Kismet needs skills that allow it to engage the caregiver
in tightly coupled dynamic interactions. Turn-taking is one such skill
that is critical to this process. It enables the robot to respond to
the human's attempts at communication in a tightly temporally
correlated and contingent manner. If the communication modality is
facial expression, then the interaction may take the form of an
imitative game. If the modality is vocal, then proto-dialogs can be
established. This dynamic is a cornerstone of the social learning
process that transpires between infant and adult.
The Behavior System
In the current implementation of the behavior system there are three
primary branches, each specialized for addressing a different need (as
defined by the homeostatic regulation processes). Each branch is
comprised of multiple levels, with three layers being the
deepest. Each level of the hierarchy serves a different function, and
addresses a different set of issues. As one moves down in depth, the
behaviors serve to more finely tune the relation between the robot and
its environment, and in particular, the relation between the robot and
the human.
Level Zero: Functional Level
Level 0 is is the functional level that establishes which need
Kismet's behavior will be directed towards satiating. Specifically,
whether the robot should engage people and satiate the
social-drive, or engage toys and satiate the
stimulation-drive, or rest and satiate the
fatigue-drive. To make this decision, each behavior receives
input from its affiliated drive. The larger the magnitude of the
drive, the more urgently that need must be addressed, and the greater
the contribution the drive makes to the activation of the
behavior. Environmental factors (the presence or absense of the
satiatory stimulus) also contribute to which behavior becomes
active.
The behaviors at this level are responsible for establishing a good
intensity of interaction with the environment. As shown in the figure,
satiate-social and
satiate-stimulation pass activation to their behavior group
below. At this level, the behavior group consists of three
types of behaviors: searching behaviors set the current
task to explore the environment and bring the robot into contact with
the desired stimulus; avoidance behaviors set the task to
move the robot away from stimuli that are too intense, undesirable,
or threatening; and engagement behaviors set the task of
interacting with desirable, good intensity stimuli.
Each search behavior establishes the goal of finding the desired
stimuli. Thus, the goal of the seek-people behavior is to seek
out skin-toned stimuli, and the goal of the seek-toys behavior
is to seek out colorful stimuli. When active, each adjusts the gains of
the attention system to facilitate these goals. Each search behavior
receives contributions from releasers (signaling the current absence
of the desired stimulus), or low arousal affective states (such as
boredom, and sorrow) that signal a prolonged absence of the
sought after stimulus.
Each avoidance behavior, avoid-stimulus for both the social and
stimulation hierarchies, establishes the goal of putting distance
between the robot and the offending stimulus or event. The presence of
an offensive stimulus or event contributes to the activation of an
avoidance behavior through its releaser. At this level, an offending
stimulus is either undesirable (not of the correct type), threatening
(very close and moving fast), or annoying (too close or moving too
fast to be visually tracked effectively). The behavioral response
recruited to cope with the situation depends upon the nature of the
offense. The coping strategy is defined within the behavior group one
more level down.
The goal of the engagement behaviors, engage-people or
engage-toys, is to orient and maintain the robot's attention
on the desired stimulus. These are the consummatory behaviors of the
level One group. With the desired stimulus found, and any offensive
conditions removed, the robot can engage in play behaviors with the
desired stimulus.
Level Two: The Protective Behaviors
There are three
types of protective behaviors that co-exist within the protective
behavior group. Each represents a different coping
strategy that is responsible for handling a particular kind of
offense. Each coping strategy receives contributions from its
affiliated releaser as well as from its affiliated emotion process.
When active, the goal set by the escape behavior is to flee from
the offending stimulus. This behavior sends a request to the motor
system to perform the fleeing response where the robot closes its
eyes, grimaces, and turns its head away from a threatening
stimulus. It doesn't matter whether this stimulus is skin-toned or
colorful -- if anything is very close and moving fast, then it is
interpreted as a threat by the low-level visual perception system.
The withdraw behavior is active when the robot finds itself in
an unpleasant, but not threatening situation. Often this corresponds
to a situation where the robot's visual processing abilities are over
challenged. For instance, if a person is too close to the robot, the
eye-detector has difficulty locating the person's eyes. Alternatively,
if a person is waving a toy too fast to be tracked effectively, the
excessive amount of motion is classified as ``annoying'' by the low
level visual processes. The primary function of this response is to
send a social cue to the human that they are offending the robot and
thereby encourage the person to modify their behavior.
The reject behavior is active when the robot is being offered an
undesirable stimulus. The affiliated emotion process is
disgust. It is similar to the situation where an infant will not
accept the food it is offered. It has nothing to do with the offered
stimulus being noxious, it is simply not what the robot is after.
Level Two: The Play Behaviors
Kismet exhibits different play patterns when engaging toys versus
people. Kismet will readily track and occasionally vocalize while its
attention is drawn to a colorful toy, but it will not evoke its
repertoire of envelope displays that characterize
vocal
play. These proto-dialog behaviors are reserved for interactions
with people. These social cues are not exhibited when playing with
toys. The difference in the manner Kismet interacts with people versus
toys provides observable evidence that these two categories of stimuli
are distinguished by Kismet.
In this section we focus our discussion on those four behaviors within
the social-play behavior group. This behavior group
encapsulates Kismet's engagement strategies for establishing
proto-dialogs during face-to-face exchanges. They finely tune the
relation between the robot and the human to support interactive games
at a level where both partners perform well.
The first engagement task is the call-to-person behavior. This
behavior is relevant when a person is in view of the robot but too far
for face-to-face exchange. The goal of the behavior is to lure the
person into face-to-face interaction range (ideally, about three feet
from the robot). To accomplish this, Kismet sends a social cue, the
calling display, directed to the person within calling range.
The display is designed to attract a person's attention.
The goal of the behavior is to socially acknowledge the human and to
initiate a close interaction. When active, it makes a request of the
motor system to perform the greeting display. This behavior is
relevant when the person has just entered into face-to-face
interaction range. It is also relevant if the social-play
behavior group has just become active and a person is already within
face-to-face range. The display involves making eye contact with the
person and smiling at them while waving the ears gently. It often
immediately follows the success of the call-to-person
behavior. It is a transient response, only issued once, as its
completion signals the success of this behavior.
The third task is attentive-regard. This behavior is active when
the person has already established a good face-to-face interaction
distance with the robot but remains silent. The goal of the behavior
is to visually attend to the person and to appear open to
interaction. To accomplish this, it sends a request to the motor
system to hold gaze on the person, ideally looking into the person's
eyes if the eye detector can locate them. The robot watches the person
intently and vocalizes occasionally. If the person does speak, this
behavior loses the competition to the vocal-play behavior.
The forth task is vocal-play. The goal of this behavior is to
carry out a proto-dialog with the person. It is relevant when the
person is within face-to-face interaction distance and has spoken. To
perform this task successfully, the vocal-play behavior must
closely regulate turn-taking with the human. This involves a close
interaction with the perceptual system to perceive the relevant
turn-taking cues from the person (i.e., that a person is present and
whether or not there is speech occurring), and with the motor system
to send the relevant turn-taking cues back to the person. There are
four turn-taking phases this behavior must recognize and respond to:
1) Relinquish speaking turn; 2) Attend to human's speech; 3) Reacquire
speaking turn; and 4) Deliver speech. Each state is recognized using
distinct perceptual cues, and each phase involves making specific
display requests of the motor system.
Other topics