DARPA
Communicator
Testbed
Log Standard Proposal (v11)
Introduction
This document is intended to establish standards
for logfile contents and format. We will try to determine what is the smallest
set of data necessary in order to re-run a system, yet also includes meaningful
metrics. This may vary depending on how much of the system is to be re-run
as well as what we would like to measure. In the process we will attempt
to establish a standard format which all logfiles can be converted to (or
generated in, although we foresee that at least a minimal amount of inferencing
might be required to render the logs in this form). A goal of this document
is to provide a standard that is flexible and general enough such that
it could be used in different domains.
In order to accomplish this goal, we will
propose an XML DTD which records the basic events in a Communicator-compliant
system which can be annotated with type information indicating that a data
element is "significant" from the point of view of annotators (and annotation
tools).
To clarify we will consider the following
(term definitions are by no means final and are open to suggestion):
-
Session - The interaction of a user
with the system. In our current demonstration the equivalent of a phone
call. A session is composed of a set of turns.
-
Turn - The set of operations performed
by the system in the course of processing and presenting a single dialogue
participant's utterance.
-
Operation - Every command executed
by the system within a turn. Every operation can send and receive data.
-
Message - Messages are items sent from
a server to the hub, and their replies. In contrast to operations, messages
are initiated by servers.
-
Events - Examples of events are internal
hub errors, locks, alarm expirations, alarm enabling/disabling, and alarm
resets.
-
Data - A set of key/value pairs.
The definition of "turn" requires special
attention. In some accounts, a turn is an exchange between user and system.
In a robust dialogue context, this definition fails to be adequate when
the user or system barges in with follow-up information, etc., or when
the dialogue involves more than two parties (a situation which we shouldn't
rule out). We propose that the term "turn" in the context of these log
files be reserved for the processing of a single participant's utterance
(either user or system). This definition is not without its problems. For
instance, it's not clear whether a call to the backend belongs at the end
of the processing of a user's utterance (because it's the presentation
of the utterance to the backend) or the beginning of the processing of
the system's utterance (because it's the source of the system's response).
We can currently think of nothing that this decision hinges on in the data
analysis, and recommend that either interpretation be recognized at the
moment.
Content
Here we will try to discuss the granularity
of data to be logged in an end-to-end system. The contents of these bullets
were derived mainly from the information needed by MITRE to do its own
internal evaluation and will probably change as the perspectives of other
sites are incorporated. Every log should contain enough information to
determine the following (here input refers to the user sending information
to the system and output refers to the system sending information to the
user). Ideally, all this information should be extractable from the log
file without any site-specific analysis. In this table, we describe the
data to be logged, whether it's optional or obligatory, and how we propose
to standardize access to the data:
Data
Obligatory
Standard access
Duration of session
yes
readable directly off the XML representation
proposed below
Duration of turn (input or output)
yes
readable directly off the XML representation
proposed below
Duration of generation of output (in a
phone demo, the time the synthesizer takes to generate the audio file)
yes
see
1
Duration of display of output (in a phone
demo, how long it takes to play the audio file)
yes
see
2
Duration of recognition of input (in a
phone demo, how long it takes the recognizer to produce its hypotheses)
yes
see
3
Duration of arbitrary operations
no
readable directly off the XML representation
proposed below
Number of turns within a session
yes
readable directly off the XML representation
proposed below
Number of sessions (in our current model
each session is its own logfile)
yes
readable directly off the XML representation
proposed below
The audio files corresponding to the user
input and system output and their formats. The audio files should be stored
and distributed with the logs, and the pathnames of these files should
be relative to the log.
yes
accessed given an arbitrary search of
the logged data (see the "audio_input" and "audio_output" values for the
type attribute of the
GC_DATA tag, as well as the "mime_type"
attribute)
The text of the user input chosen by the
system
yes
accessed given an arbitrary search of
the logged data (see the "text_input" values for the type attribute of
the
GC_DATA tag)
The text of the system output
yes
accessed given an arbitrary search of
the logged data (see the "text_output" value for the type attribute of
the
GC_DATA tag)
All possible input sentences (from the
recognizer) up to a certain limit (TBD) (N/A to systems that use a word
lattice)
no
accessed given an arbitrary search of
the logged data (see the "text_input_hypothesis" value for the type attribute
of the
GC_DATA tag)
Indication of whether the parse succeeded
no
see
4
The full input interpretation
no
accessed given an arbitrary search of
the logged data
The elements which may pose minor complications
have been left blank. Here we make tentative proposals for each of these:
-
Duration of output generation.
In a system where there is a single, obvious call to the synthesizer, this
is simply the duration of that operation, but this is only one possible
configuration. We propose that the "type" attribute be added to the GC_OPERATION
element and that a "virtual" operation be generated by a postprocess phase
with a distinguished type (say, "synthesis_duration"); alternatively, we
could introduce a new XML element (say, GC_EVENT) reserved for these "virtual"
events.
-
Duration of output presentation.
In the MIT system, this is an inference from notifications posted by the
audio server (playing_has_begun, playing_has_ended; see the Communicator
documentation for the MIT audio server). This could be handled similarly
to output generation, or we could add optional start and end time attributes
to the GC_DATA element which contains the audio file.
-
Duration of recognition.
Again, we propose to handle this similarly to output generation.
-
Indication of whether the
parse succeeded. Again, this is frequently an inference. We can insert
a distinguished GC_DATA element (say, with a type of "input_parse_successful").
We believe that this sort of proposal will
allow sites to gather data in the form they prefer, and augment it with
sharable semantics in such a way that individual sites' data will retain
its site-specific integrity.
Format
We believe that XML would be a good candidate
language for this format for many reasons, among them that there is a growing
supply of viewers, editors, as well as a variety of parsers available in
many programming languages.
We propose that operations should be logged
as single XML elements. For example:
<GC_OPERATION name="paraphrase_reply" server="nl" location="localhost:11000"
turnid="-1" stime="941473394.66"
etime="941473394.69" tidx="3">
<GC_DATA key=":reply_string" dtype="string">
Hi! Welcome to Mitre's Travel demonstration.
This call is being recorded for
system development. You may hang
up or ask for help at any time. How can I
help you?
</GC_DATA>
</GC_OPERATION>
Since in our distributed architecture
messages are sent asynchronously, and many events may occur before the
completion of an operation, some caching (or post processing) will be necessary
to log operations as single elements.
Next we will try to define the main entities
in the logfile and their formats. A DTD is also available which defines
these terms and their relations. We will assume all time types will use
a standard base time known as "the epoch", the number of milliseconds since
January 1, 1970, 00:00:00 GMT.
GC_SESSION
A session represents an interaction
of a user with the system. In our current demo the equivalent to a phone
call. The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
id
We should attempt to determine a unique
identifier for sessions. MIT's solution for this is of the following format
(IP:process id:session counter). Process id's might not be trivial to achieve
in different programing languages and OS' however there usually are "equivalent"
data available
string
yes
stime
time when session started
milliseconds
yes
etime
time when session finished
milliseconds
yes
Example:
<GC_SESSION
id="129.10.2.200:1010:3"
stime="930254422.720000"
etime="930254434.790000">
...
</GC_SESSION>
GC_TURN
Consists of each interaction of the
user with the system, as discussed in the
introduction.
The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
id
A unique identifier within each session
number
yes
stime
time when turn started
milliseconds
yes
etime
time when turn ended
milliseconds
yes
Example:
<GC_TURN
id="-01"
stime="930254422.720000"
etime="930254424.790000">
...
</GC_TURN>
GC_OPERATION
Every command executed by the system within
a turn. All operations can send and receive data, frames or audio files.
The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
type
the type of operation being executed (specific
values TBD)
string
no
turnid
the turn id that this operation was executed
under
number
yes
stime
time when operation started
milliseconds
yes
etime
time when operation ended
milliseconds
yes
server
the name (according to the program file)
of the server that executed the operation
string
yes
location
the server (real server name or IP address)
and its port (server_name:port_number)
string
yes
name
the name of the operation
string
yes
tidx
the token index associated with the operation
number
no
reply_type
valid values of reply_type include normal,
detroy, and error
string
no
reply_status
valid values are normal, error, destroy and asynchronous
string
no
type_start_task
valid values of type_start_task are task
and total, and indicate whether the measurement is of on-task time or total
call time
string
no
type_end_task
indicates the end of the task
string
no
type_new_turn
valid values of type_new_turn are user
and system
string
no
type_start_utt
valid values of type_start_utt are user
and system
string
no
type_end_utt
valid valudes of type_end_utt are user
and system
string
no
type_prompt
indicates the system is prompting for
a key. the value of type_prompt is the key being prompted
string
no
Example:
<GC_OPERATION type_new_turn="system" name="paraphrase_reply"
server="nl" location="localhost:11000"
turnid="-1" stime="941473394.66"
etime="941473394.69" tidx="3">
<GC_DATA type_utt_text="system" key=":reply_string"
dtype="string">
Hi! Welcome to Mitre's Travel demonstration.
This call is being recorded for
system development. You may hang
up or ask for help at any time. How can I
help you?
</GC_DATA>
</GC_OPERATION>
GC_MESSAGE
Messages are items sent from a server to the
hub, and their replies. In contrast to operations, messages are initiated
by servers. The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
type
the type of message being issued (specific
values TBD)
string
no
turnid
the turn id that this operation was executed
under
number
yes
time
time when message issued
milliseconds
yes
server
the name of the server that issued the
message
string
yes
location
the server (real server name or IP address)
and its port (server_name:port_number)
string
yes
name
the name of the message
string
yes
direction
server_to_hub or hub_to_server
string
yes
tidx
the token index associated with the message
number
no
reply_type
valid values of reply_type include normal,
detroy, and error
string
no
reply_status
valid values are normal, error, destroy and asynchronous
string
no
type_start_task
valid values of type_start_task are task
and total, and indicate whether the measurement is of on-task time or total
call time
string
no
type_end_task
indicates the end of the task
string
no
type_new_turn
valid values of type_new_turn are user
and system
string
no
type_start_utt
valid values of type_start_utt are user
and system
string
no
type_end_utt
valid valudes of type_end_utt are user
and system
string
no
type_prompt
indicates the system is prompting for
a key. the value of type_prompt is the key being prompted
string
no
Example:
<GC_MESSAGE name="filelog" direction="server_to_hub"
server="audio"
location="localhost:15000" turnid="-1"
time="941473396.48" tidx="6">
<GC_DATA key=":synth_log_filename" dtype="string">
/home/communicator/test/Travel-demo/../../logs/travel_cfone/19991101/001/
travel_cfone-19991101-001-synth--01-001.wav
</GC_DATA>
</GC_MESSAGE>
GC_EVENT
Examples of events are internal hub errors,
locks, alarm expirations, alarm enabling/disabling, and alarm resets. The
elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
etype
the name of hub event (SYSTEM_ERROR, LOCK,
etc.)
string
yes
turnid
the turn id under which this event occurred
number
yes
time
time when message issued
milliseconds
yes
name
the type of the event (operation)
string
yes
server
the name of the server that issued the
message
string
no
location
the server (real server name or IP address)
and its port (server_name:port_number)
string
no
tidx
the token index associated with the event
number
no
type_start_task
valid values of type_start_task are task
and total, and indicate whether the measurement is of on-task time or total
call time
string
no
type_end_task
indicates the end of the task
string
no
type_new_turn
valid values of type_new_turn are user
and system
string
no
type_start_utt
valid values of type_start_utt are user
and system
string
no
type_end_utt
valid valudes of type_end_utt are user
and system
string
no
type_prompt
indicates the system is prompting for
a key. the value of type_prompt is the key being prompted
string
no
Example:
<GC_EVENT etype="LOCK" server="audio" location="localhost:15000"
turnid="-1"
time="941473396.19" name=":hub_get_session_lock"
tidx="5"/>
GC_ANNOT
GC_ANNOT are tag containing human annotations,
and as such are not present in the raw (unannotated) log files. GC_ANNOT
is included in this specification to support folding human annotation files
in with their associated log files. The elements in this table refer to
the
XML DTD.
Name
Description
Type
Required
turnid
the turn id under which this event occurred
number
yes
tidx
the token index associated with the event
number
no
type_task_completion
human annotation indicating whether the
task was successfully completed or not
string
no
Examples:
<GC_ANNOT type_task_completion="1"/>
<GC_ANNOT turnid="2" tidx="129">
<GC_DATA type_utt_text="transcription" dtype="string">
i'd like a flight from boston to
san francisco
</GC_DATA>
</GC_ANNOT>
GC_DATA
A key/value pair. This datatype can be used
to display the information involved in an operation, as well as to display
the contents of a GC_FRAME or GC_LIST. The elements in this table refer
to the
XML DTD.
Name
Description
Type
Required
key
the name of this data point
string
yes
turnid
the turn id that this operation was executed
under
number
no
time
time stamp for this data point
milliseconds
no
type
valid values of type include audio_input,
audio_output, text_input, text_output, text_input_hypothesis, and concept.
See the
Content section.
string
no
mime_type
the mime type of the data
string
no
direction
valid values are in and out
string
no
dtype
the data type - valid values include integer,
string, etc. (full list of values TBD)
string
no
type_utt_text
valid values of type_utt_text are transciption,
system and asr
string
no
type_error_msg
valid value is true
string
no
type_help_msg
valid value is true
string
no
Example:
<GC_DATA key=":reply_string" dtype="string">
Hi! Welcome to Mitre's Travel demonstration. This
call is being recorded for
system development. You may hang up or ask for help
at any time. How can I
help you?
</GC_DATA>
GC_FRAME
This stucture would allow for recording of
frames. The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
frame_type
Galaxy frame type
string
no
name
the name of the frame
string
no
turnid
the turn id in which this frame appears
number
no
Example:
<GC_DATA key=":rec_scores">
<GC_FRAME name="scores" type="clause">
<GC_DATA key=":acoustic_score"
dtype="string">
"-617.9270"
</GC_DATA>
<GC_DATA key=":ngram_score" dtype="string">
"-17.4465"
</GC_DATA>
<GC_DATA key=":nwords" dtype="integer">
8
</GC_DATA>
<GC_DATA key=":total_score" dtype="string">
"-651.3735"
</GC_DATA>
<GC_DATA key=":nphones" dtype="integer">
36
</GC_DATA>
</GC_FRAME>
</GC_DATA>
GC_LIST
This stucture would allow for recording of
lists. The elements in this table refer to the
XML
DTD.
Name
Description
Type
Required
name
the name of the list
string
no
turnid
the turn id in which this list appears
number
no
Example:
<GC_DATA name=":nbest_list">
<GC_LIST name=":nbest_list">
<GC_DATA key=":nbest_list[0]"
dtype="string">
can i get this
american flight
</GC_DATA>
<GC_DATA key=":nbest_list[1]"
dtype="string">
can i get this
american difference
</GC_DATA>
<GC_DATA key=":nbest_list[2]"
dtype="string">
can i did this
american difference
</GC_DATA>
<GC_DATA key=":nbest_list[3]"
dtype="string">
can i get this
american that flight
</GC_DATA>
</GC_LIST>
</GC_DATA>
Code support
MITRE volunteers to work with sites to produce
the appropriate conversion tools from MIT logfiles to the proposed logfile
standard. If more appropriate, we will produce a new logging module for
the Hub which will simplify this process; however, we don't envision this
to be necessary.
Document
Type Definition (DTD)
Below we provide an XML DTD to define the
above types.
<?xml version="1.0"?>
<!ELEMENT GC_LOG (GC_SESSION)*>
<!ATTLIST GC_LOG logfile_version CDATA #IMPLIED>
<!ELEMENT GC_SESSION ( GC_TURN | GC_ANNOT )*>
<!ATTLIST GC_SESSION id NMTOKEN #REQUIRED>
<!-- time could be defined as CDATA if we chose to use a non
millisecond format -->
<!ATTLIST GC_SESSION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_SESSION etime NMTOKEN #REQUIRED>
<!ELEMENT GC_TURN ( GC_ANNOT | GC_OPERATION | GC_MESSAGE | GC_EVENT
)*>
<!ATTLIST GC_TURN id NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN stime NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN etime NMTOKEN #REQUIRED>
<!ELEMENT GC_ANNOT (GC_DATA)*>
<!-- GC_ANNOT can have a sequence of one or more GC_DATA tags
or it can be empty -->
<!ATTLIST GC_ANNOT type_task_completion CDATA #IMPLIED>
<!ATTLIST GC_ANNOT turnid NMTOKEN #IMPLIED>
<!ATTLIST GC_ANNOT tidx NMTOKEN #IMPLIED>
<!ELEMENT GC_OPERATION (GC_DATA)*>
<!ATTLIST GC_OPERATION type NMTOKENS #IMPLIED>
<!ATTLIST GC_OPERATION turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION server CDATA #REQUIRED>
<!ATTLIST GC_OPERATION location NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION name CDATA #REQUIRED>
<!ATTLIST GC_OPERATION tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_OPERATION reply_type CDATA #IMPLIED>
<!ATTLIST GC_OPERATION reply_status CDATA #IMPLIED>
<!ATTLIST GC_OPERATION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION etime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION type_start_task CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_end_task CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_OPERATION type_prompt CDATA #IMPLIED>
<!ELEMENT GC_MESSAGE (GC_DATA)*>
<!ATTLIST GC_MESSAGE type NMTOKENS #IMPLIED>
<!ATTLIST GC_MESSAGE turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE server CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE location NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE name CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE direction NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_MESSAGE reply_type CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE reply_status CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE time NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE type_start_task CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_end_task CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE type_prompt CDATA #IMPLIED>
<!ELEMENT GC_EVENT (GC_DATA)*>
<!ATTLIST GC_EVENT etype NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT server CDATA #IMPLIED>
<!ATTLIST GC_EVENT location NMTOKEN #IMPLIED>
<!ATTLIST GC_EVENT time NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT name CDATA #REQUIRED>
<!ATTLIST GC_EVENT tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_EVENT type_start_task CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_end_task CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_new_turn CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_start_utt CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_end_utt CDATA #IMPLIED>
<!ATTLIST GC_EVENT type_prompt CDATA #IMPLIED>
<!ELEMENT GC_DATA ANY>
<!ATTLIST GC_DATA key CDATA #REQUIRED>
<!ATTLIST GC_DATA type NMTOKENS #IMPLIED>
<!ATTLIST GC_DATA mime_type NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA direction NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA dtype NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA time NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA turnid NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA type_utt_text CDATA #IMPLIED>
<!ATTLIST GC_DATA type_error_msg CDATA #IMPLIED>
<!ATTLIST GC_DATA type_help_msg CDATA #IMPLIED>
<!ELEMENT GC_FRAME (GC_DATA)*>
<!ATTLIST GC_FRAME frame_type NMTOKEN #IMPLIED>
<!ATTLIST GC_FRAME name CDATA #IMPLIED>
<!ATTLIST GC_FRAME turnid NMTOKEN #IMPLIED>
<!ELEMENT GC_LIST (GC_DATA)*>
<!ATTLIST GC_LIST name CDATA #IMPLIED>
<!ATTLIST GC_LIST turnid NMTOKEN #IMPLIED>