A feature structure is a general purpose data structure which identifies and groups together individual
features, each of which associates a name with one or more values. Because of the generality
of feature structures, they can be used to represent many different kinds of information,
but they are of particular usefulness in the representation of linguistic analyses,
especially where such analyses are partial, or underspecified. Feature structures represent the interrelations among various pieces of information,
and their instantiation in markup provides a metalanguage for the generic representation of analyses and interpretations. Moreover, this instantiation
allows feature values to be of specific types, and for restrictions to be placed on the values for particular features, by means
of feature system declarations.84
This chapter is organized as follows. Following this introduction, section 19.2 Elementary Feature Structures and the Binary Feature Value introduces the elements fs and f, used to represent feature structures and features respectively, together with the
elementary binary feature value. Section 19.3 Other Atomic Feature Values introduces elements for representing other kinds of atomic feature values such as
symbolic, numeric, and string values. Section 19.4 Feature Libraries and Feature-Value Libraries introduces the notion of predefined libraries or groups of features or feature values along with methods for referencing their
components. Section 19.5 Feature Structures as Complex Feature Values introduces complex values, in particular feature-structures as values, thus enabling
feature structures to be recursively defined. Section 19.7 Collections as Complex Feature Values discusses other complex values, in particular values which are collections, organized
as sets, bags, and lists. Section 19.8 Feature Value Expressions discusses how the operations of alternation, negation, and collection of feature
values may be represented. Section 19.9 Default Values discusses ways of representing underspecified, default, or uncertain values. Section
19.10 Linking Text and Analysis discusses how analyses may be linked to other parts of an encoded text. Section 19.11 Feature System Declaration describes the feature system declaration, a construct which provides for the validation of typed feature structures. Formal
definitions for all the elements introduced in this chapter are provided in section
19.12 Formal Definition and Implementation.
The fundamental elements used to represent a feature structure analysis are f (for feature), which represents a feature-value pair, and fs (for feature structure), which represents a structure made up of such feature-value pairs. The fs element has an optional type attribute which may be used to represent typed feature structures, and may contain
any number of f elements. An f element has a required name attribute and an associated value. The value may be simple: that is, a single binary, numeric, symbolic (i.e. taken
from a restricted set of legal values), or string value, or a collection of such values,
organized in various ways, for example, as a list; or it may be complex, that is,
it may itself be a feature structure, thus providing a degree of recursion. Values
may be under-specified or defaulted in various ways. These possibilities are all described
in more detail in this and the following sections.
Feature and feature-value representations (including feature structure representations)
may be embedded directly at any point in an XML document, or they may be collected
together in special-purpose feature or feature-value libraries. The components of such libraries may then be referenced from other feature or feature-value
representations, using the feats or fVal attribute as appropriate.
We begin by considering the simple case of a feature structure which contains binary-valued
features only. The following three XML elements are needed to represent such a feature
structure:
- fs (feature structure) represents a feature structure, that is, a collection of feature-value pairs organized as a structural unit.
type specifies the type of the feature structure.
feats (features) references the feature-value specifications making up this feature structure.
- f (feature) represents a feature value specification, that is, the association of a name with a value of any of several different types.
fVal (feature value) references any element which can be used to represent the value of
a feature.
- binary (binary value) represents the value part of a feature-value specification which can
contain either of exactly two possible values.
The attributes feats and the fVal are not discussed in this section: they provide an alternative way of indicating
the content of an element, as further discussed in section 19.4 Feature Libraries and Feature-Value Libraries.
An fs element containing f elements with binary values can be straightforwardly used to encode the matrices of feature-value specifications for phonetic segments, such as the following for
the English segment [s].
+--- ---+ | consonantal + | | vocalic - | | voiced - | | anterior + | | coronal + | | continuant + | | strident + | +--- ---+⚓
This representation may be encoded in XML as follows:
<fs type="phonological_segments"> <f name="consonantal"> <binary value="true"/> </f> <f name="vocalic"> <binary value="false"/> </f> <f name="voiced"> <binary value="false"/> </f> <f name="anterior"> <binary value="true"/> </f> <f name="coronal"> <binary value="true"/> </f> <f name="continuant"> <binary value="true"/> </f> <f name="strident"> <binary value="true"/> </f></fs>
Note that
fs elements may have an optional
type attribute to indicate the kind of feature structure in question, whereas
f elements must have a
name attribute to indicate the name of the feature. Feature structures need not be typed,
but features must be named. Similarly, the
fs element may be empty, but the
f element must specify its value either directly as content, by means of the
fVal attribute, or implicitly by reference to a feature system declaration.
The restriction of specific features to specific types of values (e.g. the restriction
of the feature strident to a binary value) requires additional validation, as does any restriction on the
features available within a feature structure of a particular type (e.g. whether a
feature structure of type phonological segment necessarily contains a feature voiced). Such validation may be carried out at the document level, using special purpose
processing, at the schema level using additional validation rules, or at the declarative
level, using an additional mechanism such as the feature-system declaration discussed in 19.11 Feature System Declaration.
Although we have used the term binary for this kind of value, and its representation in XML uses values such as true and false (or, equivalently, 1 and 0), it should be noted that such values are not restricted to propositional assertions.
As this example shows, this kind of value is intended for use with any binary-valued
feature.
Features may take other kinds of atomic value. In this section, we define elements
which may be used to represent: symbolic values, numeric values, and string values. The module defined by this chapter allows for the specification of additional datatypes
if necessary, by extending the underlying class model.featureVal.single. If this is done, it is recommended that only the basic W3C datatypes should be used;
more complex datatyping should be represented as feature structures.
The
symbol element is used for the value of a feature when that feature can have any of a small,
finite set of possible values, representable as character strings. For example, the
following might be used to represent the claim that the Latin noun form
mensas (tables) has accusative case, feminine gender, and plural number:
<fs> <f name="case"> <symbol value="accusative"/> </f> <f name="gender"> <symbol value="feminine"/> </f> <f name="number"> <symbol value="plural"/> </f></fs>
More formally, this representation shows a structure in which three features (
case,
gender, and
number) are used to define morpho-syntactic properties of a word. Each of these features
can take one of a small number of values (for example, case can be
nominative,
genitive,
dative,
accusative, etc.) and it is therefore appropriate to represent the values taken in this instance
as
symbol elements. Note that, instead of using a symbolic value for grammatical number, one
could have named the feature
singular or
plural and given it an appropriate binary value, as in the following example:
<fs> <f name="case"> <symbol value="accusative"/> </f> <f name="gender"> <symbol value="feminine"/> </f> <f name="singular"> <binary value="false"/> </f></fs>
Whether one uses a binary or symbolic value in situations like this is largely a matter
of taste.
The
string element is used for the value of a feature when that value is a string drawn from
a very large or potentially unbounded set of possible strings of characters, so that
it would be impractical or impossible to use the
symbol element. The string value is expressed as the content of the
string element, rather than as an attribute value. For example, one might encode a street
address as follows:
<fs> <f name="address"> <string>3418 East Third Street
</string> </f></fs>
The
numeric element is used when the value of a feature is a numeric value, or a range of such
values. For example, one might wish to regard the house number and the street name
as different features, using an encoding like the following:
<fs> <f name="houseNumber"> <numeric value="3418"/> </f> <f name="streetName"> <string>East Third Street
</string> </f></fs>
If the numeric value to be represented falls within a specific range (for example
an address that spans several numbers), the
max attribute may be used to supply an upper limit:
<fs> <f name="houseNumber"> <numeric value="3418" max="3440"/> </f> <f name="streetName"> <string>East Third Street
</string> </f></fs>
It is also possible to specify that the numeric value (or values) represented should
(or should not) be truncated. For example, assuming that the daily rainfall in mm
is a feature of interest for some address, one might represent this by an encoding
like the following:
<fs> <f name="dailyRainFall"> <numeric value="0.0" max="1.3"
trunc="false"/> </f></fs>
This represents any of the infinite number of numeric values falling between 0 and
1.3; by contrast
<fs> <f name="dailyRainFall"> <numeric value="0.0" max="1.3"
trunc="true"/> </f></fs>
represents only two possible values: 0 and 1.
Some communities of practice, notably those with a strong computer-science bias, prefer
to dissociate the information on the value of the given feature from the specification
of the data type that this value represents. In such cases, feature values can be
provided directly as textual content of
f, with the assumption that the data type is specified by the schema. The following
is an example taken from ISO 24612, presenting the symbolic values for Active Voice
and Simple Present Tense in the untyped form:
<fs> <f name="voice">active
</f> <f name="tense">SimPre
</f></fs>
As noted above, additional processing is necessary to ensure that appropriate values
are supplied for particular features, for example to ensure that the feature singular is not given a value such as <symbol value="feminine"/>. There are two ways of attempting to ensure that only certain combinations of feature
names and values are used. First, if the total number of legal combinations is relatively
small, one can predefine all of them in a construct known as a feature library, and then reference the combination required using the feats attribute in the enclosing fs element, rather than give it explicitly. This method is suitable in the situation
described above, since it requires specifying a total of only ten (5 + 3 + 2) combinations
of features and values. Similarly, to ensure that only feature structures containing
valid combinations of feature values are used, one can put definitions for all valid
feature structures inside a feature value library (so called, since a feature structure may be the value of a feature). A total of
30 feature structures (5 ×ばつ 3 ×ばつ 2) is required to enumerate all the possible combinations
of individual case, gender and number values in the preceding illustration. We discuss
the use of such libraries and their representation in XML further in section 19.4 Feature Libraries and Feature-Value Libraries below.
However, the most general method of attempting to ensure that only legal combinations
of feature names and values are used is to provide a feature-system declaration discussed in 19.11 Feature System Declaration.
Whether at the level of feature-system declarations, feature- and feature-value libraries,
or individual features, it is possible to align both feature names and their values
with standardized external data category repositories.
85 In the following example, both the feature
part_of_speech and its value
NN (standing for
‘common noun’) are aligned with the respective definitions provided by the
CLARIN Concept Registry (CCR).
<fs> <f name="part_of_speech"
datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"> <symbol valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"
value="NN"/> </f><!-- ... --></fs>
Since the above representation takes up a lot of space and quickly becomes redundant
and error-prone, it is possible to delegate the task of aligning with external repositories
to elements such as
fLib,
fvLib,
fDecl, or
fsDecl to reduce the feature representation at hand and to increase its readability at the
same time, as shown in the example below.
<fs><!-- ... --> <f name="POS" fVal="#common_noun"/><!-- ... --></fs>
The value common_noun should best be listed (as an xml:id) either in a library of feature values (fvLib, see the following section) or in a taxonomy element.
As the examples in the preceding section suggest, the direct encoding of feature structures
can be verbose. Moreover, it is often the case that particular feature-value combinations,
or feature structures composed of them, are re-used in different analyses. To reduce
the size and complexity of the task of encoding feature structures, one may use the
feats attribute of the fs element to point to one or more of the feature-value specifications for that element.
This indirect method of encoding feature structures presumes that the f elements are assigned unique xml:id values, and are collected together in fLib elements (feature libraries). In the same way, feature values of whatever type can be collected together in fvLib elements (feature-value libraries). If a feature has as its value a feature structure or other value which is predefined
in this way, the fVal attribute may be used to point to it, as discussed in the next section. The following
elements are used for representing feature libraries and feature-value libraries:
- fLib (feature library) assembles a library of f (feature) elements.
- fvLib (feature-value library) assembles a library of reusable feature value elements (including
complete feature structures).
For example, suppose a feature library for phonological feature specifications is
set up as follows.
<fLib n="phonological features"> <f xml:id="CNS1" name="consonantal"> <binary value="true"/> </f> <f xml:id="CNS0" name="consonantal"> <binary value="false"/> </f> <f xml:id="VOC1" name="vocalic"> <binary value="true"/> </f> <f xml:id="VOC0" name="vocalic"> <binary value="false"/> </f> <f xml:id="VOI1" name="voiced"> <binary value="true"/> </f> <f xml:id="VOI0" name="voiced"> <binary value="false"/> </f> <f xml:id="ANT1" name="anterior"> <binary value="true"/> </f> <f xml:id="ANT0" name="anterior"> <binary value="false"/> </f> <f xml:id="COR1" name="coronal"> <binary value="true"/> </f> <f xml:id="COR0" name="coronal"> <binary value="false"/> </f> <f xml:id="CNT1" name="continuant"> <binary value="true"/> </f> <f xml:id="CNT0" name="continuant"> <binary value="false"/> </f> <f xml:id="STR1" name="strident"> <binary value="true"/> </f> <f xml:id="STR0" name="strident"> <binary value="false"/> </f><!-- ... --></fLib>
Then the feature structures that represent the analysis of the phonological segments
(phonemes)
/t/,
/d/,
/s/, and
/z/ may be defined as follows.
<fs feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/><fs feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/><fs feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/><fs feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/>
The preceding are but four of the 128 logically possible fully specified phonological
segments using the seven binary features listed in the feature library. Presumably
not all combinations of features correspond to phonological segments (there are no
strident vowels, for example). The legal combinations, however, can be collected together,
each one represented as an identifiable
fs element within a
feature-value library, as in the following example:
<fvLib xml:id="fsl1"
n="phonological segment definitions"><!-- ... --> <fs xml:id="T.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/> <fs xml:id="D.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/> <fs xml:id="S.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/> <fs xml:id="Z.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/><!-- ... --></fvLib>
Once defined, these feature structure values can also be reused. Other
f elements may invoke them by reference, using the
fVal attribute; for example, one might use them in a feature value pair such as:
<f name="dental-fricative" fVal="#T.DF"/>
rather than expanding the hierarchy of the component phonological features explicitly.
The feature structure that concludes section
19.3 Other Atomic Feature Values above, identifying the value of some part of speech to be a common noun, may be used
in tandem with a feature-value library, which offers a way to encode a grammatical
tagset, in this case containing labels for parts of speech:
<fvLib n="POS values"> <symbol xml:id="common_noun" value="NN"
datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/> <symbol xml:id="proper_noun" value="NP"
datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/><!-- ... --></fvLib>
Such a feature-value library combines the standard short symbolic label for a part
of speech (e.g.,
NN) with a mnemonic identifier that can be referenced by means of
fVal, and with a persistent identifier, maintained in a public reference taxonomy repository
together with the basic definition of the given concept.
Feature structures stored in the way presented in this section may also be associated
with the text which they are intended to annotate, either by a link from the text
(for example, using the TEI global ana attribute), or by means of stand-off annotation techniques (for example, using the
TEI link element): see further section 19.10 Linking Text and Analysis below.
Note that when features or feature structures are linked to in this way, the result
is effectively a copy of the item linked to into the place from which it is linked.
This form of linking should be distinguished from the phenomenon of structure-sharing, where it is desired to indicate that some part of an annotation structure appears
simultaneously in two or more places within the structure. This kind of annotation
should be represented using the vLabel element, as discussed in 19.6 Re-entrant Feature Structures below.
Features may have complex values as well as atomic ones; the simplest such complex
value is represented by supplying an fs element as the content of an f element, or (equivalently) by supplying the identifier of an fs element as the value for the fVal attribute on the f element. Structures may be nested as deeply as appropriate, using this mechanism.
For example, an fs element may contain or point to an f element, which may contain or point to an fs element, which may contain or point to an f element, and so on.
To illustrate the use of complex values, consider the following simple model of a
word, as a structure combining surface form information, a syntactic category, and
semantic information. Each word analysis is represented as a
<fs type='word'> element, containing three features named
surface,
syntax, and
semantics. The first of these has an atomic string value, but the other two have complex values,
represented as nested feature structures of types
category and
act respectively:
<fs type="word"> <f name="surface"> <string>love
</string> </f> <f name="syntax"> <fs type="category"> <f name="pos"> <symbol value="verb"/> </f> <f name="val"> <symbol value="transitive"/> </f> </fs> </f> <f name="semantics"> <fs type="act"> <f name="rel"> <symbol value="LOVE"/> </f> </fs> </f></fs>
This analysis does not tell us much about the meaning of the symbols
verb or
transitive. It might be preferable to replace these atomic feature values by feature structures.
Suppose therefore that we maintain a feature-value library for each of the major syntactic
categories (N, V, ADJ, PREP):
<fvLib n="Major category definitions"><!-- ... --> <fs xml:id="N" type="noun"><!-- noun features defined here --> </fs> <fs xml:id="V" type="verb"><!-- verb features defined here --> </fs></fvLib>
This library allows us to use shortcut codes (
N,
V, etc.) to reference a complete definition for the corresponding feature structure.
Each definition may be explicitly contained within the
fs element, as a number of
f elements. Alternatively, the relevant features may be referenced by their identifiers,
supplied as the value of the
feats attribute, as in these examples:
<!-- ... -->
<fs xml:id="ADJ" type="adjective" feats="#F1 #F2"/>
<fs xml:id="PREP" type="preposition" feats="#F1 #F3"/>
<!-- ... -->
This ability to re-use feature definitions within multiple feature structure definitions
is an essential simplification in any realistic example. In this case, we assume the
existence of a feature library containing specifications for the basic feature categories
like the following:
<fLib n="categorial features"> <f xml:id="NN-1" name="nominal"> <binary value="true"/> </f> <f xml:id="NN-0" name="nominal"> <binary value="false"/> </f> <f xml:id="VV-1" name="verbal"> <binary value="true"/> </f> <f xml:id="VV-0" name="verbal"> <binary value="false"/> </f><!-- ... --></fLib>
With such libraries in place, and assuming the availability of similarly predefined
feature structures for transitivity and semantics, the preceding example could be
considerably simplified:
<fs type="word"> <f name="surface"> <string>love
</string> </f> <f name="syntax"> <fs type="category"> <f name="pos" fVal="#V"/> <f name="val" fVal="#TRNS"/> </fs> </f> <f name="semantics"> <fs type="act"> <f name="rel" fVal="#LOVE"/> </fs> </f></fs>
Although in principle the fVal attribute could point to any kind of feature value, its use is not recommended for
simple atomic values.
Sometimes the same feature value is required at multiple places within a feature structure,
in particular where the value is only partially specified at one or more places. The
vLabel element is provided as a means of labelling each such re-entrancy point:
- vLabel (value label) represents the value part of a feature-value specification which appears
at more than one point in a feature structure.
For example, suppose one wishes to represent noun-verb agreement as a single feature
structure. Within the representation, the feature indicating (say) number appears
more than once. To represent the fact that each occurrence is another appearance of
the same feature (rather than a copy) one could use an encoding like the following:
<fs xml:id="NVA"> <f name="nominal"> <fs> <f name="nm-num"> <vLabel name="L1"> <symbol value="singular"/> </vLabel> </f><!-- other nominal features --> </fs> </f> <f name="verbal"> <fs> <f name="vb-num"> <vLabel name="L1"/> </f> </fs><!-- other verbal features --> </f></fs>
In the above encoding, the features named vb-num and nm-num exhibit structure sharing. Their values, given as vLabel elements, are understood to be references to the same point in the feature structure,
which is labelled by their name attribute.
The scope of the names used to label re-entrancy points is that of the outermost
fs element in which they appear. When a feature structure is imported from a feature
value library, or referenced from elsewhere (for example by using the
fVal attribute) the names of any sharing points it may contain are implicitly prefixed
by the identifier used for the imported feature structure, to avoid name clashes.
Thus, if some other feature structure were to reference the
fs element given in the example above, for example in this way:
<f name="class" fVal="#NVA"/>
then the labelled points in the example would be interpreted as if they had the name
NVAL1.
Complex feature values need not always be represented as feature structures. Multiple
values may also be organized as sets, bags or multisets, or lists of atomic values
of any type. The vColl element is provided to represent such cases:
- vColl (collection of values) represents the value part of a feature-value specification
which contains multiple values organized as a set, bag, or list.
A feature whose value is regarded as a set, bag, or list may have any positive number
of values as its content, or none at all, (thus allowing for representation of the
empty set, bag, or list). The items in a list are ordered, and need not be distinct.
The items in a set are not ordered, and must be distinct. The items in a bag are neither
ordered nor distinct. Sets and bags are thus distinguished from lists in that the
order in which the values are specified does not matter for the former, but does matter
for the latter, while sets are distinguished from bags and lists in that repetitions
of values do not count for the former but do count for the latter.
If no value is specified for the org attribute, the assumption is that the vColl defines a list of values. If the vColl element is empty, the assumption is that it represents the null list, set, or bag.
To illustrate the use of the
org attribute, suppose that a feature structure analysis is used to represent a genealogical
tree, with the information about each individual treated as a single feature structure,
like this:
<fs xml:id="p027" type="person"> <f name="forenames"> <vColl> <string>Daniel
</string> <string>Edouard
</string> </vColl> </f> <f name="mother" fVal="#p002"/> <f name="father" fVal="#p009"/> <f name="birthDate"> <fs type="date" feats="#y1988 #m04 #d17"/> </f> <f name="birthPlace" fVal="#austintx"/> <f name="siblings"> <vColl org="set"> <fs copyOf="#pnb005"/> <fs copyOf="#prb001"/> </vColl> </f></fs>
In this example, the vColl element is first used to supply a list of ‘name’ feature values, which together constitute
the ‘forenames’ feature. Other features are defined by reference to values which we
assume are held in some external feature value library (not shown here). For example,
the vColl element is used a second time to indicate that the persons's siblings should be regarded
as constituting a set rather than a list. Each sibling is represented by a feature
structure: in this example, each feature structure is a copy of one specified in the
feature value library.
If a specific feature contains only a single feature structure as its value, the component
features of which are organized as a set, bag, or list, it may be more convenient
to represent the value as a
vColl rather than as an
fs. For example, consider the following encoding of the English verb form
sinks, which contains an
agreement feature whose value is a feature structure which contains
person and
number features with symbolic values.
<fs type="word"> <f name="category"> <symbol value="verb"/> </f> <f name="tense"> <symbol value="present"/> </f> <f name="agreement"> <fs> <f name="person"> <symbol value="third"/> </f> <f name="number"> <symbol value="singular"/> </f> </fs> </f></fs>
If the names of the features contained within the
agreement feature structure are of no particular significance, the following simpler representation
may be used:
<fs type="word"> <f name="category"> <symbol value="verb"/> </f> <f name="tense"> <symbol value="present"/> </f> <f name="agreement"> <vColl org="set"> <symbol value="third"/> <symbol value="singular"/> </vColl> </f></fs>
The
vColl element is also useful in cases where an analysis has several components. In the following
example, the French word
auxquels has a two-part analysis, represented as a list of two values. The first specifies
that the word contains a preposition; the second that it contains a masculine plural
relative pronoun:
<fs> <f name="lex"> <symbol value="auxquels"/> </f> <f name="maf"> <vColl org="list"> <fs> <f name="cat"> <symbol value="prep"/> </f> </fs> <fs> <f name="cat"> <symbol value="pronoun"/> </f> <f name="kind"> <symbol value="rel"/> </f> <f name="num"> <symbol value="pl"/> </f> <f name="gender"> <symbol value="masc"/> </f> </fs> </vColl> </f></fs>
The set, bag, or list which has no members is known as the null (or empty) set, bag,
or list. A vColl element with no content and with no value for its feats attribute is interpreted as referring to the null set, bag, or list, depending on
the value of its org attribute.
If, for example, the individual described by the feature structure with identifier
p027 (above) had no siblings, we might specify the
siblings feature as follows.
<f name="siblings"> <vColl org="set"/></f>
A vColl element may also collect together one or more other vColl elements, if, for example one of the members of a set is itself a set, or if two
lists are concatenated together. Note that such collections pay no attention to the
contents of the nested vColl elements: if it is desired to produce the union of two sets, the vMerge element discussed below should be used to make a new collection from the two sets.
It is sometimes desirable to express the value of a feature as the result of an operation
over some other value (for example, as ‘not green’, or as ‘male or female’, or as
the concatenation of two collections). Three special purpose elements are provided
to represent disjunctive alternation, negation, and collection of values:
- vAlt (value alternation) represents the value part of a feature-value specification which
contains a set of values, only one of which can be valid.
- vNot (value negation) represents a feature value which is the negation of its content.
- vMerge (merged collection of values) represents a feature value which is the result of merging
together the feature values contained by its children, using the organization specified
by the org attribute.
The
vAlt element can be used wherever a feature value can appear. It contains two or more feature
values, any one of which is to be understood as the value required. Suppose, for example,
that we are using a feature system to describe residential property, using such features
as
number.of.bathrooms. In a particular case, we might wish to represent uncertainty as to whether a house
has two or three bathrooms. As we have already shown, one simple way to represent
this would be with a numeric maximum:
<f name="number.of.bathrooms"> <numeric value="2" max="3"/></f>
A more general way would be to represent the alternation explicitly, in this way:
<f name="number.of.bathrooms"> <vAlt> <numeric value="2"/> <numeric value="3"/> </vAlt></f>
The
vAlt element represents alternation over feature values, not feature-value pairs. If therefore
the uncertainty relates to two or more feature value specifications, each must be
represented as a feature structure, since a feature structure can always appear where
a value is required. For example, suppose that it is uncertain as to whether the house
being described has two bathrooms or two bedrooms, a structure like the following
may be used:
<f name="rooms"> <vAlt> <fs> <f name="number.of.bathrooms"> <numeric value="2"/> </f> </fs> <fs> <f name="number.of.bedrooms"> <numeric value="2"/> </f> </fs> </vAlt></f>
Note that alternation is always regarded as
exclusive: in the case above, the implication is that having two bathrooms excludes the possibility
of having two bedrooms and vice versa. If inclusive alternation is required, a
vColl element may be included in the alternation as follows:
<f name="rooms"> <vAlt> <fs> <f name="number.of.bathrooms"> <numeric value="2"/> </f> </fs> <fs> <f name="number.of.bedrooms"> <numeric value="2"/> </f> </fs> <vColl> <fs> <f name="number.of.bathrooms"> <numeric value="2"/> </f> </fs> <fs> <f name="number.of.bedrooms"> <numeric value="2"/> </f> </fs> </vColl> </vAlt></f>
This analysis indicates that the property may have two bathrooms, two bedrooms, or
both two bathrooms and two bedrooms.
As the previous example shows, the
vAlt element can also be used to indicate alternations among values of features organized
as sets, bags or lists. Suppose we use a feature
selling.points to describe items that are mentioned to enhance a property's sales value, such as
whether it has a pool or a good view. Now suppose for a particular listing, the selling
points include an alarm system and a good view, and either a pool or a jacuzzi (but
not both). This situation could be represented, using the
vAlt element, as follows.
<fs type="real_estate_listing"> <f name="selling.points"> <vColl org="set"> <string>alarm system
</string> <string>good view
</string> <vAlt> <string>pool
</string> <string>jacuzzi
</string> </vAlt> </vColl> </f></fs>
Now suppose the situation is like the preceding except that one is also uncertain
whether the property has an alarm system or a good view. This can be represented as
follows.
<fs type="real_estate_listing"> <f name="selling.points"> <vColl org="set"> <vAlt> <string>alarm system
</string> <string>good view
</string> </vAlt> <vAlt> <string>pool
</string> <string>jacuzzi
</string> </vAlt> </vColl> </f></fs>
If a large number of ambiguities or uncertainties need to be represented, involving
a relatively small number of features and values, it is recommended that a stand-off
technique, for example using the general-purpose alt element discussed in section 17.8 Alternation be used, rather than the special-purpose vAlt element.
The
vNot element can be used wherever a feature value can appear. It contains any feature value
and returns the complement of its contents. For example, the feature
number.of.bathrooms in the following example has any whole numeric value other than 2:
<f name="number.of.bathrooms"> <vNot> <numeric value="2"/> </vNot></f>
Strictly speaking, the effect of the
vNot element is to provide the complement of the feature values it contains, rather than
their negation. If a feature system declaration is available which defines the possible
values for the associated feature, then it is possible to say more about the negated
value. For example, suppose that the available values for the feature
case are declared to be nominative, genitive, dative, or accusative, whether in a TEI feature
system declaration or by some other means. Then the following two specifications are
equivalent:
(i)
<f name="case"> <vNot> <symbol value="genitive"/> </vNot></f> (ii)
<f name="case"> <vAlt> <symbol value="nominative"/> <symbol value="dative"/> <symbol value="accusative"/> </vAlt></f>
If however no such system declaration is available, all that one can say about a feature
specified via negation is that its value is something other than the negated value.
Negation is always applied to a feature value, rather than to a feature-value pair.
The negation of an atomic value is the set of all other values which are possible
for the feature.
Any kind of value can be negated, including collections (represented by a vColl elements) or feature structures (represented by fs elements). The negation of any complex value is understood to be the set of values
which cannot be unified with it. Thus, for example, the negation of the feature structure
F is understood to be the set of feature structures which are not unifiable with F.
In the absence of a constraint mechanism such as the Feature System Declaration, the
negation of a collection is anything that is not unifiable with it, including collections
of different types and atomic values. It will generally be more useful to require
that the organization of the negated value be the same as that of the original value,
for example that a negated set is understood to mean the set which is a complement
of the set, but such a requirement cannot be enforced in the absence of a constraint
mechanism.
The vMerge element can be used wherever a feature value can appear. It contains two or more
feature values, all of which are to be collected together. The organization of the
resulting collection is specified by the value of the org attribute, which need not necessarily be the same as that of its constituent values
if these are collections. For example, one can change a list to a set, or vice versa.
As an example, suppose that we wish to represent the range of possible values for
a feature ‘genders’ used to describe some language. It would be natural to represent the possible values
as a set, using the
vColl element as in the following example:
<fs> <f name="genders"> <vColl org="set"> <symbol value="masculine"/> <symbol value="feminine"/> </vColl> </f></fs>
Suppose however that we discover for some language it is necessary to add a new possible
value, and to treat the value of the feature as a list rather than as a set. The
vMerge element can be used to achieve this:
<fs> <f name="genders"> <vMerge org="list"> <vColl org="set"> <symbol value="masculine"/> <symbol value="feminine"/> </vColl> <symbol value="neuter"/> </vMerge> </f></fs>
The value of a feature may be underspecified in a number of different ways. It may
be null, unknown, or uncertain with respect to a range of known possibilities, as
well as being defined as a negation or an alternation. As previously noted, the specification
of the range of known possibilities for a given feature is not part of the current
specification: in the TEI scheme, this information is conveyed by the feature system declaration. Using this, or some other system, we might specify (for example) that the range
of values for an element includes symbols for masculine, feminine, and neuter, and
that the default value is neuter. With such definitions available to us, it becomes
possible to say that some feature takes the default value, or some unspecified value
from the list. The following special element is provided for this purpose:
- default (default feature value) represents the value part of a feature-value specification
which contains a defaulted value.
The value of an empty
f element which also lacks an
fVal attribute is understood to be the most general case, i.e. any of the available values.
Thus, assuming the feature system defined above, the following two representations
are equivalent.
<f name="gender"/><f name="gender"> <vAlt> <symbol value="feminine"/> <symbol value="masculine"/> <symbol value="neuter"/> </vAlt></f>
If, however, the value is explicitly stated to be the default one, using the
default element, then the following two representations are equivalent:
<f name="gender"> <default/></f>
<f name="gender"> <symbol value="neuter"/></f>
Similarly, if the value is stated to be the negation of the default, then the following
two representations are equivalent:
<f name="gender"> <vNot> <default/> </vNot></f>
<f name="gender"> <vAlt> <symbol value="feminine"/> <symbol value="masculine"/> </vAlt></f>
Text elements can be linked with feature structures using any of the linking methods
discussed elsewhere in these Guidelines (see for example sections
18.2 Global Attributes for Simple Analyses and
18.4 Linguistic Annotation). In the simplest case, the
ana attribute may be used to point from any element to an annotation of it, as in the
following example:
<s n="00741"> <w ana="#at0">The
</w> <w ana="#ajs">closest
</w> <w ana="#pnp">he
</w> <w ana="#vvd">came
</w> <w ana="#prp">to
</w> <w ana="#nn1">exercise
</w> <w ana="#vbd">was
</w> <w ana="#to0">to
</w> <w ana="#vvi">open
</w> <w ana="#crd">one
</w> <w ana="#nn1">eye
</w> <phr ana="#av0"> <w>every
</w> <w>so
</w> <w>often
</w> </phr> <c ana="#pun">,
</c> <w ana="#cjs">if
</w> <w ana="#pni">someone
</w> <w ana="#vvd">entered
</w> <w ana="#at0">the
</w> <w ana="#nn1">room
</w><!-- ... --></s>
The values specified for the
ana attribute reference components of a feature-structure library, which represents all
of the grammatical structures used by this encoding scheme. (For illustrative purposes,
we cite here only the structures needed for the first six words of the sample sentence):
<fvLib xml:id="C6" n="Claws 6 tags"><!-- ... --> <fs xml:id="ajs"
type="grammatical_structure" feats="#wj #ds"/> <fs xml:id="at0"
type="grammatical_structure" feats="#wl"/> <fs xml:id="pnp"
type="grammatical_structure" feats="#wr #rp"/> <fs xml:id="vvd"
type="grammatical_structure" feats="#wv #bv #fd"/> <fs xml:id="prp"
type="grammatical_structure" feats="#wp #bp"/> <fs xml:id="nnn"
type="grammatical_structure" feats="#wn #tc #ns"/><!-- ... --></fvLib>
The components of each feature structure in the library are referenced in much the
same way, using the
feats attribute to identify one or more
f elements in the following feature library (again, only a few of the available features
are quoted here):
<fLib><!-- ... --> <f xml:id="fl-bv" name="verbbase"> <symbol value="main"/> </f> <f xml:id="fl-bp" name="prepbase"> <symbol value="lexical"/> </f> <f xml:id="fl-ds" name="degree"> <symbol value="superlative"/> </f> <f xml:id="fl-fd" name="verbform"> <symbol value="ed"/> </f> <f xml:id="fl-ns" name="number"> <symbol value="singular"/> </f> <f xml:id="fl-rp" name="prontype"> <symbol value="personal"/> </f> <f xml:id="fl-tc" name="nountype"> <symbol value="common"/> </f> <f xml:id="fl-wj" name="class"> <symbol value="adjective"/> </f> <f xml:id="fl-wl" name="class"> <symbol value="article"/> </f> <f xml:id="fl-wn" name="class"> <symbol value="noun"/> </f> <f xml:id="fl-wp" name="class"> <symbol value="preposition"/> </f> <f xml:id="fl-wr" name="class"> <symbol value="pronoun"/> </f> <f xml:id="fl-wv" name="class"> <symbol value="verb"/> </f><!-- ... --></fLib>
Alternatively, a stand-off technique may be used, as in the following example, where
a
linkGrp element is used to link selected characters in the text
Caesar seized control with their phonological representations.
<s> <w xml:id="S1W1"> <c xml:id="S1W1C1">C
</c>ae
<c xml:id="S1W1C2">s
</c>ar
</w> <w xml:id="S1W2"> <c xml:id="S1W2C1">s
</c>ei
<c xml:id="S1W2C2">z
</c>e
<c xml:id="S1W2C3">d
</c> </w> <w xml:id="S1W3">con
<c xml:id="S1W3C1">t
</c>rol
</w>.
</s><fvLib xml:id="FSL1"
n="phonological segment definitions"><!-- as in previous example --></fvLib><linkGrp type="phonology"><!-- ... --> <link target="#S.DF #S1W3C1"/> <link target="#Z.DF #S1W2C3"/> <link target="#S.DF #S1W2C1"/> <link target="#Z.DF #S1W2C2"/><!-- ... --></linkGrp>
As this example shows, a stand-off solution requires that every component to be linked
to must be addressable in some way, by means of an XPointer. To handle the POS tagging
example above, for example, each annotated element might be given an identifier of
some sort, as follows:
<s xml:id="mds09" n="00741"> <w xml:id="mds0901">The
</w> <w xml:id="mds0902">closest
</w> <w xml:id="mds0903">he
</w> <w xml:id="mds0904">came
</w> <w xml:id="mds0905">to
</w> <w xml:id="mds0906">exercise
</w><!-- ... --></s>
It would then be possible to link each word to its intended annotation in the feature
library quoted above, as follows:
<linkGrp type="POS-codes"><!-- ... --> <link target="#mds0901 #at0"/> <link target="#mds0902 #ajs"/> <link target="#mds0903 #pnp"/> <link target="#mds0904 #vvd"/> <link target="#mds0905 #prp"/> <link target="#mds0906 #nn1"/> <link target="#mds0907 #vbd"/> <link target="#mds0908 #to0"/> <link target="#mds0909 #vvi"/> <link target="#mds0910 #crd"/><!-- ... --></linkGrp>
The Feature System Declaration (FSD) is intended for use in conjunction with a TEI-conforming
text that makes use of fs (that is, feature structure) elements. The FSD serves three purposes:
- the encoder can list all of the feature names and feature values and give a prose
description as to what each represents.
- the encoder can define what it means to be a well-formed feature structure, and define
constraints which may be used to determine whether a particular feature structure
is valid relative to a given theory stated in typed feature logic. These may involve constraints
on the range of a feature value, constraints on what features are valid within certain
types of feature structures, or constraints that prevent the co-occurrence of certain
feature-value pairs.
- the encoder can define the intended interpretation of underspecified feature structures.
This involves defining default values (whether literal or computed) for missing features.
The scheme described in this chapter may be used to document any feature structure
system, but is primarily intended for use with the feature structure representation
defined by the ISO 24610-1:2006 standard, which corresponds with the recommendations
presented in these Guidelines, 19 Feature Structures. This chapter relies upon, but does not reproduce, formal definitions and descriptions
presented more thoroughly in the ISO standard, which should be consulted in case of
ambiguity or uncertainty.
The FSD serves an important function in documenting precisely what the encoder intended
by the system of feature structure markup used in an XML-encoded text. The FSD is
also an important resource which standardizes the rules of inference used by software
to validate the feature structure markup in a text, and to infer the full interpretation
of underspecified feature structures.
The reader should be aware the terminology used in this document does not always closely
follow conventional practice in formal logic, and may also diverge from practice in
some linguistic applications of typed feature structures. In particular, the term
‘interpretation’ when applied to a feature structure is not an interpretation in the
model-theoretic sense, but is instead a minimally informative (or equivalently, most
general) extension of that feature structure that is consistent with a set of constraints
declared by an FSD. In linguistic application, such a system of constraints is the
principal means by which the grammar of some natural language is expressed. There
is a great deal of disagreement as to what, if any, model-theoretic interpretation
feature structures have in such applications, but the status of this formal kind of
interpretation is not germane to the present document. Similarly, the term ‘valid’
is used here as elsewhere in these Guidelines to identify the syntactic state of well-formedness
in the sense defined by the logic of typed feature structures itself, as distinct
from and in addition to the ‘well-formedness’ that pertains at the level of this encoding
standard. No appeal to any notion from formal semantics should be inferred.
We begin by describing how an encoded text is associated with one or more feature
system declarations. The second, third, and fourth sections describe the overall structure
of a feature system declaration and give details of how to encode its components.
The final section offers a full example; fuller discussion of the reasoning behind
FSDs and another complete example are provided in Langendoen and Simons (1995).
In order for application software to use feature system declarations to aid in the
automatic interpretation of encoded texts, or even for human readers to find the appropriate
declarations which document the feature system used in markup, there must be a formal
link from the encoded texts to the declarations. However, the schema which declares
the syntax of the Feature System itself should be kept distinct from the feature structure
schema, which is an application of that system.
A document containing typed feature structures may simply include a feature system
declaration documenting those feature structures. A more usual scenario, however,
is that the same feature system declaration (or parts of it) will be shared by many
documents. In either case, an fsDecl element for each distinct type of feature structure used must be provided and associated
with the type, which is the value used within each feature structure for its type attribute.
When the module defined in this chapter is included in an XML schema, the following
elements become available via the model.fsdDeclPart class:
- fsdDecl (feature system declaration) provides a feature system declaration comprising one
or more feature structure declarations or feature structure declaration links.
- model.fsdDeclPart groups elements which can occur as direct children of fsdDecl.
fLib (feature library) assembles a library of
f (feature) elements.
fsDecl (feature structure declaration) declares one type of feature structure.
fsdLink (feature structure declaration link) associates the name of a typed feature structure
with a feature structure declaration for it.
fvLib (feature-value library) assembles a library of reusable feature value elements (including
complete feature structures).
The fsdDecl element serves as a wrapper for declaring feature systems and may be supplied either
within the header of a standard TEI document, or as a standalone document in its own
right. It contains one or more fsdLink or fsDecl elements and may hold several fLib or fvLib as well.
For example, suppose that a document
doc.xml contains feature structures of two types:
gpsg and
lex. We might simply embed an
fsDecl element for each within the header attached to the document as follows:
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc><!-- example --> </fileDesc> <encodingDesc><!-- ... --> <fsdDecl> <fsDecl type="gpsg"><!-- information about this type --> </fsDecl> <fsDecl type="lex"><!-- information about this type --> </fsDecl> </fsdDecl><!-- ... --> </encodingDesc> </teiHeader> <text> <body><!-- ... --> <fs type="lex"><!-- an instance of the typed feature structure "lex" --> </fs><!-- ... --> </body> </text></TEI>
In this case there is an implicit link between the fs element and the corresponding fsDecl element because they share the same value for their type attribute and appear within the same document. This is a short cut for the more general
case which requires a more explicit link provided by means of the fsdLink element, as demonstrated below.
Now suppose that we wish to create a second document which includes feature structures
of the same type. Rather than duplicate the corresponding declarations, we will need
to provide a means of pointing to them from this second document. The easiest
86 way of accomplishing this is to add an XML identifier to each
fsDecl element in
example.xml:
<fsdDecl> <fsDecl type="gpsg" xml:id="GPSG"><!-- information about this type --> </fsDecl> <fsDecl type="lex" xml:id="LEX"><!-- information about this type --> </fsDecl></fsdDecl>
(Although in this case the XML identifier is simply an uppercase version of the type
name, there is no necessary connection between the two names. The only requirement
is that the XML identifier conform to the standards required for identifiers, and
that it be unique within the document containing it.)
In the
fsdDecl for the second document, we can now include pointers to the
fsDecl elements in the first:
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc><!-- doc2 --> </fileDesc> <encodingDesc><!-- ... --> <fsdDecl> <fsdLink type="gpsg"
target="example.xml#GPSG"/> <fsdLink type="lexx"
target="example.xml#LEX"/> </fsdDecl><!-- ... --> </encodingDesc> </teiHeader> <text> <body><!-- ... --> <fs type="lexx"><!-- an instance of the typed feature structure "lex" --> </fs><!-- ... --> </body> </text></TEI>
Note that in
doc2.xml there is no requirement for the local name for a given type of feature structures
to be the same as that used by
example.xml. We assume in this encoding that the type called
lexx in
doc2.xml is declared as having identical constraints and other properties to those declared
for the type called
lex in
example.xml.
An fsdDecl may be given, as above, within the encoding description of the teiHeader element of a TEI document containing typed feature structures. Alternatively, it
may appear independently of any feature structures, as a document in its own right
with its own teiHeader. These options are both possible because the element is a member of both the model.encodingDescPart class and the model.resource class.
The current recommendations provide no way of enforcing uniqueness of the type values among fsdDecl elements, nor of requiring that every type value specified on an fs element be also declared on an fsdDecl element. Encoders requiring such constraints (which might have some obvious utility
in assisting the consistency and accuracy of tagging) are recommended to develop tools
to enforce them, using such mechanisms as Schematron assertions.
A feature system declaration contains one or more feature structure declarations,
each of which has up to three parts: an optional description (which gives a prose
comment on what that type of feature structure encodes), an obligatory set of feature
declarations (which specify range constraints and default values for the features
in that type of structure), and optional feature structure constraints (which specify
co-occurrence restrictions on feature values).
- fsDescr (feature system description (in FSD)) describes in prose what is represented by the
type of feature structure declared in the enclosing fsDecl.
- fDecl (feature declaration) declares a single feature, specifying its name, organization,
range of allowed values, and optionally its default value.
- fsConstraints (feature-structure constraints) specifies constraints on the content of valid feature
structures.
Feature declarations and feature structure constraints are described in the next two
sections. Note that the specification of similar fsDecl elements can be simplified by devising an inheritance hierarchy for the feature structure
types. Each fsDecl element may name one or more ‘basetypes’ from which it inherits feature declarations
and constraints (these are often called ‘supertypes’). For instance, suppose that
<fsDecl type="Basic"> contains <fDecl name="One"> and <fDecl name="Two">, and that <fsDecl type="Derived" baseTypes="Basic"> contains just <fDecl name="Three">. Then any instance of <fs type="Derived"> must include all three features. This is because <fsDecl type="Derived"> inherits the two feature declarations from <fsDecl type="Basic"> when it specifies a base type of Basic.
The following sample shows the overall structure of a complete feature structure declaration:
<fsDecl type="SomeName"> <fsDescr>Describes what this type of fs represents
</fsDescr> <fDecl name="featureOne"><!-- The declaration for featureOne --> </fDecl> <fDecl name="featureTwo"><!-- The declaration for featureTwo --> </fDecl> <fsConstraints><!-- The feature structure constraints go here --> </fsConstraints></fsDecl>
The attribute baseTypes gives the name of one or more types from which this type inherits feature specifications
and constraints; if this type includes a feature specification with the same name
as one inherited from any of the types specified by this attribute, or if more than
one specification of the same name is inherited, then the possible values of that
feature is determined by unification. Similarly, the set of constraints applicable
is derived by conjoining those specified explicitly within this element with those
implied by the baseTypes attribute. When no base type is specified, no feature specification or constraint
is inherited.
Although the present standard does provide for default feature values, feature inheritance
is defined to be monotonic.
The process of combining constraints may result in a contradiction, for example if
two specifications for the same feature specify disjoint ranges of values, and at
least one such specification is mandatory. In such a case, there is no valid feature
structure of the type being defined.
Every type specified by baseTypes must be a single word which is a legal XML name; for example, they cannot include
whitespace or begin with digits. Multiple base types are separated with spaces, e.g.
<fsDecl type="Sub" baseTypes="Super1 Super2">.
Each feature is declared in an fDecl element whose name attribute identifies the feature being declared; this matches the name attribute of the f elements it declares. An fDecl has three parts: an optional prose description (which should explain what the feature
and its values represent), an obligatory range specification (which declares what
values the feature is allowed to have), and an optional default specification (which
declares what default value should be supplied when the named feature does not appear
in an fs). If, in a feature structure, a feature:
then the value of this feature in the feature structure's most general valid extension
is the most general value provided in its vRange, in the case of a unit organization, or the singleton set, bag, or list containing
that element, in the case of a complex organization. If the feature:
- is optional,
- has no value provided, or the value default is provided, and
- either has a default specified, or has conditional defaults, one of the conditions
on which is met,
then this feature does have a value in the feature structure's most general valid
extension when it exists, namely the default value that pertains.
It is possible that a feature structure will not have a valid extension because the
default value that pertains to a feature is not consistent with that feature's declared
range. Additional tools are required for the enforcement of such criteria.
The following elements are used in feature system declarations:
The logic for validating feature values and for matching the conditions for supplying
default values is based on the operation of subsumption. Subsumption is a standard operation in feature-structure-based formalisms. Informally,
a feature structure FS subsumes all feature structures that are at least as informative as itself; that
is, all feature structures that specify all of the feature values that FS does with
values that are subsumed by the values that FS has, and that have all of the re-entrancies
(see 19.6 Re-entrant Feature Structures) that FS does. (Carpenter (1992); see also Pereira (1987) and Shieber (1986)) A more formal definition is provided in ISO 24610-1:2006 .
Following the spirit of the informal definition above, we can extend subsumption in
a straightforward way to cover alternation, negation, special primitive values, and
the use of attributes in the markup. For instance, a vAlt containing the value v subsumes v. The negation of a value v (represented by means of the vNot element discussed in section 19.8.2 Negation) subsumes any value that is not v; for example <vNot><numeric value='0'/></vNot> subsumes any numeric value other than zero. The value <fs type="X"/> subsumes any feature structure of type X, even if it is not valid.
As an example of feature declarations, consider the following extract from Gazdar
et al.'s
Generalized Phrase Structure Grammar. In the appendix to their book, they propose a feature system for English of which
this is just a sampling:
feature value range
INV {+, -}
CONJ {and, both, but, either, neither, nor, or, NIL}
COMP {for, that, whether, if, NIL}
AGR CAT
PFORM {to, by, for, ...}
Feature specification defaults
FSD 1: [-INV]
FSD 2: ~[CONJ]
FSD 9: [INF, +SUBJ] --> [COMP for]
The INV feature, which encodes whether or not a sentence is inverted, allows only
the values plus (+) and minus (-). If the feature is not specified, then the default
rule (FSD 1 above) says that a value of minus is always assumed. The feature declaration
for this feature would be encoded as follows:
<fDecl name="INV"> <fDescr>inverted sentence
</fDescr> <vRange> <vAlt> <binary value="true"/> <binary value="false"/> </vAlt> </vRange> <vDefault> <binary value="false"/> </vDefault></fDecl>
The value range is specified as an alternation (more precisely, an exclusive disjunction),
which can be represented by the binary feature value. That is, the value must be either true or false, but cannot be both
or neither.
The CONJ feature indicates the surface form of the conjunction used in a construction.
The ~ in the default rule (see FSD 2 above) represents negation. This means that by
default the feature is not applicable, in other words, no conjunction is taking place.
Note that CONJ not being present is distinct from CONJ being present but having the
NIL value allowed in the value range. In their analysis, NIL means that the phenomenon
of conjunction is taking place but there is no explicit conjunction in the surface
form of the sentence. The feature declaration for this feature would be encoded as
follows:
<fDecl name="CONJ"> <fDescr>surface form of the conjunction
</fDescr> <vRange> <vAlt> <symbol value="and"/> <symbol value="both"/> <symbol value="but"/> <symbol value="either"/> <symbol value="neither"/> <symbol value="nor"/> <symbol value="or"/> <symbol value="NIL"/> </vAlt> </vRange> <vDefault> <binary value="false"/> </vDefault></fDecl>
Note that the
vDefault is not strictly necessary in this case, since the binary value of
false only serves to convey the information that the feature has no other legitimate value.
The COMP feature indicates the surface form of the complementizer used in a construction.
In value range, it is analogous to CONJ. However, its default rule (see FSD 9 above)
is conditional. It says that if the verb form is infinitival (the VFORM feature is
not mentioned in the rule since it is the only feature that can take INF as a value),
and the construction has a subject, then a
for complement must be used. For instance, to make John the subject of the infinitive
in
It is necessary to go, a
for complement must be used; that is,
It is necessary for John to go. The feature declaration for this feature would be encoded as follows:
<fDecl name="COMP"> <fDescr>surface form of the complementizer
</fDescr> <vRange> <vAlt> <symbol value="for"/> <symbol value="that"/> <symbol value="whether"/> <symbol value="if"/> <symbol value="NIL"/> </vAlt> </vRange> <vDefault> <if> <fs> <f name="VFORM"> <symbol value="INF"/> </f> <f name="SUBJ"> <binary value="true"/> </f> </fs> <then/> <symbol value="for"/> </if> </vDefault></fDecl>
The AGR feature stores the features relevant to subject-verb agreement. Gazdar et
al. specify the range of this feature as CAT. This means that the value is a
category, which is their term for a feature structure. This is actually too weak a statement.
Not just any feature structure is allowable here; it must be a feature structure for
agreement (which is defined in the complete example at the end of the chapter to contain
the features of person and number). The following feature declaration encodes this
constraint on the value range:
<fDecl name="AGR"> <fDescr>agreement for person and number
</fDescr> <vRange> <fs type="Agreement"/> </vRange></fDecl>
That is, the value must be a feature structure of type
Agreement. The complete example at the end of this chapter includes the
<fsDecl type="Agreement"> which includes
<fDecl name="PERS"> and
<fDecl name="NUM">.
The PFORM feature indicates the surface form of the preposition used in a construction.
Since PFORM is specified above as an open set,
string is used in the range specification below rather than
symbol.
<fDecl name="PFORM"> <fDescr>word form of a preposition
</fDescr> <vRange> <vNot> <string/> </vNot> </vRange></fDecl>
This example makes use of a negated value:
<vNot><string/></vNot> subsumes any string that is not the empty string.
For the reduced feature structure that concludes section
19.3 Other Atomic Feature Values above and identifies the value of some part of speech to be a common noun, it is possible
to align the concept of
part of speech with its definition and persistent identifier using the
targetDatcat attribute, which connects the modeled XML object with the appropriate locus in a reference
taxonomy, as shown below:
<fDecl name="POS"
targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"> <fDescr>part of speech (morphosyntactic category)
</fDescr> <vRange> <vAlt> <symbol value="NN"
datcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"/> <symbol value="NP"
datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/><!-- ... --> </vAlt> </vRange></fDecl>
The above example declares the feature
‘POS’ as instantiating the corresponding concept defined in a reference taxonomy or ontology,
and defines the range of values of the feature at hand by listing the appropriate
alternatives, together with their external persistent identifiers.
Note that the class model.featureVal includes all possible single feature values, including feature structures, alternations
(vAlt) and complex collections (vColl).
Ensuring the validity of feature structures may require much more than simply specifying
the range of allowed values for each feature. There may be constraints on the co-occurrence
of one feature value with the value of another feature in the same feature structure
or in an embedded feature structure.
Such constraints on valid feature structures are expressed as a series of conditional
and biconditional tests in the fsConstraints part of an fsDecl. A particular feature structure is valid only if it meets all the constraints. The
cond element encodes the conventional if-then conditional of boolean logic which succeeds
when both the antecedent and consequent are true, or whenever the antecedent is false.
The bicond element encodes the biconditional (if and only if) operation of boolean logic. It
succeeds only when the corresponding if-then conditionals in both directions are true.
In feature structure constraints the antecedent and consequent are expressed as feature
structures; they are considered true if they subsume (see section 19.11.3 Feature Declarations) the feature structure in question, but in the case of consequents, this truth is
asserted rather than simply tested. That is to say, a conditional is enforced by determining
that the antecedent does not (and will never) subsume the given feature structure,
or by determining that the antecedent does subsume the given feature structure, and
then unifying the consequent with it (the result of which, if successful, will be
subsumed by the consequent). In practice, the enforcement of such constraints can
result in periods in which the truth of a constraint with respect to a given feature
structure is simply not known; in this case, the constraint must be persistently monitored
as the feature structure becomes more informative until either its truth value is
determined or computation fails for some other reason.
The following elements make up the fsConstraints part of an FSD:
- fsConstraints (feature-structure constraints) specifies constraints on the content of valid feature
structures.
- cond (conditional feature-structure constraint) defines a conditional feature-structure
constraint; the consequent and the antecedent are specified as feature structures
or feature-structure collections; the constraint is satisfied if both the antecedent
and the consequent subsume a given feature structure, or if the antecedent does not.
- bicond (bi-conditional feature-structure constraint) defines a biconditional feature-structure
constraint; both consequent and antecedent are specified as feature structures or
groups of feature structures; the constraint is satisfied if both subsume a given
feature structure, or if both do not.
- then separates the condition from the default in an if, or the antecedent and the consequent in a cond element.
- iff (if and only if) separates the condition from the consequence in a bicond element.
For an example of feature structure constraints, consider the following ‘feature co-occurrence
restrictions’ extracted from the feature system for English proposed by Gazdar, et
al. (1985:246–247):
[FCR 1: [+INV] → [+AUX, FIN]⚓
FCR 7: [BAR 0] ≡ [N] & [V] & [SUBCAT]⚓
FCR 8: [BAR 1] → ~[SUBCAT]]⚓
The first constraint says that if a construction is inverted, it must also have an
auxiliary and a finite verb form. That is,
<cond> <fs> <f name="INV"> <binary value="true"/> </f> </fs> <then/> <fs> <f name="AUX"> <binary value="true"/> </f> <f name="VFORM"> <symbol value="FIN"/> </f> </fs></cond>
The second constraint says that if a construction has a BAR value of zero (i.e., it
is a sentence), then it must have a value for the features N, V, and SUBCAT. By the
same token, because it is a biconditional, if it has values for N, V, and SUBCAT,
it must have BAR='0'. That is,
<bicond> <fs> <f name="BAR"> <symbol value="0"/> </f> </fs> <iff/> <fs> <f name="N"> <binary value="true"/> </f> <f name="V"> <binary value="true"/> </f> <f name="SUBCAT"> <binary value="true"/> </f> </fs></bicond>
The final constraint says that if a construction has a BAR value of 1 (i.e., it is
a phrase), then the SUBCAT feature should be absent (~). This is not biconditional,
since there are other instances under which the SUBCAT feature is inappropriate. That
is,
<cond> <fs> <f name="BAR"> <symbol value="1"/> </f> </fs> <then/> <fs> <f name="SUBCAT"> <binary value="false"/> </f> </fs></cond>
Note that cond and bicond use the empty tags then and iff, respectively, to separate the antecedent and consequent. These are primarily for
the sake of enhancing human readability.
To summarize this chapter, the complete FSD for the example that has run through the
chapter is reproduced below:
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <titleStmt> <title>A sample FSD based on an extract from Gazdar
et al.'s GPSG feature system for English
</title> <respStmt> <resp>encoded by
</resp> <name>Gary F. Simons
</name> </respStmt> </titleStmt> <publicationStmt> <p>This sample was first encoded by Gary F. Simons (Summer
Institute of Linguistics, Dallas, TX) on January 28, 1991.
Revised April 8, 1993 to match the specification of FSDs
in version P2 of the TEI Guidelines. Revised again December 2004 to
be consistent with the feature structure representation standard
jointly developed with ISO TC37/SC4.
</p> </publicationStmt> <sourceDesc> <p>This sample FSD does not describe a complete feature
system. It is based on extracts from the feature system
for English presented in the appendix (pages 245–247) of
Generalized Phrase Structure Grammar, by Gazdar, Klein,
Pullum, and Sag (Harvard University Press, 1985).
</p> </sourceDesc> </fileDesc> </teiHeader> <fsdDecl> <fsDecl type="GPSG"> <fsDescr>Encodes a feature structure for the GPSG analysis
of English (after Gazdar, Klein, Pullum, and Sag)
</fsDescr> <fDecl name="INV"> <fDescr>inverted sentence
</fDescr> <vRange> <vAlt> <binary value="true"/> <binary value="false"/> </vAlt> </vRange> <vDefault> <binary value="false"/> </vDefault> </fDecl> <fDecl name="CONJ"> <fDescr>surface form of the conjunction
</fDescr> <vRange> <vAlt> <symbol value="and"/> <symbol value="both"/> <symbol value="but"/> <symbol value="either"/> <symbol value="neither"/> <symbol value="nor"/> <symbol value="or"/> <symbol value="NIL"/> </vAlt> </vRange> <vDefault> <binary value="false"/> </vDefault> </fDecl> <fDecl name="COMP"> <fDescr>surface form of the complementizer
</fDescr> <vRange> <vAlt> <symbol value="for"/> <symbol value="that"/> <symbol value="whether"/> <symbol value="if"/> <symbol value="NIL"/> </vAlt> </vRange> <vDefault> <if> <fs> <f name="VFORM"> <symbol value="INF"/> </f> <f name="SUBJ"> <binary value="true"/> </f> </fs> <then/> <symbol value="for"/> </if> </vDefault> </fDecl> <fDecl name="AGR"> <fDescr>agreement for person and number
</fDescr> <vRange> <fs type="Agreement"/> </vRange> </fDecl> <fDecl name="PFORM"> <fDescr>word form of a preposition
</fDescr> <vRange> <vNot> <string/> </vNot> </vRange> </fDecl> <fsConstraints> <cond> <fs> <f name="INV"> <binary value="true"/> </f> </fs> <then/> <fs> <f name="AUX"> <binary value="true"/> </f> <f name="VFORM"> <symbol value="FIN"/> </f> </fs> </cond> <bicond> <fs> <f name="BAR"> <symbol value="0"/> </f> </fs> <iff/> <fs> <f name="N"> <binary value="true"/> </f> <f name="V"> <binary value="true"/> </f> <f name="SUBCAT"> <binary value="true"/> </f> </fs> </bicond> <cond> <fs> <f name="BAR"> <symbol value="1"/> </f> </fs> <then/> <fs> <f name="SUBCAT"> <binary value="false"/> </f> </fs> </cond> </fsConstraints> </fsDecl> <fsDecl type="Agreement"> <fsDescr>This type of feature structure encodes the features
for subject-verb agreement in English
</fsDescr> <fDecl name="PERS"> <fDescr>person (first, second, or third)
</fDescr> <vRange> <vAlt> <symbol value="1"/> <symbol value="2"/> <symbol value="3"/> </vAlt> </vRange> </fDecl> <fDecl name="NUM"> <fDescr>number (singular or plural)
</fDescr> <vRange> <vAlt> <symbol value="sg"/> <symbol value="pl"/> </vAlt> </vRange> </fDecl> </fsDecl> </fsdDecl></TEI>
This elements discussed in this chapter constitute a module of the TEI scheme which
is formally defined as follows:
- Module iso-fs: Feature structures
-
The selection and combination of modules to form a TEI schema is described in 1.2 Defining a TEI Schema.