FAQ
History |
Previous Home Next |
Search
Feedback |
DividerDesigning an XML Data Structure
This section covers some heuristics you can use when making XML design decisions.
Saving Yourself Some Work
Whenever possible, use an existing schema definition. It's usually a lot easier to ignore the things you don't need than to design your own from scratch. In addition, using a standard DTD makes data interchange possible, and may make it possible to use data-aware tools developed by others.
So, if an industry standard exists, consider referencing that DTD with an external parameter entity. One place to look for industry-standard DTDs is at the repository created by the Organization for the Advancement of Structured Information Standards (OASIS) at
http://www.XML.org
. Another place to check is CommerceOne's XML Exchange athttp://www.xmlx.com
, which is described as "a repository for creating and sharing document type definitions".
Note: Many more good thoughts on the design of XML structures are at the OASIS page,
http://www.oasis-open.org/cover/elementsAndAttrs.html
.Attributes and Elements
One of the issues you will encounter frequently when designing an XML structure is whether to model a given data item as a subelement or as an attribute of an existing element. For example, you could model the title of a slide either as:
<slide> <title>This is the title</title> </slide>or as:
<slide title="This is the title">...</slide>In some cases, the different characteristics of attributes and elements make it easy to choose. Let's consider those cases first, and then move on to the cases where the choice is more ambiguous.
Forced Choices
Sometimes, the choice between an attribute and an element is forced on you by the nature of attributes and elements. Let's look at a few of those considerations:
The data contains substructures
In this case, the data item must be modeled as an element. It can't be modeled as an attribute, because attributes take only simple strings. So if the title can contain emphasized text like this:
The <em>Best</em> Choice
, then the title must be an element.The data contains multiple lines
Here, it also makes sense to use an element. Attributes need to be simple, short strings or else they become unreadable, if not unusable.
Multiple occurrences are possible
Whenever an item can occur multiple times, like paragraphs in an article, it must be modeled as an element. The element that contains it can only have one attribute of a particular kind, but it can have many subelements of the same type.
The data changes frequently
When the data will be frequently modified with an editor, it may make sense to model it as an element. Many XML-aware editors make it easy modify element data, while attributes can be somewhat harder to get to.
The data is a small, simple string that rarely if ever changes
This is data that can be modeled as an attribute. However, just because you can does not mean that you should. Check the "Stylistic Choices" section next, to be sure.
Using DTDs when the data is confined to a small number of fixed choices
Here is one time when it really makes sense to use an attribute. A DTD can prevent an attribute from taking on any value that is not in the preapproved list, but it cannot similarly restrict an element. (With a schema on the other hand, both attributes and elements can be restricted.)
Stylistic Choices
As often as not, the choices are not as cut and dried as those shown above. When the choice is not forced, you need a sense of "style" to guide your thinking. The question to answer, then, is what makes good XML style, and why.
Defining a sense of style for XML is, unfortunately, as nebulous a business as defining "style" when it comes to art or music. There are a few ways to approach it, however. The goal of this section is to give you some useful thoughts on the subject of "XML style".
Visibility
One heuristic for thinking about XML elements and attributes uses the concept of visibility. If the data is intended to be shown--to be displayed to some end user--then it should be modeled as an element. On the other hand, if the information guides XML processing but is never seen by a user, then it may be better to model it as an attribute. For example, in order-entry data for shoes, shoe size would definitely be an element. On the other hand, a manufacturer's code number would be reasonably modeled as an attribute.
Consumer / Provider
Another way of thinking about the visibility heuristic is to ask who is the consumer and/or provider of the information. The shoe size is entered by a human sales clerk, so it's an element. The manufacturer's code number for a given shoe model, on the other hand, may be wired into the application or stored in a database, so that would be an attribute. (If it were entered by the clerk, though, it should perhaps be an element.)
Container vs. Contents
Perhaps the best way of thinking about elements and attributes is to think of an element as a container. To reason by analogy, the contents of the container (water or milk) correspond to XML data modeled as elements. Such data is essentially variable. On the other hand, characteristics of the container (blue or white pitcher) can be modeled as attributes. That kind of information tends to be more immutable. Good XML style will, in some consistent way, separate each container's contents from its characteristics.
To show these heuristics at work: In a slideshow the type of the slide (executive or technical) is best modeled as an attribute. It is a characteristic of the slide that lets it be selected or rejected for a particular audience. The title of the slide, on the other hand, is part of its contents. The visibility heuristic is also satisfied here. When the slide is displayed, the title is shown but the type of the slide isn't. Finally, in this example, the consumer of the title information is the presentation audience, while the consumer of the type information is the presentation program.
Normalizing Data
In Saving Yourself Some Work, you saw that it is a good idea to define an external entity that you can reference in an XML document. Such an entity has all the advantages of a modularized routine--changing that one copy affects every document that references it. The process of eliminating redundancies is known as normalizing, so defining entities is one good way to normalize your data.
In an HTML file, the only way to achieve that kind of modularity is with HTML links--but of course the document is then fragmented, rather than whole. XML entities, on the other hand, suffer no such fragmentation. The entity reference acts like a macro--the entity's contents are expanded in place, producing a whole document, rather than a fragmented one. And when the entity is defined in an external file, multiple documents can reference it.
The considerations for defining an entity reference, then, are pretty much the same as those you would apply to modularized program code:
- Whenever you find yourself writing the same thing more than once, think entity. That lets you write it one place and reference it multiple places.
- If the information is likely to change, especially if it is used in more than one place, definitely think in terms of defining an entity. An example is defining
productName
as an entity so that you can easily change the documents when the product name changes.- If the entity will never be referenced anywhere except in the current file, define it in the local_subset of the document's DTD, much as you would define a method or inner class in a program.
- If the entity will be referenced from multiple documents, define it as an external entity, the same way that would define any generally usable class as an external class.
External entities produce modular XML that is smaller, easier to update and maintain. They can also make the resulting document somewhat more difficult to visualize, much as a good OO design can be easy to change, once you understand it, but harder to wrap your head around at first.
You can also go overboard with entities. At an extreme, you could make an entity reference for the word "the"--it wouldn't buy you much, but you could do it.
Note: The larger an entity is, the less likely it is that changing it will have unintended effects. When you define an external entity that covers a whole section on installation instructions, for example, making changes to the section is unlikely to make any of the documents that depend on it come out wrong. Small inline substitutions can be more problematic, though. For example, if
productName
is defined as an entity, the name change can be to a different part of speech, and that can produce! Suppose the product name is something like "HtmlEdit". That's a verb. So you write a sentence that becomes, "You can HtmlEdit your file..." after the entity-substitution occurs. That sentence reads fine, because the verb fits well in that context. But if the name is eventually changed to "HtmlEditor", the sentence becomes "You can HtmlEditor your file...", which clearly doesn't work. Still, even if such simple substitutions can sometimes get you in trouble, they can potentially save a lot of time. (One alternative would be to set up entities namedproductNoun
,productVerb
,productAdj
, andproductAdverb
!)Normalizing DTDs
Just as you can normalize your XML document, you can also normalize your DTD declarations by factoring out common pieces and referencing them with a parameter entity. Factoring out the DTDs (also known as modularizing or normalizing) gives the same advantages and disadvantages as normalized XML--easier to change, somewhat more difficult to follow.
You can also set up conditionalized DTDs. If the number and size of the conditional sections is small relative to the size of the DTD as a whole, that can let you "single source" a DTD that you can use for multiple purposes. If the number of conditional sections gets large, though, the result can be a complex document that is difficult to edit.
Summary
Congratulations! You have now created a number of XML files that you can use for testing purposes. Here's a table that describes the files you have constructed.
Table 7-5 Listing of Sample XML Files File Contents slideSample01.xml A basic file containing a few elements and attributes, as well as comments. slideSample02.xml Includes a processing instruction. SlideSampleBad1.xml A file that is not well-formed. slideSample03.xml Includes a simple entity reference (<). slideSample04.xml Contains a CDATA section. slideSample05.xml References either a simple external DTD for elements (slideshow1a.dtd
), for use with a nonvalidating parser, or else a DTD that defines attributes (slideshow1b.dtd
) for use with a validating parser. slideSample06.xml Defines two entities locally (product and products), and referencesslideshow1b.dtd
. slideSample07.xml References an external entity defined locally (copyright.xml
), and referencesslideshow1b.dtd
. slideSample08.xml Referencesxhtml.dtd
using a parameter entity inslideshow2.dtd
, producing a naming conflict, sincetitle
is declared in both. slideSample09.xml Changes thetitle
element toslide-title
, so it can referencexhtml.dtd
using a parameter entity inslideshow3.dtd
without conflict.
FAQ
History |
Previous Home Next |
Search
Feedback |
All of the material in The J2EE Tutorial for the Sun ONE Platform is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.