World Wide Web Consortium Member Find out more about usContact WinWriters at 1-800-838-8999
Link to WinWriters home pageReceive information about our eventsLink to our discussion and jobs ForumLink to the Online Help Resource Directory
Link to WinWriters home

XML: What Do Help Authors Need to Know? Part 2

By Scott Boggan


This article contains links to sample pages that require an XML viewer, such as Microsoft Internet Explorer 5, which is available at http://www.microsoft.com/windows/ie/default.htm.external link Screen captures of the sample pages are also available for the benefit of those readers who do not have an XML viewer. You can also download the source code files for all of the examples.

In Part 2 of our article, we’ll explore XML syntax in greater detail. We'll also study the power of Extensible Stylesheet Language (XSL), take a look at XML’s new linking technologies—XLink and XPointer—and briefly survey XML authoring tools. If you haven’t read Part 1 of this article and are new to XML, you may want to do so before continuing.

More XML Syntax

Our previous installment covered the basic syntax used to create XML documents. We obviously don’t have space to cover all of XML’s rules, but here are a few more important concepts of particular interest to Help authors.

Tag Attributes

XML tags may include optional attributes. In the following example, the <BOOK> tag has a GENRE attribute with the value non-fiction:

<BOOK GENRE="non-fiction">The Right Stuff</BOOK>

With all of the flexibility offered by XML, we could of course create our own tag named <GENRE>. So why use an attribute instead? Generally, attributes are valuable for adding information that won’t be visible to the user. As we’ll see later when we discuss XSL, an attribute such as GENRE is extremely useful for processing the document. For example, we might use XSL to hide all of the books except those that are nonfiction. Or, we may want to sort a list of books according to genre.

Entities

Serious web jockeys are probably familiar with the concept of HTML entities. These entity characters are prefixed with an ampersand, such as the opening angle bracket (&lt;) or the quotation mark (&quot;).

XML also uses a few predefined entities, but takes the concept even further. One of the most powerful uses for entities is as a placeholder for text that you want to appear in multiple documents. For example, you might define an entity for a technical support phone number like this:

<!ENTITY PHONE "in the USA, call toll free at 800-555-1234">

This creates an entity named PHONE. You can insert the contents of that entity into a document using a tag like this:

<SUPPORT>For technical support &PHONE;</SUPPORT>

This results in a text string reading "For technical support in the USA, call toll free at 800-555-1234." Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

One of the benefits of using entities is that you can update multiple pages with one simple change; for example, you can change the phone number for different regions by updating a single entity.

Playing By the Rules

Rules are important in any markup language, even for relatively unstructured languages such as HTML. HTML documents begin with an <HTML> tag and end with an </HTML> tag, and in between contain Title, Head, and Body elements. Each of these tags must appear in a specific order and each has a specific purpose. Similar rules govern other HTML tags, such as hyperlinks.

These rules are included in something called a Document Type Definition, or DTD. Many HTML authoring tools add a reference to a DTD at the top of each document, like the one below. This identifies the document as conforming to the World Wide Web Consortium's (W3C) HTML 4.0 English (EN) specification:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

This sounds nice, but in practice Web browsers ignore references to HTML DTDs and allow authors to get away with very sloppy code.

Not so with XML processors, which are much more rigid about enforcing rules. There are two types of XML documents: well-formed and valid. Well-formed XML simply means that the document contains no errors. Part 1 of this article covered the basic rules: all tags must be present and of the same case, the open and closing tags must match, and the tags must not overlap. If an XML document is not well-formed, nothing will display—instead you’ll get an error message. (I’ve heard that Microsoft and Netscape—apparently tired of building browsers smart enough to clean up after sloppy code!—demanded this of the W3C.)

Valid XML documents are attached to a DTD. Think of the DTD as a blueprint for the document structure: it defines the tags that can be used in the document and the order in which they appear. Here’s a sample DTD that you might use to document a task in a Help system:

Sample DTD

The first line says that this DTD is for a procedure. The second line defines a procedure element, and says that the procedure must contain a heading followed by a body. Next, the heading element is defined as having a title and an optional concept. The fact that the concept is optional is indicated by the question mark (?). The text #PCDATA (short for "parsed character data") says that the title and concept both contain text. The body of our procedure contains one or more steps, and zero or more tips or notes. As you can see, various symbols—the comma, question mark, plus sign, asterisk, and pipe symbol—are used to define the structure of the document.

Once you’ve created your DTD, there are two ways to access it. An internal DTD is stored in the document along with your XML data. More commonly, you’ll probably want to access an external DTD stored in a separate file. This makes the rules in your DTD available to multiple XML documents. External DTDs are themselves XML documents, so you add an XML declaration (<?xml version="1.0"?>) to the beginning of the file. To access an external DTD, you add a DOCTYPE declaration to the beginning of your XML file, like this:

<!DOCTYPE procedure SYSTEM "procedure.dtd">

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

So do you need a DTD? For simple documents, probably not. You not only save the time it takes to build a DTD, but get slightly better performance, too. On the other hand, if you’re creating a large document set or work in a department with multiple authors, you may very well want a DTD to enforce consistency. Your DTD can prevent authors from breaking your style guidelines—for instance, preventing your new hire from writing a procedure that does not begin with a heading. Department standards are just the beginning—hundreds of DTDs are being developed for industries that want to create documents conforming to a common standard.

Note that Internet Explorer does not process DTDs correctly. For example, if you remove a required element from an XML document (say, the heading in our PROCEDURE.DTD), IE5 will display your document anyway. However, if you try to open an invalid document in a different XML processor, you’ll receive the following message:

"Element content is invalid according to the DTD/Schema."

Namespaces

Thousands and thousands of XML authors out on the Internet creating their own tags will undoubtedly lead to conflicts. A Help author’s <procedure> tag may be very different than a <procedure> tag used by a web site for medical professionals. Namespaces are designed to avoid confusion arising from tags with the same name. To use a namespace, prefix your tag with its name; a procedure tag using a help namespace would look like <help:procedure/> Additionally, the XML document must contain a declaration identifying the namespace:

<procedure xmlns:help="x-schema:helpSchema.xml"> 

XSL: The Power to Format and Transform

To your reader, raw XML is about as exciting as dirt. As we saw in Part 1 of our article, plain XML documents appear as a simple tree view in a web browser. The key to presenting XML documents in a browser is to transform them into HTML. There are various options for transforming XML to HTML on both the client and the server (including Perl, Java, and various scripting languages), but for web browsers the likeliest choice is the Extensible Stylesheet Language, or XSL. XSL transforms XML in two ways.

You’ve Got the Look

The first use for XSL is to convert XML into something a web browser can display. In this case, XSL takes XML as input and outputs it as HTML. And it does this "on the fly": if you use IE5 to open an XML document formatted with XSL, you’ll see HTML on the screen, but view the source code and you’ll see XML. Slick, huh?

Let’s look at an example using our sample bookstore. We’ll create a simple XSL template to display our bookstore data (BOOKSTORE1.XSL):

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body bgcolor="ivory">
<h1>Amalgamated.com Online Bookstore</h1>
  <TABLE BORDER="2" CELLPADDING="5">
    <tr>
      <th>Author</th>
      <th>Title</th>
      <th>Price</th>
    </tr>
    <xsl:for-each select="BOOKSTORE/BOOK">
    <tr>
      <td><cite><xsl:value-of select="TITLE"/></cite></td>
      <td><xsl:value-of select="AUTHOR"/></td>
      <td>$<xsl:value-of select="PRICE"/></td>
    </tr>
    </xsl:for-each>
  </TABLE>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

The bulk of this document is very standard HTML, with a few additions. First, you’ll notice that the document begins with an XML declaration—XSL stylesheets are in fact XML documents. The next line identifies this document as an XSL stylesheet and references an XML namespace that IE5 recognizes. To understand the third line, recall that Part 1 of this article illustrated how XML documents are often structured hierarchically, like a tree. The tag <xsl:template match="/"> instructs the stylesheet to begin processing the document from the root (/) of that document tree. Buried in our HTML table code, you’ll find <xsl:for-each select="BOOKSTORE/BOOK">. This tag is used to locate a particular set of XML tags: all BOOK elements inside the BOOKSTORE element.

From here, the stylesheet begins matching XML elements using patterns. This pattern matching is at the heart of XSL. The <xsl:value-of> lines work down the XML tree and look for any branches that match the specified pattern. In this case, the select attribute starts at the BOOKSTORE/BOOK tree and locates the TITLE, AUTHOR, and PRICE. For each of these three children, the <xsl:value-of> element inserts the XML data into a <td> tag. Notice that we’re using the <cite> tag to format our book titles, and are prefixing the book prices with a dollar sign ($).

We’re almost there. Once we’ve created our XSL stylesheet, we need to reference it from our XML document. The stylesheet reference appears beneath the document’s XML declaration:

<?xml version="1.0"?>
<?xml-stylesheet href="bookstore.xsl" type="text/xsl"?>

Open our document, and voilà, our XML is now formatted as HTML!

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

More Than Just a Pretty Face

Even more interesting than its formatting ability is XSL’s power to transform the structure of an XML document. For example, suppose that we want to filter our XML document to only include works of fiction. This is accomplished by modifying our XSL file to use the <xsl:if> element. In the following example, <xsl:if test="@GENRE[.='fiction']"> checks to see if the GENRE attribute is equal to fiction. For each book meeting this criteria, the XSL stylesheet inserts the TITLE, AUTHOR, and PRICE data into the table (BOOKSTORE2.XSL).

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body bgcolor="ivory">
<h1>Amalgamated.com Online Bookstore</h1>
  <TABLE BORDER="2" CELLPADDING="5">
    <tr>
      <th>Author</th>
      <th>Title</th>
      <th>Price</th>
    </tr>
    <xsl:for-each select="BOOKSTORE/BOOK">
    <xsl:if test="@GENRE[.='fiction']">
    <tr>
      <td><cite><xsl:value-of select="TITLE"/></cite></td>
      <td><xsl:value-of select="AUTHOR"/></td>
      <td>$<xsl:value-of select="PRICE"/></td>
    </tr>
    </xsl:if>
    </xsl:for-each>
  </TABLE>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

Another use for XSL would be to sort our XML document. By default, our stylesheet lists the books in the same order that they appear in the XML file. But suppose that we want to arrange our books alphabetically by author’s last name? This is accomplished using XSL’s order-by attribute (BOOKSTORE3.XSL):

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body bgcolor="ivory">
  <TABLE BORDER="2" CELLPADDING="5">
    <tr>
      <th>Author</th>
      <th>Title</th>
      <th>Price</th>
    </tr>
    <xsl:for-each select="BOOKSTORE/BOOK" order-by="AUTHOR/LAST_NAME">
    <tr>
      <td><xsl:value-of select="TITLE"/></td>
      <td><xsl:value-of select="AUTHOR"/></td>
      <td><xsl:value-of select="PRICE"/></td>
    </tr>
    </xsl:for-each>
  </TABLE>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Nothing too complex here: <xsl:for-each select="BOOKSTORE/BOOK" order-by="AUTHOR/LAST_NAME"> sorts the books using the LAST_NAME element of the AUTHOR element. Sorting numbers is a little more involved, since by default XSL treats them as text. Hence an $11.99 book will appear before a tome costing $8.99. This can be corrected by modifying the PRICE element in our XML file to identify the data as a number, like this:

<PRICE dt:dt="number">$8.99</PRICE>

This also requires that we add a namespace declaration to the root BOOKSTORE element:

<BOOKSTORE xmlns:dt="urn:schemas-microsoft-com:datatypes">

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

Note that the sorting syntax used in IE5 does not comply with the current W3C specification; the current draft will use an <xsl:sort> element. In fact, the XSL specification has been a moving target for some time. As of this writing (February 2000), the W3C just wrapped up a working draft, so you can expect further changes prior to the 1.0 recommendation.

What About CSS?

Just when many of us are getting familiar with CSS, along comes a new technology like XSL. This has many wondering whether CSS will be obsolete. That appears unlikely: XSL, CSS, XML, and HTML will all work together. The examples in this article have showed one common scenario: using XSL to generate HTML on the client. The next step is to format that HTML using CSS, as I’ve done in the following example by embedding the CSS formatting instructions right into the XSL document (BOOKSTORE4.XSL).

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<style>
  body { display: block; background: darkseagreen; color: black;
         font-family: Verdana; font-size: x-small; }
  h1   { display: block; color: white; font-family: Georgia; }
  th   { display: block; background: black; color: white; }
  td   { display: block; margin-bottom: -3em; margin-top:-1em; }
</style>
<body bgcolor="ivory">
. . .

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

This hybrid approach will be very common until XML- and XSL-enabled browsers gain more market penetration. The W3C has note clarifying the different technologies; see http://www.w3.org/TR/NOTE-XSL-and-CSS.external link

Just the Beginning

We’ve only scratched the surface of what’s possible with XSL. Like all XML technologies, it can be extended with scripting languages to create interactive data-driven pages. The next example uses XML, XSL, CSS, and JavaScript to create a page that allows the user to dynamically filter or sort the data. We don’t have space to examine the scripting used in this sample, but hopefully it will get you thinking about some of the possibilities.

Click here to view the XML document. For those readers without an XML processor, click here to view a screen capture of the XML document.

XLink and XPointer

Hyperlinks form the heart of hypertext, and not surprisingly XML brings new linking capabilities. There are two components to XML linking: XLink and XPointer. Both technologies require the usual "work in progress" caveats, and since they’re not even partially implemented in IE5 (or any other web browser) this part of the article will not feature any working examples.

XLink outlines the syntax for defining XML links, and supports a variety of new features:

  • Links that point to multiple topics. These "one-to-many links" are much like ALinks or Related Topics links.
  • Links that work in multiple directions. This is not like clicking "Back" in the browser: a bi-directional link can be traversed in either direction regardless of whether you went the other way first.
  • Links with special behaviors, such as the ability to define where the link appears. You can display the target topic in a new window or embed it beneath the link, similar to the DHMTL "drop-down" effect common in many HTML Help systems.
  • Link databases that store the links for a collection of documents. These "linkbases" provide many possibilities for filtering, sorting, analyzing, and processing links.

XPointer’s job is to provide a method for addressing individual pieces of XML documents. With XPointer it's possible to create links to XML elements, character strings, and other parts of XML documents. This allows you to link to any part of a document even if the author of the target document did not provide an ID (such as "some_target.html#section2"). For example, the following code snippet links to the last code example in a document:

DESCENDANT(-1,EXAMPLE)

Links selected by element type have two advantages. One, because humans typically refer to things by type ("the fourth paragraph," or "second figure"), the links are clear and easy to understand. Secondly, they’re more robust—it’s easier to detect if the link is broken because the document has been edited.

One of the W3C representatives working on XML linking, Tim Bray, had an interesting example that I will pass along. It demonstrates several of the new links available in XML. From the end-user’s perspective, here’s what the link might look like:

Links available in XML

These links might be coded like this (although the exact syntax is certain to change, you’ll get the general idea):

Coding links in XML

Link 1 runs a CGI script and displays an English translation in a new browser window (SHOW="NEW"). Link 2 is a standard web link; the SHOW="REPLACE" code indicates that the document being linked to will replace the current document.

Link 3 and Link 4 demonstrate XPointer’s ability to link to portions of an XML document. Link 3 displays the first figure with the caption "TESUJI." The EMBED attribute means that the figure will appear beneath the link, similar to the DHMTL "drop-down" effect common in many HTML Help systems. Link 4 links to the first three paragraphs following a tag containing the ID def-Tesuji. As you can imagine, these two links are easier to create and are less likely to break as documents are edited.

XLink and XPointer are hopefully nearing completion; see the W3C web site for more information.

XML Authoring Tools

As you might imagine, the authoring tool vendors are actively pursuing the XML market. The XML tool market currently features plenty of offerings at both the high and low end, with not much in the middle. Traditional SGML vendors—such as Interleaf and ArborText—are filling the high-end market niche. The high end also features offerings from many of the content management tools (such as Arbortext Epic and Chrystal Canterbury). Shareware and freeware tools such as XML Spy, XMLwriter, XML Notepad occupy the low end. See http://www.xml.com/pub/pt/3external link for reviews of many of these tools. As XML matures, we can expect to see many of the standard HTML authoring tools add support for XML and fill in the midrange market.

The XML support in Microsoft Office 2000 has garnered attention from many. Office uses XML tags to describe the formatting in Office documents. These tags are used as a common interchange format to preserve the appearance of Office documents during "round-tripping." But Office doesn’t use XML in the traditional sense—to describe the meaning of the data—and should not be confused with a real XML authoring tool.

XML Resources

Hopefully, this article has given you some idea of XML’s capabilities for Help authoring, and provided a basic idea of the technology. Here are further resources for you to study:

Tools

XML.com has reveiwed many XML authoring tools http://www.xml.com/pub/pt/3external link

A few high-end tools are:

Low-end options include:

Documentation

Sample Files

You can download the source code files for all of the examples in this article.


Scott Boggan is co-author of the award-winning Developing Online Help for Windows and a forthcoming book on HTML Help. He is a popular speaker at numerous conferences throughout the world and also teaches through the University of Washington. Scott is principal of HelpCraft (www.helpcraft.comexternal link), a training and consulting company.


up

Copyright © WinWriters. All Rights Reserved. sharon@winwriters.com
Last modified on