Integrating Partner Information Using XML and XSL
By Anne Gentle
BMC Software Inc., a company that writes utility tools for database administrators, wanted to reuse the error messages from partner database companies. Having learned that two of these database companies already used single-source files for their error messages, BMC Software integrated the information about the error messages from the database companies. We accomplished our goal by negotiating with our partner companies for the source files of the error message information. This session discusses how we took those source files and modified them to create simple XML files, then transformed them into HTML using XSL transforms within a BMC Software product.
In software documentation, users often refer to the company’s support web site for more detailed information, especially for error messages. BMC Software is a large database management vendor, and our users often look up error message information on the Oracle Technology Network web site or in the IBM documentation library. While researching online knowledge bases, we discovered a product that directly displayed error message documentation from Oracle rather than referring users to Oracle’s web site. While reading through their online information, we began to wonder about opportunities to reuse Oracle’s well-written documentation. If a third party knowledge base was reusing the information, why couldn’t we? This question began our quest for source files from partner companies.
This project allowed us to broaden our partnerships with other software companies and reuse their documentation, imbedding it into our software console using XML files for the content and XSL transforms to display the information. Our goal was to have our database utility products help customers with troubleshooting tasks by providing detailed corrective actions for error messages.
By using XML tags wrapped around HTML tags, we could reformat the display of the information and embed the information in our products. This feature allows the product to display the information without requiring an Internet connection, username, and password for Oracle’s Technology Network or IBM’s documentation site.
The article is presented through the following sections:
Identifying Reusable Information
One way to identify information that you can reuse is to consult with technical support staff and customers, asking “What information do you repeatedly refer to outside of the documentation set?” Also, identify what web sites your users are reading consistently and determine if any of that content is reusable or searchable. Try to list other software tools that your users consistently work with in their day-to-day activities. Does the documentation for those tools offer users additional value? If the users could skip the Internet connection and find the information embedded in your software product, would they stay in your product’s interface for a longer period of time? These types of questions got us to look at ways to embed additional information that would be useful to our users.
Negotiating for Partner Information
BMC Software has an established partner relationship with Oracle and an employee who is the Oracle business liaison whom we could contact for assistance in finding the right contact person at Oracle. Our business liaison contacted the corresponding liaison at Oracle who put us in touch with the Technical Publications manager in charge of error message documentation. A similar employee structure existed for liaisons to IBM, and we were put in contact with the Technical Publications and Usability Director located in Toronto. While discussing the information, we offered our own documentation if any of their products could re-use it. In this way, a free exchange of information could grow and continue.
Copyright and Contract Details
Once we contacted the correct managers, the partner companies wrote no-fee fair use licensing agreements that stated that we could use their material, formatted in our style, provided that we attribute the partner company. Because this was a new partnership, the licensing agreement was for a limited scope of content and duration, with an option to expand and extend the agreement clearly defined.
To attribute the source company, we include a copyright statement in the output of every file by placing a line in the XSL file. Every time the HTML is generated, a copyright line is displayed.
Examples:Copyright 1996, 2002 Oracle Corporation.
Reprinted by permission from International Business Machines Corporation. Copyright 1993–2002.
Transforming the Information
Nearly all the database documentation for error messages contains a unique identifier, the error message text, the cause of the error, and the corrective action. These elements are available in every piece of error message documentation that we obtained from partner companies. Because of this simple structure, we took a straightforward approach to marking up the source files from the partner companies. We placed XML markers around the existing HTML, which had unique identifiers in the HTML tags. We performed text manipulation such as search and replace until the source file was ready for transformation. In this section, tool selection and source files are discussed.
XML and XSL Tools
The tool selection was shaped in part by the tools that our development teams were using, namely freeware and shareware tools. The source files (SGML, HTML, and XML) were manipulated using TextPad, a shareware tool available from www.textpad.com. TextPad allows you to use regular expressions to search for white space elements such as line breaks. Sometimes it was necessary to use carriage returns as part of a search identifier in order to ensure that the element was separate.
The XSL file was written using TextPad primarily and the processor used for debugging was Instant Saxon, an open source command-line processor downloadable from saxon.sourceforge.net. We also tried MSXML 4.0, a Microsoft XML processor. We found that the Saxon processor had the best error messages for determining which line was incorrect and what was wrong with that line. For final testing and debugging we used XT, an implementation of XSLT in Java, available from www.blnz.com/xt/. Our product is a Java-based console that uses XT for transforms.
We did not fully utilize some trial versions of XSL debugging tools for this project because most of the trial licenses expired during the project’s lifetime. Also, some of the XSL debugging tools kept returning an error in the source file. We found the eXcelon Stylus Studio to be useful for viewing the output and the structure of the input files and for running the transforms in a GUI environment. Overall, the easiest method was to just write the XSL in TextPad and do command-line processing.
Choosing the XML Elements
We wanted a set of XML tags that worked for database messages from any partner company. Fortunately, the messages had common formatting that allowed us to mark up the source files using similar tag names. This common formatting consists of the following elements:
All error messages started with a three-digit identifier, so the first tag that identified the text of the error itself was <NNNMSG> where NNN was the three-digit identifier such as ORA or CLI.
Next, the text of the message is included, which typically was a single sentence of information.
The cause of the error was the next chunk of information, so we used <CAUSE> for that element. The CAUSE element could contain paragraphs or even bulleted lists because it contained HTML tags that could be embedded directly during the XSL transform. The details of the XSL transform are discussed in the next section.
Finally, each error message had a corrective action. This element tag was <ACTION> and again we kept the HTML formatting by embedding any HTML tags within this <ACTION> tag.
SGML Source Files
IBM gave us source files that were SGML files with specific tags for each section. By searching for their <msg> tag, and replacing it with <CLIMSG>, we could tag the source file so that the CLI code would identify the source of the information as being from IBM. This task only required a text editor such as TextPad.
One minor complication with these source files was IBM’s use of variables to refer to other books. To correctly link to the book that the source file referred to, we had to guess at the title, then look it up on IBM’s web site and replace the link with one that would work from the user’s browser.
In this example, the user was referred to the SQL Reference. Fortunately, the variable names were easily deciphered and the links to the books could be found in IBM’s large documentation library.
HTML Source Files
The Oracle source files were HTML files that had been generated from FrameMaker source using Quadralay WebWorks Publisher. Because of the detailed HTML markup, we could search and replace in these files as well. For example, searching for <DT CLASS="M"> revealed all the message text, which could then be replaced with <ORAMSG> to match our tagging scheme.
Unfortunately, however, we found that simple text search and replace was not enough text manipulation for the source files. So, we enlisted the help of a developer who knew perl scripting. He wrote a perl script that could better identify the nuggets of information that we needed and strip out unnecessary tags that were not easy to find using search and replace. For example, Quadralay creates anchor tags with random numbers used to uniquely identify a location. The perl script could strip these out using regular expressions to search through the text.
What? No DTD File?
With simple XML tags and basic HTML tags in the source files, there was no requirement for a Document Type Definition. Basically, when the XSL transform ran, it would stop when it missed an end tag. So, while debugging the XSL, the XML integrity was also tested. The XSL transform engine would report back which line of the XML failed and which tag it was looking for. By inspecting that line and tracing back to the beginning tag, we could figure out where the end tag error was and correct the source file. The XML was validated in this manner. Admittedly, the source files were not always clean, and this method may have been more time-consuming than the effort to validate using a DTD.
Writing XSL Transforms
With the simplicity of our XML tagging consisting of only three XML tags with HTML tags embedded within each XML element, we were able to essentially copy the HTML tags directly into HTML output using XSL. Our current Help system had an existing message template that used a two-row table for displaying the cause and action for each error message. By simply placing table tags around the HTML, we could obtain HTML output that was formatted nicely and fit into our current Help file look-and-feel. The key to this solution was the XSL tag “copy-of.”
Using an exact text copy of the currently selected node, we could copy in the text directly, which included all the HTML tags.
To get one resulting HTML file from an XML file containing thousands of potential HTML output files, we employed a variable method.
When the code requests the transform, the command contains a variable for the message ID for which the user wants to know the cause and corrective action. In this example, xsl-msgid-target is the variable and ORA-00936 is the value.
The product inputs the desired error code as the value that the XSL transform uses to search through the file. Once the matching error code is found, only the hits that match that node are output in the HTML.
The transform looks through the XML source file until it finds the MSGID element that contains “ORA-00936.” Once that node is selected, the transform is run on the message text, cause, and action for that message ID and the resulting HTML file is displayed.
Updating the Content
While negotiating for the source files, we selected the most recent database version as the source for the error documentation. In some cases, we knew that our users would be running an older version of the database, but we wanted the most recent documentation for errors despite a version mismatch. Fortunately, error messages are rarely changed or discarded so the majority of error messages are accurate for whichever version of the database our users are running.
Every time a new version is released, there are updates to the databases and to the error messages. We will negotiate for the updated source files when we determine a target database version for the majority of our users. This updating method means that we will have to redo the text manipulation of the source files for each type of database. However, the XSL transform does not need to be rewritten and is simply in a maintenance state.
Integrating the Information Into the Product
Once the XML and XSL files were in place and the transform into HTML was tested, we worked with the development team to determine how to present this information to the user. We wanted to help the user as soon as an error occurred. Because the product console already had a Messages window, we decided to include links from the Messages window to the error messages.
Figure 1: Messages window for product console
When the user clicks the blue message text, the XSL transform occurs and the product displays the HTML file in a browser window.
Figure 2: Resulting HTML display
The transform occurs when the product passes the message ID into the appropriate XML file.
Challenges in Implementation
We discovered several scenarios in our products that did not have an immediate solution. For example, a database could return three errors at once. The product listed all three errors in one line in the Messages window, rather than one error per line. In this scenario, the first message was the only one that was linked to an HTML page.
We also learned that some product teams were capturing database error messages and never displaying them to the user. Instead, a customized error message relating more directly to the product was displayed. In many ways, the more customized product error message helps the user uncover the corrective action more quickly than a database error message.
Start Up Costs
Obtaining the source files and manipulating the text took about ten hours of a writer’s time per database type. Tracking down the appropriate contacts from Oracle and IBM took a manager approximately forty hours of time per company. For the Oracle source files, we required approximately eight hours of a developer’s time to write the perl scripts and debug them.
Writing the XSL transforms took time to learn XSL in addition to debugging the XSL. At the beginning of this project, we did not know XSL and learned by using examples in XSLT programming books and on the Internet. When we got stuck, we would ask developers on the team for assistance. It took about 40 hours of time to learn enough XSL to accomplish the transforms as well as debug the XSL transforms for this project. An experienced XSL programmer would have taken much less time to write the transforms.
Because this design was going into a console application used by many product teams, we worked closely with usability and product engineering to insure seamless integration and adoption. While the current structure and organization of these messages meets our basic needs, we are still refining our implementation requirements.
Future Implementation Goals
We need to add error checking to our implementation. The way the XSL is currently implemented, there is no error checking in the transform itself. If an incorrect error code is input as the variable, the program itself has to do the error checking to make sure that the HTML that is returned is valid. In the future we may rewrite the XSL so that if the error code isn't found in the XML, an HTML page explaining that the error wasn't found could be returned.
For additional reuse of the database messages, we would like for the user to be able to enter a database message ID in a web form and search for a specific message. The launch of this search feature should be from the Help menu. Ideally, a separate search would be used for each database.
We are investigating ways to expand adoption of this technology inside of our products. We also intend to pursue obtaining source files for additional database error messages and other reference material.
The Bottom Line
By reusing content from partner companies, we gained access to detailed information for over 7,000 Oracle error messages and over 2,500 IBM error messages. We can offer additional information to our users for nearly 10,000 database error messages that might occur while using our products.
This project was successful due to partnering both externally with technical publications departments in other companies and partnering internally with our engineering and quality assurance groups. We improved our documentation team’s knowledge of single-source technology and broadened our abilities by learning XSL. In doing so, we delivered relevant, reusable information directly to our users with few hours, few people resources, and no monetary investment.
(1) Kay, Michael. XSLT Programmer’s Reference. 2nd ed. Wrox Press, 2001.
(2) Mangano, Sal. XSLT Cookbook. O’Reilly & Associates, 2002.
(3) XPath Expression Syntax Reference Page. Retrieved 28 January 2004 saxon.sourceforge.net/saxon6.5.3/ expressions.html
Anne Gentle completed her master's degree at Miami University in Technical and Scientific Communication (MTSC) in December 1995. She is a senior STC member who has volunteered as the Advisor for the Miami University Student STC Chapter and was awarded a Distinguished Chapter Service Award for her service there. She has also been the hospitality chair for both the Southwest Ohio Chapter and the Austin Chapter of the STC. She has worked as a technical communicator for more than eight years and has a special interest in using technology in interesting ways for communication purposes.