An XML Architecture for Technical Documentation: The Darwin Information Typing Architecture
DITA is an architecture for creating topic-oriented, information-typed content that can be reused and single-sourced in a variety of ways. It is also an architecture for creating new information types and describing new information domains, allowing groups to create very specific, targeted document type definitions using a process called specialization, while at the same time reusing common output transforms and design rules. We discuss several methods that can be used to extend DITA's basic topic types.
Since 1999 IBM has been working on the move from SGML-based and HTML-based authoring systems to an XML-based authoring system for hypertext topic-oriented information. This move was motivated by XML initiatives from the W3C, internal work on topic-oriented architectures and information-typing, and a need for an authoring system with a low barrier to entry to begin using the system. We have described the details behind IBM's Darwin Information Typing Architecture (DITA) in previous publications (, , , , ). In this paper we continue to provide basics on the evolving roles and responsibilities of an authoring community related to DITA, provide more details on the extensible nature of DITA, and provide information about how to get started with DITA.
As with all current workhorse documentation solutions, truly the Darwin Information Typing Architecture has been built on the shoulders of giants.
Among the many kinds of document markup languages used in early text processing systems, IBM contributed the concept of generalized markup languages that was formalized in 1986 in the original ISO specification of the Standard Generalized Markup Language, SGML. Through the early 1990s, a number of major SGML markup languages and applications were created, primarily for various companies and governments. At this time, IBM created its own implementation, IBMIDDoc, which stands for "IBM Information Development Document type." For the past decade, this DTD (Document Type Definition) and its internal workbench (including editors, tools, and interfaces) have been the mainstay of IBM's delivery of product information in a multitude of national languages.
With the advent of the World Wide Web in the early 1990s and the initial specification of XML in 1998, IBM's ID community began an internal workgroup to evaluate XML and recommend its future use. The result of this activity was a topic-oriented design that consisted of an extensible core language; it was named the Darwin Information Typing Architecture to acknowledge 1) its dependence on principles of specialization and inheritance, 2) its incorporation of current content typing methodologies, and 3) its processing architecture, which is scalable to any number of delivery needs or aggregating principles.
Principles of DITA's design also reflect the legacy of major influences in the world of information systems. An early design principle was to borrow where possible on tag names that would be familiar to writers who had authored with HTML. Similarly, wherever there were tag names that were familiar to those who had authored in IBMIDDoc, these also were favored. The terminology associated with DITA's topic-orientation and information typing design comes from current best practices from research and academia. And DITA's processing model is based entirely on World Wide Web Consortium technologies, primarily XSLT, the standard transformation language, meaning that both the authoring and the production of deliverables can be based on standard function and easily available tools.
This architecture was initially published in the spring of 2000 on the IBM developerWorks™ site as a report and accompanying toolkit. Since then, IBM's teams have conducted usability sessions to improve the design, and used the DTDs in both prototypes and beta authoring systems for actual deliverables. Results of these activities, along with an updated specification on "domain specialization," were published in the spring of 2002. IBM's XML team continues to encourage the exploration and use of DITA through public forums, articles, and consultation.
In May 2004, a major milestone in DITA development was reached. At that time, IBM contributed DITA and its accompanying document models and schemas, to the Organization for the Advancement of Structured Information Standards, an international consortium that drives the development, convergence, and adoptions of e-business (mostly XML) standards. OASIS then formed a DITA Technical Committee (TC) to further the development of DITA as an open standard. Currently the DITA TC is working toward releasing Draft 1.0 of an official [open] DITA Specification sometime during the winter of 2004. This draft will be based substantially on the materials contributed by IBM to OASIS.
The Promise of XML
Like many others, we were excited by the promise of XML:
As we looked closer, however, we discovered problems with the promises. When XML promises single sourcing, it's despite XML languages that enshrine media-specific constructs such as chapters in books or screens in presentations. When XML promises smart content that can generate customized views and intelligent search, it's in the face of standard XML languages that only know about paragraphs and lists. When XML promises interchangeability, it's limited to those who accept a common-denominator standard like XHTML.
Simply choosing XML doesn't in itself deliver any of the touted benefits. In fact, the second and third promises – smart content and interchangeability – appear to be in fundamental conflict. In sum, the more useful your markup is to you, the more it will cost you, and the fewer people share that cost.
In designing DITA as an architecture, we took aim at all three of the key benefits to see if we could assemble an XML solution that could bypass this traditional tradeoff. There are three separate areas that needed to be addressed:
Fixing the content
While XML makes a big deal about separating form from content, media differences aren't just about fonts and page breaks: they're about how you structure content as well. If you turn a book into a set of Web pages, it's still going to read like a book, with all the standard limitations of a book, just transposed to a medium where they aren't necessary. As long as you author in media-specific structures, you'll be dragging that first medium's assumptions along with you to every new output form you attempt.
So what does media-neutral content look like? It focuses on tasks and concepts, not on chapters and appendixes. It follows the same basic information design principles that have informed good manual design and good online design for decades: task orientation, minimalism, and scenario-based development. If you author tasks and concepts, rather than sections and paragraphs, you have the makings of a topic collection that can be reordered for different needs, supporting different task flows for different users, and supporting different reading paths for different media. You don't have to add conditional processing directives to your source, and you can have truly descriptive markup that applies regardless of medium. You can read more about this in .
Fixing the context
DITA maps, which organize collections of topics, allow users to separate content from context. Using a map structure, rather than embedded links, to drive topic organization allows reuse and repurposing of topics in substantially different contexts and automatic context-dependent link generation. For more information, see .
Maps can be used in a number of ways:
While any DITA map can produce a printable PDF, a formal book model needs more bibliographic content and a more traditional book presentation. Such books can reuse topics provided as part of online help or other information systems. For traditional books, the public DITA toolkit provides the bookmap specialization of the DITA map, a basic implementation that can be built on for a sophisticated book solution.
Fixing the design
Two of the key benefits – smart content and interchangeability – require a change in the way we design XML documents. Traditionally, people have either created their own DTD, used a standard one, or customized a standard one.
DITA avoids the tradeoff by using a technique called specialization, which applies the well-established principles of inheritance and polymorphism from object-oriented design and applies them to the world of information design. This lets you create new designs by specializing existing designs. At the end of it, you get a customized solution that is still compatible with the existing infrastructure. You can customize as much as you want without breaking the infrastructure, and without compromising interchangeability. In other words, you can make your markup specific to your content and business rules, and still get the benefits of using a standard language.
The two main principles of specialization are modularization and inheritance: break your design into modules based on information type (such as concept and task) and domain (such as programming and user interfaces). Then map the modules into a hierarchy, so you can say, for example, that both "concepts" and "tasks" are kinds of topics.
Anecdotally, we've also found specialization to be a lot faster than traditional XML development. The example specialization in  took only one or two days to develop and test, as opposed to the months it would have taken using more traditional methods.
Fixing the Processing
Specialization lets us get output from newly designed content immediately, because the processes designed for higher-level information types apply by default to new, specialized information types. However, when the existing treatment isn't exactly what you want, you can modularize and override your processing just as you did for your design. This lets you get specific processing when you need it, again without compromising reusability or interchangeability.
Delivering on the promise
Following the principles of information typing and specialization in DITA, you can create task-oriented, audience-oriented information that is reusable, useful, and standardized, all at the same time. For a more in-depth discussion of these three areas, see . The next sections of the paper discuss how you can implement these principles in a writing team, and discuss how you can use specialization with other extension techniques to customize DITA for your needs.
Working with DITA On a Team
Now that we've discussed what uses DITA is designed to support, let's look at how its use affects a team, and the responsibilities and rewards for each of the roles on the team.The roles and responsibilities related to DITA include the following:
The following table describes the responsibilities and rewards associated with each of these roles.
A team that wants to extend DITA has several options. We've discussed how specialization lets you create new information types and domains, but it is only one of the ways to get the fit you need, as follows:
Compare these three approaches with designing a new DTD from scratch (entries in brackets are optional):
By contrast, if you want specialized markup and behavior without using DITA, you must design from scratch:
Before working with DITA, you need to set up a publishing environment. You'll need a minimum of an XML editor, an XSLT processor, and the DITA package.
This section lists some typical tools in each area. To check the full range of options, you might want to visit a Web site for XML resources such as XML Software:
Installing an Editor
In most scenarios, you'll need a text editor to create your XML documents. If you have the gift of typing ever-valid XML, you can use your favorite text editor such as vi or Microsoft® Notepad. Otherwise, you might want to get a validating XML editor or editing solution such as one of the following:
Each of these editors has its own installation program.
In addition, the FrameMaker-DITA user's group has developed enablement for DITA within FrameMaker. To find out more, join the group at:
Installing an XSLT Processor
The XSLT processor turns your XML into HTML and other formats.
Some popular XSLT processors are Java™ programs. If you use one of these XSLT processors, you'll need to install a Java runtime (which can be downloaded for free from IBM or Sun):
Some versions of Java come with an installation program.
Some popular Java-based XSLT processors are:
To install these programs, you unzip the packages and add one or more jar files to the CLASSPATH variable in the environment for your system. Look in the subdirectories of the unzipped packages to find the documentation that explains which jar files are required.
If you are using a Linux system, you might be interested in a native Linux XSLT processor:
The final layer in this foundation is the release for the DITA architecture:
Note: by the time you read this paper, there may be a more recent package. Check the XML Cover page for DITA to find out the details. You'll unzip the DITA package and reference the DTD and XSLT files in the subdirectories.
In your topic documents, use a SYSTEM document type (or an XML catalog) to point to one of the installed DITA DTDs. To transform your topic documents, use the appropriate command line for your XSLT processor to apply one of the DITA XSLT scripts to your document. That's all there is to it.
Building output with Ant
Once you start transforming XML to output formats regularly, you might want to consider using a build manager. Such tools take over the repetitive manual tasks of building output, making it easier to generate output and, at the same time, improving reliability by eliminating the manual mistakes.
In the Java environment, the standard build manager is the Ant tool:
You can use Ant to build output from your DITA source files. To learn more, look at the DITA-ant.html file in the doc directory of the DITA distribution.
Enterprise Publishing Tools
What you've installed so far is enough to experiment with or even to use productively in a small group. If you're working with a large publishing organization, however, you'll want to install a CMS (content management system).
Several commercial CMS solutions support generic XML or DITA in particular. For instance, many of the vendors providing the XML editors listed earlier also provide either a CMS or CMS integration. Here is an additional CMS vendor:
For Open Source content management systems, you might check out the Apache Lenya project or another CMS listed on the XML Software site at the start of this section.
DITA delivers on the promise of XML by focusing on fixing the content, context, design, and processing problems in generic XML. Using DITA on a production team involves creating new roles and responsibilities on the technical writing team. In addition, a key attribute of DITA is its methods of extensibility, each with its own specific costs and benefits. A strength of DITA is its reliance on XML standard tooling to get started.
Interest in DITA has increased rapidly following the founding of the OASIS TC. The imminent release of DITA 1.0 will give DITA users a stable base to build on and a substantially improved public toolkit. Through a standard architecture designed for reuse and extensibility, users can both share the benefits of DITA now and help to shape DITA's future.
 Priestley, Michael. Specializing topic types in DITA. http://www.ibm.com/developerworks/xml/library/x-dita2/
 Hennum, Erik. Specializing domains in DITA. http://www.ibm.com/developerworks/xml/library/x-dita5/
 Priestley, M., Hargis, G., and Carpenter, S. (2001) DITA: An XML-based Technical Documentation Authoring and Publishing Architecture. Technical Communication, Technical Communication, Volume 48, No.3, p.352--367.
 Schell, D.A., Priestley, M., Day, D.R., Hunt, J. Status and directions of XML in technical documentation in IBM: DITA. Conference proceedings, Make IT Easy 2001 http://xml.coverpages.org/IBM-Easy2001-dita-1819.pdf
 Priestley, M., and Schell, D.A. (2002). Specialization in DITA: Technology, process, and policy. Proceedings of the 20th annual international conference on Computer documentation. ACM: SIGDOC.
 Hennum, E., Priestley, M. and Schell, D.A. (2002). Specialization in DITA. Extreme Markup Languages 2002 conference proceedings.
 Hennum, E., Day, D., Hunt, J., and Schell, D.A. (2004). Design patterns for information architecture with DITA map domains: Defining a type for collections of topics
XML Cover page about DITA: http://xml.coverpages.org/dita.html
DITA user's group: http://groups.yahoo.com/group/dita-users/
OASIS DITA Technical Committee http://www.oasis-open.org/committees/dita
Main developerWorks site: http://www.ibm.com/developerworks/xml/library/x-dita1/
DITA DTDs and transforms: http://www-106.ibm.com/developerworks/xml/library/x-dita6/x-dita_downloads.html
Trademarks: IBM, developerWorks, and Lotus are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft is a registered trademark of Microsoft Corporation in the United States, other countries, or both. Other company, product or service names may be trademarks or service marks of others.