An Association News
Technical Review
 

XML and HTML's future
by Simon St. Laurent
XML Editor

The hype surrounding XML has touted it largely as a replacement for HTML. Despite that hype, HTML is alive and well (for now), and XML is finding use mostly in the background while direct browser support for XML is slowly being added. XML's relationship to the future of the Web became clearer when it was more or less declared the successor to HTML.

In April 1998, the W3C held a "Future of HTML Workshop" in San Jose to determine the future direction of the ubiquitous markup language. A few months after that conference, the W3C HTML activity statement changed to reflect that HTML 4.0 is the end of the line for old-style HTML. As the activity statement put it, "The proposed way forward was to make a fresh start with the next generation of HTML, based upon a suite of XML tag-sets."

Designers and developers hungry for more tools to create pages and applications that fit their specifications precisely may be surprised to hear it, but HTML has grown too big and unwieldy. A forest of tags makes it very difficult to design new browsers, causing problems for the current browser development firms and making it difficult for new firms to enter the market. Worse yet, HTML browsers were designed early on to forgive user's mistakes, making it easier for neophytes to enter but harder for browsers to interpret code. The browser developers were forced not only to keep up with the standard, but with their competitors' interpretations of 'broken' syntax. As development of Dynamic HTML has progressed, the broken syntax traditions of the past have made it harder for programmers to figure out how to interpret a document for manipulation with scripts.

The answer to the problem seems to lie in a combination of discipline and division of labor. XML imposes much stricter discipline on documents, requiring end-tags, the use of the empty tag ( instead of ) to indicate elements without content, and clean nesting of elements. This makes it much, much easier for parsers to determine the structure of a document, instead of requiring parsers to read a stream and make best guesses about where the end of an element should be. The division of labor separates the content from the presentation. Using style sheets, either Cascading Style Sheets (CSS) or the upcoming Extensible Style Language (XSL), designers can build extremely fine-tuned pages designed to look good on screen or in print, and reuse the style sheets as needed.

Designers and developers accustomed to the current order are probably going to be fairly unhappy, at least at first. Because of limited browser support and overlapping functionality with HTML, style sheets remain fairly unknown and under-used. The discipline imposed is going to cause more serious problems. Adding a '/' to empty tags isn't very difficult, but much 'good' HTML is going to need some serious cleaning. Even code generated by some tools has been built strangely, so it'll be a long while before the entire Web is XML-approved. (It's certainly an opportunity for tool builders!) Developers who have built server-side scripts are in for a new round of debugging, though fixing the HTML should at least be simpler than fixing logic problems.

Future HTML development will focus on creating a modular HTML, one better suited to the needs of smaller devices and applications, while still capable (when all the modules are in use) of providing support for documents as complex as current HTML. It will also be easier to add new modules, like the recently approved MathML and SMIL (the Synchronized Multimedia Integration Language) to applications that need them. It looks like the current HTML will be broken down into parts: a core module, a module for tables, a module for forms, and a module for multimedia. These divisions, of course, may change.

The activity statement states: "The next generation of HTML will concentrate on structuring data rather than attending to the nuances of presentation and layout." This will make HTML information much friendlier to databases and other content-focused applications, and make it possible to greatly improve models for forms (finally) and tables.

When will these changes take place? The W3C is planning that "The working group is expected to last for 18 months, starting from Summer 1998." The browser vendors are still preparing basic XML support for their products, and the transition is likely to be a long one. At the end, the Web will look the same to most users, though it may be more convenient. For designers and developers, though, the entire landscape is changing. Planning ahead by exploring XML and style sheets now can make the new landscape look more like a construction site and less like a bombing target.

Simon St. Laurent is a web developer, network administrator, computer book author, and XML troublemaker working in Ithaca, NY.  His books include XML: A Primer, Dynamic HTML: A Primer, and Cookies.

[Back to Technical]