Coding xml


Constrained authoring in XML

Posted by admin at 23 April , 2009

The choice of clear text for the storage of XML documents, rather than any binary format, is important, if controversial for some. A binary solution would have been more efficient from a machine-processing point of view and would allow much smaller files. Having XML documents in plain text, however, means that users can intervene manually to examine or edit them if they wish. Keeping both the content and the containing markup as plain text allows users and programs alike to identify easily not only the structures foreseen by an author but also the intention of those structures.

In reality, many XML systems – whether processing or storing XML – use proprietary binary formats internally to improving operational efficiency. XML in its native plain text form is most useful when exchanging information between systems.

As we saw in the opening chapter, there is no longer a premium on ‘terseness’, so the added verbosity of XML is not a problem from the point of view of storage. Literally ‘spelling things out’ is the key to XML’s role in interoperability between information systems.

Posted in: Uncategorized | No Comments »

Posted by admin at 20 April , 2009

Much information content is in fact quite tightly structured, although often those structures may be only implicit in the manner that authors prepare their content. As we saw in the example of e-mail in the previous chapter, some structures are more explicit – often necessary for the automatic processing of the content or to improve presentation and management.

In order to ensure processability, and underline the separations of content, form and presentation, it is useful to force authors to respect agreed structures at the point of creation. In this manner, authors are required to place all content within appropriate markup tags. This requires both that these structures are defined and that they are available to authors in a way that they can conveniently use.

Although it is possible to write an XML document directly – as can be done also with HTML – authoring tools have inevitably emerged to help the user concentrate more on content and less on defining the actual tags. Unfortunately, however, many XMLcompliant applications still make such structured authoring look too much like the formfilling front-end of a database and not enough like authoring tools. Further, what authoring functions that do exist are more concerned with presentational constructs than with the business of providing relevant structures to a text. That the structures are agreed is important. Users could be constrained to use a structure defined by someone else, as we saw with the example of e-mail, but XML offers more. It is concerned with allowing users, user communities and enterprises to develop and agree shared structures to be used for given types of information and text.

Posted in: Uncategorized | No Comments »

Posted by admin at 19 April , 2009

I remember my very first text processor. It was a small program that ran on a VAX750 minicomputer, and it very literally ‘processed’ my text. Three stages had to be rigorously followed:

  • Firstly, I would write my content in a simple text editor, oblivious to format, layout or pagination
  • Secondly, I would annotate or ‘mark up’ my text with processing instructions such as ‘from here: in bold’ and ‘from here: end bold’ or ‘define a page header’, and save the file
  • Finally, this ‘source file’ would be processed – or ‘parsed’ – through the dedicated word-processing batch program on the computer, generating a formatted output file designed to be passed to a printer.

The principal drawbacks of this rather primitive system are obviously a very limited control over presentation and the difficulty of predicting results. However, this supposedly primitive approach embodies one of the central principles and strengths of XML: that content, structure and presentation are three distinct facets of information management and must be treated as such.

The output from this early 1980s system reflected a limitation of the printing and processing possibilities of the time, rather than being conceptually flawed. The same cannot be said of many popular, so called ‘WYSIWYG’ (‘what you see is what you get’) word-processing systems, where presentation and content are entirely mixed, with few
opportunities to identify underlying information structure. Text is not simply a stream of words, even if it might seem so to many authors and readers alike. There are structures, from the simplest titles and paragraphs, which are
presented to the reader or user in a particular way. This structure will normally precede the actual content, even if only implicitly.

Posted in: Uncategorized | No Comments »