Home Previous Tools Next Previous 3-99 (September 1999) Next
by Stefan Freisler

TO12: Word as HTML/XML/SGML-Editor

by using the MarkupKit 1.1 of SCHEMA

As insiders very well know, the HTML-Export integrated in Word97 produces only poor results. The generated HTML contains a number of disturbing HTML-tags making further processing difficult. Several Macro-programmers and suppliers of software have picked up this issue and offer different converters promising to solve the problem.

Most conversions are based upon the idea to produce a true copy of the print-layout, if possible automatically, to prevent users from having to put time and effort in configuring and structuring afterwards. That the latter is a precondition to receive a sensible conversion is self-evident, when one looks at the ‘fantastic’ outcome of such systems.

Information which is not included in a document will not suddenly appear by converting automatically, and announcements will not change this.

 
The Different Approach

Our approach is based on the fact that authors producing larger texts usually structure their documents by paragraph-styles and character-styles, which are analysed by the program. This enables the user to produce, through the configuration of the converter, syntactically correct and ‘clean’ HTML, XML and even SGML.

In practice

When our MarkupKit is installed, a new type of file "MarkupKit Document" is integrated in the File-save-dialogue of Word 97. This happens when the files for configuration are adopted to the existing formats, and the user selects "file-save as-MarkupKit Document". The process is then completed invisibly behind the scenes.

The advantage of this close integration lies not only in the ease of use but in the possibility to start up the converter via visual basic applications, and thus allow for further integration into customised software systems.

Easy configuration

To configure MarkupKit you define a so called "Mapping" in XML-syntax for every paragraph style, character style and special character used in Word. That is, if you have defined a paragraph style in Word as "body text" the mapping would be defined as follows:

<MAP TAGNAME="P" PARSTYLENAME="BODY TEXT">

</MAP>

What happens is that the name of the paragraph style (PARSTYLENAME="BODY TEXT") is assigned to an HTML-Tag (TAGNAME="P"). That’s all !

The MarkupKit comprises, in addition to the converter-plug-in for Word 97, a command-line version allowing for the conversion of RTF-files from other word-processing systems.

 
Features of conversion

The following features are processed by MarkupKit, supplementary to paragraph styles and character styles:

MarkupKit is able not only to transform referenced images into the right image references, i.e. for HTML, but also to convert automatically inserted word-graphic-objects into different graphic formats. For this purpose MarkupKit is equipped with a graphic-converter which allows for converting into several graphic formats.

Conclusion

MarkupKit is a practical and reliable converting tool. If you have some knowledge in HTML and if you frequently have to convert a considerable number of texts, then MarkupKit displays its strengths.

MarkupKit was not designed for competing with a converter trying to transform complete Word-documents into a visually demanding HTML, but as a converter for the transformation of contents and structures of Word-documents into syntactically perfect markup language of any kind.

 

© TC Forum 1998-2001 - http://www.tc-forum.org - file last updated 17 Oct 1999
"transline Deutschland - Übersetzungsdienst für technische Übersetzung"
Web design by "Alexander von Obert"