by Jeff Allen |
![]()
There has been much discussion on the topic of Controlled Language (CL) in the past issues of TC-Forum. With several years of experience as a translator, as a trainer of Controlled English writing and translation post-editing, and as a developer of Machine Translation (MT) and Translation Memory (TM) systems, I would like to clarify some points that do not seem to have been presented in other articles. These points do not indicate all of the details of possible CL systems, but I hope that they open up the discussion to cover both past and recent developments in CL system and application research and development.
Limited vocabulary CLs
Caterpillar developed Caterpillar Fundamental English (CFE) as a restricted vocabulary of a total of some 850 words back in the early 1970s as a way of simplifying their version of technical English so that non-native English speaking clients could read the documents more easily. This is similar to the work of Odgen's Basic English back in the 1930s (see http://web.marshallnet.com/~manor/basiceng/ramble.html) [broken link as of 13 JAN 00 - AvO].
Boeing and the other airline industries have built upon the original work of Caterpillar's CFE. Emphasis is placed on creating a core of lexical items that can be used throughout the document. A certain number of general technical writing rules (e.g. write short sentences) are also promoted, but strict enforcement of the grammatical rules is not usual. If a conformance checker is used, it mainly checks for adherence to vocabulary items rather than overall grammatical structure.
Extended vocabulary CL grammar conformance checker
Conformance checkers are the new wave of CL writing. The Simplified English Checker/Corrector (SECC) project was completed in 1994 and resulted in the creation of a basic conformance checker. It checks for grammatical structures that do not conform to SE examples. It is interactive in that it indicates where deviance occurs in the CL writing sample. In addition, Caterpillar's more recently developed Caterpillar Technical English (CTE), launched in the early 1990s, is quite different from the original CFE.
CTE started with a reduced vocabulary (8 000 general terms and 50 000 technical terms selected from a total of approximately 1 million terms) and a set number of constrained syntactic constructions in English that can be mapped into about 10 other languages. As indicated in a recent article on the subject (Kamprath et al., 1998), new technical terms are constantly being added to the CTE database for approval and then are submitted to human translators who then provide translations in their respective languages and add them to the multilingual database. The current number of English technical terms is approximately 70 000.
The objective of CTE is for better standardization of English terminology, better comprehension of the English documentation by native and non-native English readers and more easily facilitated translation into 13 target languages (both by MT *AND* human translation processes). So the goal of CTE is quite different from that of its predecessor.
Stop-and-Go Authoring
Most CL authoring systems today are called "Stop-and-Go" or "Red light / Green light" systems. The author works on the entire text and then submits it to the conformance checker. The checker then goes through each sentence one at a time and notifies the author of potential spelling mistakes, ambiguity pitfalls for translation, etc.
Interactive Authoring Systems
Some research is being done (Hartley and Paris, in preparation) on the development of interactive authoring systems that could assist authors who are writing technical texts, similar to how Computer-Aided Translation (CAT) tools assist a human translator to produce the target translation of a source text.
Destined or Chosen for Translation
I make a distinction between texts that are "destined for translation" (i.e. it has been decided before writing starts that the original will be translated), and texts that are "chosen for translation" after writing the source text. When a company such as Caterpillar or General Motors decides that all manuals that are produced are destined to be translated from the very inception of the document, it is easier to persuade management and technical authoring staff to implement writing principles that will improve translatability of the texts.
If a text is meant to be produced and read only in the source language (e.g., the Starr Report and Clinton Rebuttal), yet someone decides to take such a document and feed it through an MT system, as did AltaVista using the Systran Babelfish on-line translation service (Alberganti, 1998), the resulting text will most likely be quite unsatisfactory because the text was not written with the intent that it would be translated, and especially not by a machine. The objective of CL applications for technical writing is to foresee the need of document translation, and to create structural paradigms that allow a computational system to optimally retrieve equivalents in the target language for texts written in a controlled source language.
Different Types of Translation Systems
I would be willing to discuss in a future issue the different types of translation systems (i.e. Fully Automated Machine Translation, Machine Assisted Human Translation, Human Assisted Machine Translation, etc.) and how CLshave contributed to the evolution of these translation strategies.
Conclusion
Controlled Language is not a single, immutable entity. It has evolved over the decades and has taken form in different applications and for different purposes. Many companies haven taken the general concept and then customized it within their own environments to make it profitable for their specific needs. It is only now in the late 1990s that the different CL players are starting to work together by forming the National Consortium to Advance Controlled Language and Computer-Aided Translation Tools (NCCAT). Their focus is to create general CL and training principles that will allow for cross-industry standards in this emerging field.