MT Summit VIII: Paper type U
Ape: Reducing the Monkey Business in Post-Editing by Automating the Task Intelligently
Claus Povlsen and Annelise Bech*
Center for Sprogteknologi
Njalsgade 80, DK-2300 Copenhagen S
* Lingtech A/S
DK-1620 Copenhagen V
For a professional user of MT, quality, performance and cost efficiency are critical. It is therefore surprising that only little attention – both in theory and in practice - has been given to the task of post-editing machine translated texts. This paper will focus on this important user aspect and demonstrate that substantial cuts in time and effort can be achieved by implementing intelligent automatic tools. Our point of departure is the PaTrans MT-system, developed by CST and used by the Danish translation company Lingtech. An intelligent post-editing facility, Ape, has been developed and added to the system. We will outline and discuss this mechanism and its positive effects on the output. The underlying idea of the intelligent post-editing facility is to exploit the lexical and grammatical knowledge already present in the MT-system’s linguistic components. Conceptually, our approach is general, although its implementation remains system specific. Surveys of post-editor satisfaction and cost-efficiency improvements, as well as a quantitative, benchmark-based evaluation of the effect of Ape demonstrate the success of the approach and encourage further development.
Keywords: Machine translation, user aspects, post-editing.
Machine translation is now a serious alternative to manual translation. Many organisations and businesses employ MT-systems and for various purposes. Some use MT-systems for information purposes (gisting of material in “exotic” languages), some as a basis for decision-making (as to which documents to provide high-quality translations of) and some use machine translation in their production (providing an output to be post-edited and finalised before “publication”/delivery to the client).
Lingtech, a professional translation company in Copenhagen, Denmark is one of the pioneers in using MT in its production of high-quality technical translations. Since 1993, Lingtech has used PaTrans to translate technical texts, primarily patents, and currently some 3.5 million words are run through the system every year.
The output from the MT-system is post-edited and finalised by Lingtech staff before a final translation is delivered to the client. Thus, Lingtech has a considerable interest in the time and money spent on this task.
(Previously, Lingtech has reported on improved cost-benefit from providing ancillary tools and setting up a suitable work-flow for preparing texts for machine translation and extending dictionaries (Bech, 1997)).
From the user point of view, steps to improve and reduce the post-editing workload is not only an issue of profit maximisation, although it is obviously an incentive, which cannot be neglected in the business context. The quality of the final product as well as the ergonomy and job-satisfaction of the post-editors are also at stake.
Over the years, Lingtech has carefully and systematically monitored and registered trouble-areas in relation to the so-called peripheral tasks of pre- and post-editing. In close collaboration with the development team at Center for Sprogteknologi, Lingtech has strived to continually automate and facilitate the tasks involved.
An interesting recent development is the introduction of an intelligent strategy for minimising the burden of post-editing. A specific programme component, Ape, has been added to the system, which exploits the linguistic knowledge of PaTrans and the grammatical information in the translated text to correct a number of “mistakes”, which would otherwise have had to be dealt with in the manual post-editing.
In the next sections, we will set the context and outline the fundamentals of the intelligent post-editor.
PaTrans and Post-Editing
PaTrans is a full-automatic, transfer-based machine translation system. The user prepares texts for translation, codes dictionary entries necessary and then submits the text for translation. There is no interaction with the system in the translation process; the user takes over again when the system has finished and produced a translated text. The user’s task is then to post-edit the output in order to produce a flawless finalised version.
Based on the Eurotra linguistic model, PaTrans has an analysis component (source grammar and lexicon), a transfer component, and a synthesis component (target grammar and lexicon). The linguistic strategy involves processing the input text such that surface neutral representations of the input sentences are produced in terms of word order and function words. The linguistic representation is an argument and complement ordered structure with valency-bound lexical items and function word encoded as information on the relevant nodes. The representation(s) of sentences are transferred to the synthesis component, which produces a surface sentence in the target language.
In order to operate in practice, various robustness features have been implemented in PaTrans such that the system always produces a translation of an input sentence. Input sentences that for various reasons cannot be processed correctly or completely by the linguistic components proper are treated by the failsoft component of the system.
Although the PaTrans system produces output of a quite high quality, the user is inadvertently - as with any MT-system - faced with output that needs to be post-edited.
From the commercial point of view, the time and effort required for getting from translated output to a final version of the translation is crucial. The less post-editing is required or the easier it is to correct flaws, the better the cost-efficiency of the whole operation.
In the case of PaTrans, the output consists of both well-formed translations of sentences according to the linguistic components of the system and of failsofted sentences. For the post-editor, the failsofted output typically requires the greatest attention and effort. These parts of the text often present the post-editor with hard-to-understand sentences or muddled-up word order; that is the more tedious and time-consuming (and hence costly) aspect of post-editing. Not surprisingly, in a controlled survey of post-editing problems conducted by Lingtech amongst our staff, they ranked word order problems as the number one irritation factor in post-editing.
With this in mind, we consequently set out to remedy the situation, the result of which is the intelligent post-editing component Ape.
Ape: Basic Idea and Functionality
The quality in terms of correct word order in failsofted sentences depends to a large degree on where in the PaTrans translation process the failsoft mechanism has been activated. The lowest translation quality is achieved if it takes place after the argument structure representation of the input sentence has been generated.
After the linguistic processing has finished Ape traverses through the translation results generated and in case that they have been tagged as failsofted, the translation results are sorted based on the indices of the content words in the source input sentence. This index information is assigned to the words in the input sentence and preserved through the processing flow.
One could easily think of languages in which the idea of using the original word order as a knowledge source for sorting the translation results would be inappropriate. The relatively high degree of similarity between English and Danish led to the assumption that implementation of the reordering idea would result in significantly improved translation quality. Various differences between English and Danish word order can, however, be observed. The positioning of adverbs is one example:
'Unfortunately, the secondary chamber is injection moulded in two halves which has caused several problems'
'Uheldigvis sprøjtestøbes det sekundære kammer i to halvdele, hvilket har forårsaget flere problemer.'
The example illustrates the inversion phenomenon in Danish. Whenever i.a. an adverb is topicalised, the word order (unlike English) changes so that the finite verb in (main clauses) precedes the subject of the sentence. Even though it would be possible to treat this difference between English and Danish, it was considered to be a minor problem and left out in the first version of the Ape program.
The overall sorting algorithm in the current version of Ape is thus the following:
If the sentence is failsofted
if the indexing is out of order
then sort all the words (besides adverbs) in the sentence so that it corresponds to the word order in the source input sentence
In addition to information about indexing and failsoftedness, Ape has access to information about part of speech of the words in the translated sentences and their morphology.
The following examples illustrate the Ape functionality, (the words reordered are in bold):
The technique using an inert gas to form the secondary chamber and then only forming the orifice as the secondary chamber is placed into the container.
Teknikken ind i containeren der anvender en inert gas for at danne det sekundære kammer og derefter kun at danne mundingen idet det sekundære kammer placeres
Teknikken der anvender en inert gas for at danne det sekundære kammer og derefter kun at danne mundingen idet det sekundære kammer placeres ind i containeren.
The post-edited translation
Teknikken, der anvender en inert gas for at danne det sekundære kammer og derefter kun danner mundingen som det sekundære kammer, placeres inde i containeren.
The copper compounds used as anti-oxidants in this invention may be chosen from those described in the document as suitable for lubricants.
De anvendte kobberforbindelser som antioxidanter i denne opfindelse kan vælges fra de i dokumentet som egnet til smøremidler beskrevet.
De kobberforbindelser anvendte som antioxidanter i denne opfindelse kan vælges fra de beskrevet i dokumentet som egnet til smøremidler.
The post-edited translation
Kobberforbindelserne som er anvendt som antioxidanter i denne opfindelse kan vælges fra dem som er beskrevet i dokumentet som egnede til smøremidler.
As can be seen, Ape has sorted the word order so it corresponds to the source input sentence. The word reordering via Ape, however, has improved profoundly the translation quality and at the same time reduced the manual post-editing work.
Especially the reordering of described/beskrevet via Ape in the second example has made it possible to grasp the overall meaning of the sentence.
Treatment of indexless items
A transfer-based approach with stepwise refinement has as one of its implications that function words are elevated (featurised) during analysis of the source language in that they are regarded as being language-specific (for a thorough description see (EUROTRA 1991)). The function words are then inserted during generation of the target language (synthesis).
Consequently, index information about the position of the source function words is non-existing. In the following a brief description of how indexless words are treated in Ape will be given.
The handling of indexless items is done in two ways.
a) Attachment of specific constructions as a single
indexed item before running Ape.
b) Defining additional constraints in the overall sorting
The attachment strategy is quite simple. Indexless words are attached to the (possibly distant) following indexed word. One exception from this strategy is the auxiliary verbs, which are attached to the next indexed verb.
The additional constraints on the overall sorting algorithm are the following:
If (the difference between the current word's index and that of the following word is one)
if (the following word has initial indexless items and the
current word has a higher index)
then output those indexless items before outputting the current word.
As an illustration consider the following example of the extended sorting procedure (lower case letters represent indexless words, the numbers are indexed words and underscore expresses attachment):
The original input sentence:
'… adjusting a mechanism …'
The translation result before running Ape:
' … en(a)_ mekanisme(2) for(b)_ at(c)_ indstille(1) …'
In this example all the conditions are met. The difference between the current word (represented by 2) and the index of the following word (represented by 1) is one, and the following word (1) has indexless items (a,b), and the index of the current word is higher than the following word. The result after running Ape is:
' … for at indstille en mekanisme …'
The treatment of indexless items is per definition language-specific so in terms of generalisation the Ape handling of function words is exclusively restricted to Danish. The overall concept of reordering failsofted sentences based on the word order of the source input sentence, however, has (cf. below) a more general perspective.
Two types of evaluation of the effect of Ape have been carried out, reflecting the two convergent, yet complementing, interests of end-user and system developer.
The user-oriented evaluation is predominantly subjective and focuses on the qualitative aspects in terms how the post-editor experiences the output when Ape has been applied. That is, evaluation indicates the effect on post-editor satisfaction and task ergonomy.
Complementing this user-oriented evaluation, is the calculation of the effect on cost-efficiency – an important parameter in the commercial setting.
Thus, Lingtech conducted focused interviews with a number of post-editors. The conclusion was – not surprisingly - that the post-editors felt that the number of “difficult” sentences had reduced considerably.
Lingtech also performed a calculation of improvements in cost-efficiency by way of comparing average number of post-edited words before and after the introduction of Ape. Again, the results were encouraging, in that performance improvements between 10-15% were recorded. In other words, after the introduction of Ape, the overall time spent on getting from machine translated out to finalised text was reduced.
In order to generate a quantitatively more precise measure of the Ape functionality, the following evaluation procedure is being performed. First, a representative corpus of failsofted sentences was identified and collected. Then parallel translation results were made by running PaTrans with and without Ape, respectively. Both these groupings of results are then compared automatically with the post-edited and thereby final versions of the translation process, which functions as a benchmark representing the satisfaction rating. The comparison procedure goes through the parallel translation results and aligns them with the benchmark results and then a comparison of the performance of the two PaTrans systems is made. In this way an approximate measure of the Ape functionality in terms of reordering adequacy and thereby by reduced post-editing is achieved.
The details of the latter evaluation procedure will be presented and elaborated on in the final version of this paper.
When using machine translation in the process of producing translations of publishable quality, post-editing (i.e. correcting flaws and mistakes in the machine translated output) is an important task that needs to be focused seriously on. As has been argued previously in Bech 1997 and also in this paper, cleverly automating peripheral tasks and providing suitable tools for the human tasks to be performed in relation to the commercial usage of machine translation are critical parameters for success.
In this paper we have presented an innovative approach to easing the burden of post-editing, going beyond providing an environment with pre-implemented short-cut key operations for the repetitive types of corrections to be made by the post-editor. The basic idea of the Ape strategy is to exploit the linguistic information present in the text to ‘repair’ different kinds of flaws which are tedious to deal with manually in post-editing. We have demonstrated the viability of the strategy by its practical implementation in PaTrans and its positive effect on output quality as reviewed by post-editors (qualitative evaluation) and in terms of a benchmark-based, quantitative evaluation.
As any serious, modern MT-system exploits linguistic knowledge in its processing, the fundamental idea behind the approach presented here is generaliable to other MT-systems and scenarios. Finally, the encouraging results we have obtained with our present version of Ape leads us to work on future developments and further enhancements.
Bech, A. (1997). MT from an Everyday User’s Point of View. In Proceedings of MT Summit VI (pp. 98--105). San Diego, CA: AMTA.
EUROTRA (1991). Copeland, C., Durand, J., Krauwer, S. & Maegaard, B. (Eds.), Studies in Machine Translation and Natural Language Processing, Vol. 1. Luxembourg: CEC.
Maegaard, B. & V. Hansen (1995). PaTrans – Machine Translation of Patent Texts. From Research to Practical Application. In Convention Digest: Second Language Engineering Conference (pp. 1--8). London.
Povlsen, C., Underwood N., Music B., Neville A. (1998). Evaluating Text-type Suitability for Machine Translation a Case Study on an English-Danish MT System. In Proceedings of the First International Conference on Language Resources & Evaluation (pp. 27--32). Granada.
Ørsnes, B., B. Music and B. Maegaard (1996). PaTrans – A Patent Translation System. In Proceedings of COLING, (pp. 1115--1118). Copenhagen.
 PaTrans was developed by Center for Sprogteknologi for Lingtech. The system translates from English to Danish. The system is described in (Ørsnes et al.,1996; and Maegaard & Hansen, 1995).
 E.g. the input may be ungrammatical, lexical items not coded or fall outside of the linguistic coverage of the system.
 Please bear in mind that both sentences are failsofted so the translation results are not perfect
 Based on the fact that these words (and thereby their position) to some extent are language-specific, this information would in any case be inappropriate as a knowledge source for improving the translation results.
 In this evaluation setup we think of PaTrans as being two systems, PaTrans with and PaTrans without the Ape functionality.