Information Structure in MT (PhD project)

Patrizia Paggio
Center for Sprogteknologi, Njalsgade 80, DK-2300 Copenhagen S, Denmark


The goal of this thesis is to propose a treatment of information structure which may contribute to the generation of more varied and coherent output in an MT system than it is possible in most current approaches. The main assumption behind this work is the fact that the structure of the individual sentences in a written (technical) text is indicative of the way in which each sentence relates to the rest of the discourse. Thus, by conveniently representing the information structure of a source sentence, it is possible to choose between competing, but from a discourse perspective different syntactic structures in the target language. On the basis of a discussion of various previous approaches to the topic, a model is proposed whereby the information structure of a sentence can be represented in a unification-based formalism like HPSG by means of the three context-related features P_TOPIC, FOCUS and NEUTRAL. Specifications are then worked out for the analysis and generation of a number of linguistic constructions in Danish in terms of their syntactic, semantic and information structural properties. The issues dealt with range from the computation of the information structure of unmarked declarative sentences, which is achieved through heuristic principles based on definiteness, word order, and semantic properties of certain constituents, to the analysis and generation of clefts, existential sentences, topicalisation and extraposition. In all these cases, it is shown that adding the information structural layer of analysis to the syntactic and semantic analyses of the input make it possible to account for the linguistic facts in a more accurate and complete way. Some of the specifications proposed have been implemented in the ALEP engineering platform: the positive results of the implementation provide an additional confirmation of the soundness of the model adopted.

