Abstract Anaphora in Danish (The abstract Det)

Funded by The Danish Research Council for the Humanities

Project aims

The project's main aim is to develop a formal model over the use of Danish pronominal abstract anaphora.

What are abstract pronominal anaphora?

Abstract pronominal anaphora in Danish comprise the pronouns det (it/this/that), dette (this) and det her (this) when they point back to verb phrases, predicates in copula constructions, clauses, discourse segments or abstract pronouns (the antecedents). These anaphors refer to abstract entities such as events, states, situations, facts and propositions. An exanple of abstract anaphor is the following:

og så prøvede jeg så at gå lidt i svømmehallen og det prøver jeg sådan ind imellem [Samtale med Lægen (Duncker & Hermann, 1996)]
(and then I tried to go a little to the swimming pool and I try that once in a while)

The pronoun det in the example has as antecedent the verb clause to g a little to the swimming pool and refers to an activity.

Why are the Danish abstract anaphora interesting?

The Danish abstract anaphora are quite frequent, expecially in spoken language, and they refer to the must central concepts in discourse. The Danish abstract anaphora are used in more contexts than the corresponding pronouns in English and in Italian. Defining a model to identify the entities abstract anaphora refer to is important for the automatic process of discourse and it might give a better insight in the cognitive processes involved in the production and reception of discourse where morphologic, syntactic, semantic and pragmatic aspects interact.

Background

The project takes its starting point from Costanza Navarretta's phd dissertation The Use and Resolution of Intersentential Pronominal Anaphora in Danish Discourse. This dissertation indicates that there are aspects about the Danish abstract anaphora that need further investigation and that cannot be accounted for by existing studies on abstract anaphora.

The project builds on cognitive-based theories about the use of referring expressions which all presuppose that a speaker makes an assumption about the status of entities in the addressee's mental state and this influences her/his choice of referring expressions, i.a. (Givón 1983, Prince 1981; Ariel 1988; Gundel et al. 1993).

The project also builds upon existing studies on abstract anaphora, exspecially (Webber,1991; Fraurud 1992; Asher 1993; Gundel et al. 2005).

Abstract anaphora annotation

In the project we have annotated third-person singular pronouns and their function in Danish and, to some extent, Italian data. For abstract anaphors we annotate information such as the anaphors' antecedents, their syntactic type, the referents and their semantic type.

The annotated corpora

The texts we have annotated comprise EU texts, legal texts, part of The Danish PAROLE corpus and literary texts:

  • Danish and Italian parallel EU texts (24,389 and 25,303 running words respectively)
  • Italian stories by Pirandello (9,018 words) and their Danish translations (9,933 words)
  • Danish texts from the juridical domain consisting of 17,600 words
  • extracts of newspaper and journal articles, novels and reports from the Danish PAROLE corpus (Keson and Norling, 1998) (12,570 words)
  • Financial newspapers: extracts from the Italian Il Sole 24 Ore (MLCC corpus).
  • Danish dialogues and monologues from the Danish DanPASS corpus, a dialogue from the Danish Lanchart corpus
  • dialogues from the Italian AVIP corpus.

Part of the written Danish and Italian corpora are also annotated with coreference relations for all nominal phrases.

The annotation tool

We use the PALinkA annotation tool developed by Constantin Orasan, University of Wolverhampton.

Reports

Costanza Navarretta and Sussi Olsen. The annotation of pronominal abstract anaphora in Danish texts and dialogues. DAD report 1. Centre for Language Technology, University of Copenhagen. January 2009, p.20.

Articles

C. Navarretta. Automatic recognition of the function of third-person singular pronouns in texts and spoken data. In: S. Lalitha Devi, A. Branco and R. Mitkov (eds.) Anaphora Processing and Applications. 7th Discourse Anaphora and Anaphor Resolution Colloquium, DAARC 2009 Goa, India, November 5-6, 2009 Proceedings. LNAI 5847. pp. 15-28. Springer Verlag Berlin/Heidelberg 2009.

Costanza Navarretta. Co-referential chains and discourse topic shifts in parallel and comparable corpora. Revista de Procesamiento de Lenguaje Natural, La Sociedad Espanola para el Procesamiento del Lenguaje Natural (SEPLN), 42:105-112, 2009.

Costanza Navarretta. Pronominal types and abstract reference in the Danish and Italian DAD Corpora. In C. Johansson (ed.) Proceedings of the Second Workshop on Anaphora Resolution (WAR II). NEALT Proceedings Series, Vol. 2, 2008, 63-71.

Costanza Navarretta and Sussi Olsen. Annotating abstract pronominal anaphora in the DAD project. In Proceedings of LREC-2008 May 28th-30th 2008, Marrakesh, Morocco.

Costanza Navarretta. A contrastive analysis of abstract anaphora in Danish, English and Italian. In: A. Branco, T. McEnery, R. Mitkov and F. Silva (eds.) Proceedings of DAARC 2007 - 6th Discourse Anaphora and Anaphora Resolution Colloquium, March, 2007, Centro de Linguistica da Universidade do Porto, 103-109.

Project participants and contact

Costanza Navarretta, costanza @ hum.ku.dk

Sussi Olsen, saolsen @ hum.ku.dk


Emil Holms Kanal 2, bygn. 22, 3., DK-2300 KBH S
Tlf: +45 35329090 - Fax: +45 35329089