Cass NP recogniser

A NP recogniser collects words that constitute noune phrases. For example,

"Den sorte kats mindste killing har en meget tyk mave."

    [PRON_DEMO Den den]
    [ADJ sorte sort]
    [N_INDEF_SING_GEN kats kat]
    [ADJ mindste mindst]
    [N_INDEF_SING killing killing]]
  [V_PRES har have]
     [PRON_UBST en en]
     [ADJ meget megen]
     [ADJ tyk tyk]
     [N_INDEF_SING mave mave]]
  [TEGN . .]

Or to illustrate more clearly:

NP[Den sorte kats mindste killing] har NP[en meget tyk mave]

Noun phrases (NP's) in a text function typically as subject and object, so by identifying these and also the verbs, one obtains a gross analysis of the sentence. But NP-recognition can also be used in e.g. information retrieval. Especially the relation between compound words and their synonyms can be relevant. For example byrådsmedlem vs. medlem af byrådet.

CST's NP recogniser is implemented in Cass, a finite-state chunck parser. The system is basically language independent, but the NP grammar is modelled on NP's found in the Danish Parole corpus.

The grammar identifies simple NP's ranging from the start of the NP to the its kernel. Relative clauses and coordinations of NP's are not found, but proper names in postposition and the first preposition syntagma after the kernel are recognised on an experimental basis.

More information

Report about the NP recogniser used in information retrieval (Danish)

User guide to the Danish Parole corpus (Danish)

Read more about content-based information retrieval in Ontoquery and about the relation between NP's and compound words in the VID project (Danish).

Contact: Dorte Haltrup Hansen

Blå linie
Emil Holms Kanal 2, building 22, 3, DK-2300 Copenhagen S
Valid XHTML 1.0 Strict