previous next Title Contents

1. Introduction


This document gives a short overview of the word lists that have been collected by te TEMAA project for testing purposes.

Word lists are used in TEMAA to evaluate spelling checkers' positive lexical coverage, and to generate lists of misspelled words to test the same checkers' coverage of errors and evaluate the adequacy of suggested replacements. We distinguish a number of sub-attributes to positive lexical coverage. For a few sub-attributes (in particular common word coverage and coverage of technical domains), we foresee the use of lists ordered and structured on the basis of frequency of occurrence, while for the remaining ones the relevant lists would be constructed manually or semi-manually. Examples of both list types have been produced by the project.

It was the original aim of the project to deliver a complete package for at least one language, to be selected from among English, Italian and Danish. The project has instead worked simultaneously with Danish and Italian. Thus, although full coverage for one language has not been achieved, the collection of word lists produced ranges over a broad selection of coverage sub-attributes and constitutes in our opinion a good exemplification of the methods set up by the project.

Along with word lists for the testing of lexical coverage, the project has worked out language specific criteria to guide the automatic generation of misspellings. The results obtained are included as appendices to this document.

In Section 2, a list of the various word lists is provided, complete with a short description of their contents.

1.1. Contributors

The following authors have contributed to this report:

Introduction and overview of TEMAA test materials: Patrizia Paggio (CST) and Sandra Manzi (ISSCO);

Typology of Danish spelling errors: Patrizia Paggio and Uffe Sonne Svendsen (CST);

Typology of Italian spelling errors: Sandra Manzi (ISSCO).

Comments to all sections have been made by Nancy Underwood and Bente Maegaard (CST). General editing was carried out by Patrizia Paggio (CST).


previous next Title Contents