TEMAA D13 - 3 Methodology

The functionality of a spelling checker is to be able to accept all valid words of a given language, reject all non-valid words, and in the presence of a non-valid word propose a plausible replacement word that belongs to that language. Our tests have therefore taken into account these three aspects: lexical coverage, error detection and suggestion adequacy.

3.1.1 Testing lexical coverage

Two lists were available to ISSCO to test coverage. One fairly large one, supplied by the Pisa Centre in Italy, that consisted of 244,191 words and that was derived from a variety of texts (literature, newspapers, scientific articles, etc.)[4]. A second list was derived at ISSCO from news wire articles (from ANSA), and comprises 16,530 words. This latter list is more recent (the articles were posted on the World Wide Web between January and April 1995), and there are a number of neologisms present in this list.

The two spelling checkers tested were exposed to both lists for coverage checking. However, error generation was done only on the more recent, smaller list of words.

3.1.2 Particular cases

3.1.2.1 Numbers

During the testing, we have noticed that one checker rejects more frequently ordinal numbers in full letters (e.g., ottantatreesimo [83rd]) or age adjectives (e.g., ottantatreenne [83-year-old]), so we submitted to the checker a selection of such adjectives that were present in the Pisa frequency list. The results are given below.

3.1.2.2 Proper names

We submitted to both spelling checkers the two lists of personal given names and main Italian cities (those that have the administrative function of head of province, see deliverable D12), as well as adjectives that refer to those cities (e.g. Roma-> romano, [Rome -> Roman] etc.). The adjectives were derived manually, according to the rules given in grammar books and dictionaries.

3.1.3 Error generation

In order to assess the quality of a spelling checker, we need to test the product not only for the number of correct words it recognises, but also (and maybe more importantly) for the number of incorrect words it recognises as such. In order to achieve this, we have proceeded to insert one error per word in (a subset of) the ANSA list, according to the taxonomy of errors described in deliverable D12.

2. doubling of a "b" in a given context (e.g., amabile-> *amabbile [loving])

3. insertion of a "g" in a given context (e.g., miliardo -> *migliardo [billion])

4,5,6. substitution of the couple "cu"+Vowel instead of "qu"+ Vowel and vice versa and substitution of cqu with qu (e.g., innocua -> *innoqua [innocuous]; scuola ->*squola [school], and acquario -> *aquario [aquarium], respectively)

7. exchange of the letter m with the letter n and vice versa (e.g., ancora -> *amcora [still], or amico -> *anico [friend].

Errors in the first and seventh category can be due, according to the given word, to either a mis-typing or to a real lack of knowledge of the correct spelling. Errors in the other categories are generally due to a real lack of knowledge.

3.2 Problematic areas

We have run ASCC with a list of incorrect words, where one letter substitution had taken place; (namely, exchange of every "n" with an "m" and vice versa). Product B's spelling checker goes through the coverage checking, but when it comes to suggestion level 1 (that is, record every 1st choice of suggested words), it seems to try and replace an erroneous word by itself, in uppercase. For instance, for the word " amni " (which is the result of the exchange of an n for an m in the word " anni " [years]) the checker suggests as a correction the word " ANmi ", and at the same time a dialogue window warns the user that such a word is not present in any of the dictionaries in use. At this point, ASCC is not able to work in batch mode as it was designed to do, and the process of testing for error detection becomes much more time-consuming.

Because this unexpected behaviour requires human supervision during running time, it has led us to restrict the number of suggestions to test the suggestion adequacy to 2 (instead of 5 as for the Product A), in order to speed the testing process.

[4] We are grateful to the Istituto di Linguistica Computazionale del CNR in Pisa to have made this data available to us