3 Methodology
We have applied the methodology defined in the TEMAA Final Report (Deliverable
16).
The functionality of a spelling checker is to be able to accept all valid words
of a given language, reject all non-valid words, and in the presence of a
non-valid word propose a plausible replacement word that belongs to that
language. Our tests have therefore taken into account these three aspects:
lexical coverage, error detection and suggestion adequacy.
Two lists were available to ISSCO to test coverage. One fairly large one,
supplied by the Pisa Centre in Italy, that consisted of 244,191 words and that
was derived from a variety of texts (literature, newspapers, scientific
articles, etc.)
[4]. A second list was derived
at ISSCO from news wire articles (from ANSA), and comprises 16,530 words. This
latter list is more recent (the articles were posted on the World Wide Web
between January and April 1995), and there are a number of neologisms present
in this list.
The two spelling checkers tested were exposed to both lists for coverage
checking. However, error generation was done only on the more recent, smaller
list of words.
During
the testing, we have noticed that one checker rejects more frequently ordinal
numbers in full letters (e.g., ottantatreesimo [83rd]) or age adjectives (e.g.,
ottantatreenne [83-year-old]), so we submitted to the checker a selection of
such adjectives that were present in the Pisa frequency list. The results are
given below.
We submitted to both spelling checkers the two lists of personal given names
and main Italian cities (those that have the administrative function of head of
province, see deliverable D12), as well as adjectives that refer to those
cities (e.g. Roma-> romano, [Rome -> Roman] etc.). The adjectives were
derived manually, according to the rules given in grammar books and dictionaries.
In order to assess the quality of a spelling checker, we need to test the
product not only for the number of correct words it recognises, but also (and
maybe more importantly) for the number of incorrect words it recognises as
such. In order to achieve this, we have proceeded to insert one error per word
in (a subset of) the ANSA list, according to the taxonomy of errors described
in deliverable D12.
More precisely, the errors taken into account were:
1. undouble a double consonant (e.g., abisso -> *abiso
[abyss])
2. doubling of a "b" in a given context (e.g., amabile->
*amabbile [loving])
3. insertion of a "g" in a given context (e.g., miliardo ->
*migliardo [billion])
4,5,6. substitution of the couple "cu"+Vowel instead of "qu"+ Vowel and vice
versa and substitution of cqu with qu (e.g., innocua ->
*innoqua [innocuous]; scuola ->*squola
[school], and acquario -> *aquario [aquarium],
respectively)
7. exchange of the letter m with the letter n and vice versa (e.g.,
ancora -> *amcora [still], or amico ->
*anico [friend].
Errors in the first and seventh category can be due, according to the given
word, to either a mis-typing or to a real lack of knowledge of the correct
spelling. Errors in the other categories are generally due to a real lack of
knowledge.
We have run ASCC with a list of incorrect words, where one letter substitution
had taken place; (namely, exchange of every "n" with an "m" and vice versa).
Product B's spelling checker goes through the coverage checking, but when it
comes to suggestion level 1 (that is, record every 1st choice of suggested
words), it seems to try and replace an erroneous word by itself, in uppercase.
For instance, for the word " amni " (which is the result of the exchange of an
n for an m in the word " anni " [years]) the checker suggests as
a correction the word " ANmi ", and at the same time a dialogue window warns
the user that such a word is not present in any of the dictionaries in use. At
this point, ASCC is not able to work in batch mode as it was designed to do,
and the process of testing for error detection becomes much more
time-consuming.
Because this unexpected behaviour requires human supervision during running
time, it has led us to restrict the number of suggestions to test the
suggestion adequacy to 2 (instead of 5 as for the Product A), in order to speed
the testing process.
[4] We are grateful to the Istituto di
Linguistica Computazionale del CNR in Pisa to have made this data available to us