The reason for choosing this way of proceeding is in the amount of "noise" that each list would inevitably bring, in order to give a likely real-word simulation of a user's spelling checker session. In the ANSA-derived list there would be no proper names, but there would be neologisms, and possibly foreign words. In the construction of the Pisa list, capitalization of words has been ignored, so that no difference is made between strings with upper case and those with lower case letters. Thus for example the strings " con ", " Con ", and " CON " [with] are all counted as instances of the same word and are listed under just one of the possible forms. The choice which form is listed appears to be arbitrary. It was therefore not possible to automatically sort capitalised words as proper names, but we chose to eliminate words with a very low frequency since in that group there would be mainly foreign words, proper names, extremely archaic forms and misspellings.
Apart from the general vocabulary, we submitted to the two checkers lists of proper names: one of personal first names, and another of Italian cities with an administrative function (" capoluogo di provincia "). In order to test the proper " localization " of the spelling checkers, we manually derived the adjectives used for each of the cities, in all the inflected forms. A third list was manually sorted from the Pisa list, consisting of ordinal numbers or age adjectives written in full letters, since some of these have a spelling that might appear unusual for Italian (double vowels). Although ordinal numbers and age numbers in full letters do not have a high frequency in the lists at our disposal, they must be correctly recognized by the checkers.
Throughout this report, the percentages given have been rounded up to the closest integer.
A final word of warning: the two checkers tested seem to treat differently words that are linked with a hyphen, (e.g. " decreto-legge " [legislative decree]) or end with an apostrophe (e.g., " dell' " [of the]) as either one word or two words, so the total word count in the tables below reflects the word count of the checker, and the same list yields a different figure according to the spelling checker used. However, such words occur only in the Pisa derived list, and likewise the different word counts only concern that list.
Lexical coverage
Product: System A
|
||||
List
name
|
Total
no. of words
|
No.
Of words recognised
|
%
of words recognised
|
|
Pisa-derived
|
33377
|
29961
|
90
|
|
ANSA
derived
|
16527
|
15518
|
94
|
Lexical coverage
Product: System B
|
||||
List
name
|
Total
no. of words
|
No.
Of words recognised
|
%
of words recognised
|
|
Pisa-derived
|
33377
|
29326
|
88
|
|
ANSA
derived
|
16527
|
15561
|
94
|
Proper names coverage: Personal names
Product: System A
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
609
|
565
|
93
|
Proper names coverage: Personal names
Product: System B
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
609
|
484
|
79
|
Proper names coverage: Names of cities
Product: System A
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
103
|
103
|
100
|
Proper names coverage: Names of cities
Product: System B
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
103
|
102
|
99
|
City adjectives
Product: System A
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
320
|
108
|
34
|
City adjectives
Product: System B
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
320
|
303
|
95
|
Ordinal and age adjectives
Product: System A
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
63
|
18
|
29
|
Ordinal and age adjectives
Product: System B
|
||
Total
no. of words
|
No.
of words recognised
|
%
of words recognised
|
63
|
61
|
97
|
Lexical coverage
|
|||
Product
|
Lists
(Pisa and ANSA derived)
% recognized
|
Proper
Names (People and Cities)
% recognized
|
Adjectives
(City, Age and Ordinal)
% recognized
|
System
A
|
91%
|
94%
|
33%
|
System
B
|
90%
|
83%
|
95%
|
We have opted for a mixture of errors that are due to the mechanical mis-typing of a word and the real lack of knowledge of the correct spelling.
Error coverage
Product: System A
|
|||
Error
types
|
No.
of errors
generated
|
No.
of errors
signalled
|
%
of errors
signalled
|
1.
undouble a consonant
|
4670
|
4546
|
97
|
2.
bile-> bbile
|
233
|
231
|
99
|
3.
g insertion in li+vowel
|
83
|
82
|
100
|
4.
cu ->qu
|
10
|
9
|
99
|
5.
qu ->cu
|
116
|
115
|
99
|
6.
cqu ->qu
|
36
|
33
|
92
|
7.
m <->n exchange
|
105
|
100
|
95
|
Error coverage
Product: System B
|
|||
Error
types
|
No.
of errors
generated
|
No.
of errors
signalled
|
%
of errors
signalled
|
1.
undouble a consonant
|
4670
|
4567
|
98
|
2.
bile-> bbile
|
233
|
232
|
100
|
3.
g insertion in li+vowel
|
83
|
83
|
100
|
4.
cu -> qu
|
10
|
10
|
100
|
5.
qu ->cu
|
116
|
116
|
100
|
6.
cqu ->qu
|
36
|
34
|
94
|
7.
m <->n exchange
|
105
|
101
|
96
|
Error coverage
|
|||||
Product
|
Undouble
consonant
%
|
bile
->bbile
%
|
g
+ li
%
|
c/q
related errors
%
|
m/n
exchange
%
|
System
A
|
97
|
99
|
100
|
97
|
95
|
System
B
|
98
|
100
|
100
|
99
|
96
|
Suggestion adequacy
Product: System A
|
|||||||||
Error types
|
No. of errors
recognised
|
1st
sugg. correct
|
Correct sugg. Among 2nd-5th
|
Correct sugg. not among first 5
|
No sugg. offered
|
||||
No.
|
%
|
No.
|
%
|
No.
|
%
|
No.
|
%
|
||
1.
undouble
|
4546
|
3028
|
67
|
964
|
21
|
282
|
6
|
272
|
6
|
2.
bile -> bbile
|
231
|
204
|
88
|
1
|
0
|
3
|
1
|
23
|
10
|
3.
g insertion+
li +Vowel
|
82
|
61
|
74
|
8
|
10
|
5
|
6
|
8
|
10
|
4.
cu -> qu
|
9
|
7
|
78
|
1
|
11
|
0
|
0
|
1
|
11
|
5.
qu -> cu
|
115
|
98
|
85
|
0
|
0
|
3
|
3
|
14
|
12
|
6.
cqu -> qu
|
33
|
22
|
67
|
9
|
27
|
1
|
3
|
1
|
3
|
7.
m<->n
|
100
|
48
|
48
|
31
|
31
|
20
|
20
|
1
|
1
|
Suggestion adequacy
Product: System B
|
|||||||||
Error types
|
No. of errors
recognised
|
1st
sugg. correct
|
2nd sugg. Correct
|
Correct sugg. not among first 2
|
No sugg. offered
|
||||
No.
|
%
|
No.
|
%
|
No.
|
%
|
No.
|
%
|
||
1.
undouble
|
4567
|
4055
|
89
|
69
|
2
|
305
|
7
|
138
|
3
|
2.
bile -> bbile
|
232
|
206
|
89
|
1
|
0
|
12
|
5
|
13
|
6
|
3.
g insertion+
li +Vowel
|
83
|
68
|
82
|
1
|
1
|
8
|
10
|
6
|
7
|
4.
cu -> qu
|
10
|
8
|
80
|
1
|
10
|
1
|
10
|
--
|
--
|
5.
qu -> cu
|
116
|
69
|
59
|
1
|
1
|
5
|
4
|
41
|
35
|
6.
cqu -> qu
|
34
|
30
|
88
|
1
|
3
|
3
|
9
|
--
|
--
|
7.
m<->n
|
101
|
48
|
48
|
3
|
3
|
22
|
22
|
28
|
28
|
Suggestion adequacy
|
||||||
Product
|
No.
of errors
recognised
|
1st
sugg. correct
|
2nd
sugg. correct
|
Correct
sugg. among
2nd-5th
|
Sugg.
not among first 2/5
|
No
sugg. offered
|
A
|
5116
|
68%
|
N.A.
|
20%
|
(5)
6%
|
6%
|
B
|
5143
|
87%
|
2%
|
N.A.
|
(2)
7%
|
4%
|