previous next Title Contents

1. Introduction

The purpose of this presentation of Danish spelling errors is to provide background material for understanding the corruption rules used by ASCC for the automatic generation of Danish spelling errors. These language-specific spelling errors should be seen as a complement to language-independent typing errors, which ASCC can also generate automatically.

For the sake of completeness, we list below the typing errors that ASCC can generate:

1. doubling: insertion of a letter X next to another X (special case of insertion)

2. singling: the opposite of doubling (special case of deletion)

3. deletion: of a randomly chosen letter

4. interchanging of two letters

5. addition of a letter X to a letter Y where X and Y are close on the keyboard

6. substitution of X by Y where X and Y are close on the keyboard.

All these errors are mainly due to typing mistakes, and do not relate to language-specific factors. Our aim here is enlarge this set by adding error types which only pertain to specific languages, in particular Danish.

We have based our classification on investigations of spelling errors made by students in Danish primary and high schools (e.g. Löb 1983 and Andersen et al. 1992) as well as on relevant textbooks for native Danish speakers (Togeby 1989). Therefore, the user group we are concerned with here is that of Danish native speakers with poor or imperfect spelling ability.

Unfortunately, no systematic data are yet available on spelling errors made by second language learners of Danish, and we shall therefore not be concerned with them here. Nor shall we deal with typing errors or any medium related kind of errors.

Throughout our presentation, we shall relate the error types described to the issue of automatic generation and detection of spelling errors which is one of the goals of the project. Thus, two factors are crucial to determine whether a spelling error can be treated by our evaluation method:

whether the error can be generated automatically by some mechanical and systematic substitution, deletion or addition of letters

whether the incorrect form is incorrect in all contexts.

For instance, among the spelling error categories identified by Löb (1983) for Danish are idiosyncratic errors for which no systematic mapping between the correct and the incorrect forms seems possible (e.g. *"indtasitter" for "intercitytoget", En: intercity train). Such errors cannot be generated automatically and therefore fall outside the scope of our evaluation package. In fact, they are also very difficult for any spelling checker to correct.

The second group of misspelled words that constitute a problem in our case are the so-called "false negatives" or "real-word errors". A false negative is a misspelled word which is wrong in the current context, but may be correct in others. The correct and the "incorrect" words are often homophones, i.e. they have the same pronunciation but different orthographies (e.g. in Danish "at *terroriserer" for "at terrorisere", En: to terrorise). These errors are systematic, and can easily be generated automatically. However, since spelling checkers check words one at a time without taking the context into account, false negatives cannot be detected. Therefore, they are not treated in the evaluation package.

previous next Title Contents