previous next Title Contents

3. Conclusions

In this report, we have discussed spelling errors made by Danish native speakers. Our main goal has been to formalise the error types described in the literature so that they could be implemented in ASCC as an addition to the rules for general typing errors already implemented in the system.

We believe to have fulfilled our goal, at least partially. The majority of the rules sketched out here can be more or less directly implemented. A few of them (e.g. syllable repetition and omission) cannot be expressed directly, but it should be possible to implement an approximation of them.

An issue we have only touched upon is that of the generative power of our rules. There are cases, in fact, where the rules cannot be constrained in such a way as to avoid generating errors which are unlikely to occur in real texts (at least as spelling errors). For example, one of the sources of error we have described here for Danish is confusion between the two participial endings t and et. We repeat here the rule that produces the relevant spelling error, namely:

Ct. > Cet example: slæbt > *slæbet (En: dragged)

The rule simply replaces a t ending with et, without taking into account the fact that the stem should be a verbal one. Therefore, it would also produce less principled -- and probably less likely errors, e.g. with an adjective:

hårdt > *hårdet (En: hard)

However, in our case it seems less important to exclude unlikely errors than to make sure that all the errors we know of are indeed generated.

Nevertheless, there are other ways in which the output of the rules can be constrained without introducing too much complexity into the formalism, for instance by making use of frequency information. Thus, assuming as seems reasonable, that letter combinations that are infrequent or illegal in a given language are unlikely to occur in misspellings, it could be useful to introduce the notion of unlikely letter combinations. The idea is that the output of corruption rules is checked against a list of letter pairs considered illegal or very infrequent for the language under consideration, so that misspellings containing such letter pairs are discarded. For Danish, an interesting source to use in this context is Danske Bogstavpar, a frequency list of letter pairs in context.


previous next Title Contents