r-related errors
errors of suffixation ( not involving the letter "r")
silent letters
consonant doubling
letter substitution
compounding errors
errors in loan words
syllable omission and syllable repetition
other error types (apostrophe, capitalisation, etc.)
In the following sections, we shall provide examples for each category. Most of them are data quoted in the reports as naturally occurring examples. The ones that have been constructed are flagged with a "#". In each example, the mispelled word is followed by the corresponding correct word in square brackets. The misspelled word is flagged with a "*" if it is an incorrect word or if the context shows that it is wrong in the context.
For each error type, we shall also indicate what corruption rule could be used to generate the error from the correct form. In what follows, we use a notation where a corruption rule is a context-sensitive rule consisting of a left-hand side expressing a pattern to be matched, the symbol ">", a right-hand side expressing a sequence to be substituted for the pattern matching the left-hand side, and optional constraints enclosed in curly braces[4].
The following symbols are used in the rules:
C stands for one consonant
V stands for one vowel
& stands for one letter (either vowel or consonant)
- stands for one or more letters
. stands for beginning or end of a word
The constraints in curly braces can express either equality ("=") or inequality ("!=") between letters.
For example, the rule that deletes the final e in a word ending with re will be:
rule: -re. > r
A rule where additional constraints are specified is the following:
-re& > r& {&!=r}
The rule states that the sequence re must be replaced by r if it is followed by a letter which is different from r.
A question to be asked is whether any of the corruption rules described in the following sections overlap with one or more of the six operations already available to generate errors in ASCC. With the exception of letter doubling and singling, these operations are based either on a random process, or are constrained by the position of the letters on the keyboard. Most of the rules listed below, instead, are motivated by the interaction between phonetics and orthography, and predict the addition or deletion of one or more letters in certain well-specified combinations of letters. Therefore, from a general point of view, the two sets of rules are fundamentally different. Thus, although in practice some of the manipulations foreseen by our language-specific rules may also be achieved by random deletion or addition, to make sure that all the errors we are interested in testing are indeed generated, it seems reasonable to implement the rules below as separate operations.
Another general observation regards rule interaction. All the error types implemented in ASCC are single error types. In reality, however, several misspelling errors can occur simultaneously in the same word (e.g. *"selfølelig" for "selvfølgelig", En: of course). To account for this, ASCC should be able to apply more than one rule to the same input word.
Finally, we must note that some of rules given below overlap with each other, and have therefore been collapsed in the implementation.
Löb distinguishes five types of spelling errors involving the letter r, namely:
errors based on standard Danish pronunciation
errors related to pronunciation in regional variants of Danish
visual errors
technical errors due to wrong dictionary look-up
other errors
The last category includes the kind of idiosyncratic errors that defy automatic treatment. The penultimate one is a group of errors strictly connected to the type of examination the students were submitted to and are therefore not generally relevant. Hence, we shall focus on the first three types.
English translations are given in parentheses: note that where two translations are indicated, the first one refers to the misspelled word, and the second to the correct one.
Examples Corruption rule
Adding/deleting e
(1.1) klør [kløer] (itches/claws) -Ver > Vr
* vidre [videre] (further) -Cere > Cre
* sneer [sner] (snows) -r > er
(1.2) # * værlse [værelse] (room) -re& > r& {&!=r}
* fler [flere] (more) -re. > r
* længer [længere] (longer)
(-re is either final or followed by a letter different from r )
-r > re
Adding/deleting er
(1.3) * kontrollørne [kontrollørerne] -rer > r
(the conductors)
-r > rer
Adding/deleting r
(1.4) bære [bærer] (carry/carries) -rer > re
flimre [flimrer] (flimmer/flimmers)
* kørerplan [køreplan] (time schedule)
-re > rer
(1.5) * kontrolløerne [kontrollørerne] -VrV > VV {V != V}
(the conductors)
* Panduo [Panduro] (an author's name)
* byrer [byer] (cities) -VV > VrV {V != V}
* muserum [museum] (museum)
(1.6) * hierakisk [hierarkisk] (hierarchic) &VrC > VC {& != r}
* vudere [vurdere] (assess)
* nomal [normal] (normal)
* fasterlavn [fastelavn] (carneval) &VC > VrC {& != r}
* absorlut [absolut] (absolute)
(the constraint makes sure the rules are different from those in 1.4)
Reversing r and e
(1.7) * flimer [flimre] (flimmer) -&re > &er {&!=e}
*tuer [ture] (walks)
* byre [byer] (cities) -er > re
Reversing r and e; adding/deleting r
(1.8) * kuperrene [kupeerne] -Ver <-> Vrre
(the compartments)
#* vier [virre] -rre > er
(shake one's head)
Replacing r with g/j and viceversa
(1.9) * kontroløgerne [kontrollørerne] -rer > ger
(the conductors)
-rer > jer
(1.10) * børerne [bøgerne] -ger > rer
(the books)
-jer > rer
Replacing ar(r) with ej/eg and viceversa
(1.11) * paret [peget] -eget > aret
(pointed)
parret [peget] -eget > arret
(the couple/pointed)
-aret > eget
-arret > eget
(1.12) * naret [nejet] -ejet > aret
(curtsied)
narret [nejet] -ejet > arret
(fooled/curtsied)
-aret > ejet
-arret > ejet
Replacing år(r) with øj and viceversa
(1.13) fåret [føjet] -øjet > året
(the sheep/submitted)
*fårret [føjet] -øjet > årret
(submitted)
-året > øjet
-årret > øjet
Replacing rd with rde/rre/er and viceversa
(1.14) * færde [færd] -rd > rde
(journey)
-rde > rd
(1.15) * færre [færd] -rd > rre
(journey)
-rre > rd
(1.16) *fæer [færd] -rd > er
(journey)
-er > rd -
Vowel replacement
(1.17) * prast [præst] ræ& > ra& {& != r}
(vicar)
* skrætte [skratte] ra& > ræ& {& != r}
(rattle)
(1.18) * rotebil [rutebil] ru > ro
(coach)
* prublemer [problemer] ro > ru
(problems)
(1.19) * fårmiddags [formiddags] or > år
(yesterday morning)
år > or
(1.20) * dokter [doktor] or > er
(doctor)
* bibliotekorne [bibliotekerne] er > or
(the libraries)
Examples
dukke [dukker] (doll/dolls)
bleger [blege] (bleaches/pale)
However, Löb shows that there is considerable variation in the frequency of occurrence of the various error types depending on the region. Therefore, differentiation here could be achieved by weighting different errors accordingly.
(2.1) * passagerne [passagererne] -VrVr > Vr {V = V}
(the passengers)
(2.2) rare [rarere] -rVrV -> rV {V = V}
(nice/nicer)
(2.3) * vinduererne [vinduerne] Vr > VrVr
(the windows)
rV > rVrV
(2.4) * starks [straks] Vr > rV {V != e}
(soon)
The past participle -et suffix for verbs is often pronounced with a soft d // which sounds very much like the d in the past tense -ede suffix. In the first of the two examples given below, the participial form has been replaced by a past tense form, thus resulting in a false negative, wheras in the second example, the misspelled word is not a legal form:
(3.1) har jeg *beskæftigede [beskæftiget] -et. > ede
mig med
(I have occupied myself with)
(3.2) pengene bliver *betragted [betragtet] som -et. > ed
(money is considered as)
Sometimes the past participle -t suffix (which is required with certain verbs) is confused with the more common -et suffix:
(3.3) en lærer havde *slæbet [slæbt] ham Ct. > Cet
(a teacher had dragged him)
men også en *hvis [vis] tilfredsstillelse
(but also a certain satisfaction)
en generation *vis [hvis] forældre arbejder hårdt
(a generation whose parents work hard)
hvorfor har nogle meget *sværdt [svært] ved
(why is it quite difficult for some)
Note that the first two cases above cannot be treated by a spelling checker as the two forms ("vis" and "hvis", En: certain/whose) are homophones, whereas the misspelled word occurring in the third example (*"sværdt") is not part of the Danish vocabulary although it has probably been formed by association with the word "sværd" (En: sword).
However, most of the instances of errors in this class are not attributable to any similarity with the pronunciation of other words. The bulk of errors are due to voiced consonants (with the exception of b) in the final position of a syllable where they appear after l, m, n or r. In such a context, these consonants are either silent or greatly weakened, and are therefore often wrongly omitted in writing. It also happens that, because of the existence of silent letters in other words, the writer wrongly adds letters that are not part of the written word and are not even pronounced:
*hindanden [hinanden]
(each other)
det sociale *samværds [samværs] lille ABC
(lit: social gathering's little ABC)
de ser meget *veltilpadse [veltilpasse] ud
(they look quite satisfied)
The presence or absence of a silent letter is due to a number of factors, such as whether the word has a glottal stop ("hund/hun", En: dog/she), and if so on which letter ("find/fin", En: find/fin), whether the word comes from Latin or not ("inkludere/indtage", En: include/consume) and so on. However, these factors cannot be taken into account here, as the automatic generation of errors possible in ASCC is based on a simple pattern-matching mechanism. Therefore, some of the rules given below are not restricted enough and will also create errors that are unlikely to occur in real text. However, this seems unavoidable if we want to generate all the error types we are interested in.
We shall now discuss each of the possibly silent letters in turn.
silent h
Examples Corruption rule
(4.1) men også en *hvis [vis] tilfredsstillelse .v- > hv
(but also a certain satisfaction)
(4.2) en generation *vis [hvis] forældre arbejder hv- > v
hårdt
(a generation whose parents work hard)
# *erverv [erhverv] (occupation)
The two examples above are naturally occurring ones. However, confusion between words beginning with hj and j may also be foreseen:
(4.3) # *jelm [hjelm] (helmet) hj- > j
(4.4) # *hjern [jern] (iron) j- > hj
# *børnejem [børnehjem] (kindergaarden)
silent d
Examples Corruption rule
(5.1) de er til *gengæl [gengæld] godt udrustet -ld > l
(they are, on the other hand, well equipped)
Bryld slår ret *volsomt [voldsomt] ned på...
(Bryld criticises rather violently)
(5.2) *sansynligvis [sandsynligvis] -nd > n
(probably)
regler *inskærpes [indskærpes] ikke længere
(rules are not stressed any longer)
(5.3) *fær [færd] (journey) -rd > r
går [gård] (yesterday/yard)
(5.4) # *kontrold [kontrol] (control) -l > ld
(5.5) *hindanden [hinanden] -n > nd
(each other)
tjene til ferien og *lommepengende [lommepengene]
(earn for holidays and pocket money)
(5.6) det sociale *samværds [samværs] lille ABC -rC > rdC
(lit: social gathering's little ABC)
*gjordt [gjort] -r. > rd
(done) (r must not be followed by a vowel)
(5.7) # *best [bedst] Vds > Vs
(best)
(5.8) # *påvidst [påvist] Vs > Vds
(proved)
(5.9) # *hvit [hvidt] &dt > &t {& != lnr}
(white)
(5.10) # *konsonandt [konsonant] &t > &dt {& != lnr}
(consonant)
silent e
Examples Corruption rule
(6.1) # bar [bare] (carried/only) -&e. > & {& != r}
(6.2) *tydlig [tydelig] (clear) -el > l
The letter e is also silent when it occurs before m, n, and t (pronounced //). However, in these cases the letter combination which would result from deleting the e would deviate too much from the rules of Danish ortography (e.g. #* "gamml" for "gammel", En: old). Therefore, since no relevant spelling error examples are quoted in the literature we have had access too, we do not set up rules for these cases.
silent g
Examples Corruption rule
(7.1) *selfølelig [selvfølgelig] -lg > l
(of course)
(7.2) # *spurte [spurgte] -rg > r
(asked)
(7.3) # *kule [kugle] -gl > l
(ball)
Wrong addition of a g seems more unlikely than deletion of the same consonant (we have found no recorded example).
The letter g may also be silent in word-final position independently of the preceding letter. The g in the -ig suffix is a case in point:
(7.4) sjældent forlader artiklen, før den er
*færdilæst [færdiglæst] -&ig > &i
(it:seldom leaves the article, before
it is read to the end)
Note that the position of the omitted g is final with respect to the word "færdig", which in the example above, is part of a compound expression.
silent t
Examples Corruption rule
(8.1) hvis mennesket *forsat [fortsat] skal kunne -rts > rs
(if people still must be able to)
man føler sig *nød [nødt] til det
(you feel forced to it)
The fact that the t is silent in the last example is due to an idiosyncracy. The case cannot, therefore, be treated by a general rule.
silent v
Examples Corruption rule
(9.1) *selfølelig [selvfølgelig] -lv > l
(of course)
*fatigdom [fattigdom]
(poverty)
lige nu *sider vi og spiser [sidder]
(right now we are sitting and eating)
*uddanelse [uddannelse]
(education)
et *visent blad hos blomsterne [vissent]
(a dead leaf among the flowers)
The opposite can also be observed, where a consonant may be incorrectly doubled:
det øverste billede på venstre *sidde [side]
(the highest picture on the left side)
*erfarringer [erfaringer]
(experiences)
*væssenlig [væsentlig]
(important)
opfattelse af harmoni som *værrende [værende]
(understanding of harmony as being)
Within this category, it would seem that there is a great deal of confusion between words which are homophons or near homophons. For example the verb "sidde" (En: sit) is often erroneously associated with the noun "side" (En: side), in spite of the fact that the two words are not semantically close.
The factors that determine consonant doubling in Danish are specific to the grammar of the language, and consonant doubling in other languages obeys different rules. Nevertheless, it appears that spelling mistakes due to the omission of a consonant where a consonant should be doubled, or to the wrong doubling of a consonant, constitute a common source of error in general. Therefore, in TEMAA consonant doubling and singling are treated as a general type of typing error, without making reference to the specific language. They are handled by the first and the second of the six operations listed in the introduction to this Appendix.
The following instances of vowel substitution are very typical. Some Danish vowels are so close to each other in terms of sound quality that the potential for error is high. This is especially the case with the pairs e/æ, a/æ, and y/ø (note that relevant examples have already been quoted in the section on r-related errors).
Examples Corruption rule
(10.1) * nysgarrigt [nysgerrigt] e > a
(curious)
*kommendere[kommandere] a > e
(command)
(10.2) * værden [verden] (world) e > æ
* sprædt [spredt] (spread)
#* portret [portræt] (portrait) æ > e
(10.3) Brians opdragelse har haft y > ø
*betødning [betydning] for
(Brian's education has had importance for)
ø > y
As for consonant substitution, the distinction voiced/non-voiced (i.e. b/p, d/t and g/k) is only active in Danish at the beginning of a word. Therefore, such consonants constitute a frequent source of error when they occur in non-word initial position.
(10.4) * trykhed [tryghed] -g > k
(security)
elever som ikke vil *magge [makke] ret -k > g
(pupils who will not behave)
(10.5) Brian kan *skruppe [skrubbe] af -b > p
(Brian can bugger off)
-p > b
(10.6) mit første * indtryg [indtryk] -k > g
(my first impression)
-g > k
(10.7) *sympol [symbol] (symbol) -b > p
*etaplere [etablere] (establish)
-p > b
Because of identical pronunciation in mid-word and word-final position g and j are also often confused:
(10.8) på deres børns *vejne [vegne] -g > j
(on their children's behalf)
-j > g
*forbrugs goder [forbrugsgoder] (consumer goods)
i *et hvert [ethvert] hjem (in every home)
*familie idee [familieide] (family idea)
*der ved [derved] ikke sagt (lit: by that not said)
*hvor imod [hvorimod] Klaus Rifbjergs digt (whereas Klaus Rifbjerg's poem)
*oprørs tendenser [oprørstendenser] (rebellion tendencies)
*leve vilkår [levevilkår] (conditions of life)
*overenskomst situationen [overenskomstsituationen] (situation of agreement)
The splitting up of a compound word into two or more of its components is a specific case of space insertion, an error which can occur in connection with simplex words, too[5]. However, it seems that spelling checkers are generally unable to handle space insertion, as they check each word in turn and report on each of them separately. Therefore, compounding errors will not be considered any further here.
Examples Corruption rule
(11.1) # *akselerere [accelerere] (accelerate) ce > se
# *annonsere [annoncere] (announce)
*nyanser [nyancer] (nuances)
(11.2) # *kamouflere [camouflere] (camouflage) cV > k V {V = a,o,u}
# *kreme [creme] (cream) cC > kC
(11.3) # *disipel [discipel] (disciple) sc > s
# *sene [scene] (scene)
(11.4) # *sjance [chance] (chance) ch > sj
# *sjock [chock] (schock)
(11.5) # *bensin [benzin] (petrol) z > s
# *bisar [bizar] (bizarre)
(11.6) # *sylofon [xylofon] (xylophone) x > s
(11.7) # *ekseptional [exceptional](exceptional) xc > ks
(11.8) # *conjak [cognak] (cognac) gn > nj
(11.9) # *salong [salon] (lounge) n > ng
(11.10) # *restaurang [restaurant] (restaurant) nt > ng
(11.11) # *sjenere / *jenere [genere] (annoy) g > sj
(11.12) # *djuice [juice] (juice) j > dj
(11.13) # *vanilje [vanille] (vanilla) ll > lj
(11.14) *niveu [niveau] (level) eau > eu
(11.15) *diskution [diskussion] (discussion) ss -> t
In addition to these, there is a group of loan words for which the Danish spelling was officially changed in 1986 (Retskrivningsordbogen 1986, p.497-506). The problem here is that the spelling has not been consistently changed from a foreign one into one that obeys Danish orthography. Sometimes the spelling that is now sanctioned has become more foreign (e.g. "kampere" "campere", En: camp), and this may cause some confusion. It is not clear how the errors deriving from misspelling of these words should be treated in ASCC. In many cases, the error is of a systematic nature, and would be covered by one of the rules given above (e.g. "campere" -> *"kampere"). In others, however, it is rather idiosyncratic, e.g. "cjartek" -> *"charteque" (En: file), so that it could not be derived by a rule. However, a list of these words comprising their earlier and actual forms could easily be constructed manually.
genitive marking of words ending in -s, e.g. Hans'.
inflection of words with a stem ending in silent consonant, e.g. pommes frites'ene.
inflection of acronyms, e.g. FDF'er.
derivations of numbers, e.g. 6'er.
Influence from English causes the following errors.
Examples Corruption rule
(12.1) på de *unge's [unges] vegne &s. > &'s
(on the youth's behalf)
*Høy's [Høys] tekst
(Høy's text)
Vi modtager gerne *check's [checks]
(We accept cheques)
(12.2) *Brians's [Brians'] væremåde s'. > s's
(Brian's manner)
Examples Corruption rule
I dagens *danmark [Danmark] ?
(In today's Denmark)
den overflod som *i har savnet [I]
(the abundance you have lacked)
*i var unge i en tid hvor [I]
(you were young in a time in which)
The rule here is simply to replace a capital letter with a small letter.
Examples Corruption rule
*alkoholdig [alkoholholdig] ?
(containing alcohol)
*meddelse [meddelelse]
(message)
It is not clear how to express this as a corruption rule, as we cannot express the concept of syllable. The closest approximation would probably be to check for combinations of two or three letters that are repeated after each other.
Examples Corruption rule
*rarerere [rarere] ?
(nicer)
*størrere [større]
(bigger)
Again, it is not easy to express this in our notation. The closest approximation will again be to repeat combinations of two or three letters.
[4] The notation informally described here has been designed by Gurli Rohde at Center for Sprogteknologi. The rules have been translated into Perl substitute statements in the implementation.
[5] Simple space insertion is, however, often caused by a simple typing error, whereas the insertion of a space to split the components of a compound in many cases is an intentional spelling error.