Oqaasileriffik
Postboks 980 DK-3900 Nuuk tlf: +299 327344 oqaasileriffik@gh.gl |
Härkätie 371
FIN-21490 Marttila tlf+358 (0)24846062 pl.oqaasileriffik@greennet.gl |
Language Technology in Greenland
by
Per Langgaard
Oqaasileriffik
Sprogsekretariatet
1 Introduction
Greenland has always been very remote from the rest of the Nordic countries, not just in terms of language and culture, but geographically and financially as well. Even today, flight prices remain prohibitive. The mail service has been very slow, and telecommunication prices used to be almost impossible to pay for most people. However, Greenland has over the last years moved a lot closer due to considerable improvements in service and remarkable price reductions within telecommunication, leading to an explosive growth in the number of users of telephone, fax and now also email and GSM. Even though it has become a lot easier to communicate with Greenland, however, the problems have not been solved overnight. Greenland is still at least 10,000 km away from the rest of the Nordic countries, and Greenlanders still - to an increasing degree, in fact - speak the polysynthetic Greenlandic language, which is so difficult for everyone else than the approximately 50,000 native Greenlanders. And due to exactly these two factors, Greenland is, despite all good intentions, still today a marginalised part of the Nordic countries, whether we like to admit it or not. It is important to call a spade a spade, especially in the case of scientific communication. The first vice-chancellor of The University of Greenland once said quite ironically that "We Greenlanders normally call a spade issorsiut, and that is probably our biggest problem in relation to the other Nordic countries". I personally believe that he was right. Apart from the almost insurmountable travel expenses involved in just a reasonably close participation by Greenland in Nordic or other networks, the language barrier has for many reasons been allowed to impose restrictions as well. This is not just in the narrow sense that it is difficult to talk to each other when one party speaks a language totally incomprehensible to the other, such as Greenlandic. Cooperation is also made difficult by the fact that this will always be based on the major language as work language and be focused on the problems presented by the major language. It is, moreover, difficult to get an academic education, partly because this is always conducted in a foreign language. Much has happened since the introduction of Home Rule in 1979. In a broad range of connections, optimism is greater than ever. The positive tendencies within education, for example, are evident to everybody, and "greenlandification" is becoming an increasingly strong current with respect to language as well as culture. But Greenland still lingers behind the rest of the Nordic countries in most areas. Despite the extension of sixth form education in Greenland and a steep rise in the percentage of pupils from each year continuing their education at this level from about 15% to 30% in the last 10 years, there is still far to go before reaching the 50 to 100%, who continue their education at this level in the rest of the Nordic countries. Similarly, whilst about one fourth of the adult population in the Nordic countries has completed tertiary education, only a few hundred Greenlanders are educated to this level. And amongst the few Greenlanders with academic educations there are very few people with educations in technical sciences. The connection between information technology and level of education is apparently quite unambiguous, so on that background it is not surprising that this account of language technology in Greenland will be incredibly short. There are very few activities to give an account of, and there are hardly any projects worth mentioning on their way. On an introductory note, it is therefore necessary to make clear that to the statement on page 12 of the annual review of language technology: "There is already highly qualified competence available in this area in the Nordic countries; ...", we need to add the paragraph which is so common within Danish law: "This law does not apply to Greenland." 2 Definitions The Nordic language technology research programme focuses on language technology in general, but three fields of development are mentioned specifically; (i) the development of computer supported teaching in the Nordic languages as second languages; (ii) the creation of cross-language search facilities in information bases; and (iii) possibilities of using Nordic language speech in interaction with machines. Let me first of all make clear that all of the mentioned areas are directly very interesting from a Greenlandic point of view. And let me also make clear that there is no part of any programme, let alone a whole running individual programme, devoted to Greenland within any of these areas. In the following, I shall therefore be using the term language technology in the broadest possible sense. However, I shall eventually return to discuss these fields of development and try to explain in what respects they are of interest to Greenland. In fact, my claim is that the need for technology in a marginal area with a unique culture such as Greenland is, if possible, even greater than in larger societies. This is not least connected with another essential fact that one needs to acknowledge in order to fully understand the situation in Greenland: Greenlandic is not some slightly exotic second language for otherwise Danish-speaking Greenlanders. Hence, it is not a language which should be guaranteed awareness and thereby survival for primarily symbolic and ethical reasons. I emphasise this because I know that a lot of outside people tend to consider Greenland in the frame of reference created by the many sad examples of complete or partial extinction of the world's minority languages. Therefore they imagine the situation of Greenlandic to be similar to Welsh in Great Britain or the Inuit languages in America, to mention but a few examples, or indeed that of the Samian language in the majority of the Sápmi area; that is, a situation where a people has in fact experienced a change of language, and now for ethnic, national, cultural, symbolic or other reasons struggles to maintain or recreate a bilingual society. Greenlandic in Greenland is not a symbolic language, and Greenland is not bilingual or Danish speaking. Greenland is officially and in reality Greenlandic speaking, and this to such a degree that the lack of knowledge of the foreign languages Danish and English, especially amongst young people and children, poses a major problem to the education system. It can, therefore, not be emphasised too strongly that when Greenland, despite its very weak preconditions, wishes to join the future of language technology, it is not an expression of a symbolic act or a "trendy" way of thinking. Greenland needs technology in Greenlandic for practically communicative and financial reasons, simply because the people of Greenland speak Greenlandic and do not have strong skills in many other languages. 3 The "infrastructure" of language technology 3.1 General background Even though the technological stage in today's Greenland is similar to the rest of the Nordic countries in many ways, and even though the technologial development all over the world has been explosive, it is still important to remember just how extremely explosive the development in Greenland has been. Up until 1966, the Morse key was the only alternative to traditional mail, and even after the establishment of the UHF chain in 1970 made telephone calls possible both within Greenland and to Denmark, the prices made it impossible to make use of this means of communication to any large extent. I recall the price per minute for a call to Denmark being approximately 1% of the post master's monthly salary in the early seventies! However, with the continuous expansion of the telecommunications network, the beginning digitalisation and the introduction of many new possibilities such as TELEX, DATAPAK, X28 etc., all communication prices started to fall, and from the mid-nineties they decreased dramatically. Consequently, the number of subscribers to all kinds of communication services has increased enormously, and so has the focus on technology in general. Aspects of this include young people's fascination with the possibilities of the Internet, the fact that everybody now has access to telephone and GSM and a considerable pressure on technological education. 3.2 Computer power and access to PCs are not bottlenecks Greenland really entered the world of computers at the end of the 70s, when what was then The Royal Greenland Trade (KGH) established a central computerised store management and invoicing unit in Nuuk. Almost simultaneously the Greenland Telecommunications Service, TELE, introduced computers in their valuation of charges and printing of bills. KGH chose a solution based on Digital's VAX, whereas TELE chose a RC8000 solution in cooperation with the large Danish company Regnecentralen. Apart from this, Greenland saw very few computerised systems for a range of years. The town councils bought the services in Denmark, and for a long time the Home Rule tried to standardise and centralise the IT development by only allowing a limited number of institutions to buy equipment, and by demanding that all equipment must be included in the central IT committee's list of approved systems; this in reality meant terminals connected with the central system of KGH or Philips word processing systems. But the explosive development proved too strong for these intentions. From the mid-eighties a rift developed between those who believed in centrally governed systems with multiple users and those who believed in the PC concept, and this very soon resulted in private users as well as institutions choosing their own solutions. Today the IT map of Greenland is probably not too different from that of Denmark: A range of large companies (Air Greenland, Bank of Greenland, The Home Rule, the town councils, Nukissiorfiit (the central power supply), TELE) have large centralised IT departments, and in smaller businesses, in schools and other institutions as well as in private homes, people have Windows-based PCs with Internet connection. Finally, there is a public IT service for more or less all children and young people via schools and other educational institutions, as well as free Internet access offered by libraries. 3.3 Internet access Until recently, the charges for Internet access were prohibitive, but from the mid-nineties the prices decreased dramatically. They are now relatively cheap and not too dissimilar to those in other countries with fairly low rates in the evening and night time. This has meant a dramatic rise in the number of Internet users. A research from 1998 showed that Greenland was slightly behind Denmark with regards to the relative part of the population having Internet access. However, it also showed that the percentage of the Greenland population planning to acquire a PC and Internet access was bigger than the comparative percentage of the Danish population. It has not been possible to get hold of more recent figures, but it is hardly an inaccurate estimate that the relative number of people with access to the Internet is today quite similar in Denmark and Greenland. 3.4 Access to data It is not the access to input data which initially limits the development of language technology in Greenlandic, either. Written Greenlandic is easy to get hold of, as newspapers and publishers have been using digital systems for a long time now, and public documents such as legal papers, minutes of meetings in the Landsting (the local governing body) etc. exist as files. Also spoken Greenlandic in sound or picture media of a technical quality suitable for research purposes is available to a reasonable extent, as the Greenland radio and TV has been at quite a high technical stage for several years. The outlook is a bit more bleak when it comes to unedited data such as lexicographical or terminological databases, tagged or untagged text corpora or digitalised collections of spoken language, to mention but a few of the types of data that are not available. 3.5 Bio-ware and language At the level of users, however, there is serious bottleneck problem. There are by now quite a lot of Greenlanders with IT skills such as computer technicians and IT assistants, so large parts of the user and maintenance functions are in fact "greenlandified". This is especially the case for major organisations such as TELE and the IT department of the Home Rule, Qarasaasiaqarfik, where almost all IT staff are Greenlanders. However, despite the positive development in this area, the IT sector still depends on quite a large import of skilled manpower, and there is therefore still a large influx of non Greenlandic-speaking expertise to this field. This is exactly where the major differences between Denmark and Greenland are found. First of all, there are no highly educated Greenlandic experts in this field such as computer scientists, software engineers etc. to develop, renew and/or "greenlandify" the systems. Secondly, the interaction between users and machines takes place in Danish/English rather than the Greenlandic native language, and this is a major problem. 3.6 Greenlandic programs On the background of the "infrastructure" described above it is hardly surprising that there is a distinct lack of programs in Greenlandic at levels below DB applications and HTML programming. As for the latter, it should be mentioned, however, that websites in Greenlandic are now slowly beginning to appear, and most major systems now have user interfaces in Greenlandic. But there still remains a long list of programs that we do not have in Greenlandic: There has not been enough resources for an adaption of the auxiliary programs in the most widely used word processing systems (spell and grammar checks, hyphenation, automatic transliteration between different ortographies, on-line help services a.o.), there are no on-line dictionaries, and computer supported translation and voice response systems are virtually unknown. Moreover, we have only just started considering the use of computers in foreign language tuition, and absolutely nobody has dared dream about the possibilities that computers open up for the small but very marginalised groups of deaf and blind people in Greenland. 4 Language technology projects up until today As mentioned earlier, there is a very limited amount of language technology projects related to Greenlandic, even though language technology is not entirely unknown in a Greenlandic connection. 4.1 Dermot Collis' frequency analyses About as early as 1970, 5 Greenlandic novels were recorded on a tagged file at what was then the Technical College of Denmark (Danmarks Tekniske Højskole) - now the Technical University of Denmark - and a number of frequency analyses were carried out on this material. Dermot Collis from Université Laval was in charge of the project. Another participant in the project was Carl Christian Olsen, who is now departmental director for the Greenlandic Department of Language. Further work with the research project took place in Canada, where the computer files might be found at Université Laval - if they still exist, that is. As far as we are aware, the project results only exists on paper today, a copy of which can be found in Nuuk at Oqaasileriffik/ Department of Language. 4.2 Ilisimatusarfik In 1981 a large and rather ambitious dictionary project was initiated in Nuuk. For several different reasons the project was abandoned, but an internal by-product of the project was a small concordance for sample search compiled by Per Langgaard, then a university lecturer, in 1986 in cooperation with RECKU (later UNI-C) and based on approximately 1000 pages of Greenlandic literature. This concordance includes 78,203 concordance lines and exists on paper at Ilisimatusarfik, whereas the output data exists as a text file at Oqaasileriffik. A range of minor programs, which have been used in literary analysis of Greenlandic texts, stem from approximately the same time. Amongst these are some KWIC-indices in the Greenlandic hymn book and the works of some writers, as well as some static programs. 4.3 Henrik Aagesen The only individual who has worked with Greenlandic language technology to any significant degree is Henrik Aagesen, PhD in linguistics. For a number of years he has been interested in the use of computers in a descriptive linguistic analysis of Greenlandic, and this work has now finally resulted in a quite large collection of programs with corresponding lexica, what he calls a "Word splitter". This program is well described at the website http://qimawin.adr.dk/ and will therefore not be described in further details here.
4.4 Oqaasileriffik Oqaasileriffik/ The Greenlandic Department of Language is a relatively recently established institution in Greenland. Apart from the scale of the dimensions, this department corresponds to Dansk Sprognævn, The Danish Language Council. It was formally established in the winter of '98/'99 but did not get permanent office space and staff until the spring 2000. One of the main purposes of Oqaasileriffik is to register and document the Greenlandic language. The institution depends entirely on the use of IT for this work, not only during the phase of registration (word databases and subject terminologies), but also for sample searches (the establishment of corpus). Oqaasileriffik has also attempted to created some primitive auxiliary programs, eg. for transliteration between the old and the new ortography. Apart from this, nothing has really happened in this field. However, there is a growing acknowledgement of the need for a wide range of auxiliary programs, not just in linguistic documentation and consultancy, but also to support the Greenlandic language and make it more "user friendly" in a modern society. This can be done for example by developing spell and grammar checks, hyphenation programs etc., so that Greenlandic will have the same facilities in this respect as the major languages have already. 4.5 Inerisaavik/ The Teaching Aids Centre of Greenland The field of development mentioned first in the Nordic language technology research programme is the development of computer assisted teaching of the Nordic languages as foreign languages. This field of development ought to receive special attention in Greenland, as the single most important question in education in Greenland is the situation of Danish and English as second languages. However in this field as well, language technology is an almost unknown phenomenon in Greenland. Apart from one single Greenlandic version of a CD from the summer 2000 with programs, which were made available for the small Nordic languages by IDUN, a project of "Information technology and Computer Pedagogics in teaching" run by the Nordic Council of Ministers. A few of these programs are designed for use in the teaching of foreign languages, but as far as I am informed, they have not been overly popular in the Greenland schools. Based on a somewhat superficial reading of the programs, I hardly find this surprising. I personally doubt that they meet the needs of the target group, as they seem to be based on antiquated audio-lingual language pedagogics focusing on single words and spelling and therefore excessively on nouns. To Greenlandic children the problems are in completely different areas, for example in pronunciation, in the understanding of the actual grammatical categories (such as tense, gender and definiteness) and in syntax. When we add to this that the translation in the Greenlandic version of the program is a bit doubtful in a number of cases (for understandable reasons, as the translator has not had access to an even remotely consistent terminology), it becomes obvious why teachers have not been overly keen to use these programs. Despite my not very flattering evaluation of the programs, I would, however, like to stress here that I consider them important and very useful, partly because they are groundbreaking and partly because they can help show the future Greenlandic software developers that it is possible to use technology in teaching. The introduction of this technology will probably just have to happen less as a top-down process and more as process based on formulated requests for help and specific needs in the schools' daily teaching. 5. On the other side Labrador and Baffin Island are called Akilineq in Greenlandic. According to Samuel Kleinschmidt's classic dictionary from 1871, this means "the country which is the opposite (on the other side of the fjord or similar)". Opposite is used here in a strictly spatial meaning, but if Kleinschmidt had lived today, he would probably have used the same word in its modern sense. The fact is that there are many differences between Greenland on the one side and Canada/ Alaska on the other. One essential difference is that whereas Greenland has very little language technology, it has plenty of languages; the Inuit languages, on the contrary, are in a bad condition in Alaska and Canada apart from the central area of Nunavut, but there is a strong focus on technology. Until Ilisimatusarfik was established in 1983, Greenlandic only had academic support at the Eskimology Department at University of Copenhagen, whereas the dialects on the opposite side of the fjord have received much attention from both new and established universities and other academic institutions conducting linguistic research. McGill and Laval - both from Quebéc - and Alaska Native Language Center at University of Alaska Fairbanks are probably the best known, but not the only Canadian and American universities with an interest in Inuit languages. One might think of at least ten or so universities who have well established programmes in "Arctic Studies", "Northern Studies" or "First Nations Studies". Thus, it is hardly a coincidence that the first language technology project in Greenland was carried out by a Canadian (Collis' frequency research) from Université Laval. Iñupiaq is the name of the dialect spoken in the northernmost part of Alaska. It is under so much pressure that many researchers already consider this language extinct. Only a few old people speak the language, and there is no consistent ortography. In an attempt to save what was left of the dialect, a project was initiated several years ago called Computer Assisted Linguistics on Alaska Native Languages at the Iñupiat University of the Arctic, Barrow, Alaska. The aim was to create a program which, based on a phonetic analysis of spoken input data, could generate digitalised phonematic output data, which could then be "translated" into some form of ortography. A more recent American project is CATANAL, which was started in February 2000. Computer-Assisted Translation of Alaska NAtive Languages springs from an idea, which was initially formulated at The Transnational Arctic and Antarctic Institute in Anchorage and later spread to a relatively large network of universities with established departments for computational linguistics in the USA and France. CATANAL is based on the theory that whilst machine translation and computer supported translation are clearly good tools for large languages, where such tools in the hands of professional experts exist and are being developed rapidly, they are, in fact, essential for the small languages, for which there are neither tools nor experts. The challenge is to both create the programs and make sure that they have a user interface making them suitable for non-specialist use. In July 2001 the government in Nunavut hosted a workshop with the same themes from a Canadian point of view called "Inuktitut in the Digital World". As will be obvious from my conclusion in the following section, I strongly agree with the view that focused and user-oriented language technology research would have very positive effects for Greenlandic as well as all other minority languages. In Canada's latest territory, Nunavut, the question of language is central and far from simple. Firstly, the government is willing to consider as many as 4 official languages with the resulting requests for interpretation and translation between the languages. Secondly, compared to Greenland, the Canadian Inuit traditionally have a very permissive attitude to acknowledging the right of different dialects to have their own ortographies, educational material etc. Obviously this is a very demanding principle for a society, and there is a need for all possible kinds of help to overcome the problems raised by such a view of language. This is probably also an important reason why Nunavut has chosen to invest in a broad range of projects, which are all supposed to be of benefit to the minority languages. Part of this has been a great focus on the possible use of computers in the maintenance of languages. The most successful project to date is beyond doubt "Asuilaak/ The Living Dictionary", which, after beginning as just a thought in 1999, was already up and running in October 2000. Asuilaak is an interactive word database, which originally consisted of 10 existing word lists, and which has since been continuously updated with additions by users. Asuilaak can be visited at www.livingdictionary.com. The people behind Asuilaak are also right now working on The Greenlandic/ Syllabic Converter Project, a programme which is meant to be able to automatically transliterate/translate between Greenlandic and Inuktitut. 6. A syllogism 6.1 First premiss Greenland is culturally rooted in seal hunting, but is, at the same time, a modern Western society, which due to advanced intercommunication is almost as much at the centre of the World as any other country. And as I have already mentioned, Greenland is both officially and practically Greenlandic speaking. Greenlandic is the only functional language for about 30,000 Greenlanders, or about half the population. Among the rest, about 2/3 are bilingual with Greenlandic as their first language, and the last third, or about 8,000 people, are Danish speaking with little or no knowledge of Greenlandic. So the Greenlandic language must be able to carry out the thousands of functions demanded from a national language by modern society. 6.2 Second premiss There are only very few highly educated Greenlanders, and amongst those, only a couple are second-generation academics. I assume that the majority of those who read this will understand the enormous implications in this simple statement by introspection. Moreover, it is up until now the "soft" social and political subjects which have produced Greenlandic university graduates. During the period of ethnic mobilisation in the 70s and 80s, the technical subjects were not popular. They were almost considered imperialistic and foreign elements in Greenlandic culture, and the youth of the time were more interested in social and political subjects. This is one of the reasons why there are no Greenlandic computer scientists, and this tendency has also been detrimental for the more formal aspects of philology and linguistics. For many years there has only been one single Greenlander with an proper university degree in linguistics. 6.3 Conclusion There is an enormous need for linguistic thinking and innovation and language maintenance in Greenland, and there is nobody to think, innovate and maintain. It is an absolutely impossible situation. 7 The needs "Problems are solved continuously and miracles delivered to order"; this is a slogan for a large Danish company. In Greenland's language politics and in the language maintenance policy formulated in Oqaasileriffik/ The Language Department we try to do the same. Greenlandic survived on its own when it was exposed to a lot of pressure during the period of "danification", and it is today one of the most vital minority languages in the world. But even though Greenlandic is fully functional when dealing with everyday national subjects, there are still problems within those areas traditionally dominated by Danish such as technology and economy, where Greenlandic has a lower functionality and a far lower status than Danish. Greenlandic also has problems in very small communities and subcultures. For example, it is incredibly difficult to be newly blind in Greenland, and deaf people are forced to learn Danish sign language. The effort to raise the status of Greenlandic must therefore be continued, so that Greenlandic can build up terminologies and uses in the "Danish" domains, and, hopefully, by time Greenlandic-speaking scientists will be produced for these domains. If this object is to be fulfilled, the language has to be supported in many ways. One of these is focused work in language technology, because we share the view of the people behind the above mentioned CATANAL project: The major languages would theoretically survive without the technological support, of which they have plenty, because there are plenty of human and financial resources, which could in principle cover the needs manually. And the major languages will in any case survive, simply because of the large number of users. Small languages do not have this built-in inertia, and small "strange" languages such as the polysynthetic Greenlandic language do not have this at all. Small languages normally do not have many resources either. With respect to Greenlandic there are as mentioned virtually no human resources. That is why Oqaasileriffik/ The Department of Language places great importance on technology. Oqaasileriffik is at the moment working to create an interactive word database as well as several subject terminologies, and we have developed the principles for a transliteration program and a primitive hyphenation program, however we have not yet had the time to test and integrate the programs in the major software packages. There is still far to go from this sporadic work to something which even remotely meets the needs, and it is even doubtful whether we will be able to keep up the current activity level in the near future due to pressure from other activities and lack of funding. So let us return to the 3 areas of research I mentioned at the beginning: Foreign language teaching in Greenland is in great need for a helping hand. In Greenland itself this is first and foremost Danish as a foreign language for Greenlandic-speaking pupils. However, English, which is not heard much in daily life in Greenland and therefore does not have a natural place in the minds of the pupils, also needs support. Apart from this Greenlandic as a foreign language is also in a very difficult situation. As mentioned, there are no Greenlandic academics to develop and maintain this relatively new field yet, and it is virtually impossible for people who have an interest in Greenlandic but do not live in Greenland (or in major Danish cities such as Copenhagen or Aarhus) to find tuition or support in the language. Technological solutions could help to solve the needs in all these connections. The creation of search possibilities in Greenlandic in information bases will give possibilities for the large part of the Greenland population who are today virtually cut off from a majority of the information flow because they do not speak Danish and/or English. The realisation of sound-based interfaces between users and machines sounds like a promising possibility for all. Greenlanders are no exception to this. Apart from those fields mentioned specifically, language technology in the widest possible sense is also useful for Greenlandic conditions. We have a need for a great variety of things. We would therefore like to invite to a broad cooperation with regards to implementation of existing programs suited for use in Greenland after versioning and the development of new individual programs as well as Greenlandic modules for existing programs. I should perhaps add here that our contribution to such joint ventures will only to a small extent be in the field of computer technology. But we have wishes, ideas and needs, and more importantly, we have that descriptive knowledge about the language and the users which will make it possible to start developing programs and user interfaces for Greenland. |