As developments within speech technology progress, the ways in which we can communicate with each other become increasingly varied and innovative. A speech technology tool has now finally become available in Danish, and this opens up new possibilities for the Danish market.
Your daughter is on a school trip to the theatre. You are supposed to pick her up at 4 pm. She is going to be delayed and would like to phone you from her mobile phone. This is not possible, however, as it would be rude to disturb the performance. Instead, she sends a text message to the fixed-line phone in your office. Your phone rings, and when you answer it, a male voice tells you that there is a voice-mail for you. You are told the mobile number of the sender, and the text message is read out.
This is only one of the many new possible uses for speech synthesis for Danish.
In simplified terms, speech synthesis - also known as 'text-to-speech' - consists of computer generated speech, where a software program analyses and translates written text or instructions into speech. Synthesised speech has a clipped characteristic and lacks the quality and rhythm of normal speech. However, the information generated by speech synthesisers today is easy to decipher and understand.
Have the Internet read aloud
Speech-Ware A/S, a joint venture between the Universities of Aalborg and Copenhagen, is responsible for selling and marketing the speech technology developed by Copenhagen University in cooperation with Aalborg University and TDC Tele Danmark. Speech-Ware has one product on the market, a male voice, but soon expects to have a female voice as well. The male voice is used for reading text messages in TDC's voice system.
Lise-Lotte Bjorkholt, the director of Speech-Ware predicts that, in the future, people will be able to benefit from speech synthesis in combination with the Internet. "For example, users will be able to phone a voice portal where they can choose between several different services. They can have their horoscope read out or get information about various cultural events in exactly the same way as we gather information from different Internet portals today."
Voice portals are still under development, but some telecommunications companies have already integrated speech synthesis in their products. Both TDC Tele Denmark and Sonofon offer their customers a system whereby they can have their emails and text messages read out on the phone. "This system has an advantage if, for example, you are driving your car or you are somewhere without access to a computer," says Lise-Lotte Bjorkholt.
Listen to the news
People with a visual or reading disability have, until now, only been able to follow the news by listening to the radio or TV. With the 'assistive technology' use of speech synthesis, this is no longer the case. For example, visually impaired and blind people can now subscribe to one of the major Danish papers, Jyllands-Posten, and have articles at their own choice from the daily paper read out. "We have developed a handicap-friendly service on the website of Jyllands-Posten, where visually impaired and blind people can have the daily news read out with speech synthesis," says Thomas Kellberg Christensen, Manager of IT development at the Danish National Library for the Blind. The disabled user needs access to a home computer which already has synthetic speech and a screen reader installed.
The Danish National Library for the Blind is currently working towards introducing the same system with the radio and TV programme in cooperation with the news agency Ritzau. "The programme will be available early this summer," Thomas Kellberg Christensen says.
The Danish National Library for the Blind has limited the use of speech synthesis to non-fiction, news and indexing of texts. Thomas Kellberg Christensen emphasises that it is not used for fiction. "Speech synthesis is only suited for short texts. It cannot replace the live voice which we use for recording works of fiction."
Speech recognition is another exciting aspect of current development within speech technology. With speech recognition a computer program is able to recognise spoken messages and carry out the actions. It is, in effect, the reverse of speech synthesis where the staring point is the spoken utterance and the end result is text.
There are two different approaches to speech recognition: Systems which are user independent and systems which are user dependent. A user-independent system is based on the recognition of relatively few words and phrases spoken by a large number of speakers. A user-dependent system, on the other hand, is capable of recognising larger vocabularies, typically up to 500,000 words, spoken by a specific individual - hence "user dependent". In the latter system, the voice characteristics of the user are used to train the recognition system in advance. A user-dependent system is typically installed in the user’s computer with the his/her individual profile.
"The system is set up for the user's voice and your position in the room. You might say that the program is initially ‘trained’ to understand the user. This is done by recording some spoken phrases, and thereafter the program can compensate for the acoustic changes due to the surroundings. Speech sounds different in e.g. a large room with wooden floors, open windows and many people, than in a small, cluttered office with drawn curtains. The training enhances the speech recognition and reduces the possibility of errors," explains Thomas Bilgram, Business Manager at Nordisk Språkteknologi (Nordic Language Technology).
Contrary to the user-independent system, this system is based on dynamic communication, where whole sentences and messages are converted from speech into text. User-dependent systems are not yet available for use with Danish, most probably due to the small size of the Danish language market.
Several Danish companies are currently experimenting with the use of the Danish user-independent systems within telecommunication. Product Manager Lene S. Bjerregaard from e-systems, an integrator of speech systems, says that speech recognition has been used in the USA for several years, but that the Danish version was only made available in Denmark in August of 2001. e-systems uses a speech recognition routing system for its own in-house use. The traditional receptionist, whose main job is to answer and transfer incoming calls, has been replaced by an electronic receptionist. When a customer phones e-systems and asks to speak to one of the employees, her call is automatically transferred to the right person.
"The fact that you are talking rather than pressing the buttons on the phone makes it possible to get through the system faster. You can explain exactly what you want and be transferred directly to the right person, whereas with automated touch-tone services you have to work your way through an array of menus to get the information you want," says Bjerregaard.
Information about flight departures is also offered via an electronic "agent" at SAS, the Scandinavian Airlines. The system is constructed in the same way as an automated touch-tone service. When you phone up the SAS Speech Line, a male voice guides you through various options. Instead of pressing the buttons on the phone, you repeat the word or phrase which the voice tells you to use in order to get exactly the information you want.
At the moment the system is limited to giving various types of passenger information, but SAS wishes to extend the system so that, in the future, customers can order tickets through speech recognition.
TDC Tele Denmark has, for several years, used speech recognition for their directory enquiry service, for example to transfer callers to the requested number.
Combination of several technologies
The next step in the development of the Danish version of speech recognition is to extend the system to being able to identify people through their voice. "Every voice is unique – more unique than a finger print, in fact," says Lene S. Bjerregaard. "In the USA the home-shop-network, a TV shopping service, is based on speech recognition and voice verification. The system can secure both that the right product is being ordered, and that the order is being placed by the account holder himself without the need for customers to be fumbling with paper catalogues."
Speech technology will only begin to become really interesting when speech recognition is combined with speech synthesis. In the future, speech technology might be installed in every car. Combined with GPS, the system will be able to understand the driver’s questions about directions, find the shortest possible route, make adjustments for road works via information from national and local traffic reports and read out the final route description to the driver.
Gitte Willumsen is an independent journalist with a scientific background, who writes about research and technology matters. More information (in Danish) about the author is available from www.citypressekontor.dk/gitte.htm.
The editors of HLTCentral would welcome any feedback on the article.