Swedish Book Review is not responsible for the content of external websites.
Translation and Computing:
Friends or Enemies?
Peter Linton

This is a slightly updated version of an article which appeared in the 2005:1 issue.

Within living memory, the word “computer” was not a machine, but a job description. A “computer” was a person employed to make mental calculations, for example in astronomy, artillery, surveying etc, in the days before this could be mechanized. Today the word “translator” is still a job description – but for how much longer? Is translation doomed to be dehumanized and mechanized in our lifetimes? Will a “translator” be a machine, not a human?

This article looks at this issue, and some of the computer-based tools for translators that have become available in the last 10 to 20 years (and with particular emphasis on tools for Swedish-English translators)z. The conclusion is encouraging – there are good reasons for thinking that computer-based translation is still some way off, and that there are many computer tools that can enhance a human translator’s productivity – without reducing quality.

Human versus machine translation

On the face of it, translation and computers are uneasy bedfellows. Translation is above all a very human activity, in which linguistic knowledge needs to be matched by non-linguistic knowledge about the world, by intuition, a sense of style, familiarity with different cultures, and above all common sense. Computers on the other hand are mere machines with none of those human attributes, least of all common sense. In fact, of all the uses that computers can be put to, using language is about the least suitable for precisely those reasons. Science fiction nightmares such as the misanthropic computer HAL in the film 2001 are precisely that – fiction, far removed from technical reality.

In the early days of computing, by contrast. translation seemed a particularly apt use for computers. The success of code-breaking during the Second World War encouraged a feeling that a foreign language was like a code that could be broken. The classic exposition of this view is this quotation from 1949:

“I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.”  (Warren Weaver)

This was a seductive idea, and after the war a great deal of time and effort was spent on trying to develop machine translation. Sadly, this concept of a foreign language as a code turned out to be a mirage. Despite considerable work and expenditure, machine translation to this day has proved to be adequate only for what is called "gisting” – providing the gist of a text. This may in some cases be better than nothing, but in most cases the results are risible.

Machine-translated poetry

It is just as well that computers are insensitive machines, because a popular human pastime is to find ridiculous mistranslations. One favourite is a French wine website that advertised a wine called “small windfallen wood”. Francophone oenophiles may recognize that as an accurate if misleading translation of its French name “Petit Chablis”.

As an experiment, let us see what machine translation might to do with a Swedish haiku, by Tomas Tranströmer (quoted in an article in Svenska Dagbladet). The original is on the left, and a machine translation on the right:

Håll ut näktergal!
Ur djupet växer det fram -
vi är förklädda.

Hold out nightingale!
Out of the depth grows it front -
We is förklädda.

Ungrammatical, unfeeling, unliterary, unpoetic, risible, but at least we get the gist in English. The computer understands nothing, and merely does a simple search and replace – search a dictionary to find the English equivalent of the Swedish, word by word. That is pretty well the best we can hope for from machine translation – the text is more or less understandable, but what is clearly missing is the poetry. (Readers are urged to rise to the challenge and see if they can better the machine version).

Why computers cannot handle human language

It is instructive to examine why computers are so unsuited to processing language. The reason is simple – a computer consists almost entirely of transistors, hundreds of millions of them. Each transistor is used to switch electricity on or off – exactly like a household switch. The apparent power of a computer comes from combining these extremely simple operations into more complex patterns, millions or even billions of times a second. But human language is highly complex, and what is worse, ambiguous. As a result, it is extremely difficult, perhaps even impossible, to program a computer to handle all the subtleties and irregularities of language.

Take a popular example of ambiguity: “Time flies like an arrow.”

Clearly this has two potential meanings – either that there is a species of fly called ‘time flies’ that has a mysterious predilection for eating an arrow , or alternatively that time goes by quickly. A human being will instantly reject the first as implausible, and go for the second. But how is a computer to do so? There is nothing in the text that gives any clue. The computer would need access to a grammar to sort out the verbs from the nouns, an entomological encyclopaedia to establish whether there are indeed time flies, and a list of the sort of food that such flies like. In short, it is almost impossible to program computers in advance with some of the translation skills mentioned at the start – non-linguistic knowledge about the world, intuition, a sense of style, cultural awareness, and common sense.

We can confidently dismiss machine translation, at least on current technology, for anything other than rather rudimentary translation. There are exceptions, particularly where there is a limited vocabulary, the best known example being the machine translation of Canadian weather forecasts between French and English. The EU are also working hard in this area, with some modest success in dealing with their voracious translation needs. But it is hard to see any serious threat to human translation, at least at the quality end of the spectrum.

Other uses of computers

If computers cannot handle language, what is left? The answer is to look at computers for what they are, mere machines, but with quite extraordinary abilities to handle and process information. In short, we need to keep them firmly in their place as slaves rather than masters of translation. This article looks at these individual areas:

  • Electronic dictionaries and works of reference
  • The Internet
  • Translation memory
  • Speech recognition

Electronic dictionaries and works of reference

There is something intellectually satisfying, even romantic and tactile, about large dictionaries or encyclopaedias on a bookshelf. In contrast, the electronic versions are flimsy CDs in cheap cardboard boxes. But it takes only a little experience with the electronic equivalents to discover the substantial advantages in size, cost and ease of use. Searching in dictionaries, both monolingual and bilingual, is enormously quicker and easier. An individual word can be found within seconds, even if it is not a headword, and occurs under other words. It is equally easy to find phrases, or all words starting with a particular sequence of letters. In many cases, the software enables several dictionaries to be open and searchable at the same time. One particularly effective combination is provided by the Swedish publishers Norstedt, who offer their English-Swedish and Swedish-English dictionaries on CD. The monolingual Svenska Akademiens Ordlista (SAOL) is also available on CD, and can be incorporated, so that all 3 dictionaries can be displayed together. It is also possible to add your own amplifying notes and comments to existing words. Finally it is easy to cut and paste words from the dictionaries into your current translation, thus avoiding any need to retype words.

Another impressive product comes from a Swedish company in Växjö called Wordfinder. They offer a range of dictionaries from various publishers between various languages, mainly but not exclusively to and from Swedish. They can all be built into a single interface similar in principle though different in design from the Norstedt offering mentioned above. Wordfinder have signed an agreement to supply their software to the EU.

Using Wordfinder is a delight because of the sheer speed and convenience of using it. Once the text to be translated and Wordfinder are loaded side by side on the computer screen, the searching process is extraordinarily fast, and does not even require cutting and pasting. It is enough to double-click with the mouse on a source word, move the cursor to Wordfinder and click there once. The word with all its translations appears instantly. As with Norstedts, it is possible to group together different dictionaries and see all their translations at the same time. Another click will copy the selected word or phrase back into your text. Compared to carrying out the same process with paper dictionaries, the increase in speed and productivity is enormous.

Another attraction of electronic dictionaries, and Wordfinder in particular, is that it is possible to create your own dictionaries and reference works within the same software. Translators tend to hoard new and difficult words, as well as useful explanations and cross-references. Wordfinder provides the tools to create and manage such private reference works alongside their existing dictionaries.

Two other compelling advantages of these electronic versions are size and price. Some paper versions are forbidding on both counts. At the extreme end of the scale is the Oxford English Dictionary, with half a million words in 27 volumes, costing around £2,000. In contrast, the electronic version of the OED (on 2 CDs) costs little more than a tenth of that, and occupies less than 100 Mbytes of space on a computer hard disc – a trivial amount by today’s gargantuan disc standards. Several Swedish dictionaries and encyclopaedias are available in electronic form. CDs installed on a laptop provide a highly portable, convenient and comprehensive library for peripatetic translators.

The Internet

The Internet has sprung up almost out of nowhere in the last 10 years, and now provides an extraordinary resource for translators. A good example is Svenska Akademiens Ordbok (SAOB), still incomplete (it currently covers only A – TOJS at the time of writing), but it is an unrivalled source of web-based information about Swedish words.

There is an ever-growing range of other information on the Internet, and many web sites and email groups geared specifically to translators. On some of these, it is possible to ask questions about difficult words or phrases, and to get an answer, sometimes a very good answer, remarkably quickly. The Internet is becoming more and more are useful as a resource for terminology. One particularly valuable resource is the EU. Thousands of documents exist in multilingual versions, and this provides an enormous store of expertly translated documents that can be compared and contrasted – and can form the basis for terminological research.

Translation memory

Earlier, we dismissed machine translation (MT), but as this article suggests, computers can still be useful under the general heading of CAT (Computer Assisted Translation). One branch of this is called TM (Translation Memory), and that is becoming increasingly important in run-of-the-mill translation work – commercial, legal, financial etc. It is, and will probably remain, much less useful for literary translation, and no use whatsoever for translating poetry. But it is worth keeping an eye on TM for the potential benefits, even in literary translation.

TM comes into its own in translating texts with a degree of repetition. What TM does is to store every sentence you translate, and then if the same or a similar sentence occurs later on, it fetches the sentence from its memory and displays it on the screen. This enables the translator to adopt, adapt or reject the proffered translation. It is also possible to search for individual words or phrases that might already have been translated.

Clearly, in repetitive texts such as legal documents, technical user guides and suchlike, this can be a great time-saver. It also encourages consistent translation of terminology, though this is exactly why it is less suitable for literature.

Another rarely-mentioned advantage of TM is a physical one. Although the various TM systems all look different, one thing they have in common is that they typically work sentence by sentence, and display both the source and the target sentence in close proximity on the screen – either one above the other, or side by side. This means much less eye and head movement than in more traditional translation methods. This attribute alone may make TM attractive to literary translators.

Speech recognition

Last, but by no means least, speech recognition software is a computer tool with perhaps the greatest potential for increasing translation productivity without affecting quality – indeed arguably both quantity and quality can be increased in parallel.

Speech recognition, also called voice recognition, is software that behaves exactly like an audio typist. You dictate your words to your PC through a microphone, and the words appear directly on the screen as if typed – with no other human intervention.

“Hang on,” you may be saying, “earlier we were told that computers are just stupid machines, incapable of translating. Now suddenly they understand speech. Is that not halfway to translation?” No, because speech recognition software does not understand speech or language – it merely digitizes the sounds it hears, and compares these with its library of phonemes and its dictionary of words. This sort of simple fast number-crunching is exactly what computers are best at, and over the last few years, speech recognition has made impressive progress. The software will never be as clever as a human typist, but it can nevertheless achieve remarkably high accuracy.

Another incidental advantage is that if the computer recognizes a word, it will never make spelling mistakes – only word recognition mistakes. But it can also be cleverer than expected. Take a sentence like “I wonder whether the weather will be better tomorrow.” Clearly, it is hard for the computer to distinguish “whether” from “weather”. But by correcting such sentences, the computer is able to learn and improve its accuracy – as in this case.

Speech recognition also means far less wear and tear on fingers and tendons – welcome news to RSI sufferers and those keen to avoid it in future. It also offers another unexpected advantage. It is much better at handling sentences rather than single words – there is more statistical material to work on. Therefore, when dictating, it is much better to work out a whole sentence in your mind before starting to talk. The result is not only quicker than typing, but arguably produces sentences that flow better, because dictating highlights any inelegant phrasing that might be missed when keyboarding. (Years ago, when I was a journalist writing BBC Radio 4 news, we were required to dictate to typists, rather than type news items ourselves, for precisely that reason).

But this is inevitably a very subjective consideration, and many will argue that their “thinking with the keyboard” habits are too ingrained. It has to be admitted also that speech recognition software demands a degree of computer familiarity and patience that will drive some people up the wall. But when it works, speech recognition software is magical (this article was written using SR from a comfortable armchair). It also seems particularly suitable for literary translation.

The future of books

The death of books has been forecast for some decades, and like Mark Twain’s death, has proved to be greatly exaggerated. But there is now at least some writing on the wall. As described above, works of reference are not only cheaper but much easier to use in their electronic versions. Novels and poetry are still of course much more appealing in their traditional form. No one wants to curl up by the fireside with a lumpy computer screen. But the future promises lightweight tablet PCs, and one can easily visualize a time in coming decades when people will be able to download best-sellers, newspapers and magazines into a slimline tablet PC to read or listen to, rather like the way they now download music to their iPods – and like such music, at a fraction of the price we now pay for traditional paper formats.

This has interesting longer-term implications for translators. Books will increasingly be stored not as documents, but in electronic formats that allows the text to be adapted, or in today’s jargon “re-purposed”, for publishing in different ways – whether as traditional books, Web pages, tablet PCs, memory cards etc. Today’s de facto standard is Microsoft Word. Tomorrow’s is likely to be XML or PDF, and translation will increasingly require specialized computer tools. But at the same time it raises the prospect of cheaper books – thus removing some of the cost barriers to translations of foreign literature that exist today. In short, electronic books are both a threat and an opportunity for translators.

Translation methods then and now

The advent of the PC some 20 years ago opened up the possibility of increasing translation productivity – and job satisfaction – without loss of quality. Given that the threat of Machine Translation is still distant, but that competition from computer-enabled translators is growing, it is worth looking into this. The first step is to look at your current translation process from a time and motion perspective. What activities take the most time? If it is looking up words and phrases in paper dictionaries or encyclopaedias, then electronic versions can be explored. If there is a need for terminological research, then the Internet is becoming an indispensable resource, and there are software packages that provide excellent facilities for analysing terminology and handling bitexts (parallel source and target texts). For translation involving any repetition, Translation Memory software is worth investigating. Transition Memory and Speech Recognition may also ease the physical stress of spending long hours hunched over a keyboard.

After all that good news, there has to be some bad news, and the bad news is that all these computer solutions involve a more or less steep learning curve. Help is needed from other people higher up the learning curve. But surely that is what translator associations are for?

Some websites :

Norstedt http://www.panorstedt.se/
Wordfinder http://www.wordfinder.se/
SAOL http://spraakdata.gu.se/saol/saol.html
SAOB http://g3.spraakdata.gu.se/saob/
OED http://www.oed.com/
EU http://www.europa.eu.int/
Translation Memory several suppliers
Speech Recognition various suppliers