Tuesday, June 27, 2006

Babel Fish

The Economist describes attempts to build a Universal Translator.
This may sound fanciful, but already a system has been developed that can translate speeches or lectures from one language into another, in real time and regardless of the subject matter. The system required no programming of grammatical rules or syntax. Instead it was given a vast number of speeches, and their accurate translations (performed by humans) into a second language, for statistical analysis. One of the reasons it works so well is that these speeches came from the United Nations and the European Parliament, where a broad range of topics are discussed. “The linguistic knowledge is automatically extracted from these huge data resources,” says Dr Waibel.

Statistical translation encompasses a range of techniques, but what they all have in common is the use of statistical analysis, rather than rigid rules, to convert text from one language into another. Most systems start with a large bilingual corpus of text. By analysing the frequency with which clusters of words appear in close proximity in the two languages, it is possible to work out which words correspond to each other in the two languages. This approach offers much greater flexibility than rule-based systems, since it translates languages based on how they are actually used, rather than relying on rigid grammatical rules which may not always be observed, and often have exceptions.

Examples abound of the ridiculous results produced by rule-based systems, which are unable to cope in the face of similes, ambiguities or bad grammar. In one example, a sentence written in Arabic meaning “The White House confirmed the existence of a new bin Laden tape” was translated using a standard rule-based translator and became “Alpine white new presence tape registered for coffee confirms Laden.” So it is hardly surprising that researchers in the field have migrated towards statistical translation in the past few years, says Dr Waibel.
I don't get this bit:
The next phase of the project, says Dr Black, will be to allow portable translation devices to be trained in the field. The idea is that when a traveller encounters people speaking a new language that is unknown by the translation device, it can be trained by exposing the software to lots of chatter. In theory, once a language model has been acquired, you could just leave the device in training mode in front of the television, although it would probably be preferable to find some bilingual people and ask them to repeat set phrases containing a lot of linguistic information, says Dr Black.
Without being fed any human translations?

