PARIS: Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. But now there is a new way: numbers.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu and others are constantly seeking to improve their translation tools.Up to 200 languages are currently used on Facebook, said Antoine Bordes, European co-director of fundamental AI research for the social network. Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.“For example, if you take the words ‘cat’ and ‘dog’, semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers.
But for the rarer language pair of English-Urdu, where Facebook’s traditional system doesn’t have many bilingual texts to reference, the word vector system is already superior, he said.In theory, yes, said Lample, but in practice a large body of written texts are needed to map the language, something lacking in Amazonian tribal languages.
He said “translating without parallel data” dictionaries or versions of the same documents in both languages – “is something of the Holy Grail” of machine translation.