Google's Translation Overhaul – Interview on IDF Radio17 אפריל, 2017 בשעה 17:48 | פורסם ב15 דקות, English, אנגלית, עולם דיגיטלי, תרגום | תגובה אחת
This February, I gave an interview to Ido Kenan on Galei Tzahal (IDF Radio) about Google's upgraded Machine Translation system, including its claims that it learns an intermediary abstract language representation, an "Interlingua".
You can listen to the interview above this line, or here on Kenan's blog, where you can also read my writeup. Problem is, it's all in Hebrew! Well, what better than to use the fancy new Google Translate to render the thing into English?
Here it is, untouched. See how much you understand. (Retrieved March 16, 2017)
Abstract language behind Google Translation
Following a report by Google , technology sites reported excitedly AI behind Google Trnsliit invented language. Reality only a little less Krief: AI discovered a more efficient way to translate between languages, using interlingua – abstract language that links between the two languages.
Yuval Pinter, a doctoral student in computer science language processing at Georgia Tech, explained the significance of the innovation program. He told us in an interview ahead: "Google Translate recently deployed a new translation engine that replaced the old, gradually starting in September. The system implements research ideas out there for decades and has only recently become feasible. In November the team published a paper in which he explained how the system actually succeeds in presenting a trained human language in the abstract, that made possible to translate directly between languages that the system had never met them examples. The performance showed not perfect, but certainly there is a conceptual leap forward.
"For most of the history of machine translation, the prevailing attitude was' based-phrases. According to this view, Big arthritis source sentence into parts (phrases) that seem reasonable to translate, translate them separately and then build the sentence in the target language as much as possible try to score its syntax and logic. The knowledge of each of these stages can be built automatically: show in a lot of examples of translated sentences and grammatical sentences from one language, and it 'learns' how to break a sentence, how to translate each phrase, and how to catch up. But still there are a lot of human intervention at every stage and the aisles between them. For example, if we take the phrase 'the Prime Minister yesterday visited the power station, and ask for translation, the system will need to know, among other things: Primary government, one that translates this phrase The Prime Minister and head the Government; S'bikr, this physical sense (visited) and rhetorical (criticized); Verb translated phrase should follow the topic; That yesterday will have a comma after it; Q-Jeb, not in it at; And more. The first two rules will be studied at a reasonable level automatically, but the last three probably require human hand encode specific knowledge about Hebrew and English. These systems were common until today.
"In the new system there is a massive application of technology, which until a few years ago was largely theoretical amusement, and was made possible thanks to advances in computing power and configuration process, and aggregation of data volumes magnitudes above what was acceptable. The new algorithm, there are many rules that people have written, or at least directed on the basis of knowledge of any language, computer builds his own rules. The main difference is obtained directly translating a whole sentence complete sentence, and therefore do not need to know in advance the language as long as there is enough data.
, The last article that Google released showed not only the translation process is the same between pairs of languages, but Sctotzr effects of learning, the system builds a kind of general representation, not language-dependent, the court. We said that today a translation between two languages hung in the mere existence of millions of sentences that we know their translation. Make it easier for couples languages like English-French, for example, when the Canadian Parliament's protocol or mechanism in the EU goes multilingual uniform. But what happens when you try to build a system of translation from Korean to Swahili, Hebrew or Spanish? What they did today this translation through an intermediate language (in practice, always English), from Korean to English and Swahili. It is also much more logical application – let's say there are 100 languages, so have about 200 interfaces, where about 10,000 if you want a direct translation from any language to any language, it is not applicable. One drawback of this approach is the effect of "broken telephone" as the translations are not perfect, but there are also a matter of lost data. For example, Hebrew and Spanish – the phrase 'the cat eats the cheese' will be translated into English through the cat is eating the cheese, and losing here that the cat is female, a distinction that both Hebrew and Spanish, but no English. 'Intermediate language, abstract Google found her testimony, however, is rich enough to contain the information of grammatical gender, and that preservation GATA transition to Spanish.
"" Intermediate language "itself is, as mentioned, abstract. She lives in mathematically impossible to pronounce the words and phrases in it. Google showed how it exists? Took sentences translated from Japanese to English and Korean to English, and only a trained them. Then he said to translate sentences from Japanese to Korean, things she had not seen during training. Translations were reasonable and beautiful competed last through the English translation. Then a little 'Help' system with a relatively small amount of samples translation from Japanese to Korean, then the translations were as good as the direct model, trained many examples of Japanese-Korean.
"According to the notification of Google, the system has been deployed, this means that the system is now active as we are using Google Trnsliit. English translations for all languages and vice versa should be better than before. It seems we have not deployed the common model, but if it happens, also from Hebrew into other languages quality improves. The Hebrew they did not publish the results, I do not know if you speak Hebrew enjoying significant improvement. "