20: The Challenge of Neural MT: Part I

February 01, 2017

Renato Beninatto, Michael W. Stevens with Mike Dillinger, Marco Trombetti, John Tinsley, Chris Wendt

The New York Times is writing about it. So is The Economist and dozens of other prestigious business publications. Why is Neural MT suddenly so important–and critical to language service providers? Tune in for the first in our three-part series on why neural MT is one of the hottest trends in the translation industry today.

Globally Speaking, sponsored by RWS and Nimdzi Insights.

Download episode

Show transcript

Speaker	Transcript
Renato	I am Renato Beninatto.
Michael	And I am Michael Stevens. The New York Times, the Economist, two of the most prestigious publications in the world have recently published extensive articles, not just short articles but extensive ones about artificial intelligence or AI for sort. And neural machine translation, Renato, why is this important?
Renato	You know, I frequently share content in social media with this hashtag that translation is only news when it’s bad because that’s what really happens. You only hear about translation when something goes wrong. But, this seems to be an exception. For the first time in a long time I see the press talking about progress made in the language services space.
Michael	Yes. So, for this podcast we’ve talked to several people who are directly involved with neural machine translation around the world. We’re going to hear from their voices, their opinions and help us understand what neural MT is doing, how it’s affecting the business of translation.
Renato	Or not!
Michael	Right. Translation is the most difficult and complicated area for artificial intelligence. It’s no surprise that Microsoft, Google, Baidu, IBM, Amazon and even Facebook use MT to brag about how technically skilled they are.
Renato	That’s the whole thing. It’s so difficult that if they can solve the language problem they’ve proved their skills. But, perhaps, we should start by talking about Google Translate which was launched in 2006 and since then it has become one of Google’s most popular products.
Michael	It’s in regular use of just about everyone I know on a daily basis.
Renato	Absolutely, yes. So the data for January 2017, according to the data, Google Translate is available in 103 languages; it serves more than 500 million users on a monthly basis, and these people translate 140 billion words per day—that’s 140 billion with a ‘B’ Michael.
Michael	Okay, so doing a little math there, this means Google alone translates more words in a day than all the translators in the world translate in a year.
Renato	That’s a very good way to put it, but the reason for starting with Google is that they have this new strategy in the company that they call AI First. So, it’s a company-wide initiative, and because of that initiative they started converting Google Translate from statistical MT to neural MT, and it hit the news that their new machine translation was indistinguishable from humans.
Michael	The news hit that it was indistinguishable from humans, but really it wasn’t. Google clarified the real message. It’s actually much better than it was before, but not as good as humans. Google went further to say that their efforts weren’t about replacing people but to help people.
Renato	Yes, and we have a lot of fear in our space, and the word was out. So, naturally, professional translators and bloggers, the pundits, the media, and it became a hot topic for the language services industry as a whole.
Michael	Yes, it’s trending, as they say.
Renato	Yes, exactly. So, we, actually in our podcast about the predictions for 2017, we said it was going to be a hot topic. I didn’t expect it to be so hot so early.
Michael	So early. But, Renato, neural MT is the latest iteration of a technology that’s been around since 1950. So, how did we get here?
Renato	Well, before neural MT there was rules-based machine translation, SYSTRAN being the most famous which was followed by statistical phrase-based machine translation, and this is a good time to introduce our first guest.
Mike	I’m Mike Dillinger. I’ve been working in machine translation for 20 years. I’m a former president to the Association for Machine Translation for the Americas and I’m currently managing machine translation at LinkedIn. My key area of interest is how to get machines and humans to work together most effectively so that we can produce systems that I like to call hybrid intelligence systems.
Michael	If you had to explain to an eighth grader rule-based machine translation and statistical could you give us a definition for each?
Mike	A definition, maybe an explanation. Okay, actually, I think the statistical MT is easier to explain. A statistical MT system takes a lot of examples of sentences that humans have translated and actually counts up for each pair of words and each triple of words the different ways that it got translated in those example sentences. And, when a new sentence comes in, it goes through each of the words and pairs of words and triples of words and says, “all right, let me see what was the most common way that this word or pair or triple of words showed up in the other language,” and it pieces together the translations that way, based on a big list of counts. “This word got translated as that word this many times in my examples.” I think that’s the easiest way to explain statistical MT. There’s a whole bunch of other mathematical magic that goes on to make that work better. And rule-based MT is an effort to actually take, capture the knowledge from a human and put it in the machine. So, if the human says, “ah ha, these words in this kind of sentence mean this; that’s a rule; show up as this in the other languages. And these same words but in the other sentence actually mean this other thing in the other sentence”. And rule-based systems literally take rules like that and put thousands of them into a computer so that when a new sentence comes in you see which rules will match, and then those rules will tell us what the translation looks like. So, instead of counting, it’s rules.
Michael	We wanted to understand what neural machine translation was compared to the other two systems. So, we went to the master-mind behind MateCat.
Marco	Marco Trombetti, tech entrepreneur in the language industry for the last 15 years.
Renato	And where are you based?
Marco	Rome, Italy.
Michael	Hmm, not everyone can live in paradise!
Renato	So, if you were to formally define neural machine translation, how would you define that, concisely, for a layman, for your mother?
Marco	Yeah. So, let’s say what statistical was before. So, we’re taking a lot of examples, giving these examples to a machine and that machine is learning which word is the translation or which word, and then learning which sequences are the translation of which sequences to an algorithm that we designed. Neural is just another approach where you first… the data structure is different because it’s trying to simulate the human brain in terms of structure of the data, structure of the learning, but also what to learn is something that the machine learns automatically. So, you don’t have to teach exactly what word alignment is, how to look for words, an example. It’s trying to look at all the correlation there is in data. And so you don’t have to teach where to look, it’s basically looking; if there is any correlation in the data, it will learn it. So, it’s a much broader approach and obviously it’s much more complex. It takes 100 times, if you want, 100 times the cost of the previous one. But at one point it will become sustainable.
Renato	Okay. So, the idea and this is where all these terms get mixed and you told me that they are essentially the same thing, machine learning, neural machine translation, deep learning. These are all very similar concepts.
Marco	Yeah, machine learning is also statistical. Machine learning is a concept of being able to teach a machine how to do something that the human does. So, that was an approach. Now, deep learning is a much more generalized approach to that. It’s not even… I mean, also this is limited in some way and this will be evolved, the algorithms will evolve a lot and will learn from other things, but it’s more generalized. But, also, what is incredibly interesting, and I think that’s why everybody is using it, it’s because we humans know we move forward because we’re lazy. So, what happened with neural is that the one million lines of code that was statistical, now, are only 280 lines of code in neural. So, from a developer perspective, this is extremely much easier to implement. You will have the hardware, the machine, to work for you a lot and it will cost a fortune, but your work as a developer is much simplified because they delegate everything to the calculation, it will work at night to try to find something that you didn’t find.
Michael	I may be trying to oversimplify it. But in statistical, we were the ones creating the algorithm, people, and then trying to get better results. With deep learning and neural networks the machine is actually creating the algorithm. Is that accurate?
Marco	This is correct. And, in fact, if you see the evolution in rule-based we were teaching old rules and now to transform every single word in the position, statistical is still a bit better, it’s doing some work for you but, still, you are teaching where to look. In deep learning you don’t even teach where to look.
Michael	And so, I think, also going back to rules-based, talk about a language rule what were we in charge of then that we’ve given up now?
Marco	Before you were telling the rule-based system, “okay, when there is an adjective, put it before the noun in English and revert it.” And then that’s in statistical you don’t do it anymore because it will learn by looking at the word position. In neural it’s looking at every single other relationship that there is within those words in the sentence. So, including words that appear in the sentence that are far from them, but they’re still related. And if you can learn that those two words are related then at one point the machine will learn it.
Renato	Michael, the simplest description of a neural network is that it’s a machine that makes classifications or predictions based on its ability to discover patterns in data. So, with one layer you could find only simple patterns; with more than one you could look for patterns of patterns. This is how neural networks learned how to identify images. One little feature at a time. This is how they are addressing the language problem also.
Michael	So, we went to John Tinsley, he’s the CEO of Iconic Translation Machines, a company based in Dublin, Ireland and they help build custom-MT solutions, to ask him about other applications of neural networks.
John	The reason people are getting excited about it for translation is because it did orders of magnitude levels of improvement in other areas like image recognition. So, people kind of take it for granted now, but the fact that you can go onto Google and you can search for “blue sky, landscape” and go to images and it will give you thousands and thousands of pictures of blue sky landscapes. That was all done with neural image processing, and you’ve got a lot of cool companies, for example, the likes of Shutterstock who are doing things with that. Just being able to put better tags on images, people being able to find images and things like that. So, there was this excitement that “this works!” because neural networks aren’t new; they’ve been around for ages, and they’ve been having success in these application areas and, I guess, Moore’s Law has caught up to a point where we can say “oh, okay, we have these computers now, we have these GPUs, let’s try and see if neural MT works for language,” but language is a lot more complex, so it’s not having the same breakthrough as it did in other areas, but, yes.
Renato	So, you mentioned imaging, a couple of others that come to mind?
John	Speech, probably speech recognition, because that’s where statistical MT came from, was all the research that was going on for speech processing, the kind of statistical models that they had. So, it’s any language application, natural language, understanding parsing, part of speech tagging, sentiment analysis, all that sort of stuff, it can all be done. This is another way to do it. They can all be done with rules. They can all be done with statistics. They can all be done with neural networks.
Michael	After Google Translate, the most popular MT in the Western world is Microsoft Translator. A cloud service that translates between more than 45 languages, it powers the translations in Microsoft products such as Microsoft Office, Skype, translator Bing and many others. Microsoft Translator uses an automatic translation engine that employs machine learning to generate statistical translation models and has recently launched a tool to compare a statistical MT with neural MT in eight language pairs. You can check it out at Translator.Microsoft.com/neural.
Chris	My name is Chris Wendt. I am the group program manager for machine translation at Microsoft.
Renato	I tested the comparison tool with my native Portuguese and French.
Chris	Yes. So, English/Portuguese, English/French, the quality data between statistical and neural is actually not that big. It is big on languages that have a stronger differing structure than English when you translate from it to English. Try Chinese or try Arabic…actually it’s good if you don’t speak Chinese or Arabic and you see if you actually get…
Renato	I can understand what it’s talking about. That’s a very good point.
Chris	So, the chance that you understand what this is about is much higher.
Renato	Chinese and Arabic, I’ll try that just for fun and see how that goes.
Chris	And I want you to look at one aspect again, as in what actually made you choose the one that you chose as better? Was it that it is more fluent and more understandable, or was it that it was more correct to the source, more faithful to the source?
Renato	I’m trying to remember. I just did a pair right now with some news from a Brazilian newspaper, and I chose the news because it had two names of financial institutions that by themselves are words with other meanings. So, Banco do Brasil. Banco can also mean a stool and the other one was caixa, which also means box. So, the funny thing that the statistical translated it as box and the neural one translated it as caixa, but in the middle of the sentence further down it translated it as box also. So, it couldn’t disambiguate. But, overall, I felt that the tone sounded more natural, the sentences sounded more natural. I didn’t really look for accuracy. There were sections where the language was confusing, and then when I went to check in the original it was a formatting issue; it had included captions in the middle of the text when I copied and pasted it, so captions for pictures and so on. So, both of them were pretty good but the statistical in this case, it was from Portuguese into English, did a better job with handling the names.
Chris	Yes. I figured that would be the case, statistical would win on accuracy and neural wins on fluency. Statistical, ours and some of our competitors, statistical systems are, in that sense, more mature as in they have gone through more scrutiny or longer scrutiny over the life, say during the last five, six, seven, eight years which has really addressed many of these shortcomings. But, we see that the statistical systems are flattening out, adding more data doesn’t help anymore. Tuning it here and there doesn’t really make too much of a difference anyway. A year ago if you went to the research conferences, there were papers on getting a quarter BLEUE point improvement by this or that technique. In neural you add and you layer to the network. You add something like the attention model. You have huge, huge improvements. So, the rate of improvement on the neural systems due to, well, actual algorithmic improvements is much faster.
Renato	So, what you’re saying is that we have pretty much plateaued in the statistical and we have a new runway with neural?
Chris	Exactly, yes.
Renato	So, we have room for improvement. Neural is getting close to statistical and then it’s going to go further.
Chris	Yes, yes.
Renato	So, if we go into the concepts of singularity, have we expedited singularity for translation or that was just one of the new steps in the path to that image of singularity. You know that Kurtzweil we would achieve singularity in machine translation by 2019. So, we are three years away.
Chris	I usually don’t answer that singularity question. I say the machines already translate 300 times more words per day than humans. So, does that mean that machines have replaced humans, like 300:1? Well, yes!
Michael	In the next episode of Globally Speaking, we’ll look at other aspects related to machine translation in general and neural machine translation in particular.
Renato	Topics like the relationship between human translators and the machine, confidentiality and the outlook for neural MT in the localization industry.
Michael	We welcome comments, questions, suggestions for podcasts and, on some level, constructive criticism.
Renato	Yes. You can say that we’re ugly too!

End of conversation

Guests

Mike Dillinger

Mike Dillinger is former President of the Association for Machine Translation for the Americas, and Manager, Taxonomy Team and Machine Translation at LinkedIn.

Marco Trombetti

Marco Trombetti is tech entrepreneur in the language industry and co-founder, Pi Campus.

John Tinsley

Stemming from his love of cutting-edge language technology, John co-founded Iconic Translation Machines at the end of 2012. He customizes neural machine translation engines for use across a wide variety of industries and use cases, including IT, Life Sciences and automotive fields. John graduated from Dublin City University in 2009 with a PhD in Computer Science, specializing in machine translation.

Chris Wendt

Chris Wendt is responsible for the planning and design of Microsoft’s machine translation services: Microsoft Translator, Bing Translator, Skype Translator and translation features in Office, Internet Explorer and Bing, as well as the subscription service available to the public. He guided the incubation of the original internal research project in the NLP group to one of the two most widely used automatic translation services on the web.

Hosts

Renato Beninatto

CEO of Nimdzi Insights LLC

Renato Beninatto is the co-author of The General Theory of the Translation Company and leads Nimdzi Insights, a think-tank and consulting company that focuses on growth strategies for localization leaders. A former owner of an LSP, an executive in some of the leading companies in the industry and a linguist in his own right, this Brazilian-Italian-American citizen can’t shut up in Portuguese, English, Italian, French or Spanish.

Michael W. Stevens

Vice President Americas, CC of Translated

Michael has over 10 years of experience in the localization and IT industries. A well-networked entrepreneur, Michael’s main interest is in connecting and bringing people together. He not only enjoys learning about a company’s exciting ideas and developments, he also has a keen ability to add value—and fire—to new and innovative thinking.

Listen Everywhere

Play episode

0:00