Translation Technology Archives - sa��ʴ�ý

Making machine translation work for us – part 3

Minna Helminen — Wed, 15 Aug 2018 19:10:09 +0000

In the previous two parts of our interview with STP’s machine translation guru, Mattia Ruaro, we discussed different kinds of machine translation (MT), the way the technology is changing, and how it can and should be used in the translation industry.

In this final part, Mattia shares his thoughts on how translators can use MT as a tool – and how STP is going about it.

You mentioned that editing machine translation output is a skill all of its own for a translator. How does it differ from translation?

I’d say that machine translation post-editing is not really that different from translation these days. Of course it’s quite different from translating a text from scratch in a word processor, but I think sometimes people forget that translators very often work with translation memories (TMs) nowadays. So they don’t necessarily have a blank slate even without MT.

How does working with machine translation compare to working with translation memories?

It’s somewhat similar; essentially, you are editing matches in both cases. In the case of TM matches, a tool will suggest translations of similar sentences that have been translated before and stored in a translation memory file attached to the project.

The translator might, for example, have a 95 per cent match where only the punctuation is different to that of the sentence they are looking at – or perhaps there is just one word that is different. Translators have become used to editing TM matches. An MT match is often much less accurate, but it’s a starting point.

How does the process of post-editing differ from the process of translation? What does a translator need to know before starting this?

The biggest problem, particularly for inexperienced editors, is bearing in mind that MT output is the work of a machine, not a human. You can’t trust a machine the same way you can trust a translation memory match from a previous translator.

This seems like a fairly straightforward distinction – the clue is in the name. But many struggle to make this distinction.

Another thing is the amount of training, because there is very little training and resources available. This is why we recorded webinars for our freelancers, and all our in-house translators have received training too. We can’t give people MT output and expect them to just deal with it.

Machine translation post-editing (MTPE) is not as intuitive as people think: training, experience and knowledge are necessary. It’s really helpful to try to understand why the machine produces the output it does – but this is something that requires an understanding of technology.

From my perspective, it’s really helpful to have very specific feedback from translators, as training the engine requires precision.

You can and should be able to influence the engine quality – you can train the engine as well as the translator. If you “put yourself in the machine’s shoes”, things start to fall into place.

STP is certified in MTPE according to the ISO 18587 standard. Why is this?

It shows the amount of effort we’ve put into learning, understanding and using this kind of technology as a company. And this isn’t just the case for the technology team – our production teams have put in a lot of work as well.

Adhering to the standard is something we are doing with everyone’s best interests in mind; we’re trying to contribute to making a positive difference in the industry.

The standard is basically a set of guidelines – I would describe them as a collection of best practices. Basically, they raise the bar for everyone in the industry. Companies that care about these standards can promote them and counter the misuse of MT technology.

Do you think there is a lot of deliberate misuse of MT in the industry?

Some, certainly. There are companies trying to pass off raw MT output as translation and sending it out to vendors as regular revision projects, for example. But these agencies know what they are doing – and the revisers can spot this kind of thing a mile away.

There are some companies that lack information on the MT that they are using – or that they are expecting their vendors to use. They simply don’t know how good the MT output is, since they don’t have in-house people proficient in the relevant languages to check and provide feedback on it. STP only generates output for languages that we can check in-house. That way we know exactly what sort of quality it is.

Would you say that MTPE is faster than translation without MT?

There has been a lot of talk about MT improving productivity, but most of the research on this is done with very few people who are not working with strict deadlines. These circumstances do not really reflect the way in which translators work in the commercial world. The studies often make flawed assumptions too.

AT STP, we can test the effectiveness of MT as a tool internally. We have a lot of information on our translators and they already work with deadlines and under pressure, which makes them ideal test subjects.

How do you measure something like this accurately?

We have data based on edit distance – how different the final, edited output is from the raw, unedited MT output. In general, it seems that people are more productive with MT than without, though that doesn’t necessarily mean the quality is good.

How does STP measure machine translation productivity?

Basically, we are making an effort to track productivity gains. We are doing this by recording how much time projects where no MT is used take compared to MTPE tasks. It’s not the perfect metric, but we need some hard data on MT and how useful it actually is.

Is the difference that MT makes reflected in STP’s translation rates?

For us, it’s really not as simple as that. In terms of efficiency, we want to be sure we know what we are actually getting.

I see a lot of nonsense numbers being thrown around. For example, MTPE is supposedly 50% more efficient than translation. Even if there are time-saving aspects to this, it’s not realistic to put it in those terms.

The productivity increase needs to be contextualised as well. There are often other aspects that slow the work down, such as special instructions that need to be read and implemented.

At STP, we want to take into account the total effort people put into a project. And, at the end of the day, you still have to do the work – the engine just provides suggestions.

Based on the feedback we’ve had from our translators, so-called “high fuzzies”, meaning TM matches that are ranked as a 75% match or higher by the CAT tool, are almost always more helpful than MT matches. So when our translators use MT, they are only using it for sentences where there are no “high fuzzies” available. So far, this has been a useful approach for us.

The one thing that is perhaps different at STP is that we have over 70 in-house translators who can help us develop our approach.

How does having a large team of in-house translators help?

They are all professionals who have been trained to post-edit MT output, and they are happy to help us develop the engines further. I can understand how a smaller company might find this harder.

At STP, we work with a small number of languages on a daily basis, so that means fewer engines to worry about than some other companies.

If people are not happy with something, we can try to improve it – or abandon it if that doesn’t help. We can go back to the drawing board.

How do you work with the in-house teams in practice?

We have one person for each target language who is our go-to person for MT development. So far, we’ve had this for all the Scandinavian languages and English. I work with these MT “power users”, or MT experts, when I need feedback.

It’s easy to do this with translators who are genuinely interested in the process and the technology. The technology would not really be worth much to us without our translator teams – their effort is crucial in all stages of the process.

Learn more about machine translation here.

The post Making machine translation work for us – part 3 appeared first on sa��ʴ�ý.

Making machine translation work for us – part 2

Minna Helminen — Thu, 02 Aug 2018 09:14:09 +0000

In part 1 of our interview with Mattia Ruaro, STP’s resident machine translation specialist, we talked about machine translation (MT) in general: how it works, how it has been used at STP and what companies can do to train the MT engines they use.

In part 2 today, you can read Mattia’s thoughts on the newest development within MT technology, which has people predicting the end of translation as we know it: neural machine translation.

So, Mattia, what is neural machine translation? And what’s with the hype?

Neural machine translation (NMT) is essentially the same as statistical machine translation (SMT), but there is more of a “brain” behind it. NMT can potentially improve itself over time and learn on its own.

The vital difference is the amount of data an NMT engine needs – which is way, way more than a traditional SMT engine.

Essentially you have nodes that establish connections on several levels, such as the context and clause level. This makes NMT more flexible – it can analyse shorter bits of text, so the flow of the target output tends to be better.

We often joke that when you train a SMT engine, you’re training a machine. Neural is more like teaching a child a language – or bringing up a bilingual child! While the engine is learning, it makes plenty of mistakes along the way, of course.

How does NMT output compare to previous technologies?

The first thing is better fluency. The output from an NMT engine tends to be more idiomatic, meaning it reads more like natural language. More often than before, the engines are able to use an appropriate synonym or expression within the context of the sentence at hand.

Adapting to the immediate context helps a lot with languages like German or Danish that have complex syntax. Subclauses separated by commas are interpreted more accurately, for instance.

One key aspect of NMT is that it interprets morphology better. For example, a verb in the first person would usually be rendered as an equivalent verb in the first person. So, if the source says I write in English, the target would be ��’é�� in French, with the correct ending. If the engine cannot recognise the person, it will give you the next best thing, which is usually the verb in the infinitive (for example é��). This is then easy to edit manually.

We talked about training MT engines before. How does training NMT engines differ from SMT and RBMT (rule-based machine translation) engines?

NMT needs a lot more data than SMT and RBMT. The biggest hindrance to adopting NMT in the first place is that smaller companies can’t find enough data. To get started, a NMT engine needs at least 10 million words of data.

By comparison, an SMT engine can be good as long as the data is good; you can get a decent SMT engine with as few as a million words.

So, NMT is much more about quantity over quality in this respect! Just to put this into perspective, our Finnish NMT engine has 140 million words right now.

Another thing is training the engine. NMT engines tend to resolve issues themselves based on data you add – they come up with rules. You can still add rules if you want, but sometimes this can be counterproductive – you risk doing too much, being too strict.

For example, a German to English translator at STP was wondering why the German-English engine was translating personal names. It turned out that these specific names were also all meaningful nouns (such as the surname ��ü��, which means “miller”). This means we had to consider the need for a new rule carefully, since the noun ��ü�� (capitalised, like all nouns in German) might come up in a text about millers later.

In this case, leaving it alone and replacing the translated name manually each time was the easiest thing to do. It’s an easy mistake for the translator to spot. You see the error, you check the source and you fix the output accordingly. No one is expecting the output to be perfect.

Will NMT replace human translators?

A hundred times, no! A technology like this is only as good as the use you make of it.

I could imagine a situation where a company with several offices around the world would need internal communications, such as short messages from HR, translated very quickly. These could be run through a specialised engine the company has developed and trained for that purpose. The translation wouldn’t be high quality, but people would get the gist. But this would be internal communication and nothing customers would ever see – just for information purposes. Another example is using MT to translate large amounts of survey responses for market research purposes.

But this is not how it’s been used or how it is perceived by many. Many early adopters of machine translation have misused the technology, which has affected its reputation.

The key thing is to use MT output appropriately. Professional translators can use it as a tool. It has even been suggested that post-editing output produced by a MT engine could be a separate service one provides as a translator, as long as you know what you are doing.

Translators are not being replaced; it just that the way they work is changing.

Does NMT technology work differently with different language pairs?

It seems it has done, for some language pairs. For instance, English-Japanese is working quite well, which I find quite impressive. Nordic languages have not been concentrated on much, as they are smaller.

German output seems to suffer from the syntactic complexity and strictness of the language, and capitalisation is a huge issue. Romance languages seem to be working fairly well; NMT engines seem to cope with their verb paradigms and tenses.

Rather than the language pair, the issue is more the target language itself. Obviously Finnish has been a bit of a headache for us.

Why is Finnish more difficult for NMT?

I think morphology is more important, the grammatical complexity within words. The engine will have a harder time discerning the different parts of a word.

The Finnish case system is a real challenge for the engines. Each case ending is a variable, and you need to consider this variable in every scenario. Finnish has 15 different cases and there are several possible endings for many of those cases, which means there are a lot of potential alternatives.

So far, I have only heard of one company making a Finnish engine work really well in the terms of the morphology and fluency. And that can only be achieved by specialising in one language.

How costly is neural machine translation? Is it worth investing in NMT?

Very costly. You need powerful servers to operate the amount of data we’re talking about. If SMT is like driving a car, NMT is more like flying a jet – the fuel costs are much higher. It’s a lot more affordable now than it was before, though. More and more options are becoming available and prices are falling.

In terms of cost-efficiency, I would say that, if used correctly, MT has the potential to really speed up translation in established workflows.

How secure is MT in general and NMT specifically? How can we be sure that personal data and other data is safe?

It’s as secure as you want it to be. It depends on who deals with your engines and how. We have third-party technology, but we’ve checked their locations and their background.

We also clean the data to keep it secure so that no personal data gets used to train engines. Even Google no longer reuses the data you send back to them. For a while now, they have limited themselves to the data from Google itself rather than using the final output from the translators.

In other words, I think machine translation is very safe.

In part 3 of the interview with Mattia, we will talk to him about the practice of machine translation post-editing and how translators can learn to edit the output from MT engines.

Learn more about machine translation here.

The post Making machine translation work for us – part 2 appeared first on sa��ʴ�ý.

Making machine translation work for us – part 1

Minna Helminen — Wed, 25 Jul 2018 12:35:35 +0000

It seems machine translation is not only a big trend in the translation industry, but it’s become something of a buzzword outside of the industry, too. Machine translation is not a new phenomenon; for decades, academic researchers have been looking into the possibility of using a machine to translate one language into another without human intervention.

Types of machine translation becoming available freely online has changed most people’s behaviour (at least online): you can now get the gist of an article or a website written in a language you do not understand with a few clicks.

Other machine translation engines are now being used by professional translators as well. The latest development is using artificial intelligence to help make the engines more accurate, which has led some to predict that the machines will take over the translation tasks performed by humans.

We sat down with the machine translation (MT) specialist in STP’s technology team, Mattia Ruaro, to discuss MT in the industry and at STP. Mattia is a translator by training and has become a key part of STP’s technology team after starting out in a project management role.

In this first part, we’ll talk to Mattia about what machine translation is and how machine translation engines can be used – and trained.

So, Mattia, how does machine translation work?

Machine translation is the technology that allows an engine to translate from one natural language to another. Thus far, natural language has basically also meant written language. Machine translation has been around for decades, but there has been a lot of progress in the last 20 years.

There are several types of MT engines; the rule-based ones came first, then the statistical ones and after that the more recent neural machine translation. Every new type of MT has followed the same pattern: the technology has been developed, it’s been trialled and used with a lot of enthusiasm – and then people have discovered its limitations.

While there is a lot of hype about the latest technology, neural MT, even replacing human translators, it has limitations, too. This cycle seems to be there for all the different technologies – none of them are actually quite the miracle solution they are hyped up to be at the start.

What are the differences between statistical machine translation (SMT) and rule-based machine translation (RBMT)?

In essence, rule-based machine translation does what it says on the tin; the engine operates according to a set of rules, which are inputted by the developer. Nothing apart from the rules regulates the output from the engine.

The limitations of purely rule-based machine translation were discovered quickly. You need to input all the rules manually and sometimes a long list of exceptions, which is just not viable in a commercial environment, since it takes far too long.

The only exception to this are situations where your source language and your target language are closely related. This means that the languages are very close in terms of their lexicon and the semantics of that lexicon, as well as being structurally similar. Since you don’t need to input lots of different rules, you save a lot of effort.

Statistical engines are different: they draw on data to create patterns – this is a more recent approach. It’s basically about feeding the engine as much data as possible and the engine finding patterns in that data.

At STP, which types of MT engines out of the ones you mention have been used?

All of them. We tried rule-based engines for translating between Scandinavian languages, which are closely related. So, we would use a rule-based engine to produce output to help with a text we were translating from Danish into Swedish, for example.

For the past 4–5 years, statistical engines have been more viable for us business-wise. Lately, we have been experimenting with neural machine translation. We started with only English into Finnish for neural MT, but we are now in the process of trialling it with other language pairs as well. So far, it seems to be working well in terms of the fluency of the output but it still has some difficulties processing terminology, particularly when it comes to specialised areas. Only time – and extensive testing – will tell how much better this technology truly is..

Thus far, which languages is machine translation most successful for? What about text domains?

For us at STP, the differences have been bigger between different domains than between different language pairs. The big advantage of statistical engines over rule-based ones has been customisability. It’s all about the data you feed the engine.

If you only input data for one domain, you can get rather good results, since you are training the engine for a narrow scope of material. This has been successful for software, mechanical engineering, financial and business – the latter is a bit of a catch-all term for things like website content, newsletters, HR documentation and so on.

But MT has certainly not been successful for all domains. For example, we haven’t had much success with medical engines. Medical texts are heavily regulated, and machine-translation suggestions can become more of a hindrance than a help when you’re having to follow multiple glossaries and style guides.

Is it possible to train an engine with the help of glossaries and other resources?

Yes, with glossaries, certainly. Style guides are guidelines and they do not contain absolute rules, most of the time, so they are more difficult to implement. It also has to be said that these resources are only as useful as the client makes them.

Another issue with glossaries and resources is that they are often specific to one client – creating and training an engine for just one client is a big investment of time, effort and money. So, we need to be sure that it will be of use in the future – it’s a risky investment for a language service provider to make.

How do you train an MT engine to give you good-quality output?

By having a lot of good data to begin with. If you’re looking for material to input, make sure it’s clean, flowing text and just text. It’s much better to clean the data than to feed the engine unnecessary clutter.

Once the first batch of data has been inputted, you should start using it and get feedback from translators to see if you can tweak the engine.

Ideally, you would prepare the data to make it easier for the MT engine: you’d get rid of extra formatting and tags and make it easier for the engine to parse. MT engines will struggle with extremely long segments and fragmented content.

If it’s possible to get feedback and train the engine based on that, I would recommend this. The cycle of preparing the input, training the engine and asking for feedback should be repeated regularly.

This practice of continuously improving MT engines is actually part of the machine translation post-editing standard ISO 18587 that STP received a certification in in March this year – you have to make sure that there’s a constant loop of feedback and improvement!

In part 2, you can read more about Mattia’s thoughts on neural machine translation and how STP has approached using machine translation as another technology to help translators in their work.

Learn more about machine translation here.

The post Making machine translation work for us – part 1 appeared first on sa��ʴ�ý.

Translation Technology Archives - sa���ʴ�ý

Making machine translation work for us – part 3

You mentioned that editing machine translation output is a skill all of its own for a translator. How does it differ from translation?

How does working with machine translation compare to working with translation memories?

How does the process of post-editing differ from the process of translation? What does a translator need to know before starting this?

STP is certified in MTPE according to the ISO 18587 standard. Why is this?

Do you think there is a lot of deliberate misuse of MT in the industry?

Would you say that MTPE is faster than translation without MT?

How do you measure something like this accurately?

How does STP measure machine translation productivity?

Is the difference that MT makes reflected in STP’s translation rates?

How does having a large team of in-house translators help?

How do you work with the in-house teams in practice?

Making machine translation work for us – part 2

So, Mattia, what is neural machine translation? And what’s with the hype?

How does NMT output compare to previous technologies?

We talked about training MT engines before. How does training NMT engines differ from SMT and RBMT (rule-based machine translation) engines?

Will NMT replace human translators?

Does NMT technology work differently with different language pairs?

Why is Finnish more difficult for NMT?

How costly is neural machine translation? Is it worth investing in NMT?

How secure is MT in general and NMT specifically? How can we be sure that personal data and other data is safe?

Making machine translation work for us – part 1

So, Mattia, how does machine translation work?

What are the differences between statistical machine translation (SMT) and rule-based machine translation (RBMT)?

At STP, which types of MT engines out of the ones you mention have been used?

Thus far, which languages is machine translation most successful for? What about text domains?

Is it possible to train an engine with the help of glossaries and other resources?

How do you train an MT engine to give you good-quality output?

Translation Technology Archives - sa��ʴ�ý