Great service from lovely people

Expert advice you can trust

Technology to make translation easier

Great service from lovely people

Expert advice you can trust

Technology to make translation easier

Toller Service von netten Profis

Solide, fachkundige Beratung

Einfache Abläufe dank innovativer Tools

Un service de qualité à visage humain

Expertise, conseil et fiabilité
La technologie au service de la traduction
Servicio excelente de gente encantadora
Asesoramiento experto fiable
Tecnología que facilita la traducción
Servizio eccellente e sempre cordiale
Consigli di esperti fidati e competenti
Tecnologie che semplificano la traduzione
優秀な人材による優れたサービス
信頼できる専門家のアドバイス
翻訳を容易にするテクノロジー
친절한 사람들이 전하는 훌륭한 서비스
신뢰할 수 있는 전문가의 조언
번역을 더 쉽게 만드는 기술

Top tips for getting better machine translation results

Authors

Tim Branton

PureFluent CEO

Share this

Tweet Share Share

More content

  • What are the most popular languages in the world?

    Read now
  • Achieving high-quality translations

    Read now
  • Post-edited machine translation

    Read now
  • Which languages should you translate your website into?

    Read now
  • Translation, localisation, transcreation. Do you know the difference?

    Read now
  • 6 Tips to Help You Pick a Translation Agency

    Read now
  • Is Amazon Translate good enough for my Amazon Listings?

    Read now
  • 5 Tips to Improve Translation ROI

    Read now
  • Top Tips: How to translate your Customer Portal

    Read now
  • How to translate page titles and meta descriptions

    Read now

July 28, 2020

Everyone knows that Machine Translation is getting better. If you’re interested in hearing more about this, take a look at our blog post “Is Neural Machine Translation good enough yet?”. In this post, I want to take a look at some of the practical steps you can take to getting better machine translation results.

Customised engines deliver better machine translation output

MTEs do a pretty good job nowadays, but they don’t know about your context, desired style and terminology. If you have a large enough Translation Memory, you can use it to train an engine just for you. You need to have a minimum of 10,000 sentences pairs for each language combination, and preferably more than that. This is your “training data” and it needs to be good quality human translation – garbage in, garbage out!

Improve your training data

In the real world, your training data – those translated sentences – is likely to be a bit messy. You may have variations in capitalisation, extraneous bits of punctuation, bullet points or numbered lists. Good training data meets a few key criteria:

  • Normalised text – you want the plain vanilla version of the text, without ALL CAPS for instance.
  • Longer sentences – the best training data is longer than 5 words per sentence.
  • Avoid overly long sentences – your sentences should be below 50 words per sentence.
  • No bullets or numbered lists – these will harm the training process.
  • Delete repeating characters – for example duplicate spaces or sequences like “……”
  • Avoid tabs – particularly prevalent where people have tried to manually create a Table of Contents in Word.

We’re working on a development right now which we’re calling “Laundry” which will “clean” your Translation Memory so that it provides higher quality training data. More news on that soon!

Incorporate your Terminology into the MT output

Some Machine Translation Engines like Google AutoML allow you to incorporate custom terminology into the Machine Translation process. This is important because even with good training data, the MTE is likely to get specific items of terminology wrong and may do unhelpful things like translating brand names. The terminology process overlays the original MT output with your specific terminology preferences, reducing the work required by the translator.

PEMT = MTPE = Post-Editing Machine Translation

“The translator” I hear you cry! Yes, I’m afraid you still need a human translator for most of your translated content. Sometimes called the Post-Editor, this translator checks the MT output and corrects it where required. The goal of the steps above is to reduce the effort required by the Post-Editor as far as possible.

Some Machine Translation Engines are better than others

Let me be more specific. Some Machine Translation Engines (MTEs) have better results for specific language combinations and domains. Memsource have an interesting approach to this conundrum. They assess the performance of different engines for specific language combinations and domains and use this to suggest the optimal engine for each specific project. The bottom line is – don’t assume that one engine is “the best”, be prepared to use multiple engines for the best overall performance. If you’re interested in this subject, I really recommend reading the latest Machine Translation Report from Memsource.

Generic, domain-specific engines can deliver better Machine Translation

If you don’t have a large enough corpus of existing translations, don’t despair. Machine Translation providers like Microsoft and ModernMT have domain-specific engines which use large volumes of existing translations for legal, medical, industrial etc. This is a good short cut to achieving better Machine Translation results as the MTE is more likely to get the terminology right for that domain.

This is a big subject and can seem out of reach if you’re not a big translation buyer. The good news is that the entry point for Machine Translation is coming down all the time. If you want to discuss whether it might work for you, get in touch and we’ll be happy to explore the options with you.

About the authors

Tim Branton

Tim Branton is PureFluent's CEO and a passionate advocate for the role of technology in the language industry. He has 30 years of business experience across the chemicals, telecoms, business services and software sectors in the UK, Singapore, Japan, China and South Africa.


See all posts by Tim Branton

Share this

Tweet Share Share