Artificial intelligence Archives - sa��ʴ�ý

The future of AI in translation: a revolutionary path or a self-destructive loop?

Emma Belanger — Fri, 06 Sep 2024 16:10:13 +0000

Have you felt the need to implement artificial intelligence (AI) in your organisation? There’s no denying that pressure on companies to integrate AI into their internal processes is growing increasingly intense. In today’s rapidly evolving business landscape, there’s a pervasive belief that if a company isn’t leveraging AI, it risks falling behind its competitors. In fact, the mere announcement of AI integration can cause a public company’s stock to rise dramatically, as investors and stakeholders associate AI adoption with forward-thinking and innovation.

However, the reality of implementing Generative AI is far more complex and fraught with challenges than it appears on the surface. It isn’t a perfect tool and can produce inaccurate or skewed content that then gets fed back to AI, leading to a phenomenon called “model collapse”. This article will explore the weaknesses of AI-powered solutions in multilingual communication and consider why scrutiny and critique are crucial to ensure proper implementation of this technology.

McDonald’s failed experiment

The global fast-food giant McDonald’s attempted to automate its drive-through ordering process using AI, aiming to streamline operations and enhance customer experience. But the . The AI systems struggled to understand accents, dialects and even background noise, leading to a frustrating experience for customers. In video, two friends can be seen laughing as the technology mistakenly orders over 200 Chicken McNuggets for them.��

However, it’s not as simple as just casting AI-powered products aside because of such errors. The pressure to scale up quickly using this technology is very real and have had significant success with using AI for their drive-throughs, including Wendy’s and Panda Express. There’s no doubt that McDonald’s will explore other AI avenues in the future.��

Yet this cautionary tale reminds us that we are still in the early days of AI-enhanced solutions. If McDonald’s couldn’t make AI work reliably for something as basic as ordering a hamburger, how can we expect AI to flawlessly handle the complexities of our businesses’ multilingual content? Try to ignore the hype and remain professional when you consider content creation that requires a deep understanding of context, nuance and cultural sensitivity – areas where GenAI often struggles.

AI and translation

This brings us to the topic of content generation and translation, one of the most promising yet challenging applications of AI. Advanced language models such as OpenAI’s ChatGPT have allowed businesses to streamline their content creation processes, generating marketing copy, reports, summaries, social media posts and even complex articles with the click of a button. AI-driven translation tools have been promoted to companies looking for quick, accurate translations that can be deployed across multiple channels simultaneously.

However, AI is not infallible when it comes to understanding the subtleties and intricacies of human language. While AI can produce grammatically correct texts, it struggles with idiomatic expressions, cultural references and the context-specific meanings that are crucial for effective communication.

Getting multilingual content wrong due to AI mistakes leads to individual instances of misunderstandings, not dissimilar to what happened with McDonald’s. But what happens when our content becomes so widely produced by AI that the incorrect results get fed back into the source and training materials, perpetuating and even worsening already grave cultural bias or linguistic mistakes?

What is model collapse?

One of the emerging challenges in the realm of AI is a phenomenon known as ”. This occurs when AI models are trained on data that includes AI-generated content, leading to a degradation in the quality of the output. They start to reinforce their own errors, amplifying mistakes and generating content that lacks originality, coherence or meaning.��

Model collapse is not just a danger to the quality of AI-generated content; it poses a broader risk to the entire ecosystem of AI development. As more and more content on the internet is generated by AI, there is a growing concern that the training data used by these models will become increasingly tainted, leading to a vicious cycle of declining results. have even coined terms like “Model Autophagy Disorder” and “Habsburg AI” to describe this self-destructive cycle, likening it to the way the Habsburg dynasty’s inbreeding led to the deterioration of their genetic line.��

This issue extends beyond text to other forms of AI-generated content and is perhaps best illustrated by what happens to images. When AI models are trained on images that were themselves generated by AI, they can start to lose the ability to create realistic or meaningful visuals. The result is a kind of digital inbreeding, where the AI’s outputs become increasingly detached from reality and less useful for practical applications.

Source: Bohacek & Farid 2023, https://creativecommons.org/licenses/by/4.0/

Can AI provide a potential boost to human creativity?

Despite the challenges associated with AI, experts argue that the situation is not as dire as it seems. Human editors can review and refine AI-generated content, correcting errors, adding context and ensuring that the final output meets the desired quality standards. This hybrid approach, combining AI’s speed with human expertise, mitigates many of the risks associated with model collapse.��

Furthermore, many believe that the rise of AI-generated content will actually increase the value of human-created content. As more content becomes automated, the unique qualities of human creativity – originality, emotional depth and cultural nuance – will become even more valuable. If AI is employed to create the bulk of the multilingual content, human translators and editors will be needed to check, refine and perfect the output, ensuring it resonates with audiences in different cultural contexts.

Integrating AI into your multilingual communication strategy��

So, how should you be using GenAI?�� Implementing AI in content creation requires your teams to learn how to write prompts, set benchmarks for what good looks like, assess the output, consult reliable technology partners and have a clear strategy that actually saves time rather than just shifting the cost from one part of the content generation process to another. Without these, what was initially expected to be an advantage could quickly turn into a disaster, leading to operational mishaps, loss of customer trust and significant financial losses.��

With the right approach – one that leverages AI’s strengths while acknowledging its limitations – you can use AI-powered tools to enhance your multilingual communication efforts. It’s interesting to examine where you save time and where you lose it, because whichever way you choose to generate your multilingual content, it must be quality controlled by a rigorous human review process.��

If you do choose to implement GenAI in your organisation, talk to our language experts here at Sandberg and make them a part of your solution through post-editing services and quality assurance checks, ensuring that your content is accurate, culturally appropriate and effective in conveying the intended message.��

The post The future of AI in translation: a revolutionary path or a self-destructive loop? appeared first on sa��ʴ�ý.

A guide to using ChatGPT in multilingual content production

Gonzalo Fernandez — Thu, 30 May 2024 10:25:10 +0000

If you’re up to date with technological advancements, you’ve undoubtedly heard about artificial intelligence. Who hasn’t? You might even have been asked by your manager to experiment with these technologies to see if they can automate processes or enhance the efficiency of your daily tasks.

Content creation is often associated with specific roles within marketing or communications teams. However, the truth is that we all engage with content regularly, whether it’s writing an email to a supplier, preparing slides for a weekly meeting, crafting an end-of-quarter report or compiling documentation for a tender.

In the following article, we’ll demonstrate how ChatGPT4 can be a valuable ally in producing any type of content, enhancing your productivity and safeguarding sensitive information when handling confidential materials.

Create your own ChatGPT

While you can use the regular version of ChatGPT, the first step towards a serious AI-supported content generation strategy is to create your own ChatGPT – a customised version of the same system that can be created by any user who has a paid OpenAI account.

Instead of copying and pasting text from each resource you want ChatGPT to consider when generating a response, here you can upload your own files that will make your content more personalised, consistent and in the right tone of voice.

The documents you upload aren’t shared with anyone outside the system and are used exclusively to assist you with your enquiries. ChatGPT only has access to the files you upload during your session and doesn’t retain any information once the session ends.

Pro Tip: Although the internet is a vast repository of information, not all relevant content is available online. If you have reference materials in physical form, such as academic papers, books or magazines, consider using Optical Character Recognition (OCR) software to digitise these documents. This method allows you to seamlessly integrate them into your digital database, enhancing both access and efficiency.

Create a PowerPoint presentation with ChatGPT

Once you’ve configured your own version of ChatGPT, you’ll be ready to put it to use. Let’s say that you want to use the data and references you’ve integrated to put together a PowerPoint presentation for an upcoming international conference. The next step involves crafting clear and detailed instructions to ensure that ChatGPT fully understands your objectives. The more context and information you include in your initial instructions, the better the outcome. Here’s an example of how you might structure your request:

I want my presentation to emphasise the importance of localisation in global marketing campaigns. For this reason, I have included several specific articles that address this topic, as well as another called “Marketing Myopia” from Harvard Business Review, which I consider an excellent general reference.

Once you’ve selected and uploaded the reference content, the next crucial step is to articulate the prompts you’ll provide to ChatGPT for the task. Don’t underestimate this stage; it’s pivotal for a successful outcome. To ensure a high-quality initial response, it’s essential that you equip the system with ample context and clarity regarding the desired output.

For instance:

I specialise in marketing and localisation, particularly in multilingual content. My objective is to craft a PowerPoint presentation for the “Language Matters 2024” Congress.

I’ll be delving into the significance of localisation in international marketing campaigns. I plan to have 5 slides with the following titles:

Global marketing campaigns
Addressing international audiences
The symbiotic relationship between localisation and marketing
Strategic approaches to localisation
Localisation options tailored for global marketers

Please create the slide content as specified above, ready for use. Utilise the attached reference material to craft the content. Ensure captivating titles. Vary the structure of each slide: Classic slides with title and text, bullet lists, tables and other strategies to enhance content visualisation. Each slide should have a maximum of 250 characters. The content should be written in British English. Please avoid lengthy texts.

Once I have my instructions prepared (I recommend always writing them first in a separate file), I copy and paste them into the “Create” section of my ChatGPT.

The engine gets to work and produces precisely the type of content I specified. I’ve included some screenshots below of the slides that ChatGPT created.

As we can see, the content generated by ChatGPT in just one minute serves as an excellent foundation for our presentation, although it does have some flaws:

It produced 7 slides instead of 5.
The content is rather generic and lacks a personal story.
The content wasn’t written in British English.

Once you’ve reached this point, you have two options: you can manually review and edit the content yourself or you can extract parts of the result and ask the AI engine to make changes. Human review is crucial to ensure the authenticity of the content, to insert your personal expertise and to customise it as much as possible to the target audience.

When you’re happy with the text on the slides and have written your narrative, you will need to design the visuals for the presentation (there are also AI applications for design generation, but that’s a topic for another time). Once that’s done, you’re all set!

Translate a PowerPoint presentation with ChatGPT

The presentation was such a success that you’ve been invited to deliver the same talk at a conference in Denmark. While you’re fluent in speaking the language – you speak it better than you write it – you’re not equipped to translate the entire presentation on your own. Next, we’ll demonstrate how ChatGPT can significantly aid you in this task.

It’s quite simple: you just need to organise the final content you used for the presentation and ask the system to translate it.

Like the English presentation, this one will also need a round of human editing and review. Ideally, you should seek assistance from a native Danish speaker who is an expert in marketing. If no one in your company can help, you can always turn to a professional language services provider like Sandberg.

Here is a sample of the post-editing work performed by the specialised Danish team here at Sandberg. We always provide a version with tracked changes, allowing you to easily review the modifications made.

Typically, a post-editing service has different levels, ranging from correcting grammatical errors to adapting the text to make it culturally relevant to the target audience.

As demonstrated throughout this article, artificial intelligence can be a valuable ally in content generation. However, it’s crucial to acknowledge that these engines don’t tackle every task flawlessly. To achieve optimal results, it’s essential to:

Prepare a database relevant to the topic at hand.
Craft clear and precise instructions to help the engine grasp the nature of the task.
Conduct meticulous post-editing to ensure the authenticity and accuracy of our content.

We hope this article has been helpful to you!

The post A guide to using ChatGPT in multilingual content production appeared first on sa��ʴ�ý.

Generative AI: The new frontier of exponential growth?

Gonzalo Fernandez — Fri, 08 Mar 2024 11:24:20 +0000

14 March 2024 marks the anniversary of ChatGPT 4, undeniably one of the decade’s pivotal technological breakthroughs, on par with the invention of the internet or microprocessors. The introduction of this chatbot, adept at seamlessly answering questions on virtually any subject, sparked a global frenzy, making it the quickest app to amass over 100 million users.��

As a result, the global business community’s embrace of artificial intelligence technology transitioned from a mere trend to a transformative power. The rapid pace left little room for debating AI’s utility, propelling us instead towards leveraging its practical applications for substantial business benefits. highlights this shift, noting that 64% of CEOs report continued pressure from investors, creditors and lenders to fast-track the adoption of generative AI.��

This applies to small and medium-sized enterprises as well, even if they may lack a dedicated technology department. One of the groundbreaking characteristics of the new technology is that setting it up doesn’t require a team of experts. While it’s true that many businesses don’t know how to identify strategic machine-learning opportunities or how to turn them into disrupting products or services, the reality is that thousands of people launch AI engines on their computers every day, weaving the technology into tasks that range from drafting emails and compiling reports to translating texts and crafting presentations. The broadest use of the technology to date has been in content creation processes, as content marketers were among the first to go from just playing around with AI to using AI in their daily work.��

Balancing innovation with the human touch

From the outset, the language services industry has been at the forefront of developing multilingual Gen AI solutions that offer:��

Authoring help��
Content variation by target audience��
Tone inconsistency checks��
Content classification��
Term extraction��
Language quality assessment��

At Sandberg, we’ve embarked on our own artificial intelligence journey. For our clients’ baseline translation needs, our machine translation engines still outperform other language learning model (LLM) options, but the GenAI launch prompted us to investigate AI’s potential for content creation in our marketing team and for our own internal documentation needs.��

The results to date have been nothing short of spectacular, even though every instance of the content generation process has still required substantial human involvement. Humans play a pivotal role in generating new ideas and seeking out high-quality references for content. And they have a crucial role at the end of the process, refining the machine-generated output, rearranging the text for better coherence and adjusting the language to align with the brand. For us, this final step is nothing new, thanks to our many years of experience in machine translation post-editing.

Without the human touch, maintaining Sandberg’s unique tone of voice would not have been possible. This is what brings us to our core statement: in the era of artificial intelligence-powered machines, the human element is vital to crafting content, products or services that enrich and redefine human experiences. When almost everything we know will be sourced by our inorganic colleagues, the human element stands as a key value differentiator for brands aiming to connect with their stakeholders at a deeper level.��

Generative human creativity

Let’s be honest: the cost of human touch and authenticity is prohibitive for many companies. The time-saving benefits offered by AI are undeniable. Yet, the challenge isn’t in using artificial intelligence; it’s in identifying each use case and determining the most effective workflows. The indiscriminate application of artificial intelligence in content creation is precisely why our LinkedIn feeds are now saturated with posts that mirror each other. This type of content lacks the unique ability to differentiate or spark interest in a brand.

The capabilities of tools such as ChatGPT, Hey Gen or Copilot are crucial in enhancing the efficiency of documentation, technical writing, legal and marketing teams. But how does a company ensure consistency, maintain specific terminology and preserve the tone of voice in its content when different teams employ different tools? And what happens when the content spans multiple languages? It’s fundamental that outputs such as corporate reports, marketing campaigns, organisational policies or legal documentation undergo human scrutiny and editing. This human-in-the-loop step upholds the brand’s consistency and ensures accuracy and adherence to ethical guidelines. This is where Sandberg comes in to offer professional help to businesses that are turning to artificial intelligence to enhance their content creation processes.

Solutions for content creators using AI

The solution we typically recommend to our clients starts with a process where they leverage an AI engine to generate content. Our team then refines the output, employing terminology databases, style guides and any documents containing instructions that offer insights into the target audience, market or the specific stage of the customer journey associated with the content.��

This is a process we’ve seen successfully implemented at numerous companies. To build a seamless workflow, we organise an initial meeting between our team and the client’s teams. It’s crucial to discuss the technologies involved, to understand the brand identity and to address all the project management aspects, including reference documents and timelines.��

The process can be applied to any monolingual content the client produces with the help of AI, and the output can then also be translated into further languages.��

AI and translation: the new reality is here – almost!��

William Hagerup — Thu, 06 Jul 2023 09:29:15 +0000

Artificial Intelligence and its potential impact on all parts of society and the businesses of tomorrow is a hotly debated topic these days. But what is the role of AI and Large Language Models in the world of translation, and, more specifically, translation in the Nordic languages? Read on to learn more about the possibilities and challenges of using AI for translation.��

ChatGPT and the potential of AI��

We have all heard about ChatGPT (short for Chat Generative Pre-Trained Transformer) and how it can produce human-like responses to a multitude of questions and queries. The output can even include human-level factual errors, amusingly known as “hallucinations”. If you have tried it yourself, you may have noticed this phenomenon, which occurs for a number of reasons, not least lack of knowledge of events after September 2021, the fact that it doesn’t learn from experience. ��

Still, the emergence of this engine in November 2022 has certainly changed public perception of the potential of AI and natural language processing (NLP), including fears that they will be used to write exams and essays. These concerns will only worsen with the emergence of more extensively trained models.��

The Large Language Models��

When using ChatGPT or GPT-4, the public is interacting with a Large Language Model (LLM), which is just :��

Zero-shot model: A large, generalised model trained on a generic corpus of data and able to give a reasonably accurate result for general use cases, without the need for additional training. This is what GPT-4 is regarded to be.��

Fine-tuned or domain-specific model: A zero-shot model with additional training produces a fine-tuned, domain-specific model. An example of this is the OpenAI Codex, a domain-specific LLM for programming based on GPT-4.��

Language representation model: This model makes use of deep learning and transformers well suited for NLP. An example is Bidirectional Encoder Representations from Transformers (BERT).��

Multimodal model: While many LLMs are trained just for text, the multimodal model can handle both text and images. An example of this is GPT-4.��

As the computational social scientist, Bernard Koch, : “One reason these large language models remain so remarkable is that a single model can be used for tasks including question answering, document summarization, text generation, sentence completion, translation and more.”��

It is this generalist potential that is firing up the imagination of professionals in the AI industry and beyond. But how does it work, and can it really be used with confidence to produce human-quality translations?��

Parameters and machine learning��

The way a Large Language Model (LLM) works is that rather than training it to do a particular task, it has a general framework with billions of parameters.��

The first LLM, BERT, came out in 2018 with 100 million parameters, and then GPT-2 with over a billion parameters. From there, the growth has been exponential, with GPT-4 reaching a trillion parameters.��

What exactly are parameters in Large Language Models? There are two types: . The former can be compared to rules or instructions set by the user before the model starts the “learning” process. Model parameters are then what the model “learns” from the training data fed to it. How well it functions depends on the algorithms for integrating this data and developing these parameters.��

As one technology blog explains, “Parameters fundamentally equip a model with prediction capabilities.”��

For non-technical readers, these parameters are like buttons and knobs that can be pushed and turned to fine-tune how the system responds to the data it is fed and how well it completes the desired tasks. Hyperparameters are the buttons and knobs put in place by the designer whereas model-parameters are the buttons and knobs that emerge as the model is trained on data.��

The more parameters, the more fine-tuned the model will be in terms of producing the appropriate output when prompted.��

Language Models and translation��

Apart from plaguing teachers across the world with AI-generated essays, what can LLMs be used for in other industries? One area in which they may play a revolutionary role is in the field of translation.��

Translation is all about text, so an AI engine built specifically for the production of general text should be perfect for translation-related work. Indeed, the general LLMs have shown themselves to have “”, according to a study from May 2023.��

Feed me, Seymore!��

For LLMs to develop billions or trillions of parameters, they need a large amount of data from which to generate these rules. For this reason, the larger the corpus (a large and structured set of texts) of a language, the better the quality of the machine translation.��

This means that the system can recognise the difference between words in different contexts, such as proper nouns (names) that are the same as common nouns.��

In this example, the German noun “Müller” can be a miller, but it can also be the surname Müller.��

Ein Müller namens Markus Müller hat fünf Jahre Erfahrung. Sein Bruder Karl Müller ist ebenfalls Müller. Die ganze Müller-Familie sind Müller. Der Vater, Hans, ist Müller, die Mutter Helga ist Müllerin, der ältere Bruder Otto ist Müller. Alle Müllers sind Müller.��

This is what ChatGPT produced when asked to translate it into English:��

One miller named Markus Müller has five years of experience. His brother Karl Müller is also a miller. The entire Müller family are millers. The father, Hans, is a miller, the mother Helga is a miller, the older brother Otto is a miller. All the Müllers are millers.��

Although the Müller family may not be particularly inventive in their career choices, the differentiating between noun and proper noun (name) is quite impressive here, particularly because nouns in German are capitalised just like names.��

German is one of the larger languages, but let’s take Norwegian, of which there are 5 million users, not much more than the number of people who use London Public Transport on a typical day.��

Here, it’s the profession of baker and the name Baker that we set as the challenge:��

Tom Baker is a master baker. He is part of the Baker family who are all bakers and work in Bakers & Co, the family firm of the Bakers. The bakers in Bakers are all highly qualified bakers. One baker, John Baker, has been a baker for 23 years. All the Bakers are bakers.��

ChatGPT:��

Tom Baker er en mesterbaker. Han er en del av Baker-familien, som alle er bakere og jobber i Bakers & Co, Baker-familiens firma. Bakernes i Bakers er alle svært kvalifiserte bakere. En baker ved navn John Baker har vært baker i 23 år. Alle Baker-ene er bakere.��

Here it struggles a little bit with “bakers in Bakers” and “Bakers are bakers”, which are not rendered correctly. However, the result is still pretty good in terms of distinguishing between proper and common nouns, a recurring problem in machine translation.��

AI and context��

One reason why a Large Language Model like ChatGPT can produce fairly accurate output, even with tricky texts such as the above, is that they mimic a process carried out by human translators.��

A professional translator will usually take several initial steps when preparing to translate something, such as researching the overall topic, considering the context, looking at the company in question, as well as previous translations, term lists, etc.��

A recent examined how this approach can be systematically used to further improve LLM translation. They called it the MAPS framework (Multi-Aspect Prompting and Selection).��

The process consists of three stages: “(1) Knowledge Mining: the LLM analyzes the source sentence and generates three aspects of knowledge useful for translation: keywords, topics and relevant demonstration. (2) Knowledge Integration: guided by the different types of knowledge separately, the LLM generates multiple translation candidates. (3) Knowledge Selection: the candidate with the highest QE (reference-free quality estimation) score is selected as the final translation.” (My bolding).��

Perfecting this functionality can improve the usefulness of AI, as this approach reduced “… up to 59% of hallucination mistakes in translation.”��

What about rare words?��

As mentioned above, smaller languages yield smaller corpora to train the engine on, and the same is true of infrequently used words in any language. Seldomly used words can end up being mistranslated or not translated at all. In a in February 2023, three researchers came up with a solution to this challenge: DiPMT or Dictionary-based Prompting for Machine Translation.��

In essence, this approach involves training the engine on dual-language dictionaries in addition to general corpus text, including domain-specific dictionaries, such as financial or medical dictionaries. DiPMT then uses prompts to indicate how a word is used in a particular context.��

The authors claim that DiPMT improves translation significantly both for low-resource and out-of-domain translation. They go on to discuss how having enough data is crucial for an LLM to function well. This is not often the case with smaller, specialist domains as well as for languages with smaller text corpora, as mentioned previously. Training the model on dictionaries with the addition of human feedback as to the correct term for the context, the DiPMT approach, has proven an efficient remedy for this weakness.��

What about the Nordics?��

As the languages of the Nordic region are less widely spoken, they present a unique challenge to these LLMs. Smaller languages lack the large quantities of data that train these models to produce high-quality output. In these languages, is it possible to achieve the same quality or usefulness?��

has been working to answer just that question over the past few years.��

Due to the enormous costs and resources needed, AI Sweden joined forces with supporting organisations, including the Research Institutes of Sweden (RISE) and the Wallenberg AI, Autonomous Systems and Software programme, to build a home-grown Swedish model.��

Dr. Magnus Sahlgren, Head of Research for Natural Language Understanding at AI Sweden, explains that the , as Swedish is much less widely spoken than other languages.��

But the researchers solved this problem by taking advantage of the fact that Swedish is typologically similar to the other languages in the North-Germanic language family. By combining data from Swedish, Norwegian, Danish and Icelandic, as well as English and code, they gained access to far greater amounts of data. They called this .��

A particular advantage of building a new Swedish model based on this group of Nordic languages is that it can later be used as a starting point for one of the other languages, such as Icelandic.��

AI and data services for natural language processing

Our language experts can help post-process AI-generated output so you can run multilingual campaigns seamlessly.

Practical limitations with Large Language Models��

There are several reasons why professional translation companies such as Sandberg cannot easily take advantage of LLMs in their current form.��

Firstly, we deal with proprietary texts that our clients own, so we cannot feed this into an external AI-bot owned by another company or government agency.��

We make sure that our clients’ texts are treated carefully, according to the clients’ privacy requirements, following all relevant laws and regulations and abiding by high ethical standards.��

Secondly, the cost threshold for any business to build its own LLM is also significant, to say the least. Microsoft invested . , and the training consumes enormous amounts of electricity, computing power and storage capacity.��

This is why the Swedish developers had to build a broad coalition of partners, even though they were backed by the Swedish government.��

Nevertheless, at Sandberg we follow the development of any language-related or translation technology very closely. We have been leaders in using and developing internal high-quality Nordic neural machine translation systems that ensure higher efficiency within a quality-assured and secure process.��

Megan Hancock, for many years a Translation Project Manager and now Specialist Project Manager, explains what the Sandberg approach to machine translation is:��

“We’re equipped with over 40 neural machine translation engines which are trained on a carefully curated set of data that enables us to apply it on the full range of domains and text types that we handle in our day-to-day work, from mechanical engineering to marketing or business-oriented content. All Sandberg’s MT engines translating in either direction between the Scandinavian languages and English have an edit distance of less than 20%. We are also fully compliant with the ISO 18587 certification for the post-editing of machine translation output.”

What’s edit distance?

Edit distance is often used as a metric to evaluate the quality of machine translation outputs. Given the MT output and the final human-edited target text, the edit distance between them provides a measure of the dissimilarity or the number of changes between the two texts. However, edit distance is not a perfect measure of the quality of the MT engine’s output. It only measures the key strokes performed by the human linguist in the editing, not the time the linguist takes to figure out what needs editing and how.

To give a practical example, we have written a source text inspired by the text types that we often translate. Let’s see how our engines deal with it.��

A typical marketing text may go something like this:��

Comfort, style and power! Drive the new XXXX from as little as £XX,XXX and start saving on fuel from day one. This new hybrid SUV will give you the space you need for your family, the power you need to take you where you want to go and the comfort to enjoy the ride.��

Our engines gave us the following results for these Nordic languages:��

For those who are familiar with any of these Nordic languages, you will see that the translations are far from perfect. However, they do provide a human translator with a rough translation that they can tweak into a fully polished text, with less effort and time spent than translating it from scratch.

Will AI take over translation?

Will AI take over completely? It doesn’t seem like we are getting anywhere near that point just yet. Most language service providers in the world are already equipped with machine translation engines that provide rough translations that professional linguists then post-edit. While AI output can also be edited by humans in the same way as MT, both kinds of technologies can fall dangerously short when dealing with culturally sensitive content like marketing campaigns.

Still, investing in new AI technologies and further optimising translation workflows will become more frequent as soon as AI generates consistently better outputs than MT across languages and domains. As for the moment, that AI models achieve competitive translation quality only for high-resource languages (where there are vast amounts of data), while having limited capabilities for low-resource languages.

Degeneration danger

Another reason why artificial intelligence is not ready to take over is a . This is when AI trains on AI-generated content and degenerates as a result, a little like the increasingly bad quality you would get if you took a photocopy of a photocopy of a photocopy, etc.

The first LLMs were all trained on existing written material, almost all of it produced by humans. But as the amount of AI-produced text grows, it becomes part of the data used to train AI engines, reinforcing errors and leading to a deterioration in output quality.

One of the authors of a into this, Ilia Shumailov, said in an interview with VentureBeat: “We were surprised to observe how quickly model collapse happens: Models can rapidly forget most of the original data from which they initially learned.”

This is not all bad – it means that human content creators will still be needed in the future, even if it’s just to produce high-quality content to train AI.

There is no doubt that LLMs will play a significant role in the future of translation. Exactly how we will balance the need for data protection, copyright and confidentiality with the efficiency that LLMs offer remains to be seen. But for now, there is still a clear need for the human creator, editor or translator to play their role in the process.

The post AI and translation: the new reality is here – almost!�� appeared first on sa��ʴ�ý.

Artificial intelligence Archives - sa���ʴ�ý