Internationalization: concepts and implementations

Submitted by vkozlovs on
Blog article:

Introduction

We are in an era where lots of things can be achieved with just a few clicks and quick typing on the web. This simple and fast information exchange has made this method become a standard form of communication. The web removes physical boundaries and allows people with different nationalities and mother tongues to interact. Therefore it has become very important that all parties understand the information flowing through this channel. To provide the same experience for speakers of all other languages it has become essential to design websites with internationalization (later as "i18n"). The goal for this post is to give a better understanding about i18n and about the concepts behind the different framework implementations.

1. Life cycle of translation

 

By introducing i18n to your project you will see it becoming a living and changing part of your application. The reasons for that are simple. As you introduce new features in most of the cases they are coming with new texts. Also sometimes you don’t get the right translation for the first time or even the translated text can also trigger the change of the original text, because it describes the content better. Therefore you can plan this as a regularly re-appearing simple task between the developers and translators and do an agile collaboration between the parties. For end users the process is fully transparent, they will see only the final results.

Designing an application which has i18n is not rocket science, but it does require some architectural planning. The key point you should keep in mind is that the translation should not hold back any development process. In other words the developer should not wait with developing the new features until the translation is ready. By introducing placeholders and linking them to resource files you can introduce this flexibility, because the translators can work in parallel on the translation while the developers are focusing on the new features. In practice this means that the translator receives the translation files in packs. When the pack is ready it is handed back to the developer who will apply the changes. 

Note that the translation can break some custom stylings which then the developer has to fix. This is due to the fact that languages are not 1 to 1 translatable, some things can be described shorter some things longer. Therefore it is possible to have multiple translation improvement cycles before the translation will be right.

Translation content group types 

When we are talking about translation we can group the content into three major groups:

  • Static non-changing content which helps the user to interact with the website. Such as labels, description texts about site usage, descriptions of the buttons, etc.
  • Dynamically changing selection content such as DropDown menus, Radio button lists, Checkbox lists, etc. where it is important to have the flexibility to add/remove/modify elements quickly.
  • User-triggered content which main goal is to give feedback (error messages & validations ) for end-users. Based on these feedbacks they users can easily understand and do the necessary steps to finish the process. 

2. Categorization of the i18n possibilities

 

The different technologies have their own implementations for solving the same problem. 

 

Based on the source of truth for the translations we can group the implementations:

  • Resource file based, here we have a translation file where we find the translation content. In these cases we work with reference ids and translated text pairs. This means we have one file for each target language which we want to support. Some of the file based approaches allow to add comments, which helps the translator to get better the context of the translation required text. For example:
    • In the C# world we have a graphical column-based built-in editor in Visual Studio where we can manually do our modification and add comments. Once we save this will be translated into an xml file. From this editor we can easily export and import content. 
    • In the JAVA world we have the “.properties” files where each line represents a translation line. Something like this:
      • INTRODUCTION_TEXT=This is the value which needs to be displayed
    • With the Angular front-end framework you add attributes to the html tags, this approach also allows adding comments on translations. The comments are very practical, because they allow to give a more detailed context for translators to do their work. With this approach you to use the angular-cli to export and generate the resource files. The resource files can be in different XML-formats such as xlf, xlf2 and xmb.
    • In the library based resource file approach you have to follow the format which is supported by the selected software library (such as  json, etc.) and for usage you have to import the library and the source files manually to your project. Frameworks like React and Vue.js are using this type of approach.
  • Service provided, here the translation is provided by some services such as database (query, stored procedure, stored function, etc. ) or an API or similar. Like a black box you pour your text in and get results out. Implementing this on the database layer can be complex and inefficient if you use it for your whole project. This is also the approach used by portals and Content Management Solutions (CMS) systems like Liferay, Alfresco, Drupal, etc.

We can differentiate the translations according to stages of the application. Here we separate into two main categories:

  • Translation happening during the build time. For Angular the common way is to generate one application for one translation. This requires more space (instead of one built application you will have multiple), but you can ensure less calculation happening during runtime. 
  • The majority of the resource-file based translations does the translation during runtime. This takes less space, but of course parsing adds latency to the solution.

 

Depending on the knowledge of the developers and preferences from the architectural aspect we can have the following designs:

  • Separated layer design. With this approach you are introducing some complexity for your solution because you split your translations between the different layers (front-end application, back-end application, database). At first glance this seems like a complex solution, but it also comes with a great flexibility. It works very well if you have a good architectural design. The benefit is that you have to modify only one component in the system and the other remain “unchanged”.
  • Combined design, with this approach you simplify your architecture by combining the different layers. Due the combination this requires more knowledge from the developers, because they need to have knowledge about each layer. This is a more centrally maintained solution. With this design you can not avoid re-building the whole combined element each time.

3. Defining the language preferences

The common way for defining the language preference is to ask the user through the interface. Once this information has been provided it’s up to the application design how it should be remembered. It can be stored to the user’s profile or it is just for temporary use which the user should define always at the beginning of the interaction. 

 

In both cases the user-selected language preference should be maintained during the whole interaction process. This can be accomplished in the following ways:

  • The user’s client (most often browser) keeps track over the selected language and sends it to the server in every request. Implementations differ, but most times it’s done via cookies or directly setting the “Accept-Language” header (it can be a custom header as well) of each request. For stateless applications this is the preferred way to go.
  • The other way is when the server keeps track of the preference and stores it to the users’ session. In this case for the user only the session id needs to be stored and sent to the server on every request. For session-based applications this is the way to go.

4. How and when to introduce internationalization

The technologies you select for your application determines your options. What is not predetermined is how you distribute the translation between the different layers. For this the best way to go is to look at the main purpose of the application and do an analysis based on the translation group types. By doing that categorization you can identify how and where you should put the content. So you will end up with a good architectural design which is flexible for development and easy to maintain.

 

The question of when you should introduce the internationalization is tricky. Our experience is that the earlier the better. The reasons why early introduction is so important:

  • It adds another code designing requirement: to use references instead of the raw text. It is better to use the references from the beginning rather than go through the whole application when it’s ready and change every hardcoded text to a reference.
  • The language translation is not 1 to 1, sometimes you have to fix stylings afterwards once you receive the translation, which is bearable if it’s a small portion, but doing it for the whole application can be challenging.
  • Also when you introduce i18n to your projects the first time you need to give time for your translators to adopt and get comfortable with it.

Add new comment

CAPTCHA
Enter the characters shown in the image.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Disclaimer

The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines

 

Blogroll