Randa Meetings 2016 Part III: Translation System Troubles

[[Sic, 2016. This post about the results of studying the situation with the translation systems in software by KDE during the Randa event in 2016 had been a draft all the time due to being complicated matter and partly only lamenting about broken things instead of reporting world improvements. Still the lamenting might provide information unknown to some, so time to simply dump the still valid bits (quite some bits happily have been fixed since and are omitted, same for those notes I could no longer make sense off), even without a proper story thread. So if you ever wondered about the magic happening behind the scenes to get translations to appear in software based on Qt & KF, sit down and read on.]]

IMHO new technology should adapt by default to our cultures. Not our cultures to technology, especially if the technology is enforced on us by law or other pressures. Surely technology should instead allow to enhance cultures, extending options at best. But never limiting or disabling.

One motivation to create technology like Free/Libre Software is the idea that every human should be able to take part in the (world) society. A challenge here is to enrich life for people, not to just make it more complex. Also should it not force them into giving up parts of culture for the access to new technology. Code is becoming the new law maker: if your existing cultural artifacts are not mapped in the ontology of the computer systems, it does not exist by their definition. One has to be happy already then if there is at least any “Other” option to file away with.

Sure, e.g. for producers targeting more than their local home culture it would be so nice-because-simple if everyone would use the same language and other cultural artifacts (think measurement units). But who is to decide what should be the norm and how should they know what is best?

When it comes to software, the so called “Internationalization” technologies are here to help us in the humanity being. Adding variability to the user interface to allow adaption to the respective user’s culture (so called “Localization”).
Just, for some little irony, there is also more than one “culture” with Internationalization technologies. And getting those into synchronized cooperation is another challenge, sadly one which is currently not completely mastered when it comes to the software by the KDE community.

Multiple translation systems in one application

Gettext

On Linux (and surely other *nixoid systems) traditionally gettext is used. Whose translation lookup code is either part of glibc or in a separate LibIntl. For a running executable the localization to choose is controlled by the environment variables “LANGUAGE” (GNU gettext specific), “LC_*” and “LANG” (see GNU gettext utilities documentation). Strings to be translated are grouped in so-called domains. There is one file per domain and per language, a so called “catalog”. A catalog appears in two variants, in the “Portable Object” (PO) format intended for direct editing by humans and in the “Machine Object” (MO) format intended for processing by software. For each domain optionally a separate directory can be set below which all the language catalogs belonging to that domain can be found.
On the call to the gettext API like dgettext("domain", msgid) when “LANG” is set to the id “locale” (and the other vars are not set), the translation will be taken from the file in the sub-path locale/LC_MESSAGES/domain.mo (or some less specific variant of the id “locale” until there is such a file) in the given directory for the domain.
So a library using gettext for translations has to install the catalog files in a certain directory at deploy time and, unless using the default, at execution start have that one registered for their domain (using bindtextdomain(...)). For locating and using the catalogs at run-time, an executable linking to such a library has nothing else to do to assist the library with that. And the same for the setup of translations in the code of the program itself with gettext.

Qt’s QTranslator

Qt uses another approach: one registers a set of handlers of type QTranslator which are queried one after the other if they can resolve the string to be translated. This is done by the central equivalent to the gettext API, QCoreApplication::translate(const char *context, const char *sourceText, const char *disambiguation, int n), which also if no handler could resolve the string simply returns the same string as passed in. That method is invoked indirectly from the tr(...) calls, which are class methods added to QObject subclasses via the Q_OBJECT macro, using the class name as the context.
With the Qt way, usually the caller of the translation invocation has to know which locale should be used and make sure the proper catalog handlers are registered, before doing that invocation. The Qt libraries themselves do not do those registration, it is the duty of the application linking to the Qt libraries to do that.

The Qt translation approach is not only used by all the Qt modules, but also many tier 1 modules of the KDE Frameworks. Because the KDE Frameworks module KI18n, which provides a convenience & utilitiy wrapper around gettext, is not available to them, being a tier 1 module itself.

Automagic setup of QTranslator-based translations

The classical application from the KDE spheres is traditionally developed with gettext and KI18n in mind, and thus not used to care for that registration of Qt translation handlers. To allow them staying that innocent, all libraries done by KDE using the Qt translation approach will trigger the creation and registration of the handler with their catalog themselves during loading of the library, picking a catalog matching current QLocale::system(). They are using the hook Q_COREAPP_STARTUP_FUNCTION, which evaluates to code for the definition of a global static instance of a custom structure whose constructor, invoked then after library load due to being global static instance, registers that function as startup function for the QCoreApplication (or subclass) instance or, if such instance already exists, directly calls the function. To spare the libraries’ authors writing the respective automatic loading code, KDE’s Extra CMake Modules provides the module ECMPoQmTools to have that code generated and added to the library build, by the CMake macro ecm_create_qm_loader(...).

One issue: currently the documentation of ECMPoQmTools misses to hint that generation of the handler is ensured to be done only in the main thread. In case the library is loaded in another thread, the generation code is triggered (and thus delayed) in the main thread via a timer event. This can result in race condition if other code run after loading the library in the other thread already relies on the translation handler present.

KI18n: doing automagic setup for Qt libraries translations even

The thoughtful reader may now wonder, given that KDE Frameworks modules using the Qt translation system are doing that by help of automatic loading of catalogs, whether something similar is valid for the Qt libraries itself when it comes to programs from KDE. The answer is: it depends 🙂
If the program links to the KDE Frameworks module KI18n, directly or indirectly and thus loads it, that library has code using Q_COREAPP_STARTUP_FUNCTION as well to automatically trigger the creation and deploy of the handler of the translations for the Qt libraries (see src/main.cpp). For which Qt libraries that is, see below. Otherwise, as explained before, the program has to do it explicitly.

So this is why the developers of the typical application done in the KDE community do not have to write explicit code to initiate any loading of translation catalogs.

Does it blend?

Just, the above also means that still there are two separate translation systems with different principles and rules in the same application process (if not more from 3rd-party libraries, which though usually use gettext). And that brings a set of issues, like potentially resulting in inconsistently localized UI due to different libraries having different set of localizations available or following different environment variables or internal flags to decide which localization to use (also for things like number formatting). And add to that having different teams with different guidelines doing the translations for the different libraries and programs from different organizations.

KI18n: too helpful with the Qt libraries translations sometimes

The automatic generation and deployment of the handler for the translations of Qt libraries when the KI18n library is linked to and thus loaded (as described above) is not really expected in Qt-only, not-KF-using programs. Yet, when such programs are loading plugins linking directly or indirectly KI18n, these programs will be confronted of getting the KI18n-generated handler deployed on top (and thus overriding any previously installed handler from the program itself). At best this means only duplicated handlers, but it can also mean changing the locale, as the KI18n code picks the catalog locale to use from what is QLocale::system() at the time of being run.

And such plugin can simply be the Qt platform integration plugin, which in the case of the Plasma platform integration has KI18n in the set of linked and thus loaded libraries. This issue is currently reported indirectly via Bug 215837 – Nativ KDE QFileDialog changes translation

When it comes to the Qt platform integration plugin, the use of Q_COREAPP_STARTUP_FUNCTION when being invoked via such a plugin also shows some issue in the design of the Qt startup phase, resulting in the registered function being 2x called, in this case resulting in duplicated creation of translation handlers (reported as QTBUG-54479)

Qt5: no longer one single catalog for all Qt modules

Seems in Qt4 times there was one single catalog per language for all that made up the Qt framework. In Qt5 this no longer is true though, for each language now a separate catalog file is used per Qt module. There is some backward compatibility though which has hidden this for most eyes so far, so called meta catalogs (see Linguist docs). The meta catalog qt_ does not have translations itself, but links to the catalogs qtbase_, qtscript_, qtquick1_, qtmultimedia_ and qtxmlpatterns_ (see yourself and open /usr/share/qt5/translations/qt_ll.qm, with ll your language code, e.g. de, in your favorite hex editor).

So applications which use further Qt modules, directly or indirectly, need to make sure themselves to get the respective catalogs loaded and used. Which gets complicated for those used indirectly (via plugins or indirectly linked as implementation detail of another non-Qt lib). There seems no way to know what catalogs are loaded already.

This might be important for programs from KDE using QtQuick Controls, given there exist qtquickcontrols2_*.qm files with some strings. Yet to be investigated if those catalogs are loaded via some QML mechanism perhaps, or if some handling is needed?

Juggling with catalogs on release time

The catalogs with the string translations for KDE software are maintained and developed by the translators in a database separate from the actual sources, a subversion system, partially for historic reasons.
When doing a release of KDE software, the scripts used to generate the source tarballs then do both a checkout of the sources to package as well as iterating over the translation database to download and add the matching catalogs.
KDE Frameworks extends this scheme by adding the snapshot of the translations in a commit to the source repository, using a tagged git commit off the main branch.

Issues seen:

  • which catalogs to fetch exactly based on fragile system, not exactly defined
  • which version of the database the fetched catalogs are from is not noted, tarball reproducibility from VCS not easily possible (solved for KF)
  • script accidentally added catalog files used internally by KDE translation system for in-source translations (fixed meanwhile)
  • script accidentally added by-product of translation statistic (fixed meanwhile)