A few days ago I asked for help how to design the parser used in Marble to try to turn a text string into some geographical coordinates in such a way that it also can parse localized writings of such coordinates.
For those curious what came out of this, read on:
While I then forgot to mention the use-cases I was thinking of, like copying&pasting some pretty-printed string from some text, or entering the coordinates somebody tells you on the phone, still the broad feedback I got, both by those commenting on the blog post as well as those replying on the mailinglist of KDE’s translators (thanks to all of you), helped to accumulate quite some testing data, on which based the parser could be designed.
Well, it was done in a short rush, just in time for the translation soft freeze for the upcoming release of KDE Apps 4.8, but non-the-less can parse all of the testing data 🙂 Which means basically the parser can parse coordinates in these patterns, for all languages:
- with latitude or longitude given as first
- with directions given as prefix or postfix
- with values given just as pure degree, as degree and minutes, or as degree, minutes and seconds
- the last subvalue can optionally be in floating point precision
- the first subvalue can be negative or positive, optionally with the + prefix
- latitude and longitude can be separated by the “,” or the “;” character or simply white space
- symbols for degree, minutes and seconds can be given or left out, if directions are given
The parser first tries to parse using the localized direction terms, than falls back to try the English ones. For the degree, minutes and seconds symbols it is quite tolerant, tries to understand all kind of character (combinations) people might try to express the symbols, using what their keyboard layout offers. Basic rule: if a human would get what is meant, so should the parser (e.g. two consecutive LEFT SINGLE QUOTATION MARKs
‘‘ would be accepted as seconds symbol). Additionally accepted would the localized variant of the symbol, like obviously typically found in non-latin based languages.
Additionally, for a very short notation, two pure floating point values, separated as described above, are parsed as first latitude degree and second longitude degree.
So also the Shorthand DMS notation, as hinted to by Kjetil Kilhavn, should be parsed without a problem. (Support for UTM is only worked on in Marble, no promises on when this might emerge)
For parsing of localized strings to work, the translators are asked to provide in the translation catalogs the information about additional local symbols for the degree, minutes and seconds, as well as how the directions are written. Per direction and symbol there can be multiple variants noted, so languages with variants of the direction terms are also covered. There is a short guideline for the translators, also linked from the strings in the code.
But things are not perfect yet:
only today I learned that Thai has a pattern which is not matching any of those described above, in that it also uses prefixes “longitude” and resp. “langitude” (in Thai) before those parts. Which is also the same pattern as with a sample I got for Icelandic, but I somehow had ignored that, only now found that entry marked as TODO.
So there are things left to do for the Marble version after the upcoming one. But then we all know, software is never finished 🙂
Perhaps the geographic coordinates localisation should also be added/moved to the general regional system settings, like Date & Time are, as coordinates become more and more used, due to mobile devices. So all programs dealing with coordinates would benefit and also be consistent (think KStars or Digikam).