Selecting text in the Calligra-powered Okular plugin for ODT, DOC, DOCX & WPD

Traveling 8+ hours on the train all the way through Germany to Geneva in Switzerland, where CERN is located (its area spreading across the border to France actually), the place of the WikiToLearn-Plasma-VDG-TechbaseOverhaul meeting, was also a good chance to first spend some time again on a sprint-unrelated, but still KDE-related item, which is adding support for text selection to the Calligra-powered Okular plugin for text documents.

With some first satisfying prototyping results, where selection highlight is rendered at the correct positions and the correct chars being copied:
Calligra Okular plugin first text selection support

Polishing this up is left for the trip back home on Sunday. Looks promising so far, you hopefully soon will be able to select text from your DOCX, DOC, WPD and ODT files you view in Okular using the plugins from Calligra (once both Okular based on KF5/Qt5 and Calligra 3.0 are released, TBD).

One thing learned on the train: forcing your SchuKo-style power plug into the Swiss train’s on-board Swiss-style power socket might work (SchuKo connectors have a slightly bigger diameter), yes.
But: it takes some time to get it out again, without destroying the socket, so it is best started to be done some time before the switch-trains-here station is reached, not only a few seconds, which has one run the risk of not getting out in time. Or having to leave the power supply behind, better to be avoided as well 🙂


8 thoughts on “Selecting text in the Calligra-powered Okular plugin for ODT, DOC, DOCX & WPD

  1. Awesome, thanks!

    Is there any chance of a “better” text-selection for PDF documents as well? Currently, if you’re selecting text over multiple lines, you’ll actually get a line-break on every new line, even though the text probably only goes to a new line because there is no more space on the previous line – not because the author of the PDF actually wanted to have a line break at that exact position.

    • I doubt that you will ever see that as text in pdf files are always stored as one line at a time – there is no automatic linebreak – think of pdf as a photograph of the original document. So without second guessing the author’s intent (and getting it wrong the other way around from time to time) there is no way it can be done

      • I’m aware that there’s a difference in ODT and PDF files. However, since it (at least sometimes) works as expected in Adobe Acrobat Reader, I’d have hoped that there actually is some way around this but I’m no expert in PDFs.
        I’ve not tested the text-selection of AAR in depth and can not comment on how many times it “guesses wrong”, so far I can’t remember a case where it went horribly wrong. Of course this is for “simple” documents only, the issue is probably way more complicated if tables, images etc. are in the document.

        I’ve just searched around for a bit and saw that many people have the same question. The “best” answers to the problems usually are:
        a) Use AAR
        b) Use find and replace
        c) Write a Macro that will remove all line-breaks if the last character is not a full stop/period character.

        So it seems I’ll stick with option b) for the time being.

  2. “once both Okular based on KF5/Qt5 and Calligra 3.0 are released”

    And do you happen to know any approximate date for that, pecially for Calligra’s release? I know its developers do their best and that they are very few people, but how much time ago we haerd all those whistles about Calligra 3.0 Alpha, and how close we were to have a, finally native KF5 office suite, half a year? If I recall correctly they said that 2.9.9 would be the last KDE4 versión, and we are in 2.9.11.
    Since I got rid some weeks ago of all KDE4 and QT4 stuff, and enjoy my rapid and agile Plasma 5 setup I’m not going to “pollute” it with all those KD4 dependencies, so I don’t have Calligra anymore and “survive” using Google Docs, but I’d like to know if somebody could tell us when do they estimate that Calligra 3 will be ready, just an approximation, a couple of months, in the 3rd trimester of this year, the 4th, not till 2017? Don’t want to hurry them, as I say I understand they work hard and can’t do much more, but some news about C’s development and what can we users expect.


    • I think Calligra has been abandoned. They splitted Krita and Kexi, which are the ones that are still mantained and healthly developed. If you need a solid and actively developed office suite I’d recommend LibreOffice, Calligra, as you say, is announcing its version 3 since last summer or so; we are now approaching may without news, and if you have a look to their page it’s ike a desert, they don’t even answer the users’ comments, like they used to do. So, it looks like it has become abandonware like so many nice projects which don’t get enough support.
      Maybe it’s a good thing: in Linux we have a lot of redundant projects that disgregate manpower for nothing, because if it were for making really working and complete projects, then it would be great, more offer to the users. But not, the fact is that the grat majority of those redundant projects that do the same things are all of them incomplete, buggy and have poor quality. I think it’s good for the community that those porjects die and hopefully their developers join bigger and better projects, in this case I’d like to read that the few Calligra’s developers have joined LibreOffice to make it even better.

      So, don’t expect that announced Calligra 3.0 soon, if ever. If you need serious document editing, LibreOffice is the way to go, is the “standard de facto” in open source Office, and never crashes, something one can’t say about Calligra, LOL! Oh, and since it’s GTK it won’t “pollute” your KDE5 desktop with old QT4 dependences, just with GTK ones, but I guess you use Firefox for browsing the web, so you already have installed them. 🙂

  3. Because PDF is a hardcopy representation which means what is shown on the page layout is supposed to be there, I think it is hard for code to determine the two discrete sentences are belong to the same paragraph. The idea given by ripper17 may not work well when the last sentence of a paragraph has the stop/period char at near the end of the line.

    • As an easy example, the \LaTeX macro generates a small A which has vertical offset above the baseline of the line, and text-selection usually puts the char A in a separate line just before the line containing the other four chars.

  4. Nice, but please, make it really ODT compatible, my ODTs made in LibreOffice Writer aren’t correctly rendered: all the footnotes are missing and font sizes are totally screwed. Please, first things first; I think the first thing should be ensure Okular shows documents well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.