As a user…
Have you ever copied some text from e.g. Okular, KMail or LibreOffice to Plasma KRunner, to invoke some service on it, ideally based on auto-recognition of the data? And wished, you could just have already got in the context menu on the selected text the respective service you were going for?
Or have looked in the context menu of an image in a PDF, a website in Firefox or a database in Kexi and wondered why the context menu does not show at least the “Send to” services from the Kipi plugins?
As a developer…
Have you ever written a parser for plain text which detects certain things like urls or telephone numbers, then tags those text parts, to be able to highlight them and to offer certain actions on them? Only to find out that other programs are better in detection, for more things, and offer more or other services on those, at least that other program in its new release when you just aligned yours with their old?
If so, then we share some frustration. And an itch to scratch :)
Workspace-wide services on non-file objects
So what I would like to propose and do is a workspace-wide service system. Actually two.
The first system would make potentially all services on objects available everywhere, based on the mimetypes the program can support on export (e.g. the ones it would offer for the object to the clipboard on copy). It would also allow 3rd-parties to add new services without touching any existing programs.
The second system would make all object recognition logic available to all programs. And be extendable by 3rd-party as well without touching existing programs.
Because, why only deal with objects in the filesystem (blobs of bytes commonly called “files” ;) ) in a generic way? Why not also with objects in the composed object structures the programs have made up at runtime in the working memory and which the user can clearly address as objects in the UI?
Of course this needs to be properly done, so we do not end up with crowdy and surely improvable menus (e.g. like IMHO the “Send to…” menu in KSnapshot). For that I am happy that in the next days at Akademy the good people from the Visual Design Group are willing to offer their input on what people come to them with… you will find me queueing up for them :)
Data recognition system
Often data is not completely enriched with all possible semantics, there is a final enrichment done only by a human looking at the presentation of the data. E.g.
- items in a picture (like a cat, a flower or a QR code)
- items in some plain text (like a phone number or the name of a person)
- items in some partially enriched text (like an email address in a comment in source code)
Or think about items in a sound, while not that typically presented in spatial way on a screen, still there is data recognition going on there as well, like a spoken word, barking or a speaker (or a dog, if you are into dogs :) ).
Some programs have some hardcoded data recognition system, e.g. Digikam for faces of humans, Konsole for urls in console output, KMail for urls and email addresses. Their code is not shared with other programs, everyone would have to reimplement it. Kate and Okteta would have to write their own url detection code, even Rekonq, Okular and Calligra, for text not yet marked-up as url. And Gwenview will have to do its own thing for face detection.
So I imagine a set of globally installed data recognition plugins which can be called on some given data and would report where they detected which objects. They would also mark objects with a state, like just a guess or sure thing, and if there is one or multiple options for the semantic (e.g. for non-unique names of contacts matched in the addressbook).
For text, here a list of things that could be detected in plain text and where you surely can imagine some services on: geocoordinates, date, time, phone number, url, email address, irc/chat nickname, irc channel, name of person, calculation, currency amount, value with physical unit, RGB value, abbreviation, identifying names of objects (like cities, countries, buildings, satellites), program name, you-name-it…
For many of these there are already recognition parsers in Plasma KRunners (even for geocoordinates with the Marble Plasma Runner). Time to share them with the whole system!
Many of the services I think of are those you can already find offered by the Plasma KRunners: doing some action based on some data provided.
Now the system should be able more than that, I would like to have these four kind of service types:
* action based on data (read-only with regard to the original data)
* manipulating action based on data (data returning a substitute for the original data)
* action based on data combined with other data (e.g. triggered by drag’n’drop)
* manipulating action based on data combined with other data
When querying for services, the possible mimetypes of the data should be passed (like with clipboard). For some of the mentioned things above this will mean newly invented mimetypes (e.g. for irc nickname or value with physical unit), but this seems okay. Some services will want to inspect the actual data to see if they do support something. Also will context & some metadata information (like the container) be helpful as well (e.g. for a translation service). Some services are cheap/okay to be queried for support/run as often as wanted, some are not (e.g. public web services run by private). Some services can be data-risky (do profiling by the seen data or risk lacking private info). All that should be accounted for in some way.
Some semantics of the services will be needed, to assist in presentation in the UI (e.g. “send copy of data somewhere”, “show info about data”, etc.)
Programs would install context files, which could be used to configure when to offer which services (done by whitelist/blacklist of services). The UI should offer typically used services in quickly accessible/discoverable ways (like direct items in the context menu).
Perhaps there is even a fifth kind of service possible, something that feeds the tooltip or some infobox with data about the object (like a business card for person from addressbook or a map for a location).
All this should allow services like “Offer translation”, “Alternative word proposal”, “Correction proposals”, “Look up in Wikipedia/knowledge db and show mini info card”, “do calculation” (on data of type formular-data), “Convert to other unit” (on data of type value with unit), “Start program”, “Open file”, “Show color”, “Look for offers in internet shop”, you-get-the-idea.
This service system might be similar to something done in NeXTSTEP, at least I remember having read about that one day. And Android also possibly features something similar, from what I understood. If have you pointers to details about those, and other similar systems, please post them in the comments, so the concepts could be looked at and learned from as well. I still need to any research on pre-existing concepts, currently still busy with designing this proposal itself some more.
Ideally these systems are done with cross-desktop orientation in mind. At least for the services that should be doable, as service registration and service execution could be done via the abstraction layers of D-Bus, so the actual implementation does not matter. For the data recognition system I am not so sure yet, as multiple plugins all getting full data copies passed to do their special recognition on sounds rather heavy. No idea how shared memory would help here without introducing other problems?
Please give your input in the comments below, interested what you think of this.
I hope to also find a place for a BoF here at Akademy, for some proper feedback on the plan and hopefully implementation helpers :)