Workspace-wide services on non-file objects

As a user…

Have you ever copied some text from e.g. Okular, KMail or LibreOffice to Plasma KRunner, to invoke some service on it, ideally based on auto-recognition of the data? And wished, you could just have already got in the context menu on the selected text the respective service you were going for?
Or have looked in the context menu of an image in a PDF, a website in Firefox or a database in Kexi and wondered why the context menu does not show at least the “Send to” services from the Kipi plugins?

As a developer…

Have you ever written a parser for plain text which detects certain things like urls or telephone numbers, then tags those text parts, to be able to highlight them and to offer certain actions on them? Only to find out that other programs are better in detection, for more things, and offer more or other services on those, at least that other program in its new release when you just aligned yours with their old?

If so, then we share some frustration. And an itch to scratch ๐Ÿ™‚

Workspace-wide services on non-file objects

So what I would like to propose and do is a workspace-wide service system. Actually two.

The first system would make potentially all services on objects available everywhere, based on the mimetypes the program can support on export (e.g. the ones it would offer for the object to the clipboard on copy). It would also allow 3rd-parties to add new services without touching any existing programs.

The second system would make all object recognition logic available to all programs. And be extendable by 3rd-party as well without touching existing programs.

Because, why only deal with objects in the filesystem (blobs of bytes commonly called “files” ๐Ÿ˜‰ ) in a generic way? Why not also with objects in the composed object structures the programs have made up at runtime in the working memory and which the user can clearly address as objects in the UI?

Of course this needs to be properly done, so we do not end up with crowdy and surely improvable menus (e.g. like IMHO the “Send to…” menu in KSnapshot). For that I am happy that in the next days at Akademy the good people from the Visual Design Group are willing to offer their input on what people come to them with… you will find me queueing up for them ๐Ÿ™‚

I'm going to Akademy

Data recognition system

Often data is not completely enriched with all possible semantics, there is a final enrichment done only by a human looking at the presentation of the data. E.g.

  • items in a picture (like a cat, a flower or a QR code)
  • items in some plain text (like a phone number or the name of a person)
  • items in some partially enriched text (like an email address in a comment in source code)

Or think about items in a sound, while not that typically presented in spatial way on a screen, still there is data recognition going on there as well, like a spoken word, barking or a speaker (or a dog, if you are into dogs ๐Ÿ™‚ ).

Some programs have some hardcoded data recognition system, e.g. Digikam for faces of humans, Konsole for urls in console output, KMail for urls and email addresses. Their code is not shared with other programs, everyone would have to reimplement it. Kate and Okteta would have to write their own url detection code, even Rekonq, Okular and Calligra, for text not yet marked-up as url. And Gwenview will have to do its own thing for face detection.

So I imagine a set of globally installed data recognition plugins which can be called on some given data and would report where they detected which objects. They would also mark objects with a state, like just a guess or sure thing, and if there is one or multiple options for the semantic (e.g. for non-unique names of contacts matched in the addressbook).

For text, here a list of things that could be detected in plain text and where you surely can imagine some services on: geocoordinates, date, time, phone number, url, email address, irc/chat nickname, irc channel, name of person, calculation, currency amount, value with physical unit, RGB value, abbreviation, identifying names of objects (like cities, countries, buildings, satellites), program name, you-name-it…

For many of these there are already recognition parsers in Plasma KRunners (even for geocoordinates with the Marble Plasma Runner). Time to share them with the whole system!

Services system

Many of the services I think of are those you can already find offered by the Plasma KRunners: doing some action based on some data provided.
Now the system should be able more than that, I would like to have these four kind of service types:
* action based on data (read-only with regard to the original data)
* manipulating action based on data (data returning a substitute for the original data)
* action based on data combined with other data (e.g. triggered by drag’n’drop)
* manipulating action based on data combined with other data

When querying for services, the possible mimetypes of the data should be passed (like with clipboard). For some of the mentioned things above this will mean newly invented mimetypes (e.g. for irc nickname or value with physical unit), but this seems okay. Some services will want to inspect the actual data to see if they do support something. Also will context & some metadata information (like the container) be helpful as well (e.g. for a translation service). Some services are cheap/okay to be queried for support/run as often as wanted, some are not (e.g. public web services run by private). Some services can be data-risky (do profiling by the seen data or risk lacking private info). All that should be accounted for in some way.

Some semantics of the services will be needed, to assist in presentation in the UI (e.g. “send copy of data somewhere”, “show info about data”, etc.)

Programs would install context files, which could be used to configure when to offer which services (done by whitelist/blacklist of services). The UI should offer typically used services in quickly accessible/discoverable ways (like direct items in the context menu).

Perhaps there is even a fifth kind of service possible, something that feeds the tooltip or some infobox with data about the object (like a business card for person from addressbook or a map for a location).

All this should allow services like “Offer translation”, “Alternative word proposal”, “Correction proposals”, “Look up in Wikipedia/knowledge db and show mini info card”, “do calculation” (on data of type formular-data), “Convert to other unit” (on data of type value with unit), “Start program”, “Open file”, “Show color”, “Look for offers in internet shop”, you-get-the-idea.

This service system might be similar to something done in NeXTSTEP, at least I remember having read about that one day. And Android also possibly features something similar, from what I understood. If have you pointers to details about those, and other similar systems, please post them in the comments, so the concepts could be looked at and learned from as well. I still need to any research on pre-existing concepts, currently still busy with designing this proposal itself some more.

Ideally these systems are done with cross-desktop orientation in mind. At least for the services that should be doable, as service registration and service execution could be done via the abstraction layers of D-Bus, so the actual implementation does not matter. For the data recognition system I am not so sure yet, as multiple plugins all getting full data copies passed to do their special recognition on sounds rather heavy. No idea how shared memory would help here without introducing other problems?

Please give your input in the comments below, interested what you think of this.
I hope to also find a place for a BoF here at Akademy, for some proper feedback on the plan and hopefully implementation helpers ๐Ÿ™‚


10 thoughts on “Workspace-wide services on non-file objects

  1. There’s times I’ve wished for something like this. This sounds like a worthy experiment to attempt, and I hope to hear more about your progress.

  2. Yes, yes, a thousand times yes! ๐Ÿ˜€

    From the end-user perspective I only see the problem of UI implementation, so it isn’t e.g. just a huge list of text in a menu.

    On a related note, I’m planning to suggest a global solution for user-friendly RegExp at some point.

    • Yes, I also see that problem, hook. See my comment about KSnapshot’s long list for “Send to”. On Tuesday afternoon I should get good advice by the VDG what could be done.

  3. Check out the xdg list archives for several long discussions about expanding the MIME associations spec to also cover something like Android “intents” — David Faure was heavily involved.

  4. Instead of mimetypes, I think can be used classes from the Nepomuk ontology, which already describes most of the bits of informations you listed.
    Each service can publish which classes accepts as input and which classes outputs (e.g. input a mail address, output list of photos where the person associated to the address is tagged), leveraging the semantical relations already indexed and kept by the (always sub-used) desktop indexers.
    Some time ago I started some – never completed :-\ – proof-of-concept implementation based on Tracker, centered on drag’n’drop as main interaction method.

    • Interesting idea. But this might cover more than what I am looking for here for now.
      So far my plan is that the code which queries for the (action) services does not actually care about the semantics of the actions, other then for the useful presentation in the UI. And for now any data that services can return should only be useful for replacing the initial data, so be of the same (mime)type. Because for yielding other data types from the services, the code dealing with that needs some intent of what should be done with the new data type, right?
      And this idea so far was not about (hardcoded) intents of the code, but about those of the human seeing those objects and out of sudden want to do something related to it.
      Indeed, it is very related, so possibly should be solved by one system. So food for thoughts, thanks.

      • Meanwhile saw your point perhaps: Actually one would like a chain of object recognizers. E.g. for a face detected in an image, one expects it to be tagged not just as face, but also linked to a contact in your contact database, if available. On which then the services would be offered, and less on the result of the first level data recognition.
        Or a string recognized in an image or a QR code, the actual recognized string would ideally be passed on again to the string recognizers, so an url, JSON code, irc nickname etc, would be detected as such, and the services for those would be offered as well.
        And the same still with a text recognized as email address in first round, services could be also offered to linked things as the related contact in the database. So for such an email address it could be rather seen as identifier in the email system of a given person, so services around the person should be offered even on an email address.
        I see, this will need some more iterations of thought ๐Ÿ™‚

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s