Rich T's blog RSS

Archive

Nov
17th
Thu
permalink

Optimizing the design of Web Intents

So Web Intents as a concept is awesome. By that I mean that the concept of being able to select an abstract intent and then, as a user, choose which of my favorite web apps gets to handle that intent is game-changing. It’s going to rule the web of connected things. Such an API could also be applied beyond the browser to invoke native apps. It could even be used to invoke apps running on different devices. Hell, what’s not to like about that?

Current State of the Art

We currently have a proposal from Google and it fits a number of use cases really well. But then we also currently have registerProtocolHandler() and registerContentHandler() that fulfill a remarkably similar purpose. Both of these methods have drawbacks as explained in the Web Intents FAQ hence the new proposal from Google. The summary of that discussion is that while the rPH method implies an intent/action but no data, the rCH method provides data but no intent/action. In a nutshell they are similar-but-slightly-different-enough to warrant their own methods. And then Web Intents also build on similar ground…

Design issues

So we’ve got a design problem in that both of these methods provide most of the general functionality of Web Intents but don’t unify that in to a single invocable method. The question we then come to is: why couldn’t we fix up rPH and rCH by combining them in to a single handler invocation method and handler registration method (or make that registration method a declarative HTML element if required) and then define rules that enable that method to act as a protocol handler that can allow an intent client to upgrade that connection to a content handler if or when that becomes necessary?

'Upgradable' Intent Handlers

I wanted to explore whether we can ‘fix’ register*Handler instead of needing to create an entirely new API for this stuff. Instead of three similar methods couldn’t we just end up with one? That method would combine all of the same functionality of the previous three leaving us with one method for ‘invoking and communicating with other services on the client-side web’. Here’s my proposal.

For the purposes of this discussion, I’m co-adopting the current use cases from the Google Proposal but adding even more functionality in the following cases:

Use Case: Search +the ability to obtain autocomplete suggestions as a user types their search query.

Use Case: Edit Image in Photo Service +allow ongoing tweaks to be made to an edited image once the initial response is received (i.e. I want to further tweak the brightness/contrast settings).

Below is a slight variation on the best alternative proposal, as proposed by Ian Hickson, I’ve seen to date. Here’s how we could invoke an intent and pass the invoked web page some data:

navigator.handleIntent(
  "http://example.com/sendto", 
  "mailto:bar@baz.com", 
  function(port) {
     port.onmessage = function (event) { 
        processIntentResponse(event.data);
     };
     port.postMessage({ 
         'action': 'sendImage', 
         'msg': "here's some image data"
       }, 
       { 
         'img': getImageData() 
       }
     ); 
  }
);

Note: the third argument to handleIntent is optional and provides an implicit MessagePort object linked to the window object of the invoked handler. The callback would be triggered once the Intent Handler has been loaded and is ready to receive messages. 

Naturally, this requires handlers for the requested type to be registered somehow. As you browse the web the browser would pick up intent registrations from visited web pages. Since there’s now only one intent type (a hybrid of all the other proposals) we only need one intent registration method:

navigator.registerIntent(
  "http://example.com/sendto", 
  "mailto", 
  "http://foo.com/handler.html"
);

Or we could just do that registration in a similar way declaratively, pointing to a single resource (handler.html) denoting the intent endpoint:

<link rel="intent" href="handler.html"/>

Then in handler.html define the parameters of the intent handler itself:

<meta name="intent-action" 
      content="http://example.com/sendto"/>
<meta name="intent-type" content="mailto"/>
<meta name="intent-disposition" content="inline"/>

So what’s happening here?

Well, I could invoke an intent URL with handleIntent(), as I did in the example above, that matches up to the ‘mailto:*’ registration that a handler page provided as above. The handler page would be selected by the user, that page would then be loaded and the handler page would initially act as a ‘protocol handler’ and do some stuff on load with the invocation URL that it was loaded with e.g.

window.onintent = function(e) {
  console.log(e.url); // logs 'mailto:bar@baz.com'
};

There is no reason why you couldn’t pass a complex object as a data URI in the same mechanism, thereby making the whole process a one-shot request/response without needing to exchange any additional messages. An intent provider could provide the following URL in the handleIntent call for example - thereby achieving the same functionality as provided by Google’s current proposal:

"data:text/x-vcard;charset=utf-8,...vcard_data_here..."

You could even pass objects by reference!

navigator.handleIntent(
  "http://example.com/sendto", 
  "http://goo.com/snapshot.png"
);

If the client page had any object data to hand to the invoked handler then the client page could upgrade that protocol handler invocation in to a ‘content handler’ whenever it liked or needed to via its implicit MessagePort interface.

port.postMessage("here's some data, intent handler!");

or it could completely transfer objects, such as Blobs, to the handler if it wanted to:

port.postMessage(null, myLargeUnwieldyBlobObj);

In short, I could pass any serializable or transferable data I want through this API whether that’s by data URI, by reference or by object.

The invoked intent handler could intercept messages at the window-object level and send responses to the originating client as follows:

window.onintent = function(e) {
  e.onmessage(function(msg) {
    e.postMessage(msg.data); // echo message to sender
  });
};

When the Intent Provider is closed the client could always detect that the Intent Provider is no longer active as follows:

port.onclose = function(e) {
  console.log("Channel closed. No more messages");
};

Vice-versa, the Intent Provider can detect the closing of a client page in a similar way.

Implicit vs. Explicit parameterization

Neither the client nor the handling server should need to worry about content-types. For example, when we usually pass around image data within a web page it’s just binary ImageData. We’ve also got a ton of work around Blobs and data: URIs that could come in handy here too of course. I can pass an ImageData object to a canvas and pull it back out as a data: URI if that turns out to be the best way to send image data. If we really wanted to maintain content type information then, as a service provider I could require that from clients in the passed JSON messaging API. That information can become an implicit part of the proposal rather than explicit.

But let’s say for a minute that an Intent Provider actually requires a specific content type for a rich media bookmarking service. The use case is for sending an image, video or audio file to an Intent Provider for post-processing before it is returned to the callee web page.

In this case a web page would register it’s intent action and handled intent types as follows:

<meta name="intent-action" content="share"/>
<meta name="intent-type" 
      content="data:image/, data:audio/, data:video/"/>

Any other page on the web could then invoke this action by invoking an intent with matching parameters. e.g.

navigator.handleIntent(
  "share", 
  "data:image/jpeg,__image_data__"
);

or like this…

navigator.handleIntent(
  "share", 
  "http://foo.com/video.mp4"
);

or - if the ‘share’ action allows it - like this:

navigator.handleIntent(
  "share", 
  null,
  function(port) {
    port.postMessage(null, myAudioBlob);
  }
);

Supporting non-API Protocol and Content Handling

So now let’s assume a page contains a bunch of links to non-HTTP resources such as the following:

<a href="mailto:foo@bar.com?subject=Test"/>Email me</a>
<a href="tel:+473424342"/>Call me</a>

When I click one of these links the user agent can calculate and present to the user intent handlers that have indicated support for the target protocol. The important characteristic here is that these links do not have an action. Only Intent Providers that have not specified an action in their intent registration will be invoked. So assuming an Intent Provider added an intent-type to the <head> of their web page but not any other header intent information:

<meta name="intent-type" content="tel:, mailto:, sms:"/>

…then this page becomes a general handler for tel, mailto and sms URI schemes invoked where only URLs are provided.

I click on the ‘Email me’ link and I’ll be presented with a selection of ‘*’-action handlers to choose from. When I select one of these providers, the intents process kicks in as normal and the handler page is loaded, the onintent listener is invoked and the url can be retrieved and processed by the intent handler web page according to the standard flow.

Equally, let’s presume a web page contains a link as follows:

<a href="data:image/png,__image_data__"/>Our logo</a>

And when I click this I get the exact same user experience. Intent Handlers that have registered for types such as ‘data’ or more explicitly, ‘data:image/png’ would be presented, I could choose my preferred provider and the standard intents communication process begins.

Formal Interface Design Notation

When developing interface proposals for web browsers we tend to use a formal interface definition notation called WebIDL to describe exactly how the interface should behave. Here is the sum of all the JavaScript APIs that have been discussed above in WebIDL format:

/** SERVER-SIDE INTENT INTERFACE: **/

[NoInterfaceObject]
interface WindowIntent {
  attribute Function? onintent;

  // 'intent' events fire with a single 
  // parameter of type: IntentEvent
}

// + window.onintent
Window implements WindowIntent;

[Constructor(DOMString url, 
  optional MessagePort remotePort)]
interface IntentEvent : Event {
  readonly attribute DOMString? url;
}

// + event.postMessage
// + event.onmessage
IntentEvent implements MessagePort;

/** CLIENT-SIDE INTENT INTERFACE: **/

[NoInterfaceObject]
interface NavigatorIntent {
  void handleIntent ( 
     [TreatUndefinedAs=Null] in DOMString? action, 
     [TreatUndefinedAs=Null] in DOMString? url, 
     in optional Function? callback 
  );
    

  // 'callback' fires when an Intent Provider has 
  // been selected and its DOM has been loaded.
  // 'callback' returns a Function with one attribute
  // of type: MessagePort

  void registerIntent (
     [TreatUndefinedAs=Null] in DOMString? action, 
     [TreatUndefinedAs=Null] in DOMString? url_pattern, 
     in DOMString target  
  );
}

// + window.navigator.handleIntent
// + window.navigator.registerIntent
Navigator implements NavigatorIntent;

Future work

There’s a whole bunch of stuff that also needs documenting but isn’t discussed here (but also a lot of work needs to happen in the Google proposal too). For example, intent providers should be able to de-register themselves if or when they stop providing intent handling services. Making that de-registration the product of a single API call (instead of 3 separate de-registration methods) is an extra bonus of having one unifying method. There may be some kinks in this design that need to be worked out. Detailed processing algorithms and development experimentation will help to tease those out. Perhaps I’ll discuss this and other aspects in future blog posts.

Summary

Web Intents is going to change the way we interact across services on the web. A whole bunch of work has gone in designing registerProtocolHandler, registerContentHandler and Web Intents to date but there may still be scope to unify this functionality in an intuitive way for developers.

What are the main benefits of the approach presented here? Besides having two fewer methods to achieve the same equivalent functionality, we also get a few other high-value benefits for free:

  • a persistent full-duplex messaging channel that we can use to soft-upgrade the connection from a ‘protocol handler’ to a ‘content handler’, passing any kind of data whenever and for whatever we want between the handler client page and handler server page.
  • the ability for a single handler page to support multiple intent connections.
  • we get everything in registerProtocolHandler, registerContentHandler and the Google proposal in an intuitive upgradable-depending-on-purpose API.

Hopefully we’ll get more chance to discuss the direction of this work further in the ongoing W3C Web Intents Task Force. I hope people are still willing to experiment and play around with different ideas. We have a very-pressing opportunity to optimize the concept of Intents on the web and make a solution to fit a wide range of real developer needs beyond the restrictions of current proposals.

If you have any feedback please feel free to leave some comments here or join the discussion over in the W3C Web Intents Task Force.

Edit: I’m new to Tumblr. No comments can be made here so please direct them to me on Twitter @richtibbett or with the hashtag #webintents.

  1. mackeeperrev reblogged this from richtr
  2. castlethaispa reblogged this from richtr
  3. richtr posted this