Wednesday, December 30, 2015

EdgeHTML on Past and Future Promises

This entire post is going to be about how EdgeHTML schedule ES 6 Promises, why we made the decisions we did and work we have scheduled for the future to correct the interop differences that we've created. If you thought it was about secret features then you will be disappointed.

The starting point of this article and a lot of really cool black box investigation was done by Jake Archibald when he wanted to chat about micro-tasks in the browser. What he found was that at least one implementation, the one supplied by Chakra and EdgeHTML, didn't adhere to the latest reading of the specifications. I highly recommend reading the article first to get an understanding of some of the basic concepts presented in what I think is a very approachable form. Especially cool are the live visualizations that you can run in different browsers. My post, sadly, won't have live visualizations. I'll earmark every bit of time not spent writing visualizations to fixing bugs related to the HTML 5 event loop in EdgeHTML instead, deal?

Why Are Promises in EdgeHTML not Micro-tasks?

When we were spec'ing Promises in Chakra and EdgeHTML we were doing so very early. The Chakra team is constantly contributing to the various Ecmascript specifications and so we had a very early version of the spec for Promises from the working group. We wanted to get something working really fast, perhaps a prototype of it running (at least one meeting was before IE 11 shipped and another meeting right after it shipped when we were considering maybe adding some extra features) so we could give feedback. While this never came to be, it locked our development design specs in pretty early with something we thought was pretty solid.

When we first started our conversations were around what a Job was. This is how ES 6 defines to execute the callbacks associated with a Promise. You can view spec language here (Promise Jobs) and here (Jobs and Job Queues) if you want to try and figure it out yourself. What you'll come to, is probably the same conclusion we did. There isn't a clear relationship between the Ecmascript spec and the HTML 5 spec, per say.

This meant our first round of thinking was whether or not the JavaScript engine would have its own event loop and task queuing system. We know, and have experience with, too many schedulers running on the same thread. We felt this was bad and that it would lead to various types of starvation activity having to coordinate another event loop dependency across a major component boundary. While Chakra and EdgeHTML are very close, we still like to keep our components separated enough that we don't sacrifice agility, without which ChakraCore might not exist today...

In our second meeting we mostly discussed that HTML 5 had some concepts here. There was this HTML 5 event loop thing and it was proposing tasks queues and task sources and all kinds of coolness. However, it wasn't well defined. For instance, it only generically lists task sources and doesn't talk explicitly about how many task queues there are. There is a bit of text that even insinuates that user input could be given priority over others tasks "three quarters of the time". When you are trying to build an interoperable browser in conjunction with a several other huge companies, this kind of ambiguity is really not helpful.

We decided that a Promise callback was close enough to a setTimeout(0) and that we liked the priority of that model enough, that we merged our Promise Job queue with our setTimeout "Task Queue". In reality, EdgeHTML has only dipped a toe into the HTML 5 event loop itself, and even timeouts are not really in their own task queue, but I'll get to that more a bit later.

This was enough to complete out spec writing. Jobs == Task Queues and Promise Jobs == Set Timeouts. This would be the interface on which the Chakra engine would register work for us to then properly interlace in with the rest of the work the system had to do.

How are Promises actually Timeouts?

There is a very real trend in the browser industry to create more and more new features by building on top of the foundations that already exist. When a new feature is just too fresh, then we can implement it using a poly-fill. A poly-fill can also be used to implement an older feature which we don't plan on updating that has low overall market usage, but is critical to some segment, like we did for XPath support. So please don't be surprised by the following line of code.

Okay, its not quite that. We don't actually execute code such as that every time we want to register a Promise callback. If we did it would be a nightmare, since the page could try to intercept the calls and do bad things, or simply break itself without knowing why. Instead, we share the implementation of setTimeout with the current instance of the Chakra script engine that was created for a given document. This got us close enough to the concept of an event loop scheduler function that we were happy. And yes, they literally call that function with a Function object (your callback, whether it be your resolve or reject callback) and the value of 0.

Well, as you might be able to tell now, this is a discoverable implementation of the feature. In fact, Jake in his article was able to pretty accurately describe what we were doing even though he didn't have access to the code. Simply schedule a 0 timeout yourself and then resolve a Promise and see which callback you get first. Since all 0 timeouts get properly serialized, the Promise, as a 0 timeout, will get serialized as well.

We could have gone further and hidden some of this behavior by making Promise callbacks fire before all other 0 timeouts, but doing that work wouldn't have gotten us close enough to the necessary and now spec'ed micro-task behavior that we would need to be truly interoperable. Sadly it would have fixed some sites and that is generally good enough reason, but it might have also made it easier for the web to become dependent on our broken behavior.

There you go, in EdgeHTML Promise callbacks really are setTimeouts, they really go through the same internal code paths that existing window.setTimeout calls go through as well and there is no special magic that allows us to group them together, so they get interlaced with setTimeouts that are being registered from the page as well. Clearly a MUST FIX ;-)

Promises towards a Brighter Future

This particular situation has helped us to really re-think our existing event loop situation. The specifications are getting a lot better, defining things more clearly and simply obeying them in spirit is starting to not deliver the expected end user experience that we want. While we've gotten this far using a COM STA loop with an ad-hoc task scheduler that has no concept of task sources, task queues or similar origin browsing contexts, this situation really can't last. If the web is really the OS for the next generation of applications and hopes to supplant existing OS-centric application models then things like the threading model and scheduling become part of its domain and must be well defined.

Too deep? Yeah, I'm thinking so too ;-) I'll get into more details on the HTML 5 event loop in some future posts when I dig in really deep on hosting models, COM and Win32. For now, let's just fix Promises!

It turns out the bright future for our Promise implementation isn't far off nor is it that much of a departure from the architectures we already have in place. We already have a micro-task queue which we use for Mutation Observers. We also have a communication channel on which Chakra gets our setTimeout Function implementation. Our immediate goals will be to rewire our channel with Chakra to instead allow them to submit Jobs to us, as the host environment and that will then give us control to route them wherever we want.

Since we have a micro-task queue in place fixing the bug should be a matter of routing to that queue. Nothing is every easy though, and we'll have to consider the ramifications of executing Promise calbacks in that code and the interplay with Mutation Observers. We'll also be looking at how the other browser's interleave micro-tasks. For instance, do mutation observers and promises interlace (unified queue) or do they get split into their own queues? The current specifications only have one task source defined for the micro-task queue and that is the microtask task source, so our tests will hopefully validate the unified queue behavior and we'll be able to deliver an interoperable native Promise implementation in the very near future!

4 comments:

  1. Good stuff. Just remember that mutation observers are a bit wonky with their microtasks since they use the "compound microtask" concept. So a single compound microtask goes into the queue at a time, alongside multiple promise microtasks.

    ReplyDelete
    Replies
    1. This will be part of the work we have to do. We ONLY have a concept for micro-tasks that execute in response to Mutation Observers. There is a bit of factoring to do in order to enable the "solitary callback microtasks".

      Delete
  2. Also it'll be worth testing against WebKit nightly, not just released Safari, since they have recently gotten more in line with the spec. https://github.com/WebKit/webkit/commit/440ba0d38ef128b7d4c8420bf007f45801718012

    ReplyDelete
    Replies
    1. Jake noted FireFox 43 had maybe fixed to spec as well. So I'm thinking Chrome, Canary and FireFox 43 as baselines. If anyone disagrees I'll send up signal flares ;-)

      Delete