Sunday, February 21, 2016

Dissecting the HTML 5 Event Loop - Loops, Task Queues and Tasks

I'm hoping that in this article I can introduce the general audience of web developers to the concepts in HTML 5.1 called Event Loops (whatwg living standard, w3c old standard).

When taken in isolation event loops seem fairly natural. They are just a series of tasks, maybe categorized into groups and then executed in order. Many modern languages or task frameworks have these concepts baked deeply in now, allowing for some mixture of the concepts built into the HTML 5 event loops.

Let's start with a basic overview of the event loops section and then see if we can draw out some diagrams that roughly match what we see in the specification.

The first part of the specification tells us how integral event loops are. They are as important or as intrinsic to the system as are threads and processes to the Windows operating system.

To coordinate events, user interaction, scripts, rendering, networking, ... must use event loops ...

I like the choice of words here. First, note coordinate. That doesn't mean execute arbitrarily or even in order. It means exactly what it says, coordinate. This implies we may do some prioritization or reordering. This is actually key to browser performance but it is also one of the reasons for our differences or even differences on the same browser across different timing conditions.

We also get an idea of the types of tasks we'll be coordinating. Events, user interactions, scripts, etc... All of these are tasks in the system. They are clearly labelled in the specification as tasks. Well, sometimes, like in ES 6 we instead call them jobs and they can take on some variety which we'll get into later.

The final key bit is the must use piece at the end. This generally means that browsers won't agree if they go trying to decipher this section of the specification and come up with their own ideas. I will note there are numerous differences between the browsers today, mostly due to this spec and wording not even being in existence when the core event loops were written. I'm sure all 3 of Chrome, FireFox and Microsoft Edge had to bolt on HTML 5 event loops well before making them the core of their engine.

There are two kinds of event loops: those for browsing contexts, and those for workers.

Next we learn there are more than one kind of event loop. This is important. Likely these two types of event loops are going to share some basic concepts and then diverge, just a little bit, from one another in the more specific details.

The two kinds of event loops are for browsing contexts and workers. In our case browsing contexts are documents or page. I'll avoid their usage from now on. Just think of it as your HTML content. The reference to workers refers to normal dedicated workers, shared workers (deprecated, kinda) and service workers.

... at least one browsing context event loop ... at most one per unit of related similar-origin browsing contexts

This just tells us there is only one event loop and that event loops can be shared across browsing contexts of similar origin. I actually believe that most browsers substitute similar origin with thread. Such that all of your iframes, even in different origins, are in the same event loop. This is at least how Microsoft Edge and EdgeHTML work.

A browsing context event loop always has at least one browsing context. ... browsing contexts all go away, ... event loop goes away ...

This tells us that the lifetime of the event loop is tied to its list of browsing contexts. The event loop holds onto tasks and tasks hold onto other memory so this part is a bit of rational technical advice on how to shut down and clean up memory when all of the pages the user is viewing go away, aka closing the tab.

The next bit is on workers which we'll skip for now. Because workers don't have a one to many relationship between the event loop and the browsing contexts, they can use a more thread like lifetime model.

An event loop has one or more task queues. A task queue is an ordered list of tasks, which are algorithms that are responsible for such work as ...

Here we learn another concept called the task queue which we find there is one or more of hanging off of the event loop. It turns out the event loop does not execute tasks, instead it relies on a task queue for this instead. We learn that each task queue is ordered but we've not yet gotten a hint as to what this means. For now, let's assume insertion order is in play here since the specification has not said otherwise.

We also learn that a task is an algorithm responsible for work. The next bit of the specification will simply being listing the types of work and the recommended task queues. I'll introduce this instead through a diagram and then fall back to the spec text to further describe these units of work.

The HTML 5 specification tries to describe at least 5 task queues and units of work. It does leave off one critical piece which I'll add into the list as #6. Let's step through each and see how closely we can sync the spec to something you'd likely deal with every day.

Events - An event might be a message event sent via postMessage. I generally refer to these as async events and Edge implements this using an sync event queue style approach.
Parsing - These are hidden to the web developer for now except where these actions fire other synchronous events or in later specifications where the web developer can be part of the HTML parsing stack.
Callbacks - A callback is generally a setTimeout, setInterval or setImmediate that is then dispatched from the event loop when its time is ready. requestAnimationFrame is also a callback, but executes as part of rendering, not as a core task.
Using a Resource - These are generally download callbacks. At least that is how I read the specification here. This would be your progress, onload, onreadystatechange types of events. The spec refers to fetch here, which now uses Promises, so this may be a bug in the specification.
DOM Manipulation - This task queue probably relates to DOM mutation events such as DOMAttrModified. I think most browsers fire these synchronously (not as tasks). Also, these are events, so I believe that in the case of Microsoft Edge these will fire in task queue 1.
Input - This is now a task queue that I'm adding in. Input must be delivered in order so it belongs to its own queue. Also, the specification allows for prioritizing input over all other task queues to prevent input starvation.

One thing to note is that the specification is very loose. While it started strong with a bunch of musts and requirements for how browser's implement the loop it then gets very weak recommending "at least one" task queue. It then describes a set of task queues which really doesn't map to the full range of existing tasks that a browser has to deal with. I think this is a spec limitation that we should remedy since as a browser vendor and implementer it prevents me from implementing new features that are immediately interoperable with other browsers.

I'm going to end the dissection here and then continue later with details on how a browser is supposed to insert a task, what are task sources, and what data is associated with a task. This will probably dive through the full execution of such a task and so will also include the definition of and execution for micro-tasks and micro-task checkpoints. Fun stuff, I hope you are as excited as I am ;-)

Wednesday, December 30, 2015

EdgeHTML on Past and Future Promises

This entire post is going to be about how EdgeHTML schedule ES 6 Promises, why we made the decisions we did and work we have scheduled for the future to correct the interop differences that we've created. If you thought it was about secret features then you will be disappointed.

The starting point of this article and a lot of really cool black box investigation was done by Jake Archibald when he wanted to chat about micro-tasks in the browser. What he found was that at least one implementation, the one supplied by Chakra and EdgeHTML, didn't adhere to the latest reading of the specifications. I highly recommend reading the article first to get an understanding of some of the basic concepts presented in what I think is a very approachable form. Especially cool are the live visualizations that you can run in different browsers. My post, sadly, won't have live visualizations. I'll earmark every bit of time not spent writing visualizations to fixing bugs related to the HTML 5 event loop in EdgeHTML instead, deal?

Why Are Promises in EdgeHTML not Micro-tasks?

When we were spec'ing Promises in Chakra and EdgeHTML we were doing so very early. The Chakra team is constantly contributing to the various Ecmascript specifications and so we had a very early version of the spec for Promises from the working group. We wanted to get something working really fast, perhaps a prototype of it running (at least one meeting was before IE 11 shipped and another meeting right after it shipped when we were considering maybe adding some extra features) so we could give feedback. While this never came to be, it locked our development design specs in pretty early with something we thought was pretty solid.

When we first started our conversations were around what a Job was. This is how ES 6 defines to execute the callbacks associated with a Promise. You can view spec language here (Promise Jobs) and here (Jobs and Job Queues) if you want to try and figure it out yourself. What you'll come to, is probably the same conclusion we did. There isn't a clear relationship between the Ecmascript spec and the HTML 5 spec, per say.

This meant our first round of thinking was whether or not the JavaScript engine would have its own event loop and task queuing system. We know, and have experience with, too many schedulers running on the same thread. We felt this was bad and that it would lead to various types of starvation activity having to coordinate another event loop dependency across a major component boundary. While Chakra and EdgeHTML are very close, we still like to keep our components separated enough that we don't sacrifice agility, without which ChakraCore might not exist today...

In our second meeting we mostly discussed that HTML 5 had some concepts here. There was this HTML 5 event loop thing and it was proposing tasks queues and task sources and all kinds of coolness. However, it wasn't well defined. For instance, it only generically lists task sources and doesn't talk explicitly about how many task queues there are. There is a bit of text that even insinuates that user input could be given priority over others tasks "three quarters of the time". When you are trying to build an interoperable browser in conjunction with a several other huge companies, this kind of ambiguity is really not helpful.

We decided that a Promise callback was close enough to a setTimeout(0) and that we liked the priority of that model enough, that we merged our Promise Job queue with our setTimeout "Task Queue". In reality, EdgeHTML has only dipped a toe into the HTML 5 event loop itself, and even timeouts are not really in their own task queue, but I'll get to that more a bit later.

This was enough to complete out spec writing. Jobs == Task Queues and Promise Jobs == Set Timeouts. This would be the interface on which the Chakra engine would register work for us to then properly interlace in with the rest of the work the system had to do.

How are Promises actually Timeouts?

There is a very real trend in the browser industry to create more and more new features by building on top of the foundations that already exist. When a new feature is just too fresh, then we can implement it using a poly-fill. A poly-fill can also be used to implement an older feature which we don't plan on updating that has low overall market usage, but is critical to some segment, like we did for XPath support. So please don't be surprised by the following line of code.

Okay, its not quite that. We don't actually execute code such as that every time we want to register a Promise callback. If we did it would be a nightmare, since the page could try to intercept the calls and do bad things, or simply break itself without knowing why. Instead, we share the implementation of setTimeout with the current instance of the Chakra script engine that was created for a given document. This got us close enough to the concept of an event loop scheduler function that we were happy. And yes, they literally call that function with a Function object (your callback, whether it be your resolve or reject callback) and the value of 0.

Well, as you might be able to tell now, this is a discoverable implementation of the feature. In fact, Jake in his article was able to pretty accurately describe what we were doing even though he didn't have access to the code. Simply schedule a 0 timeout yourself and then resolve a Promise and see which callback you get first. Since all 0 timeouts get properly serialized, the Promise, as a 0 timeout, will get serialized as well.

We could have gone further and hidden some of this behavior by making Promise callbacks fire before all other 0 timeouts, but doing that work wouldn't have gotten us close enough to the necessary and now spec'ed micro-task behavior that we would need to be truly interoperable. Sadly it would have fixed some sites and that is generally good enough reason, but it might have also made it easier for the web to become dependent on our broken behavior.

There you go, in EdgeHTML Promise callbacks really are setTimeouts, they really go through the same internal code paths that existing window.setTimeout calls go through as well and there is no special magic that allows us to group them together, so they get interlaced with setTimeouts that are being registered from the page as well. Clearly a MUST FIX ;-)

Promises towards a Brighter Future

This particular situation has helped us to really re-think our existing event loop situation. The specifications are getting a lot better, defining things more clearly and simply obeying them in spirit is starting to not deliver the expected end user experience that we want. While we've gotten this far using a COM STA loop with an ad-hoc task scheduler that has no concept of task sources, task queues or similar origin browsing contexts, this situation really can't last. If the web is really the OS for the next generation of applications and hopes to supplant existing OS-centric application models then things like the threading model and scheduling become part of its domain and must be well defined.

Too deep? Yeah, I'm thinking so too ;-) I'll get into more details on the HTML 5 event loop in some future posts when I dig in really deep on hosting models, COM and Win32. For now, let's just fix Promises!

It turns out the bright future for our Promise implementation isn't far off nor is it that much of a departure from the architectures we already have in place. We already have a micro-task queue which we use for Mutation Observers. We also have a communication channel on which Chakra gets our setTimeout Function implementation. Our immediate goals will be to rewire our channel with Chakra to instead allow them to submit Jobs to us, as the host environment and that will then give us control to route them wherever we want.

Since we have a micro-task queue in place fixing the bug should be a matter of routing to that queue. Nothing is every easy though, and we'll have to consider the ramifications of executing Promise calbacks in that code and the interplay with Mutation Observers. We'll also be looking at how the other browser's interleave micro-tasks. For instance, do mutation observers and promises interlace (unified queue) or do they get split into their own queues? The current specifications only have one task source defined for the micro-task queue and that is the microtask task source, so our tests will hopefully validate the unified queue behavior and we'll be able to deliver an interoperable native Promise implementation in the very near future!