Continuous Integration: Dissecting the HTML 5 Event Loop

I'm hoping that in this article I can introduce the general audience of web developers to the concepts in HTML 5.1 called Event Loops (whatwg living standard, w3c old standard).

When taken in isolation event loops seem fairly natural. They are just a series of tasks, maybe categorized into groups and then executed in order. Many modern languages or task frameworks have these concepts baked deeply in now, allowing for some mixture of the concepts built into the HTML 5 event loops.

Let's start with a basic overview of the event loops section and then see if we can draw out some diagrams that roughly match what we see in the specification.

The first part of the specification tells us how integral event loops are. They are as important or as intrinsic to the system as are threads and processes to the Windows operating system.

To coordinate events, user interaction, scripts, rendering, networking, ... must use event loops ...

I like the choice of words here. First, note coordinate. That doesn't mean execute arbitrarily or even in order. It means exactly what it says, coordinate. This implies we may do some prioritization or reordering. This is actually key to browser performance but it is also one of the reasons for our differences or even differences on the same browser across different timing conditions.

We also get an idea of the types of tasks we'll be coordinating. Events, user interactions, scripts, etc... All of these are tasks in the system. They are clearly labelled in the specification as tasks. Well, sometimes, like in ES 6 we instead call them jobs and they can take on some variety which we'll get into later.

The final key bit is the must use piece at the end. This generally means that browsers won't agree if they go trying to decipher this section of the specification and come up with their own ideas. I will note there are numerous differences between the browsers today, mostly due to this spec and wording not even being in existence when the core event loops were written. I'm sure all 3 of Chrome, FireFox and Microsoft Edge had to bolt on HTML 5 event loops well before making them the core of their engine.

There are two kinds of event loops: those for browsing contexts, and those for workers.

Next we learn there are more than one kind of event loop. This is important. Likely these two types of event loops are going to share some basic concepts and then diverge, just a little bit, from one another in the more specific details.

The two kinds of event loops are for browsing contexts and workers. In our case browsing contexts are documents or page. I'll avoid their usage from now on. Just think of it as your HTML content. The reference to workers refers to normal dedicated workers, shared workers (deprecated, kinda) and service workers.

... at least one browsing context event loop ... at most one per unit of related similar-origin browsing contexts

This just tells us there is only one event loop and that event loops can be shared across browsing contexts of similar origin. I actually believe that most browsers substitute similar origin with thread. Such that all of your iframes, even in different origins, are in the same event loop. This is at least how Microsoft Edge and EdgeHTML work.

A browsing context event loop always has at least one browsing context. ... browsing contexts all go away, ... event loop goes away ...

This tells us that the lifetime of the event loop is tied to its list of browsing contexts. The event loop holds onto tasks and tasks hold onto other memory so this part is a bit of rational technical advice on how to shut down and clean up memory when all of the pages the user is viewing go away, aka closing the tab.

The next bit is on workers which we'll skip for now. Because workers don't have a one to many relationship between the event loop and the browsing contexts, they can use a more thread like lifetime model.

An event loop has one or more task queues. A task queue is an ordered list of tasks, which are algorithms that are responsible for such work as ...

Here we learn another concept called the task queue which we find there is one or more of hanging off of the event loop. It turns out the event loop does not execute tasks, instead it relies on a task queue for this instead. We learn that each task queue is ordered but we've not yet gotten a hint as to what this means. For now, let's assume insertion order is in play here since the specification has not said otherwise.

We also learn that a task is an algorithm responsible for work. The next bit of the specification will simply being listing the types of work and the recommended task queues. I'll introduce this instead through a diagram and then fall back to the spec text to further describe these units of work.

The HTML 5 specification tries to describe at least 5 task queues and units of work. It does leave off one critical piece which I'll add into the list as #6. Let's step through each and see how closely we can sync the spec to something you'd likely deal with every day.

Events - An event might be a message event sent via postMessage. I generally refer to these as async events and Edge implements this using an sync event queue style approach.
Parsing - These are hidden to the web developer for now except where these actions fire other synchronous events or in later specifications where the web developer can be part of the HTML parsing stack.
Callbacks - A callback is generally a setTimeout, setInterval or setImmediate that is then dispatched from the event loop when its time is ready. requestAnimationFrame is also a callback, but executes as part of rendering, not as a core task.
Using a Resource - These are generally download callbacks. At least that is how I read the specification here. This would be your progress, onload, onreadystatechange types of events. The spec refers to fetch here, which now uses Promises, so this may be a bug in the specification.
DOM Manipulation - This task queue probably relates to DOM mutation events such as DOMAttrModified. I think most browsers fire these synchronously (not as tasks). Also, these are events, so I believe that in the case of Microsoft Edge these will fire in task queue 1.
Input - This is now a task queue that I'm adding in. Input must be delivered in order so it belongs to its own queue. Also, the specification allows for prioritizing input over all other task queues to prevent input starvation.

One thing to note is that the specification is very loose. While it started strong with a bunch of musts and requirements for how browser's implement the loop it then gets very weak recommending "at least one" task queue. It then describes a set of task queues which really doesn't map to the full range of existing tasks that a browser has to deal with. I think this is a spec limitation that we should remedy since as a browser vendor and implementer it prevents me from implementing new features that are immediately interoperable with other browsers.

I'm going to end the dissection here and then continue later with details on how a browser is supposed to insert a task, what are task sources, and what data is associated with a task. This will probably dive through the full execution of such a task and so will also include the definition of and execution for micro-tasks and micro-task checkpoints. Fun stuff, I hope you are as excited as I am ;-)

Continuous Integration

Sunday, February 21, 2016

Dissecting the HTML 5 Event Loop - Loops, Task Queues and Tasks

No comments:

Post a Comment