Sunday, May 3, 2015

JavaScript Type System Evolution from MSHTML to EdgeHTML

In new Microsoft fashion I left it up to the community to decide what kind of article I would write next. A quick poll with options like how we build our type system or how to write a real-time game loop was put up. After a day, the results seem to be 15 of 27 votes for information on how the EdgeHTML Type System is built. You can view the set of questions and the full results here, http://www.strawpoll.me/4262678/r.

Now, to be clear this is still my personal blog. The details will mostly be rooted in standards and inspection of the browser, but the details will indeed be backed by an intimate knowledge of the details of the architecture. So expect little but be surprised ;-)

Evolution of Prototypes in the DOM

In IE 7 and older modes, IE was a purely COM extensible architecture. The binding from script to the DOM was all through this COM binding, specifically IDispatchEx. This was a complicated interface but its abstraction allowed for some very complicated scenarios. If IDispatchEx were the only binding method then this story might have unfolded differently. Perhaps with an IDispatchEx2 or similar, but it wasn't all there was, and in fact, most of IE was simply an IDispatchEx wrapper around real COM interfaces. COM interfaces favor static binding over dynamic binding and thus most of our DOM was statically, tightly bound to legacy while we wanted to move forward, progressively improving. Something had to give.

Well, IE 8 would not be the release for dramatic improvement on this front. with over 3000 APIs already bound to COM, some sort of stop gap needed to be inserted. It would come in the form of Document Modes and those in turn would drive OM Versions. With 8 we introduced a compatibility mode and an IE 8 mode and with some tricks played on top of IDispatchEx, we could get any script callers who did dynamic binding to dynamically bind to the an alternate version of the interface based on their document mode. This seemed fairly clever at the time and in fact, any system with two versions scales pretty well. The fault would come when we started adding more modes.

The trick was simple, have the script engine know its OM Version and have it pass this information when querying for the interface mapping. You could even get clever and have the same document pretend to be 8 and compat at the same time, since the caller was in charge of noting their version. This also allowed for some clever scenarios. We would eventually rip this out in favor of version locking and callee based versioning in IE 9, but this in turn caused a slightly different set of problems.

But what about prototypes? Well, IE 8 was released with MDP or Mutable DOM Prototypes. I can find almost no public information about it, but it is out there, and we did write MSDN documentation for it. It allowed you to use some of the later ES 5 methods like defineProperty to reconfigure IE. This was quite a bit of work, was built on top of IDispatchEx, and obeyed all of the OM versioning logic. For the first time, IE would push constructors into the window global type system, allow you to add your own property getters/setters/methods, and those objects would be included in name resolution when script was trying to figure out what to execute on a given instance. We built a minimal inheritance support (so you could derive types), we allowed wrapping of our own native methods so you could do composition, the delete operator started working more reliably. If it weren't completely bolted on to a legacy system whose performance was suffering, with early versions of Chrome demonstrating much faster script execution, it might have stuck.

But it couldn't stick. With slow DOM performance compared to other browsers IE needed an overhaul. We also began to find that not all of the features in ES 5 were going to be possible to implement. And the final straw was the versioning semantics that we had been employing. As we added more and more COM interfaces to provide versioned characteristics of the same API (some APIs have 4 different implementations) the composition problem started to get really bad. COM could undercut your API replacement. If you wanted to log all calls to setAttribute, for instance, there were some callers that could avoid you by going through older internal methods without being redirected properly.

I apologize for all of the history, take a short breather, and we'll move into the modern and stop talking about COM.

Enter IE 9 and FastDOM

IE 9 was when we introduced a brand new script engine which everyone all knows now as Chakra. The primary goals behind Chakra were to improve native JavaScript execution speeds, garbage collection and improve overall Browser performance. Thankfully, we all knew that the existing COM binding was not going to be fast enough for us to build on, since it entailed crossing component boundaries. At the same time Chakra had this insanely fast built in type system model, with JIT support for type system lookups and all of the new ES 5 bells and whistles you could think of. So it was natural that we would create our new type system natively in the script engine. That would solve the lookup problems, but we'd be left with the binding problems. We also needed a way to describe the type system and bindings.

Well, that got to be my full-time project for a release and is what we called FastDOM. FastDOM is a set of compile time and run-time components for interfacing native JavaScript code and type system, with instances of browser components. Its the boundary between the managed JavaScript and the native browser. Most importantly, due to prioritizing JavaScript over COM, it also meant the later could be much thinner. JavaScript got the fast route and COM got to be the wrapper, not the other way around.

Don't get me wrong, this transformation doesn't happen over night. We had, at the time, over 3400 properties and methods to convert. So there were a lot of "go to the COM" redirection bindings. For critical APIs though, we removed this redirection and made super lightweight bindings instead. I'll get more into the details of this a bit later.

So FastDOM was first and foremost a compiler. It took in a modified version of webIDL which describes the prototype hierarchy, methods, properties, constants and any specialized configurations (read-only, not enumerable, etc..., stuff you can do on defineProperty). From there it builds out the code for configuring a full type system of constructors and prototypes. Finally it sets up the engine so that it can then bind instances to the rest of the type system. This is already getting deep, so if you are unfamiliar with some of the concepts, look at Figure 1. It details how all of these objects interact with one another and thus helps to explain the glue code that we have to create to combine them.

Figure 1: Relationships of a Browser Type System

The constructors all become fields, based on their name, on the Global object itself. This had already been there in IE 8 with MDP in the same way. When interrogating the window.HTMLElement, there was something there. A Function object, whose prototype in turn pointed to the HTMLElementPrototype. And that in turn was referred to by any HTMLElement instances to discover its methods and properties.

Figure 1 has its limitations. It doesn't get into the details of the prototype itself, which is important to this discussion. To finish the type system, we have to put all of the types methods and properties onto each prototype. Further, we have to make sure that those properties and methods bind to some appropriate behavior. A setter probably needs to change some state and a method probably needs to run some algorithm. In Figure 2 we add in this missing state for the Node prototype. The edges represent field names that you'd see when writing some JavaScript while the boxes represent different types of objects based on their colors. I've also broken out what is within the script engine and what appears outside of it.

Figure 2: Methods, Properties and Constants, Oh My!!!

So now we can build the final piece of the compiler. For every method and or property, it has to build a native binding in the green boxes and forward that native binding to the instance itself in the gold box nearer the bottom. Our native instance is represented in the script engine as an actual JavaScript object, a proxy of sorts, to the real deal. This is where browsers are really blurring the lines though. Some objects that we create are now purely native to the script engine and have no gold or green boxes. They are implemented purely in JavaScript. For other objects, known as Foreign Objects, we still have a proxy instance, but that proxy has no built in type system. Instead, we can query the objects dynamically asking for which properties and methods it supports instead. Complicated indeed, but hopefully with another 3 months of blogging I can explain it all ;-)

If you don't take anything else away from the article, I do want you to take this. With IE 9 on, IE's type system is entirely in the script engine, with the exception of those foreign objects. That means it is completely interrogatable and it obeys all of the features of ES 5 (actually ES 5.1) and with future installments, it gets ES 6 as well with minimal work, since the objects are implemented as true native JavaScript objects. This is the current evolution you see happening in EdgeHTML as we speak as the binding gets ever thinner. 

This means is that every function instance is a true Function and so anything you can do in JavaScript with one of those you can do to our functions as well. Our constants, are just fields, with actual native JavaScript objects in them as values. So ELEMENT_NODE is just a variable entry with the value of 1. There is nothing in the native side servicing that. Anything we can put into the script engine directly we can and do, for maximum configurability and maximum performance.

The Bindings of IE

So those green boxes are looking interesting right about now. Why do we have them? What do they do? What else do they enable? Well, I'll answer all of that ;-)

We have bindings because at the end of the day, there is still script engine code written in JavaScript and Browser code written in C++. We aren't yet at the cross-roads of a browser written purely in JavaScript where the entire Browser can be JIT'ed and optimized based on usage, etc... That would be super cool, but we simply aren't quite there yet. The C++ code in turn doesn't know about loosely typed JavaScript objects and thus there is some conversion, as specified by the strongly typed language in WebIDL about how JavaScript values get converted into native values. If you've never read the WebIDL binding semantics I highly recommend it. It has a lot of details on how and why APIs work the way they do and can be hugely informative. For instance, the old null becomes "null" conversions when passed to a DOMString method and how to avoid them, etc...

So bindings allow us to convert from JavaScript types to internal types. That also means they let us convert in the other direction. We can for instance, turn a native C++ array into a TypedArray of floats. This is pretty critical if you are WebGL and you need to do it efficiently. The bindings allow for this. You may also discover an error and need to notify the script engine. So the binding layers also translate between the native code error handling system and the exception based mechanism used by the browser.

Bindings also do browser important things that might go unnoticed by web developers. Have you ever wondered why evaluating a property on a prototype object throws an exception? Okay, maybe you've never tried that so let me give an example:
> Node.prototype.nextSibling
Error: Illegal Invocation
Okay so this happens because you've execute the nextSibling "getter" with the this pointer of the prototype object. The prototype object is not a Node instance, and so it can't possibly know how to return a nextSibling. While most JavaScript methods are duck typed, most browser methods are not. They require that the this pointer be of an appropriate type. So the bindings are also in place to validate the native type and make sure it matches the native type of the binding. Consequently we can enforce other constraints as well, such as disallowing construct semantics (don't call my method with the new operator ;-) or ensuring that all required parameters are passed. So yes, we enforce those as well.

Type System in a Nutshell

So that is the evolution of the type system from IE 7 through to current. I'll be expanding on these areas over the next few weeks. I'll be relying on WebIDL and the WebIDL annotations to demonstrate how browsers achieve interesting semantics. For instance, what is [Unforgeable] and how does that impact what you can do in the global scope of a Window or Web Worker? Or how about [Replaceable] and the odd semantics of a read-only property that isn't really read-only ;-)

There is a lot of ground to cover and I'm open to using questions as a way to guide progress. So let me know what you think. Its probably easiest to grab me on Twitter and the resulting Tweetfests should be fun for others as well, @JustrogDigiTec.

No comments:

Post a Comment