Thursday, December 31, 2015

Retrospective for 2015: The year of Blogging, Mentoring and EdgeHTML

Almost a year ago, on January 3rd, I made a commitment to myself to start sharing my experiences in software, mentoring and self-improvement through gamification through my blog. You can review that first post to see how my year started!

It was a lofty set of goals at the time. I remember the desire to write a blog entry per week and starting out very strong. I remember the feeling of ending my vacation having just re-read Jane McGonigal's Reality is Broken and really wanting to change my work place through principles of gamification. Most importantly I remember how my mentees were super charging my insights into how to find and fix problems in your personal and work life.

To that end my year is going to be about Career, Blogging, Mentoring and other Random Stuff. Lets combine everything I learned to self-reflect and hopefully come up with a new improve strategy for the year to come. My first post in the New Year then will be about the year to come and this, my last post of the year, will be about what I've accomplished.

Blogging


I actually started the year with 3 blogs. Two where I hoped to be purely technical and share technical shorts. They were for HTML 5 gaming and a parallel to the Perl Blackbook I was calling the HTML 5 Black Book. However, everything I wanted to write about fit into my personal blog instead, Continuous Integration, the one you are reading here. While I did contribute an article or two to those others blogs, I don't consider them launched yet. I still like the concepts they represent and I hope to improve my contributions to those in the coming year.

For my primary blog, including this posting, I was able to push out 33 articles over the course of the year. Nothing tells the story like a graph so you can see the months where I did very well and the months where I feel short, even disappearing for some time. I'll talk a bit about those, because they become real opportunities for improvement in the new year.


First thing to note is that trend line. I lack consistency for sure and the trend line would be even worse if I hadn't pulled it together in May and then again in November/December. So what happened during those extended periods of time?

March and April both coincide with some major bug fixing work to get Windows 10 ready for general release. This relaxed for a bit in May as we were doing more planning work for the next Windows 10 Update Release which most people should have by now and mostly got pushed out in November.

Starting in late June through end of August, that was the major planning and coding milestones for the update release. If a feature wasn't done during that period, it probably didn't make it into the November update. Then September and October, bug fixing for that release.

Are you seeing the problem? I sure am. I can't chew bubble gum and walk at the same time. That's not quite true, I'm actually quite good at context switching and balancing things. But the reality of faster releases of Windows is that everyone needed to figure out HOW to do it. I think I have the right recipes figured out to fix this so we'll see how next year goes since we are likely to make this release cycle and cadence the norm.

We can also pivot by our topics or tags. I didn't start my blog tagging and I didn't go back and really tag everything properly, so there are some things underrepresented. However, the tags that I was hoping to see were Gamfication and Mentoring. While I did some of that through my blog, it doesn't show up in my tags as you'll see. Another great thing to fix!


What I did really well on was an area I wasn't even thinking about when I started the blog. I've been able to build an information channel for the IE and now Microsoft Edge product. A channel which is maybe a bit more raw than it should be, but so far so good. More importantly a channel which can help the developers gain a voice when talking to the community and customers about problems, features and directions for the EdgeHTML component and the Microsoft Edge browser. I'm pretty proud of that.

I've also gotten a chance to really show how we are building our script engine integration in the FastDOM posts. That has been a great series and it drills into the intricate details of JavaScript and how the "Host" part of the specification works, which is a place only Browser vendors tend to dabble.

Starting in August I've also been ramping up on how our telemetry works in EdgeHTML and telemetry and data gathering in general. No reason that all of these findings have to be specific to our uses cases. In these posts I've started to find vital connections between how the engineering system (data used to create the browser) is then used to build and validate the telemetry which is used to build and validate future features and finally used to build tests and perform security analysis. I'll certainly talk more about how this positive feedback systems works in the coming year.

A year ago blogging was something I had done in the past. I remembered it being fun, In fact I wrote somewhere between 330-370 articles on .NET features and performance on the old weblogs.asp.net system. That ended when I joined Microsoft in 2005. It took me then 10 years to regain my writing itch and its one of the most positive improvements I made this year!

Mentoring


I've already written this part of the retrospective as a break-out. That's how important mentoring is to me. It really does drive a lot of my more major initiatives. Even the blog writing was the outcome of some mentoring suggestions I had provided to others. I was trying to get them to use it as an outlet for improving communication, building a network and generally putting into writing their mastery of some subject.

When mentoring, be very careful recommending something you aren't currently doing. Next thing you know you'll be doing it. Its hard to sell advice you yourself are not taking. In my case I felt it was a great idea for others, but that I didn't have the time. Well, I found that time and it turned out great. That means one great passion, mentoring, led to the revival of a previous passion, blogging!

One story not in my mentoring article though, which deserves to be called out, is the mentoring I've been able to provide for my wife, Ning, who also worked at Microsoft until quite recently. It is through our daily conversations that I get to learn the differences between being a male vs a female in tech. I also get to learn how many things I take for granted or how many strategies I'm able to use which simply don't work for her. There is no reason they shouldn't work, but they don't. So I get to see first hand some of the imbalance and some of the gender bias that our industry holds.

But this is a positive story, about career mentoring, not a sad story about inequality. Over the course of several months I worked with her to help navigate her career at Microsoft, but at some point it became very clear that the glass ceiling for her was different than my own. Instead of looking purely internally we started to look externally, at other opportunities.

This is where I'm really on my a-game. When it comes to interviewing and programming problems, I find them more fun than stressful so we also spent quite a bit of time in coffee shops on weekends preparing and studying the various technologies that you get rusty on as a line programmer at a major company.

I won't go into the details, but this took a few months, since we both had day jobs, but at the end, she has now catapulted her career and is now happy as a Noogler (how long can I call her that?). I usually get a chance to congratulate at least one mentee per year on a career promotion, but I don't always get a chance to watch my wife overcome all opposition and adversity and claim the career she wants. I'm super proud of her and couldn't be happier.

Career


What about my own career? How do I feel about the "New Microsoft?" Am I good with my 10 years on the IE team or am I ready for a change? What does my team think of me? Well these are all very critical questions. Some are also very personal and if answered in a public forum might make some of my co-workers uncomfortable. This is my retrospective though, so I'll try to address what I can while still ensuring the comfort and privacy of the people I work with.

First, what do I think about the "New Microsoft?" Let me start by saying I've always viewed Microsoft as the largest, most capable software engineering machine on the planet. There is so much capability in the people I work with. You point them at hard problems and you get great solutions, on time and on budget. You DO have to throw them at the right hard problems and you DO have to avoid putting too much process in their way.

The "New Microsoft" is basically executing to change these two components of how the company does business. It is changing the problems and picking problems that are both business savvy, marketable, engaging to customers, etc... That means we have a customer for all of our great solutions rather than trying to make markets from thin air. We are also working very hard to get out of the way of the engineers and improve the engineering process.

I'm liking the "New Microsoft". I definitely like the stock price. I like even more that they need people like me to help execute the change and that they are engaging me to figure out how I want to work and then enabling me to work the way that I want. Its really increasing the velocity with which we can solve problems and I appreciate even more the renewed focus on problems that matter.

Microsoft is huge though so how am I feeling about the IE team and my 10 years on the same team? After working on the team long enough I stopped working for IE and I started working for the Web. A year ago I might not have been very comfortable saying that out loud. Now, with the new directions that Microsoft as a company is going, I'm completely comfortable saying that. I would be comfortable working at Mozilla or Google as well, as long as I'm moving the Web itself forward and there isn't a monopoly developing. A monopoly on the web, IMO, would be pretty damaging and I think we learned that from the early 2000's. We had to reinvent a lot of the same technologies over again because the IE 5-6 monopoly didn't create the right result.

Since the IE team is really focusing on interoperability and standards its quite aligned with my goals of moving the web forward and the team is very supportive of helping me do just that. I wouldn't take change off the table and there are some technologies that I'm really interested in such as VR but overall it would take a pretty convincing argument to make me move.

Does my team still want me around and what do they think of me? This is always a blind spot. I spend hours and hours helping my mentees determine this information by trying to provide objective, alternate viewpoints. They tell me what they think and then I ask them questions that hopefully allow them to change their viewpoints or be a little bit more accepting of how their team looks at them.

I've been on the team for a LOOOONG time and they are still giving me great reviews and great rewards. I think those speak for themselves. Sometimes my ideas are a bit too progressive and progressive ideas are scary. They are different from what you are used to and in the case of having to deliver on them, it might put your into an uncomfortable position where you don't know if you'll be able to succeed. My projects tend to succeed, they tend to get a lot of visibility and the people that get put on my projects also get rewarded well. Overall, I think team interactions are going great.

I can't currently find any major flaws in my career. If anything, I'd like to increase my impact, responsibility and accountability. As an IC or Individual Contributor at Microsoft, this part of the equation can be difficult. I don't command resources and I've exceeded the levels wherein I can accomplish everything myself. To this end having Microsoft be able to more directly acknowledge the role of a technical lead would be an excellent improvement. A technical lead is someone who does not manage people, but directs technical projects that have people working on them. At Microsoft we combine the lead/manager roles together and the technical lead is more informal. I know other companies have a different viewpoint and maintain technical leads. Overall I think technical leads just increase the number of simultaneous projects you can do as well as improve the agility of switching between different projects.

My next career jump is a challenging one at Microsoft. Just saying that probably gives it away for people familiar with the companies level systems. I'm ready for the challenge though and I have a lot of support. I've had a string of great bosses, but my current boss, Rico Mariani is truly exceptional. My direct team, the performance team (though we do much more as you can tell from all the random things I tweet about fixing), is fully of some of the strongest technical talent in the company, every person being a self-motivated bug fixing and feature producing machine.The broader team, Windows organization and the company as a whole is laser focused on the web as a platform for application development. I'm poised to make some great things happen in the coming year ;-)

Random Stuff


Let's start with my work accomplishments since those are now top of mind after thinking through the state of my career. This year was the year of Windows 10 and Windows 10 Update 1. I'm sure the code names have leaked by now so we called these things Threshold and Threshold 2. That brought to the world one of my major features, the EdgeHTML split. A brand new DLL without the compatibility burden of the legacy MSHTML which we would continue to service for business customers.

Along with this new DLL was a heightened focus on interoperability between Edge and the mobile web. People might say we were chasing the webkit web, but that isn't fair nor is it true. We were chasing mobile interoperability and site compat primarily. While doing so, we wound up finding corners of the web which had only been tested in iOS or Chrome for Android and so we had to come up with solutions and adapt. It was a major effort and we fix thousands of bugs (many of which were the sizes of features).

On my personal development, I was able to keep my interview skills fresh while helping my wife prepare for her new career adventures. A couple of rounds through LeetCode.com is a great bit of preparation if you haven't interviewed in a few years. Its also a great way to get fresh content for interviewing others. In 2014 I was able to do an interview training course for a group of employees that were let go in the layoffs and I was using LeetCode.com at that time as a source of inspiration. I went so far as to take an interview back in September and even got an offer!

I started the year on a tri-club and I'm sad to say I feel off the bike around May and didn't get back on. I also started running with a few co-workers and was doing well for about 3 months and then stopped due to a foot injury. This exercise thing is proving to be tough and is probably one of my low points for the year. There is a pretty strong link between exercise, stress and productivity and I'm ending this year feeling like I've given up a bit of comfort and productivity by not concentrating more on my exercise regime.

Well, if I don't wrap this up now it won't make it in this year (I'm using Seattle's New Year as a time frame even though I'm in Hawaii so I could technically claim 2 more hours). This should be enough reflection on the previous year for my next post in the New Year on what my next set of accomplishments should be. See you next year!

Wednesday, December 30, 2015

EdgeHTML on Past and Future Promises

This entire post is going to be about how EdgeHTML schedule ES 6 Promises, why we made the decisions we did and work we have scheduled for the future to correct the interop differences that we've created. If you thought it was about secret features then you will be disappointed.

The starting point of this article and a lot of really cool black box investigation was done by Jake Archibald when he wanted to chat about micro-tasks in the browser. What he found was that at least one implementation, the one supplied by Chakra and EdgeHTML, didn't adhere to the latest reading of the specifications. I highly recommend reading the article first to get an understanding of some of the basic concepts presented in what I think is a very approachable form. Especially cool are the live visualizations that you can run in different browsers. My post, sadly, won't have live visualizations. I'll earmark every bit of time not spent writing visualizations to fixing bugs related to the HTML 5 event loop in EdgeHTML instead, deal?

Why Are Promises in EdgeHTML not Micro-tasks?

When we were spec'ing Promises in Chakra and EdgeHTML we were doing so very early. The Chakra team is constantly contributing to the various Ecmascript specifications and so we had a very early version of the spec for Promises from the working group. We wanted to get something working really fast, perhaps a prototype of it running (at least one meeting was before IE 11 shipped and another meeting right after it shipped when we were considering maybe adding some extra features) so we could give feedback. While this never came to be, it locked our development design specs in pretty early with something we thought was pretty solid.

When we first started our conversations were around what a Job was. This is how ES 6 defines to execute the callbacks associated with a Promise. You can view spec language here (Promise Jobs) and here (Jobs and Job Queues) if you want to try and figure it out yourself. What you'll come to, is probably the same conclusion we did. There isn't a clear relationship between the Ecmascript spec and the HTML 5 spec, per say.

This meant our first round of thinking was whether or not the JavaScript engine would have its own event loop and task queuing system. We know, and have experience with, too many schedulers running on the same thread. We felt this was bad and that it would lead to various types of starvation activity having to coordinate another event loop dependency across a major component boundary. While Chakra and EdgeHTML are very close, we still like to keep our components separated enough that we don't sacrifice agility, without which ChakraCore might not exist today...

In our second meeting we mostly discussed that HTML 5 had some concepts here. There was this HTML 5 event loop thing and it was proposing tasks queues and task sources and all kinds of coolness. However, it wasn't well defined. For instance, it only generically lists task sources and doesn't talk explicitly about how many task queues there are. There is a bit of text that even insinuates that user input could be given priority over others tasks "three quarters of the time". When you are trying to build an interoperable browser in conjunction with a several other huge companies, this kind of ambiguity is really not helpful.

We decided that a Promise callback was close enough to a setTimeout(0) and that we liked the priority of that model enough, that we merged our Promise Job queue with our setTimeout "Task Queue". In reality, EdgeHTML has only dipped a toe into the HTML 5 event loop itself, and even timeouts are not really in their own task queue, but I'll get to that more a bit later.

This was enough to complete out spec writing. Jobs == Task Queues and Promise Jobs == Set Timeouts. This would be the interface on which the Chakra engine would register work for us to then properly interlace in with the rest of the work the system had to do.

How are Promises actually Timeouts?

There is a very real trend in the browser industry to create more and more new features by building on top of the foundations that already exist. When a new feature is just too fresh, then we can implement it using a poly-fill. A poly-fill can also be used to implement an older feature which we don't plan on updating that has low overall market usage, but is critical to some segment, like we did for XPath support. So please don't be surprised by the following line of code.

Okay, its not quite that. We don't actually execute code such as that every time we want to register a Promise callback. If we did it would be a nightmare, since the page could try to intercept the calls and do bad things, or simply break itself without knowing why. Instead, we share the implementation of setTimeout with the current instance of the Chakra script engine that was created for a given document. This got us close enough to the concept of an event loop scheduler function that we were happy. And yes, they literally call that function with a Function object (your callback, whether it be your resolve or reject callback) and the value of 0.

Well, as you might be able to tell now, this is a discoverable implementation of the feature. In fact, Jake in his article was able to pretty accurately describe what we were doing even though he didn't have access to the code. Simply schedule a 0 timeout yourself and then resolve a Promise and see which callback you get first. Since all 0 timeouts get properly serialized, the Promise, as a 0 timeout, will get serialized as well.

We could have gone further and hidden some of this behavior by making Promise callbacks fire before all other 0 timeouts, but doing that work wouldn't have gotten us close enough to the necessary and now spec'ed micro-task behavior that we would need to be truly interoperable. Sadly it would have fixed some sites and that is generally good enough reason, but it might have also made it easier for the web to become dependent on our broken behavior.

There you go, in EdgeHTML Promise callbacks really are setTimeouts, they really go through the same internal code paths that existing window.setTimeout calls go through as well and there is no special magic that allows us to group them together, so they get interlaced with setTimeouts that are being registered from the page as well. Clearly a MUST FIX ;-)

Promises towards a Brighter Future

This particular situation has helped us to really re-think our existing event loop situation. The specifications are getting a lot better, defining things more clearly and simply obeying them in spirit is starting to not deliver the expected end user experience that we want. While we've gotten this far using a COM STA loop with an ad-hoc task scheduler that has no concept of task sources, task queues or similar origin browsing contexts, this situation really can't last. If the web is really the OS for the next generation of applications and hopes to supplant existing OS-centric application models then things like the threading model and scheduling become part of its domain and must be well defined.

Too deep? Yeah, I'm thinking so too ;-) I'll get into more details on the HTML 5 event loop in some future posts when I dig in really deep on hosting models, COM and Win32. For now, let's just fix Promises!

It turns out the bright future for our Promise implementation isn't far off nor is it that much of a departure from the architectures we already have in place. We already have a micro-task queue which we use for Mutation Observers. We also have a communication channel on which Chakra gets our setTimeout Function implementation. Our immediate goals will be to rewire our channel with Chakra to instead allow them to submit Jobs to us, as the host environment and that will then give us control to route them wherever we want.

Since we have a micro-task queue in place fixing the bug should be a matter of routing to that queue. Nothing is every easy though, and we'll have to consider the ramifications of executing Promise calbacks in that code and the interplay with Mutation Observers. We'll also be looking at how the other browser's interleave micro-tasks. For instance, do mutation observers and promises interlace (unified queue) or do they get split into their own queues? The current specifications only have one task source defined for the micro-task queue and that is the microtask task source, so our tests will hopefully validate the unified queue behavior and we'll be able to deliver an interoperable native Promise implementation in the very near future!

Tuesday, December 29, 2015

Progress Towards a Fully Tested Web API Surface Area

Back in September I was amazed by the lack of comprehensive testing present for the Web API surface area and as a result I proposed something akin to a basic surface level API suite that could be employed to make sure every API had coverage. Since that time a lot of forward progress has occurred, we've collected a bunch of additional data and we've come up with facilities for better ensuring such a suite is complete.

So let's start with the data again and figure out what we missed last time!

Moar Data and MOAR Test Suites

Adding more test suites is hopefully going to improve coverage in some way. It may not improve your API surface area coverage, but it may improve the deep usage of a given API. We originally pulled data from the following sources:
  1. Top 10k sites - This gives us a baseline of what the web believes is important.
  2. The EdgeHTML Regression Test Suite - By far the most comprehensive suite available at the time, this tested ~2500 API entry points well. It did hit more APIs, but we excluded tests which only enumerated and executed DOM dynamically.
  3. WebDriver Enabled Test Suites - At the time, we had somewhere between 18-20 different suites provided by the web community at large. This hit ~2200 APIs.
  4. CSS 2.1 Test Suite - Mostly not an OM test so only hit ~70 APIs
Since then we've added or improved the sources:
  1. Top 100k sites - Not much changed by adding sites.
  2. Web API Telemetry in EdgeHTML - This gave us a much larger set of APIs used by the web. It grew into the 3k+ range!! But still only about 50% of the APIs we export are used by the Web making for a very large, unused surface area.
  3. DOM TS - An internal test suite built during IE 9 to stand up more Standards based testing. This suite has comprehensive depth on some APIs not tested by our other measures.
  4. WPT (Web Platform Tests) - We found that the full WPT might not be being run under our harnesses, so we targeted it explicitly. Unfortunately, it didn't provide additional coverage over the other suites we were already running. It did end up becoming part of a longer term solution to web testing as a whole.
And thanks to one of our data scientists, Eric Olson, we have a nice Venn Diagram that demonstrates the intersection of many of these test suites. Note, I'm not including the split out WPT tests here, but if there is enough interest I can probably try to see if we can try a different Venn Diagram that can include more components or rework this one and pull out an existing pivot.


Since this is so well commented already, I won't go into too much, but I'll point out some key data points. The EdgeHTML DRTs have a lot of coverage not present in any public suites. That is stuff that is either vendor prefixed, MS specific or that we need to get into a public test suite. It likely requires that we do some work, such as conversion of the tests to test-harness.js before that happens, but we are very likely to contribute some things back to the WPT suite in the future. Merry Christmas!?!

We next found that the DOM TS had enough coverage that we would keep it alive. A little bit of data science here was the difference between deleting the suite and spending the development resources to bring it back and make it part of our Protractor runs (Protractor is our WebDriver enabled harness for running public and private test suites that follow the test-harness.js pattern).

The final observation to have is that there are still thousands of untested APIs even after we've added in all of the coverage we can throw together. This helped us to further reinforce the need for our Web API test suite and to try and dedicate the resources over the past few months to get it up and running.

WPT - Web Platform Test Suite

In my original article I had left out specific discussions of the WPT. While this was a joint effort amongst browsers, the layout of the suite and many aspects of its maintenance were questionable. At the time, for instance, there were tons of open issues, many pull requests, and the frequency of updates wasn't that great. More recently there appears to be a lot of new activity though so maybe this deserves to be revisited as one of the core suites.

The WPT is generally classified as suite based testing. It is designed to be as comprehensive as possible. It is organized by specification, which arguably means nothing to web developers, but does mean something to browser vendors. For this reason, many of the ad-hoc and suite based testing which was present in the DRTs, if upgraded to test-harness.js, could slot right in. I'm hopeful that sometime after our next release we are also able to accompany it with an update for WPT that includes many of our private tests so that everyone can take advantage of the collateral we've built up over the years.

Enhancing the WPT with this backlog of tests, and potentially increasing coverage by up to ~800 APIs, will be a great improvement I think. I'm also super happy to see so many recent commits from Mozilla and so many merge requests making it back into the suite!

Web API Suite

We still need to fix the API gap though and so for the past couple of months we've (mostly the work of Jesse Mohrland, I take no credit here) been working on a design which could take our type system information and automatically generate some set of tests. This has been an excellent process because we've now started to understand where more automatically generated tests can be created and that we can do much more than we originally thought without manual input. We've also discovered where the manual input would be required. Let me walk through some of our basic findings.

Instances are a real pain when it comes to the web API suite. We have about 500-600 types that we need to generate instances of. Some may have many different ways to create the instances that result in differences of behavior as well. Certainly creating some elements will result in differences in their tagName, but they may be of the same type. Since we are an API suite we don't want to force each element to have its own suite of tests, instead we focus on the DOM type and thus we just want to test 1 instance generically and then run some other set of tests on all instances.

We are not doing the web any service by only having EdgeHTML based APIs in our list. Since our dataset is our type system description, we had to find a way to add unimplemented stuff to our list. This was fairly trivial, but hasn't yet been patched into the primary type system. This has so many benefits though. Enough that I'll enumerate them in a list ;-)

  1. We can have a test score the represents even things we are missing. So instead of only having tests for things that exist, we have a score against things we haven't implemented yet. This is really key towards having a test suite not just useful to EdgeHTML but also to other vendors.
  2. True TDD (Test Driven Development) can ensue. By having a small ready-made basic suite of tests for any new APIs that we add, the developer can check in with higher confidence. The earlier you have tests available the higher quality your feature generally ends up being.
  3. This feeds into our other data collection. Since our type system has a representation of the DOM we don't support, we can also enable things like our crawler based Web API telemetry to gather details on sites that support APIs we don't yet implement.
  4. We can track status on APIs and suites within our data by annotating what things we are or are not working on. This can further be used to export to sites like status.modern.ie. We don't currently do this, nor do we have any immediate plans to change how that works, but it would be possible.
Many of these benefits are about getting your data closer to the source. Data that is used to build the product is always going to be higher quality than say data that was disconnect. Think about documentation for instance which is built and shipped out of a content management system. If there isn't a data feed from the product to the CMS then you end up with out of data articles for features from multiple releases prior, invalid documentation pages that aren't tracking the latest and greatest and even missing documentation for new APIs (or removing documentation for dead APIs).

Another learning is that we want the suite to be auto-generated for as many things as possible. Initial plans had us sucking in the tests themselves, gleaning user generated content out of them, regenerating and putting back the user generated content (think custom tests written by the user). The more we looked at this, the more we wanted to avoid such an approach. For the foreseeable future we want to stop at the point where our data doesn't allow us to continue auto-generation. And when that happens, we'll update the data further and continue regenerating.

That left us with pretty much a completed suite. As of now, we have a smallish suite with around 16k tests (only a couple of tests per API for now) that is able to run using test-harness.js and thus it will execute within our Protractor harness. It can trivially then be run by anyone else through WebDriver. While I still think we have a few months to bake on this guy I'm also hoping to release it publicly within the next year.

Next Steps

We are going to continue building this suite. It will be much more auto-generated than originally planned. Its goal will be to test the thousands of APIs which go untested today by more comprehensive suites such as WPT. It should test many more thousands of unimplemented APIs (at least by our standards) and also some APIs which are only present in specific device modes (WebKitPoint on Phone emulation mode). I'll report back on the effort as we make progress and also hope to announce a future date for the suite to go public. That, for me, will be an exciting day when all of this work is made real.

Also, look out for WPT updates coming in from some of  the EdgeHTML developers. While our larger test suite may not get the resources to push to WPT until after our next release I'm still hopeful that some of our smaller suites can be submitted earlier than that. One can always dream ;-)

Friday, December 25, 2015

Web API and Feature Usage - From Hackathon to Production

I wanted to provide some details on how a short 3 day data science excursion has lead to increased insights for myself, my team and eventually for the web itself.

While my original article focused on hardships we faced along the way, this article will focus more on two different topics that take much longer than 3 days to answer. The first topic is around how you take your telemetry and deliver it at web scale and production quality. You can see older articles from Ars Technica that have Windows 10 at 110 million installs back in October. That is a LOT of scale. The second topic I want to discuss are the insights that we can gather after the data has been stripped of PII (personally identifiable information).

I'll start with a quick review of things we had prior to Windows 10, things we released with Windows 10 and then of course how we released DOM API profiling in our most recent release. This later bit is the really interesting part for me since it is the final form of my Hackathon project (though in full transparency, the planning for the DOM telemetry projected preceded my hackathon by a couple of months ;-)

Telemetry over the Years

The concepts of getting telemetry to determine how you are doing in the wild is nothing new. And Web Browsers, Operating Systems and many other applications have been doing it for a long time. The largest scale telemetry effort (and probably the oldest) on Windows is likely still Watson. We leverage Watson to gain insights into application and OS reliability and to focus our efforts on finding and fixing newly introduced, crashing and memory related bugs.

For the browser space, Chrome has been doing something with use counters for a while. These are great, lightweight boolean flags that get set on a page and then recorded as a roll-up. This can tell you, across some set of navigations to various pages, whether or not certain APIs are hit. An API, property or feature may or may not be hit depending on user interaction, flighting, the user navigating early, which ads load, etc... So you have to rely on large scale statistics for smoothing, but overall this is pretty cool stuff that you can view on the chrome status webpage.

FireFox has recently built something similar and while I don't know exactly what they present, you can view it for yourself on their telemetry tracking page as well.

For Microsoft Edge, our telemetry over the years has been very nuanced. We started with a feature called SQM that allowed us to aggregate user information if they opted into our privacy policies. This let us figure out how many tabs you use on average, which UI features were getting clicks and a small set of other features. These streams were very limited in the amount of data we could send and so we were careful not to send up too much.

With Windows 10 we started to lean more on a new telemetry system which is based on ETW (Event Tracing for Windows) which gives us very powerful platform that we were already familiar with to log events not just in our application, but across the system. The major improvements made here were how we extended our existing page load and navigation timings so that we could detect very quickly whether or not we had a performance problem on the web without having to wait for users to file a bug for a site and then up-vote it using our various bug reporting portals.

Just doing something we already had in our back pockets from previous releases would have been boring though, so we decided that a boolean flag based structure, logged per navigation would also give us a lot of extra data that we could use to determine feature popularity within the browser itself. While annotating every DOM API would be overkill for such an effort, given there are 6300 of them in our case, of which nearly 3000 are in general usage on the web, we instead saved this for larger feature areas and for exploring APIs in depth. This functionality shipped in our initial release of Windows 10 and we've been steadily adding more and more telemetry points to this tracker. Many of which are mirrored in the Chrome data, but many of which are more specific to our own operations or around features that we might want to try and optimize in the future.

This puts the web in a great place. At any given point in time you have hundreds if not thousands of active telemetry points and metrics being logged by all of the major browser vendors, aggregating and gaining insight across the entire web (not just a single site and not just what is available to the web site analytics scripts) and being shared and used in the standards process to help us better design and build features.

Building a Web Scale Telemetry System

I don't think a lot of people understand web scale. In general we have trouble, as humans, with large numbers. Increasingly larger sequences tend to scale much more slowly in our minds than in reality. My favorite book on the subject currently escapes my mind, but once I'm back at my desk at home I'll dig it out and share it with everyone.

So what does web scale mean? Well, imagine that Facebook is serving up 400 million users worth of information a day and imagine that they account for say, 5% of the web traffic. These are completely made up numbers. I could go look up real numbers, but let's just hand wave. Now, imagine that Internet Explorer and Microsoft Edge have about 50% of the desktop market share (again made up, please don't bust me!) and that accounts for about 1.2 billion users.

So Facebook's problem is in that they have to scale to deliver data to 400 million users, but a lot more than say our telemetry would be, and they account for 5% of all navigations. Let's play with these numbers a bit and see how they compare to the browser itself. Instead of 400 million users, let's say we are at 600 million (half of that 1.2 billion person market). Instead of 5% of page navigations we are logging 100% of them. This would put us at roughly 30x more telemetry data points having to be managed than Facebook having to manage a day's worth of responses to their entire user base. This is just a beginning of a web scale feature. We don't get the luxury of there being millions of sites that get to distribute the load, instead all of the data from all of those users has to slowly hit our endpoints and its very unique, uncacheable data.

Needless to say, we don't upload all of it, nor can we. You can opt out of data collection and many people do. We will also restrict our sampling groups for various reasons to limit the data feeds to an amount that can be managed. But imagine there is a rogue actor in the system, maybe a web browser that is new that doesn't have years of refinement on its telemetry points. You could imagine such a browser over logging. In fact, to be effective in the early times where your browser is being rolled out, you have to log more and more often to get enough data points while you build your user base up. You want the browser to be able to log a LOT of data and then for that data to be restricted by policies based on the amount and types of data being received before going up to the cloud.

Cool, so that is a very long winded way to introduce my telemetry quality or LOD meter. Basically, when you are writing telemetry, how much thought/work should you put into it based on the audience that it will be going to and of what size is that audience. As you scale up and you want to roll out to say, every Windows 10 user, then something as innocuous as a string formatting swprint might have to be rethought and reconsidered. The following figure shows, for my team, what our considerations are when we think about who we will be tapping to provide our data points for us ;-)


I can also accompany this with a simple table that maps the target audience to the various points that change as you slide from the left of the scale to the right. From left to right the columns are:

  • Audience - From the figure above who is going to be running the bits and collecting data.
  • Code Quality - How much thought has to be put into the code quality?
  • Code Performance - How much performance impact can the code have?
  • Output Formatting - How verbose can the data be and how can it be logged?
  • Logging Data Size - How much data can be logged?
  • Logging Frequency - Can you log every bit of data or do you have to start sampling?

Note many of these are interrelated. By changing your log data size, you might be able to increase your frequency. In production, you tend to find that the answer is, "yes, optimize all of these" in order to meet the desired performance and business requirements. Also note that as you get to release, the columns are somewhat additive. You would do all of the Beta oriented enhancements for Release as well as those specified for the Release code. Here is the table with some of my own off the cuff figures.

Audience Quality Performance Output Log Size Log Frequency
Developer Hack Debug Perf Strings and Files GBs Every Call
Team Doesn't Crash Debug Perf Log Files GBs Every Call
Internal Reviewed 1.2-1.5x Log Files/CSV GBs Aggregated
Beta Reviewed & Tested 1.1x String Telemetry <MBsAggregated+Compressed
Release Optimized ~1x Binary Telemetry <5KBAgg+Sampled+Compressed

Hopefully it is clear from the table that to build a developer hack into your system you can get away with murder. You can use debug builds with debug performance (some debug builds of some products can be greater than 2x-5x slower than their retail counterparts) using debug string formatting and output, maybe write it to a file, but maybe just use OutputDebugString. You can log gigs of data and you can most importantly log everything, every call, full fidelity. In this mode you'll do data aggregation later. 

The next interesting stage is internal releases. This might be to a broader team of individuals and it may also include using web crawlers, performance labs, test labs, etc... to exercise the code in question. Here we have to be more mindful of performance, the code needs to have a review on it to find stupid mistakes, and you really need to start collecting data in a well formatted manner. At this point, raw logs start to become normalized CSVs and data tends to be aggregated by the code before writing to the logs to save a bit more on the output size. You can still log gigs of data or more at this point though, assuming you can process all of it. You also probably want to only enable the logging when requested, for instance via an environment variable or by turning on a trace logging provider (ETW again, if you didn't follow that link you should, ETW is our preferred way of building these architectures into Windows).

Depending on your scale, Beta and Release may have the same requirements. For us they tend to, since our beta size is in the millions of users and most beta users tend to enable our telemetry so they can give us that early feedback we need. Some companies ship debug builds to beta users though, so at this point you are just trying to be respectful of the end users machine itself. You don't want to stores gigs of log data, you don't want to upload uncompressed data. You may choose not to upload binary data at this point though. In fact having it in viewable format for the end user to see can be a good thing. Some users appreciate that. Others don't, but hey, you can't make everyone happy.

Finally when you release, you have to focus on highly optimized code. At this point your telemetry should have as close to 0 as possible on the performance marks for your application. Telemetry is very important to a product, but so is performance, so finding your balance is important. In a browser, we have no spare time to go collecting data, so we optimize both at the memory level and the CPU level to make sure we are sipping at the machine's resources and leaving the rest to the web page. You'll generally want to upload binary telemetry, in small packets, highly compressed. You'd be surprised what you can do with 5KB for instance. We can upload an entire page's worth of sampled DOM profiler information on most pages. More complex pages will take more, but that is where the sampling can be further refined. I'll talk a bit about some of these considerations now.

DOM Telemetry

Okay, so for DOM information how can we turn the knobs, what is available, what can we log? We decided that a sampled profiler would be best. This instruments the code so that some small set of calls as we choose will have timing information taken as part of the call. The check for whether or not we should log needs to be cheap as does the overhead of logging the call. We also want some aggregation since we know there are going to be thousands of samples that we'll take and we only want to upload those samples if we aren't abusing the telemetry pipeline.

A solution that used a circular logging buffer + a sampled API count with initial random offset was sufficient for our cases. I apologize for slamming you with almost ten optimizations in that one sentence, but I didn't feel going through the entire decision tree that we did would be useful. This is the kind of feature that can take longer to design than code ;-)

Let's start with sampling. We built this into the accumulator itself. This meant that any CPU overhead with logging could be eliminated whenever we weren't sampling (rather than having sampling accept or reject data at a higher level). Our sampling rate was a simple counter. Something like 1 in every 1000 or 1 in every 10000. By tweaking the number to 1, we could log every call if we wanted to making it a call attributed profiler. From my hack I did build a call attributed profiler instead, since I wanted more complete data and my collection size was a small set of machines. The outcome of that effort though showed to do it right you would need to aggregate, which we aren't doing in this model. Aggregation can cost CPU and we can defer that cost to the cloud in our case!

With a simple counter and mod check we can now know if we have to log. To avoid a bias against the first n-1 samples, we start our counter with a random offset. That means we might log the first call, the 50th call, whatever, but then from there it is spaced by our sampling interval. These are some of the tricks you have to use when using sampling otherwise you might miss things like bootstrapping code if you ALWAYS skip the first sampling interval of values.

When logging we take the start and end QPCs (QueryPerformanceCounter values), do the math and then log the values. On 64-bit, we can submit a function pointer (this is the DOM function) the QPC delta and a few flags bits to the circular buffer and continue on. We don't even bother decoding the function pointers until we are in the cloud where we marry the data with symbols. I can't recall but we also decided at some point that we would send down the value of the QueryPerformanceFrequency down in the payload so we could do the math on that in the cloud as well. We might have decided against that in the end, but you can clearly see the lengths we go to when thinking about how much CPU we use on the client's machine.

The next knob we have is the circular buffer size and the logging frequency. We allow ourselves to log the first buffer during a navigation and then 1 more buffer every minute. If the buffer isn't full we log a partial buffer. If the buffer overflows then we simply lose samples. We never lose samples in the initial navigation buffer since we always commit it when its ready and then put ourselves on a future logging diet.

Once this data hits the Windows telemetry service, it gets to decide if this users is opted into this type of logging. So we MIGHT in some cases be tracking things that don't make it up to us. We do try and detect this beforehand, but we can't always do this. There are things like throttling that would decide if buffer should go up or no. Once we hit production, which we did in our first Windows 10 update release back, then scale kicks in and you don't even concern yourself with the missing data because you have WAY TOO MUCH data to deal with already!

The Windows telemetry pipeline also controls for many other variables which I'm not tuned into. There is an entire data science team which knows how to classify the machines, the users, the locale, and a bunch of other information from each Windows machine and then those become pivots that we can sometimes get in our data. We can certainly get details on domains and URLs once we have enough samples (to anonymize the data there must be a sufficient number of samples, otherwise we can end up seeing PII without realizing it).

Okay, I'm starting to get into the data itself so let's take a look at some of the insights this effort has brought to our attention!

Web API Insights

There are two schools of thought in data science. Ask a specific question and then attempt to answer it with some telemetry. This is a very focused approach, it often yields results, but it rarely creates new questions or allows for deep insight. For the second approach, when we think about "big data" as opposed to "data science" we start to think about how our raw data has deeply buried patterns and insights for us to go glean. Its rarely this clean, but there are indeed patterns in that raw data, and if you have enough of it, you definitely start asking more questions ;-) Our second school of thought then wants to add telemetry to many things with no specific question, then process the data and see if anything pops out.

Our Web API telemetry design is both. First, we did have some very specific questions and our questions were around things like, "What are the top 10 DOM APIs by usage?" and "What are the top 10 DOM APIs by total exclusive time?". These are usage and performance questions. We didn't start by thinking about other questions though like, "What APIs are the top 10 websites using today that is different from 3 months ago?" How could we ask a time oriented question requiring many data points without having the first data point? Well, by collecting more raw data that didn't have specific questions in mind just yet, we can ask some of those questions later, historically if you will, and we can run algorithms to find patterns once we have additional data.

One of our biggest outcomes from the hackathon data was using a clustering algorithm to cluster the sites into 10 categories based on their API usage. Would you have guessed that 700 websites out of the top 1000 would be categorized and appear similar to one another? I wouldn't have.

Here are some insights that we were able to derive. I'm, unfortunately, anonymizing this a little bit but hopefully in the future we'll be able to broadly share the data similar to how Chrome and FireFox are doing through their telemetry sites.

Insight #1: Upon initial release of our feature, we found that our numbers were heavily skewed towards URL's in a specific country that we didn't expect to be super high. We found, using this method, an indirect correlation between upgrade cadence and country. After a couple of weeks this completely evaporated from our data and we started to see the site distribution that we more traditionally expected.

Insight #2: Our crawler data only had about a 60-70% overlap with our live data. This meant that what people do on the web changes quite a bit between their initial navigates and when they start to interact with the page. Our crawler was blind to big sites where people spend a lot of time and do a lot of interactions. All of those interactive scenarios were only "marginally" hit by the crawler.

This means that some APIs not on our performance optimization list started to jump up the list and became important for our team. We also started to extrapolate use cases from the data we were seeing. As an immediate example, APIs like setTimeout started to show up more since that is how dynamic pages are written. requestAnimationFrame was the same. All of the scheduling APIs moved up the list a bit when we considered the live data and presented differently than the crawler did. This was great news.

Insight #3: Even though I just talked down the crawler, it turns out, it isn't THAT far off. Since we know its shortcomings we can also account for them. We use the crawler to validate the live data (does it make sense?) and we use the live data to validate the crawler (is it still representative of the real world). Having two different ways to get the same data to cross validate is a huge bonus when doing any sort of data science projects.

Insight #4: The web really needs to think about deprecation of APIs moving forward. The power of the web is becoming the ability for the run-time and language to adopt to new programming trends in months rather than years. This has the downside of leading to a bloated API set. When APIs are no longer used by the web we could try to work towards their deprecation and eventual removal. Given the use trackers of Chrome, FireFox and Microsoft Edge this can become more than just a hope. If we consider that Internet Explorer is supporting the legacy web on Windows platforms and filling that niche roll of keeping the old web working I see even more hope.

What we classically find is that something like half of the web API is even used. Removing APIs would improve perf, shrink browser footprint and make space for newer APIs that do what web developers actually want them to do.

Insight #5: My final insight is one that we are only beginning to realize. We are collecting data over time. FireFox has an evolution dashboard on their site and hear I'm linking one where they explore, I think, UI lags in the event loop and how that changes over time.

Why do overtime metrics matter for the browser? Well, by watching for usage trends we can allocate resources towards API surface area that will need it most in the future. Or we can focus more on specifications that extend the areas where people are broadly using the API set. A great example would be monitoring adoption of things like MSE or Media Source Extensions and whether or not the browser is supporting the media APIs necessary to deliver high quality experiences.

We can also determine if architectural changes have materially impacted performance either to the positive or negative. We've been able to "see" this in some of our data though the results are inconclusive since we have too few data points currently. By logging API failures we can take this a step further and even find functional regressions if the number of failures increases dramatically say between two releases. We don't yet have an example of this, but it will be really cool when it happens.

Conclusions

After re-reading, there is a LOT of information to digest in this article. Just the section on the telemetry LOD could be its own article. The Web API data, I'm sure, will be many more articles to come as well. We should be able to make this available for standards discussions in the near future if we haven't already been using it in that capacity.

The most stand-out thought for me as a developer was that going from Hackathon to Production was a long process, but not nearly as long as I thought it would be. I won't discount the amount of work that everyone had to put in to make it happen, but we are talking about a dev month or two, not dev years. The outcome from the project will certainly drive many dev years worth of improvements, so in terms of cost/benefit it is definitely a positive feature.

Contrasting this with work that I did to instrument a few APIs with use tracker before we had this profiler available, I would say the general solution came out to be much, much cheaper. That doesn't mean everything can be generally solved. In fact, my use tracker for the APIs does more than just log timing information. It also handles the parameters passed in to give us more insight into how the API is being used.

In both cases adding telemetry was pretty easy though. And that is the key to telemetry in your product. It should be easy to add, easy to remove and developers should be aware of it. If you have systems in place from the beginning to collect this data, then your developers will use it. If you don't have the facilities then developers may or may not write it themselves, and can certainly write it very poorly. As your product grows you will experience telemetry growing pains. You'll certainly wish you had designed telemetry in from the start ;-) Hopefully some of the insights here can help you figure out what level of optimization, logging, etc... would be right for your project.

Credits

I'd like to provide credit here to the many people who ended up helping with my efforts in this space. I'll simply list first names, but I will contact each of them individually and back fill their twitter accounts or a link to a blog if they want.

Credit for the original design and driving the feature goes to my PM Todd Reifsteck (@ToddReifsteck) and our expert in Chakra who built the logging system, Arjun.

Credit for all of the work to mine the data goes mainly to one of our new team members Brandon. After a seamless hand-off of the data stream from Chakra we have then merged it with many other data streams to come up with the reports we are able to use now to drive all of the insights above.

Tuesday, December 22, 2015

A 2014 WebVR Challenge and Review

I almost can't believe it was nearly 2 years ago when I started to think about WebVR as a serious and quite new presentation platform for the web. The WebGL implementation in Internet Explorer 11 was state of the art and you could do some pretty amazing things with it. While there were still a few minor holes, customers were generally happy and the performance was great.

A good friend, Reza Nourai, had his DK1 at the time and was experimenting with a bunch of experimental optimizations. He was working on the DX12 team and so you can imagine he knew his stuff and could find performance problems both in the way we built games/apps for VR and in the way the hardware serviced all of the requests. In fact, it wasn't long after our Hackathon that he got a job at Oculus and gained his own bit of power over the direction that VR was heading ;-) For the short time where I had access to the GPU mad scientist we decided if Microsoft was going to give us two days for the first every OneHack then we'd play around with developing an implementation of this spec we kept hearing about, WebVR and attaching it to the DK1 we had access to.

This became our challenge. Over the course of 2 days, implement the critical parts of the WebVR specification (that was my job, I'm the OM expert for the IE team), get libVR running in our build environment and finally attach the business end of a WebGLRenderingContext to the device itself so we could hopefully spin some cubes around. The TLDR is that we both succeeded and failed. We spent more time redesigning the API and crafting it into something useful than simply trying to put something on the screen. So in the end, we wound up with an implementation that blew up the debug runtime and rendered a blue texture. This eventually fixed itself with a new version of the libVR, but that was many months later. We never did track down why we hit this snag, nor was it important. We had already done what we set out to do, integrate and build an API set so we had something to play around with. Something to find all of the holes and issues and things we wanted to make better. Its from this end point that many ideas and understandings were had and I hope to share these with you now.

Getting Devices

Finding an initializing devices or hooking up a protected process to a device is not an easy task. You might have to ask the user's permission (blocking) or any number of other things. At the time the WebVR implementation did not have a concept for how long this would take and they did not return a Promise or have any other asynchronous completion object (callback for instance) that would let you continue to run your application code and respond to user input, such as trying to navigate away. This is just a bad design. The browser needs APIs that get out of your way as quickly as possible when long running tasks could be serviced on a background thread.

We redesigned this portion and passed in a callback. We resolved the callback in the requestAnimationFrame queue and gave you access at this point to a VRDevice. Obviously this wasn't the best approach, but our feedback, had we had the foresight at the time to approach the WebVR group, would have been, "Make this a Promise or Callback". At the time a Promise was not a browser intrinsic so we probably would have ended up using a callback, and then later, moving to a Promise instead. I'm very happy to find the current specification does make this a Promise.

This still comes with trade-offs. Promises are micro-tasks and serviced aggressively. Do you really want to service this request aggressively or wait until idle time or some other time. You can always post your own idle task once you get the callback to process later. The returned value is a sequence and so it is fixed and unchanging.

The next trade-off comes when the user has a lot of VR devices. Do you init them all? Our API would let you get back device descriptors and then you could acquire them. This had two advantages. First, we could cache the previous state and return it more quickly without having to acquire the devices themselves. Second, you could get information about the devices without having to pay seconds of cost or have to immediately ask the user for permission. You might say, what is a lot? Well, imagine that I have 6 or 7 positional devices that I use along with 2 HMDs. And lets not forget positional audio which is completely missing from current specifications.

The APIs we build for this first step will likely be some of the most important we build for the entire experience. Right now the APIs cater towards the interested developer who has everything connected, is actively trying to build something with the API and is willing to work round poor user experience. Future APIs and experiences will have to be seamless and allow normal users easy access to our virtual worlds.

Using and Exposing Device Information

Having played with the concept of using the hardware device ID to tie multiple devices together, I find the arrangement very similar to how we got devices. While an enterprising developer can make sure that their environment is set up properly, we can't assert the same for the common user. At current, we should probably assume though that the way to tie devices together is sufficient. That an average user would only have one set of hardware. But then, if that is the case, why would we separate the positional tracking from the HMD itself? We are, after all, mostly tracking the position of the HMD itself in 3D space. For this reason, we didn't implement a positional VR device at all. We simply provided the positional information directly from the HMD through a series of properties.

Let's think about how the physical devices then map to existing web object model. For the HMD we definitely need some concept of WebVR and the ability to get a device which comprises of a rendering target and some positional/tracking information. This is all a single device, so having a single device expose the information makes the API much simpler to understand from a developer perspective.

What about those wicked hand controllers? We didn't have any, but we did have some gamepads. The Gamepad API is much more natural for this purpose. Of course it needs a position/orientation on the pad so that you can determine where it is. This is an easy addition that we hadn't made. It will also need a reset so you can zero out the values and set a zero position. Current VR input hardware tends to need this constantly, if for no other reason than user psychology.

Since we also didn't have WebAudio and positional audio in the device at the time we couldn't really have come up with a good solution then. Exposing a set of endpoints from the audio output device is likely the best way to make this work. Assuming that you can play audio out of the PC speakers directly is likely to fail miserably. While you could achieve some 3D sound you aren't really taking advantage of the speakers in the HMD itself. More than likely you'll want to send music and ambient audio to the PC speakers and you'll want to send positional audio information, like gunshots, etc... to the HMD speakers. WebAudio, fortunately, allows us to construct and bind our audio graphs however we want making this quite achievable with existing specifications.

The rest of WebVR was "okay" for our needs for accessing device state. We didn't see the purpose of defining new types for things like rects and points. For instance DOMPoint is overkill (and I believe it was named something different before it took on the current name). There is nothing of value in defining getter/setter pairs for a type which could just be a dictionary (a generic JavaScript object). Further, it bakes in a concept like x, y, z, w natively that shouldn't be there at all and seems only to make adoption more difficult. To be fair to the linked specification it seems to agree that other options, based solely on static methods and dictionary types is possible.

Rendering Fast Enough

The VR sweet spot for fast enough is around 120fps. You can achieve results that don't induce simulation sickness (or virtual reality sickness) at lesser FPS but you really can't miss a single frame and you have to have a very fast and responsive IMU (this is the unit that tracks your head movement). What we found when using canvas and window.requestAnimationFrame is that we could even get 60fps let alone more. The reason is the browser tends to lock to the monitor refresh rate. At the time, we also had 1 frame to commit the scene and one more frame the compose the final desktop output. That added 2 more frames of latency. That latency will dramatically impact the simulation quality.

But we could do better. First, we could disable browser frame commits and take over the device entirely. By limiting the browser frames we could get faster intermediate outputs from the canvas. We also had to either layer or flip the canvas itself (technical details) and so we chose to layer it since flipping an overly large full-screen window was a waste of pixels. We didn't need that many, so we could get away with a much smaller canvas entirely.

We found we didn't want a browser requestAnimationFrame at all. That entailed giving up time to the browser's layout engine, it entailed sharing the composed surface and in the end it meant sharing the refresh rate. We knew that devices were going to get faster. We knew 75fps was on the way and that 90 or 105 or 120 was just a year or so away. Obviously browsers were going to crush FPS to the lowest number possible for achieving performance while balancing the time to do layout and rendering. 60fps is almost "too fast" for the web and most pages only run at 1-3 "changed" frames per second while browsers do tricks behind the scenes to make other things like user interactivity and scrolling appear to run at a much faster rate.

We decided to add requestAnimationFrom to the VRDevice instead. Now we gained a bunch of capabilities, but they aren't all obvious so I'll point them out. First, we now hit the native speeds of the devices and we sync to the device v-sync and we don't wait for the layout and rendering sub-systems of the browser to complete. We give you the full v-sync if we can. This is huge. Second, we are unbound from the monitor refresh rate and backing browser refresh rate, so if we want to run at 120fps while the browser does 60fps, we can. An unexpected win is that we could move the VRDevice off-thread entirely and into a web worker. So long as the web worker had all of the functionality we needed, or it existed on the device or other WebVR interfaces we could do it off-thread now! You might point out that WebGL wasn't available off-thread generally in browsers, but to be honest, that wasn't a blocker for us. We could get a minimal amount of rendering available in the web worker. We began experimenting here, but never completed the transition. It would have been a couple of weeks, rather than days, of work to make the entire rendering stack work in the web worker thread at the time.

So we found, that as long as you treat VR like another presentation mechanism with unique capabilities, then you can achieve fast and fluid frame commits. You have to really break the browser's normal model to get here though. For this reason I felt that WebVR in general should focus more on how to drive content to this alternate device using a device specific API rather than on piggybacking existing browser features such as canvas, full-screen APIs, and just treating the HMD like another monitor.

Improving Rendering Quality

When we were focusing on our hack, WebVR had some really poor rendering. I'm not even sure we had proper time warp available to us in the SDK we chose and certainly the DK1 we had was rendering at a low frame rate, low resolution, etc... But even with this low resolution you still really had to think about the rendering quality. You still wanted to render a larger than necessary texture to take advantage of deformation. You still wanted to get a high quality deformation mesh that matched the users optics profile. And you still wanted to hit the 60fps framerate. Initial WebGL implementations with naive programming practices did not make this easy. Fortunately, we weren't naive and we owned the entire stack so when we hit a stall we knew exactly where and how to work around it. That makes this next section very interesting, because we were able to achieve a lot without the restrictions of building against a black box system.

The default WebVR concept at the time was to take a canvas element, size it to the full size of the screen and then full screen it onto the HMD. In this mode the HMD is visualizing the browser window. With the original Oculus SDK you even had to go move the window into an area of the virtual desktop that showed up on the HMD. This was definitely an easy way to get something working. You simply needed to render into a window, and move that onto the HMD desktop and disable all of the basic features like deformation etc... (doing them yourself) to get things going. But this wasn't the state of the art, even at that time. So we went a step further.

We started by hooking our graphics stack directly into the Oculus SDK's initialization. This allowed for us to set up all of the appropriate swap chains and render targets, while also giving us the ability to turn on and off Oculus specific features. We chose to use the Oculus deformation meshes for instances rather than our own since it offloaded 1 more phase out of our graphics pipeline that could be done on another thread in the background without us having to pay the cost.

That got us pretty far, but we still had a concept of using a canvas in order to get the WebGLRenderingContext back. We then told the device about this canvas and it effectively swapped over to "VR" mode. Again, this was way different than the existing frameworks that relied on using the final textures from the canvas to present to the HMD. This extra step seemed unnecessary so we got rid of it and had the device give us back the WebGLRenderingContext instead. This made a LOT of sense. This also allowed the later movement off to the web worker thread ;-) So we killed two birds with one stone. We admitted that the HMD itself was a device with all of the associated graphics context, we gave it its own rendering context, textures and a bunch of other state and simply decoupled that from the browser itself. At this point you could almost render headless (no monitor) directly to the HMD. This is not easy to debug on the screen though, but fortunately Oculus had a readback texture that would give you back the final image presented to the HMD, so we could use that texture and make it available, on demand, off of the device so we only paid the perf cost if you requested it.

At the time, this was the best we could do. We were basically using WebGL to render, but we were using it in a way that made us look a lot more like an Oculus demo, leaning heavily on the SDK. The rendering quality was as good as we could get at the time, without us going into software level tweaks. I'll talk about some of those ideas (in later posts), which have now been implemented I believe by Oculus demos and some industry teams, so they won't be anything new, but can give you a better idea of why the WebVR API has to allow for innovation and can't simply be a minimal extension of existing DOM APIs and concepts if it wants to be successful.

Improvements Since the Hackathon

We were doing this for a Microsoft Hackathon and our target was the Internet Explorer 11 browser. You might notice IE and later Microsoft Edge doesn't have any support for WebVR. This is both due to the infancy of the technology, but also due to their not being a truly compelling API. Providing easy access to VR for the masses sounds great, but VR requires a very high fidelity rendering capability and great performance if we want users to adopt it. I've seen many times where users will try VR for the first time, break out the sick bag, and not go back. Even if the devices are good enough, if the APIs are not good then it will hold back the adoption rates for our mainstream users. While great for developers, WebVR simply doesn't set, IMO, the web up for great VR experiences. This is a place where we just have to do better, a lot better and fortunately we can.

The concept of the HMD as its own rendering device seems pretty critical to me. Also, making it have its own event loop and making it available on a web worker thread also go a long way to helping the overall experience and achieving 120fps rendering sometime in the next two years. But we can go even further. We do want, for instance, to be able to render both 3D content and 2D content in the same frame. A HUD is a perfect example. We want the devices to compose, where possible, these things together. We want to use time warp when we can't hit the 120fps boundaries so that there is a frame that the user can see that has been moved and shifted. Let's examine how a pre-deformed, pre-composed HUD system, would look using our existing WebVR interfaces today if we turned on time warp?

We can use something like Babylon.js or Three.js for this and we can turn on their default WebVR presentation modes. By doing so, we get a canonical deformation applied for us when we render the scene. We overlay the HUD using HTML 5, probably by layering it over top of the canvas. The browser, then snapshots this and presents it to the HMD. The HUD itself is now "stuck" and occluding critical pixels from the 3D scene that would be nice to have. If you turned on time warp you'd smear the pixels in weird ways and it just wouldn't look as good as if you had submitted the two textures separately.

Let's redo this simulation using the WebGLRenderingContext on the device itself and having it get a little bit more knowledge about the various textures involved. We can instead render the 3D scene in full fidelity and commit that directly to the device. It now has all of the pixels. Further, it is NOT deformed, so the device is going to do that for us. Maybe the user has a custom deformation mesh that helps correct an optical abnormality for them, we'll use that instead of a stock choice. Next we tell the device its base element for the HUD. The browser properly layers this HUD and commits that as a separate texture to the device. The underlying SDK is now capable of time warping this content for multiple frames until we are ready to commit another update and this can respond to the user as they move their head in real-time with the highest fidelity.

You might say, but if you can render at 120fps then you are good right? No, not really. That still means up to 8ms of latency between the IMU reading and your rendering. The device can compensate for this by time warping with a much smaller latency by sampling the IMU when it goes to compose the final scene in the hardware. Also, since we decomposed the HTML overlay into its own texture, we can also billboard that into the final scene, partially transparent, or however we might want to show it. The 3D scene can shine through or we can even see new pixels from "around" the HUD since we didn't break them away.

Conclusion

Since our hack, the devices have changed dramatically. They are offering services, either in software or in the hardware that couldn't have been predicted. Treating WebVR like a do it all shop and then splash onto a flat screen, seems like its not going to be able to take advantage of the features in the hardware itself. An API that instead gets more information from the device and allows the device to advertise features that can be turned on and off might end up being a better approach. We move from an API that is device agnostic, to one that embraces the devices themselves. No matter what we build, compatibility with older devices and having something "just work" on a bunch of VR devices is likely not going to happen. There is simply too much innovation happening and the API really has to allow for this innovation without getting in the way.

Our current WebVR specifications are pretty much the same now as they were when we did our hack in 2014. Its been almost 2 years now and the biggest improvement I've seen is the usage of the Promise capability. I don't know what advances a specification such as WebVR, but I'm betting the commercial kit from Oculus coming out in 2016 will do the trick. With a real device gain broad adoption there will likely be a bigger push to get something into all of the major browsers.

Saturday, December 12, 2015

Retrospective: 2015 as a Tech and Career Mentor

2015 was a pretty great year mentoring for me. After several years of offering advice to others, I now have a small, consistent group of individuals that I work with. Each in a different way, important to them, but probably more important to me.

My Background and Thoughts On Mentoring


I generally mentor people at Microsoft who are experiencing some issue in their career growth. Sometimes the issues are things they can change, sometimes the environment needs to change, sometimes it is technical, sometimes it relates to missing skill-sets and sometimes its personal issues. No matter what the cause, as an external observer, I can either directly help them or I can find someone else who can. I can be supportive, uplifting or give them the hard talk about something they need to change. Whatever they need. As a mentor I'm adaptable to their situation.

Most of my mentoring sessions consist of drawing parallels between road-blocks that I've faced and how I overcame them. By explaining through a real scenario it usually becomes much clearer to others that they have options. More increasingly though I find that my experiences aren't the same as theirs. This was a great mentoring realization for me, people have unique problems, they have unique strengths and weaknesses. My "solution" isn't their "solution". Instead I need to offer many viewpoints and using the combined experiences of all of my mentees, not just my own experiences. Intelligent people tend to solve their own problems once they start to realize how many options they have. Persistent problems tend to remain so long as your options are limited.

Helping someone find their options isn't limited to those with wisdom and experience either. This is my pitch to those who think they are too young mentor others. Mentoring is just as much about listening, learning, problem solving and being supportive as it is about learning purely from experience. If you are giving someone the solutions to their problems, it isn't really a mentoring session anymore. You are trying to make them better, help them to find their own solutions, not demonstrate how easily and intelligently you could resolve their issue.

Mentoring doesn't have to be formal. It could be one friend to another. It doesn't have to be long or short. A single meeting or many years worth of them, both outcomes are acceptable. It could be a regular weekly meeting or it could be a couple of random days in the year. It doesn't have to be scheduled at all, for instance, getting feedback on a potentially career limiting email from your mentor can often be the difference between success and failure.

We need more mentors thinking about diversity. Rico Mariani recently started an inspiring series of thoughts and threads (well more than that, he started a scholarship) focused on women in tech. He immediately reached out to everyone on his team and made it very clear, minorities in tech, regardless of which minority, need equal opportunities and access to strong mentors. So I'm going to do some more thinking on this. I think it is very hard for someone to reach out and accept mentoring and the bigger the differences which exist (level, race, gender, social status, personalities) the harder it is for someone to take that first step. Its probably time to reach out to potential mentees, a lot more often, and accept the rejections in favor of helping the diversity imbalances.

My Mentees for 2015


This won't focus solely on 2015, since one of my most important roles in mentoring occurred during 2014 just following the large Microsoft lay-offs. My list of mentees (anonymous) and how we spend our time.

  1. The Entrepreneur - I once owned my own company and was independently successful for a few years during my 20's. This experience taught me quite a bit. One of my mentees is at a point in his life where he wants to develop a small company for himself and so we have sessions on how to transition from your day job to being a founder.
  2. The Classic - Before the Microsoft mentoring site went down I got my last mentee referral for what would become my most normal mentoring setup. We meet on his schedule and the topics are around how to manage his career development, communications, timing and positioning. Sorry, I'll expand more on that in a bit ;-)
  3. The Switcher - Within the past few months I've started working with someone who is unhappy in their position on their current team. They had already made a decision that improving their situation was not the right move, that they needed a change. For this person, I'm helping them improve their technical interviewing skills.
  4. The LayOffs - When the large scale Microsoft lay-offs happened I realized quickly that many intelligent and capable people were going to start having a very hard time in the job market. It is easy to have your skills atrophy when you work in a mature company like Microsoft. For this group, I got to learn the challenges of mentoring in a group setting.
  5. The Boxer - Everyone needs change and growth opportunities. This person had been working in the same area for too long and was feeling constrained, as if they had been put in a box for safe keeping. When in this position, having options and knowing what you can and should be able to ask for to improve your situation is key.
With the exception of the group training I did when the lay-offs occurred, my diversity scorecard has me gender biased (zero female mentees) but doing okay otherwise (I have mentees of many races, though they are all male). During the lay-offs, I was able to help multiple female engineers reset their skills and I'm happy that a majority of them were able to quickly find a position that was as good or better than what they left.

You Get More than you Give


My mentees are consistently conscious of my time. They don't want to waste it, since they view my time as important, and they view me as someone who maximizes the value of my own time. Guess what, they are right! I do value my own time and I can say hands down the time I spend mentoring is far more important than anything else I could be doing. So the secret to mentoring is that you spend time doing it, its that you can help yourself and others at the same time, in ways that will help you in the future.

After all, how do you become better at anything? You PRACTICE it. Mentoring gives you the opportunity to be a great listener and a strong communicator. It helps you to organize your experiences and arguments through self-reflection that you are able to share with someone else. Your mentee can sometimes reverse the experience on you and potentially point out something you missed when you dealt with a situation that you are using as an example. These moments, for me, have always led to improvements in my own career.

Your mentees will also have experiences that differ from your own. This will help you increase your own understanding of diversity and help provide you with insights to improve how you integrate and communicate with your co-workers. Perhaps a certain cultural bias prevents someone from speaking up, and you are able to experience this through your mentee, then it can make you more sensitive to this issue.

The last bit for me is around building your moral standards. If I'm not helping people, I'm not happy. There is a certain amount of human interaction and empathy that goes into the entire process that leaves me feeling like I've done something, bigger than myself. I've reached out and helped someone else. You can get this many ways, through teaching, mentoring or volunteering, so I challenge you to do all 3 to maximize your own benefits ;-)

Reaching Out


You've read everything above and you are thinking, "Sign me up!" So how do you start? I don't have 40 years of experience doing this, but I do have enough to know one thing. Not everyone realizes they need a mentor and pointing it out often isn't enough to convince them. Mentoring is a two-way street and even though I have time to give my advice it doesn't mean that others have the time to hear it. It is an exceptional case when someone walks up to you and ASKS you to mentor them. This almost never happens. 

So you have to put yourself out there and let people know you are ready to help. When approaching a new mentee with an offer, do so privately. Not everyone sees having mentors as a strength and some see it as a sign of weakness. A powerful mentor relationship can also be seen as a threat or even favoritism. This is normally irrational and unhealthy thoughts by others but it is something to be aware of.

By the numbers, the amount of people that I will approach is quite small. I'm evaluating a lot of different outcomes and consequences. As a mentor I should do a MUCH better job of approaching people. 

Once approached, the acceptance rate is probably 50/50. So be prepared to be rejected. This is key, never take someone else's rejection of your mentoring offer as anything against you. There are a lot of factors they are evaluating as well, including whether or not they even feel comfortable talking to you. They may not see you as a mentor at all. This is normal. When you get a rejection, kindly let them know that you are available in the future if things change, and move on.

Even out of those who accept, there is a likelihood you will never have your first mentoring session. If you do, there is a high likelihood it will be your last or next to last. Being mentored is not for everyone and maybe the session made them feel awkward. I bet psychologists have a similar track record with new clients. Also, not every problem is one to be fixed. Your mentee may find that it isn't the right time for them to think about and correct the issues impacting them. This is especially true in long term career mentoring.

Once you've gotten started be prepared to adapt yourself to many different types of people and expectations. If they want you to set up a schedule and manage it, then do that. Otherwise, if they want to set it up instead, then let them do that. The majority of the improvement they make will be when they are not interacting directly with you, so figure out how to follow up with them. If they like goals, set them up. The more you mentor the more you'll figure this out, but being prepared will hopefully your first few mentoring encounters much more fruitful.

If in doubt, get a mentor yourself and ask them about mentoring. Its the ultimate mentoring meta ;-)

Diversifying


Diversifying your mentees can be extremely difficult. First, people are more comfortable around those more like them. For this reason, in a work environment, people of similar ethnic backgrounds are likely to stick together. When a team has good diversity this doesn't happen as much, but then the team itself can stick together and becomes a blocker for technical diversity. Since minorities, by definition, are fewer within the groups, trying to find a mentor that matches your minority can be hard.

This leaves our minority populations in tech with a hard problem. First, they can't find high level mentors within their own minority. Second, they may find it hard or challenging to find high level mentors at all. This all assumes that they reach out. If they don't then its perhaps more unlikely that an appropriate mentor would not find them.

This type of diversity bias is what we have to overcome. It isn't really fixed by taking a "fair" approach to the problem. You can't sit back, and expect the odds to play themselves out here. While you may think that by accepting anyone and everyone you'll achieve diversity, you won't. You have to break the "fairness" and make a point of being unfair in your selection of mentees and how you reach out. Make a point of prioritizing for ethnic and gender diversity when you fill up your time initially and then worry about filling out the majority slots later. Trust me, they won't be nearly as hard to fill.

For my part, I'm going to commit to filling out my diversity card in the upcoming year of 2016. I've already tweeted that I have some mentoring capacity. I'm reserving that space for women and other minorities in tech. I will be proactive and start reaching out to potential mentees and work with other mentors in my network in case they know of people actively looking. I am going to increase my efforts towards STEM education for minorities though I don't yet have a plan for how I'm going to do that. If anyone has ideas let me know ;-) I'll make sure to be more vocal about my efforts as well in case it can inspire or inform others. Bringing awareness is a key step in solving any problem.

Finally a challenge to all you would be mentors. If you aren't sure, and you want to talk to someone about becoming a mentor, I hear you can call anyone on this thing called the Internet. I'd be happy to talk with you about my experiences and answer any questions you might have before you get started.