Showing posts with label Software Engineering. Show all posts

Sunday, September 18, 2016

WebVR - 95% Web and 5% VR - That's a GOOD thing

When a new technology comes out there is rarely a singular influence. Technologies today are highly connected, almost always derivative and rarely kept private. These separate influences are good for the development of a diverse technology that is general enough to solve large scale problems and powerful enough to create experiences that are truly unique. I think WebVR is one of these truly diverse technologies, but only when you take full advantage of everything we've already developed and influence the many new technologies yet to be developed for the web.

The marriage of Web and VR tends to be very one sided. Most of the bits and pieces you need to build a modern VR application already exist in the web platform. Compare and contrast this with VR technology stacks, where people are still building their own event loops, threading models, memory models, etc... The VR space is the wild-wild west and this means there is a lot of room for cowboys, experimentation, lots of wheel reinvention factories and that's just the tip of the iceberg.

Leading the way in this round are the entertainment and gaming applications. This is where you tend to get highly specific experiences (tell me the last time the UX on your media player was considered innovative and didn't look like a 1990's VCR) or uniquely creative experiences that tickle your gaming itch. In this world nothing is uniform and everything is a new assault on your senses. A small and very influential group of people love this space and they become our pioneering group. They start to build and design the future while the calm majority sits back and waits for the technology to come to them.

The Web is this calm majority. A highly stable, yet quickly moving substrate of APIs and technologies, spread across billions of devices, with majority stakes already placed into many excellent and well established design principles and practices. Basically, you can get stuff done on the web. Its a productive place. Its usable by the masses, but configurable to the needs of the experts. It can do almost anything. Almost. What it can't do is blast you in the amazeballs with pixels of light that transport you to a far away world. Yet.

Given the existing influences there are two ways to get to the goal. The first is design it all again and make people move to the VR space. A sort of manifest destiny approach to technology. The second is to embrace the existing web for all of its great capabilities and even many of its terrible ones. Evolve the technology and let everyone adopt at their own pace.

The later has a much better chance of working. Not because its the best approach for VR but because it allows us to retain as much of the power of the existing web as we can without resetting the entire ecosystem. Think back just 5 years and look at the rapid adoption of new and pervasive technologies on the web to understand its power and importance. Here are a couple that I was around to watch blossom.

Canvas - Yeah, it was basically just entering adoption 5 years ago and is now a premiere API for 2D bitmapped graphics creation.
WebGL - Allowing for truly customized graphical experiences in 2D, 3D, Image Processing and its the basis for WebVR (without WebGL there is no WebVR).
WebAudio - Giving full access to sound effects and mixing for the first time on the web.
Web Workers - Half way between threads and tasks giving access to extra cores
Media Capture - Camera, Microphone, Video
WebRTC - Adaptive media transfer for person to person interactions
Fetch - A real networking primitive for highly configurable requests
Service Workers - A brilliant empowerment of the network stack without breaking existing networking expectations of the long tail web

All of these web technologies were driven by a collective set of requirements from a highly diverse group of interested parties. Each technology did not have to try and incubate and grow on its own. It could instead live side by side with many others. When used in isolation a technology like Canvas would have lost to the many, much better solutions in the market already. But because it was augmentative to your average web page, it had a place.

Those other, better, non-web, technologies live in isolation. They don't understand JavaScript, garbage collection, browser event loops, browser render loops and how to peacefully co-exist in a broad technology stack. Maybe they are faster and you can do more 2D rendering stuff in them than you can in Canvas, but those incremental differences aren't as powerful as being deployed to billions of devices, over the network, compiled from text to the screen and then printed (yeah, printing is a stretch, but bear with me!) alongside of any arbitrary piece of HTML content.

The capabilities of the web are now increasing at a rate faster than any other platform. This means its ability to provide value is leaps and bounds beyond most other systems. When compared on a single axis the web will likely lose. You can always build a better, more specific widget catering to a specific use case, but the power in the web is its Swiss Army Knife approach. We don't buy a Swiss Army Knife to be the best of any given thing, we instead buy it to be good enough at many things and hopefully fill in missing gaps in our tool chest. Its an astoundingly effective convenience and it takes convenience to enable modern developers to achieve the content depth and content breadth that is needed to appease the masses.

Looking forward there are both VR specific technologies and general web technologies that will make VR a success. Personally, some of the most interesting are being driven, not with VR in mind, but instead extending the web to new levels of capability that help it exceed app frameworks and runtimes and puts it on part with existing operating systems. A few critical to VR, web inspired technologies in my short list are...

Service Workers - VR applications are resource intensive and the first line of defense is going to be prioritization of resource downloads, caching and working around the inefficiencies that exist in legacy file formats.
WebGL - New performance improvements to WebGL and extensions like multi-view or stereo rendering are likely to land with or without WebVR being a consumer since they have general applications outside of VR as well.
HTML to Texture - If this existed today, it would likely be the top used API in WebGL applications. The pre-existing power we have in DOM, HTML/CSS layout and all of our existing content is something we don't just want to use we NEED to use it. We have to fill time in VR and new 3D assets alone are just not compelling enough.

VR will reinvent the web, but it won't happen in a chaotic upswell and massive conversion to fully 3D experiences overnight. It can't. There is too much value in our existing base. VR will reinvent the web in subtle ways such as spearhead better graphics APIs, faster JavaScript run-times tuned to our frame based patterns and parameterized GC behavior that will avoid frame jank. VR will push hard on solutions to the HTML to Texture problem so we can bring our web content into our 3D worlds. VR will redefine what security means when it comes to both new and existing APIs. Most importantly VR will define a small subset of new APIs (arguably 5% is too high from my title) that are critically important to the medium itself. Those APIs will work seamlessly with the existing web platform enabling our iterative transformation from the existing 2D informaton packed web to a mixed 3D/2D model allowing developers to pick the presentation mechanisms of their choice.

I hope that VR won't become a parallel stack to the web. If it did then I think we've failed hard. There is no need to reinvent all of the past 20 years in productivity and capabilities increases we've gotten from the web. There is no reason to relive all of the same mistakes and relearn all of the best practices. More importantly there is no reason that the existing web can't continue to evolve and get better as a collaborative partner, or more accurately an elder sibling allowing WebVR to shine in ways that it is best designed rather than spending endless cycles trying to replace all of the web experiences we already have with VR versions of the same.

Sunday, August 7, 2016

How I Used the Career Triforce to Change my Job

When I was leaving Microsoft, on my last day, I got a lot of questions around why I was leaving. I was a lynch pin of sorts in the team. I was positioned very well to get impactful work. Highly networked. Very happy with my day to day. Reporting to one of the best managers/mentors on the planet.

I was on a career trajectory for Microsoft that was almost unreal. Averaging better than a promotion every 2 years with no slow down just because I was cross bands eventually ending up at the top of the Principal ladder. Compensation was great when compared relatively across other MS employees (will not be discussing this further). Everything seemed to be going amazingly well.

What did I say to everyone? Well, I said there are three areas (these are not my own, but taken from a book on career advice that I found exceptionally relevant) on which you should judge your current career. You should start by looking at your Job. Do you love it? Are you able to make an impact? Are you passionate about what you are doing? All off my answers here would be positive. The first part of my TriForce is complete

Next you look at your Manager. This is the singular individual who has the most control over your happiness and your career path in most companies. Ask yourself questions like, are you aligned with your boss? Does your support you when you are about to fail? Does your boss accentuate your good qualities and help you improve on your bad qualities? Can your boss act as your manager, your friend, a leader and a mentor? Well, #FML, it turns out I just found the second part of my TriForce.

Lastly I said you look at your Team. For me, at my level this meant looking at my immediate team, the entire Edge WPT team and then finally up to the Windows organization as a whole. Those are the scales at which I had impact at Microsoft. When looking at all levels of the team you ask questions like, do I like working with these people? Are the politics manageable or are they over the top? Does the team exercise trust? Does the team exercise transparency? As I worked from my local team up to Windows the third component in my TriForce starts to crack a little bit, maybe it has a little bit less luster.

However, when I consider the most stress I faced while making my decision to change jobs, it came down to the people. I loved the people and I felt like we created an almost extended family like support system for one another. I wasn't concerned about my projects that wouldn't get done if I left. Instead I was worried about the people that I worked with on a daily basis that I could see growing and becoming amazing engineers in their own rights. I was worried there wouldn't be enough people left infusing positive energy into the team on a daily basis to keep the morale up. I was worried that I was failing my team by leaving. That's when you realize, yeah, you have a great team. There may be some scuffs on that TriForce shard, but its still shining just as brightly as the other 2. My TriForce was complete.

My Answer

Okay, so if I already had the TriForce what kind of answer could I give everyone then? Why was I leaving? This is when I learned something that I had learned earlier in my career, but it took another 11 and a half years to discover it again. Once you've built a TriForce there isn't as much exponential growth in your future and mostly you just end up making incremental improvements. You spend more time doing the things you know, rather than learning new things. Your awesomeness starts to atrophy. You rarely feel the stress of a complicated and new situation. You rarely push your boundaries.

That isn't to say there aren't still moments like that. There certainly are. They just aren't as often and so growth tends to become linear and plateau increasingly frequently.

You also don't know if you have the skills to build another TriForce. I spend a lot of time mentoring and I often reach out for new mentees. My dream is that they too can achieve their TriForce and that I'm an enabler for that. I provide experience and strategies for working with difficult situations and to figure out why some aspect of their career is not shining or working well with the rest. Are my recommendations good? Do I have enough experience to offer the types of career advice that they need? If I put myself in their shoes, with their knowledge, and took on their risk would I be able to replicate my experience?

That is an important question for me. Doing something once can be dumb luck. It doesn't mean you can make it happen. It means it happened and perhaps it has something to do with you. But perhaps you are unaware of the actual forces of nature that brought it into being and it turns out it had nothing to do with you. That is a scary thought. Am I successful because of me? Or am I successful because of a random set of circumstances that I only manipulated superficially.

This led me to my answer to the team, paraphrasing a bit I finally said, "When you make a career change you should look at your job, boss and team. If they are all great then you are probably on the right track. When I look at myself, I have a TriForce in these three areas. Everything is amazing. So I had to use other measures to figure out my future. Specifically to follow my passions in VR and to see if I can build my second TriForce."

Maybe everyone thinks that is bullshit and will point to other factors in my decision making. I had a lot. Compensation, family, location and friends were all additional complications. However, I can say after tons of cross comparison Excel tables, almost everything zero'ed out between Oculus and Microsoft. I was only left with a very real and pressing question, one that Brendan Iribe asked me during my process. Do you want to think about VR all day, every day? That was his pitch to me. An offer to work on a technology that would change the future with all of my insight and passion. And when my answer to that simpler question is, "Fuck Yeah!" you can see how my explanation to my former team was given in honesty.

Passion

Passion isn't on the TriForce, but it is part of how you feel about your Job, how you are supported by your Boss (does he let you run with your wacky ideas?) and how your Team adapts to a changing society and marketplace. That makes it is an integral component in all of them. When you are passionate you'll find that you can't sleep because you are still solving problems. You spring out of bed every morning to rush to work. You let everyone know what you are working on and why they should care. You see clearly how what you are doing is going to change the future, improve lives, connect you more closely to your friends/family and make the world a better place for everyone to live.

When I saw the opportunity to lend my passion and devote all of my ability to launching the VR revolution I couldn't pass it up. VR has to potential to change the way that we think about education, jobs and entertainment. It literally allows us to redefine space itself and transform a living room into an anything room. I didn't jump ship to VR in the beginning because my expertise wasn't needed yet. But now is the time to scale and build platforms for VR that extend to millions. This is where I thrive as a developer. This is where the web thrives as a platform for scale and accessibility. This is the time to deeply investment my time and effort and build my second career TriForce. With news like the HTC VR alliance offering 10 billion in VC capital to development of VR content and experiences, I think I'm in good company thinking this way.

My Final Advice

Most people in their careers I find are working on some aspect of building their TriForce, probably for the first time. I know because I mentor some amazing developers and almost always they have some sort of hang-up in one of these areas and they haven't yet figured out how to completely self-diagnose themselves when things are going wrong.

For this reason I think evaluating your job, boss and team is a great way for you to figure out two things. For instance, do you need to improve something in your current career in order to elevate yourself to the next level. You may find that your job sucks for some reason, but it is within your control to make it not suck. You should do that. The easiest thing to change is yourself.

If you evaluate these and find that there are things outside of your control that you don't see are going to change then you can use it as a way to figure out how you are going to change your career. Not everything is within your control and often times your happiness or passion requires an environmental change. Perhaps your current job would get you there, but in a time period that is longer than you would like. I always recommended being open and honest during this period just in case you've misread your situation. If you did, then making your situation apparent to your team can sometimes result in the change that you were going to switch jobs for.

If you are sitting on your TriForce though and you are happy with all three you shouldn't close your eyes off to the opportunities that might present themselves. Maintain your marketability and interview skills. From time to time, reach out and do an interview or two and see what else is available both in terms of unique job roles, but also life changing compensation. When an opportunity comes along and you do have to make the big decision, know that it will be stressful. Then calm down, evaluate everything objectively and if it looks like another opportunity to build your next TriForce then perhaps you should go for it!

Tuesday, July 19, 2016

In True Chef Style - PokemonGO three ways

When Pokemon GO released on July 6th I naturally thought everyone would just get it. Pokemon has always been a huge franchise, location aware mobile applications are at the top of the charts and anything social takes off with the millenials. There are numerous location aware and social applications that are pervasive, that we use nearly every day, but they all have a time of use. You use Yelp when you are hungry, Flixster when you want to watch a movie, Google Maps when you are lost. Now, instead of a time, activity specific application we plant a game. A game that has no time limits, nothing you have to buy to play more, nothing to control your natural addictive impulses to keep playing. You keep playing because you can and also because you, "Gotta Catch em All!"

Not everyone gets it though. I hear a lot of complaints from people about how players are crowding into areas they previously felt were somehow their own. That kids are out "WALKING AROUND" too much now, shouldn't they be doing something else? That the game is too addictive and people spend too much time playing it. Of course all of this is true, in both the negative and positive sense.

Kids ARE walking around, socializing, getting exercise. Its summer, where else would you want them to be? At home playing video games or watching television? But it isn't just kids. Its teenagers, adults, seniors, pretty much everyone is getting in on it. They are walking their environment, making unsafe places safe with sheer numbers. I learned about more than 150 things in New York that even New Yorkers don't know about. I found Central Perk! I found great graffiti art areas. I spent hours in Central Park in places that I'm told were previously unsafe to be in. All the while logging 2-3 times more miles per day on average than I had in the previous year.

What about those personal places though? Can't I have those back? You know the ones that are now crowded with people playing Pokemon GO? The ones that EVERYONE's tax dollars pay for that are now getting used the way they should be? Well, you can't have those back and everyone should be excited that those places are packed now. Its more likely they'll continue to be funded and improved if the public uses them. They are probably safer too with more people around.

Businesses love it I hear. They get increased foot traffic and sales on high profit items like bottles of water. I bet every street vendor in New York wanted to be on the corners of Central Park with over 500 players per block all getting hungry, needing food, water and battery. The Apple store probably sold out of lightning cables and power bricks.

That addictive quality though, that has to be bad right? Let's flip that for a minute. If you found something that got people addicted to diet and exercise and improved the overall social health, would that be bad? Sure, if everyone became unhealthily skinny it would be terrible. But otherwise you'd be a medical genius that saved the unhealthy and obese US from itself. Hearing kids talk about taking about needing to keep moving to hatch their 5k/10k eggs is great. They could sit next to a couple of lures, but yet they are compelled to walk around to get even more riches. The Egg mechanic is a stroke of genius only tarnished by Niantic's terrible distance tracking.

If you didn't get it before do you get it now? Its a good thing for people to find a reason to get outside, exercise, socialize and have fun. Everyone is discovering their cities, parks, new shops and helping businesses. You don't even have to play in order to get it, but you have to accept it for all of its positive values.

Okay, so how am I going to break down Pokemon GO three ways? For the next 3 days I'm going to write 3 articles. Each article will target Pokemon GO from a completely different point of observation. As a player, as a technology and gamification expert and finally as a developer. To keep you interested here is a short abstract for each.

Pokemon GO - A Player's Perspective

As a player and someone who excels at just about any game they play I'm going to share some tips and tricks that many people don't realize will help them get more out of the game. Simple tricks like turning off AR mode for less distracting play, how to quick collect pokestops and even how to best spend your money should you choose to do so. This article will provide many statistics and calculations relevant to people who want to power level or maximize their time/value of play.

This article is now live at Pokemon GO - A Players Perspective

Pokemon GO - Gamification for Social Change

For days I've wandered around asking questions, taking photos and listening to Pokemon GO players. I've done so from Redmond, Kirkland, Bellevue (WA state) and also from the hustle and bustle of New York City, NY. I've watched as a shared vocabulary is created and even helped transport some words between the coasts as I share my experiences with others. I've watched it break down social barriers between people who would probably never share a word, yet when prompted for information about Pokemon GO they freely share it (sometimes even without the prompting). I've eavesdropped on conversations where even kids themselves acknowledge that if they weren't out catching Pokemon they'd be at home by themselves doing something else, not nearly as exciting or social.

I hope to distill these experiences into something that we can take action on as a community of gamification experts. While Pokemon GO is doing nothing out of the ordinary and offers no unique and new gamification concepts, it does prove out a large set of techniques that have not been shown to work at this large scale before.

Pokemon GO - As an Engineer I'm Dying Inside

When breaking down Pokemon GO as an engineer who builds games, works on large software projects, scalable server systems and even mobile apps from time to time, I literally pitch a fit every time the game hangs. The game is the thinnest of facades on top of a synchronous networking stack using the most simplistic of tricks to hide this reality. Every time I hang downloading the same image I've downloaded before that wasn't cached, I curse the developer who decided to invoke Knuth's ruling on premature optimization when just the simplest of caches would have done wonders.

This game was clearly an MVP (minimum viable product) and it proved its worth quickly. Then the developers decided that on such a fragile and failing system they'd roll out to more countries (who does that?). I can only guess they wanted to set some world record on number of installs and DAUs (daily active users). Somehow, even through all of this failure, they put enough duct tape on the servers to keep people happy enough to keep playing. They taught an entire generation of users how to swipe up to kill an app in the process, but people keep coming back for more so they clearly have a winner and hopefully will figure out how to redesign and scale the back end.

Not knowing the immense loads this game was really under, I won't claim that all of this could have been avoided. Most likely there are some aspects of even a great architecture that would have failed. There are lessons to be learned though and maybe I can distill some by analyzing how the game itself responds and recovers from failure (or fails to recover from failure).

Tuesday, December 29, 2015

Progress Towards a Fully Tested Web API Surface Area

Back in September I was amazed by the lack of comprehensive testing present for the Web API surface area and as a result I proposed something akin to a basic surface level API suite that could be employed to make sure every API had coverage. Since that time a lot of forward progress has occurred, we've collected a bunch of additional data and we've come up with facilities for better ensuring such a suite is complete.

So let's start with the data again and figure out what we missed last time!

Moar Data and MOAR Test Suites

Adding more test suites is hopefully going to improve coverage in some way. It may not improve your API surface area coverage, but it may improve the deep usage of a given API. We originally pulled data from the following sources:

Top 10k sites - This gives us a baseline of what the web believes is important.
The EdgeHTML Regression Test Suite - By far the most comprehensive suite available at the time, this tested ~2500 API entry points well. It did hit more APIs, but we excluded tests which only enumerated and executed DOM dynamically.
WebDriver Enabled Test Suites - At the time, we had somewhere between 18-20 different suites provided by the web community at large. This hit ~2200 APIs.
CSS 2.1 Test Suite - Mostly not an OM test so only hit ~70 APIs

Since then we've added or improved the sources:

Top 100k sites - Not much changed by adding sites.
Web API Telemetry in EdgeHTML - This gave us a much larger set of APIs used by the web. It grew into the 3k+ range!! But still only about 50% of the APIs we export are used by the Web making for a very large, unused surface area.
DOM TS - An internal test suite built during IE 9 to stand up more Standards based testing. This suite has comprehensive depth on some APIs not tested by our other measures.
WPT (Web Platform Tests) - We found that the full WPT might not be being run under our harnesses, so we targeted it explicitly. Unfortunately, it didn't provide additional coverage over the other suites we were already running. It did end up becoming part of a longer term solution to web testing as a whole.

And thanks to one of our data scientists, Eric Olson, we have a nice Venn Diagram that demonstrates the intersection of many of these test suites. Note, I'm not including the split out WPT tests here, but if there is enough interest I can probably try to see if we can try a different Venn Diagram that can include more components or rework this one and pull out an existing pivot.

Since this is so well commented already, I won't go into too much, but I'll point out some key data points. The EdgeHTML DRTs have a lot of coverage not present in any public suites. That is stuff that is either vendor prefixed, MS specific or that we need to get into a public test suite. It likely requires that we do some work, such as conversion of the tests to test-harness.js before that happens, but we are very likely to contribute some things back to the WPT suite in the future. Merry Christmas!?!

We next found that the DOM TS had enough coverage that we would keep it alive. A little bit of data science here was the difference between deleting the suite and spending the development resources to bring it back and make it part of our Protractor runs (Protractor is our WebDriver enabled harness for running public and private test suites that follow the test-harness.js pattern).

The final observation to have is that there are still thousands of untested APIs even after we've added in all of the coverage we can throw together. This helped us to further reinforce the need for our Web API test suite and to try and dedicate the resources over the past few months to get it up and running.

WPT - Web Platform Test Suite

In my original article I had left out specific discussions of the WPT. While this was a joint effort amongst browsers, the layout of the suite and many aspects of its maintenance were questionable. At the time, for instance, there were tons of open issues, many pull requests, and the frequency of updates wasn't that great. More recently there appears to be a lot of new activity though so maybe this deserves to be revisited as one of the core suites.

The WPT is generally classified as suite based testing. It is designed to be as comprehensive as possible. It is organized by specification, which arguably means nothing to web developers, but does mean something to browser vendors. For this reason, many of the ad-hoc and suite based testing which was present in the DRTs, if upgraded to test-harness.js, could slot right in. I'm hopeful that sometime after our next release we are also able to accompany it with an update for WPT that includes many of our private tests so that everyone can take advantage of the collateral we've built up over the years.

Enhancing the WPT with this backlog of tests, and potentially increasing coverage by up to ~800 APIs, will be a great improvement I think. I'm also super happy to see so many recent commits from Mozilla and so many merge requests making it back into the suite!

Web API Suite

We still need to fix the API gap though and so for the past couple of months we've (mostly the work of Jesse Mohrland, I take no credit here) been working on a design which could take our type system information and automatically generate some set of tests. This has been an excellent process because we've now started to understand where more automatically generated tests can be created and that we can do much more than we originally thought without manual input. We've also discovered where the manual input would be required. Let me walk through some of our basic findings.

Instances are a real pain when it comes to the web API suite. We have about 500-600 types that we need to generate instances of. Some may have many different ways to create the instances that result in differences of behavior as well. Certainly creating some elements will result in differences in their tagName, but they may be of the same type. Since we are an API suite we don't want to force each element to have its own suite of tests, instead we focus on the DOM type and thus we just want to test 1 instance generically and then run some other set of tests on all instances.

We are not doing the web any service by only having EdgeHTML based APIs in our list. Since our dataset is our type system description, we had to find a way to add unimplemented stuff to our list. This was fairly trivial, but hasn't yet been patched into the primary type system. This has so many benefits though. Enough that I'll enumerate them in a list ;-)

We can have a test score the represents even things we are missing. So instead of only having tests for things that exist, we have a score against things we haven't implemented yet. This is really key towards having a test suite not just useful to EdgeHTML but also to other vendors.
True TDD (Test Driven Development) can ensue. By having a small ready-made basic suite of tests for any new APIs that we add, the developer can check in with higher confidence. The earlier you have tests available the higher quality your feature generally ends up being.
This feeds into our other data collection. Since our type system has a representation of the DOM we don't support, we can also enable things like our crawler based Web API telemetry to gather details on sites that support APIs we don't yet implement.
We can track status on APIs and suites within our data by annotating what things we are or are not working on. This can further be used to export to sites like status.modern.ie. We don't currently do this, nor do we have any immediate plans to change how that works, but it would be possible.

Many of these benefits are about getting your data closer to the source. Data that is used to build the product is always going to be higher quality than say data that was disconnect. Think about documentation for instance which is built and shipped out of a content management system. If there isn't a data feed from the product to the CMS then you end up with out of data articles for features from multiple releases prior, invalid documentation pages that aren't tracking the latest and greatest and even missing documentation for new APIs (or removing documentation for dead APIs).

Another learning is that we want the suite to be auto-generated for as many things as possible. Initial plans had us sucking in the tests themselves, gleaning user generated content out of them, regenerating and putting back the user generated content (think custom tests written by the user). The more we looked at this, the more we wanted to avoid such an approach. For the foreseeable future we want to stop at the point where our data doesn't allow us to continue auto-generation. And when that happens, we'll update the data further and continue regenerating.

That left us with pretty much a completed suite. As of now, we have a smallish suite with around 16k tests (only a couple of tests per API for now) that is able to run using test-harness.js and thus it will execute within our Protractor harness. It can trivially then be run by anyone else through WebDriver. While I still think we have a few months to bake on this guy I'm also hoping to release it publicly within the next year.

Next Steps

We are going to continue building this suite. It will be much more auto-generated than originally planned. Its goal will be to test the thousands of APIs which go untested today by more comprehensive suites such as WPT. It should test many more thousands of unimplemented APIs (at least by our standards) and also some APIs which are only present in specific device modes (WebKitPoint on Phone emulation mode). I'll report back on the effort as we make progress and also hope to announce a future date for the suite to go public. That, for me, will be an exciting day when all of this work is made real.

Also, look out for WPT updates coming in from some of the EdgeHTML developers. While our larger test suite may not get the resources to push to WPT until after our next release I'm still hopeful that some of our smaller suites can be submitted earlier than that. One can always dream ;-)

Saturday, December 12, 2015

Retrospective: 2015 as a Tech and Career Mentor

2015 was a pretty great year mentoring for me. After several years of offering advice to others, I now have a small, consistent group of individuals that I work with. Each in a different way, important to them, but probably more important to me.

My Background and Thoughts On Mentoring

I generally mentor people at Microsoft who are experiencing some issue in their career growth. Sometimes the issues are things they can change, sometimes the environment needs to change, sometimes it is technical, sometimes it relates to missing skill-sets and sometimes its personal issues. No matter what the cause, as an external observer, I can either directly help them or I can find someone else who can. I can be supportive, uplifting or give them the hard talk about something they need to change. Whatever they need. As a mentor I'm adaptable to their situation.

Most of my mentoring sessions consist of drawing parallels between road-blocks that I've faced and how I overcame them. By explaining through a real scenario it usually becomes much clearer to others that they have options. More increasingly though I find that my experiences aren't the same as theirs. This was a great mentoring realization for me, people have unique problems, they have unique strengths and weaknesses. My "solution" isn't their "solution". Instead I need to offer many viewpoints and using the combined experiences of all of my mentees, not just my own experiences. Intelligent people tend to solve their own problems once they start to realize how many options they have. Persistent problems tend to remain so long as your options are limited.

Helping someone find their options isn't limited to those with wisdom and experience either. This is my pitch to those who think they are too young mentor others. Mentoring is just as much about listening, learning, problem solving and being supportive as it is about learning purely from experience. If you are giving someone the solutions to their problems, it isn't really a mentoring session anymore. You are trying to make them better, help them to find their own solutions, not demonstrate how easily and intelligently you could resolve their issue.

Mentoring doesn't have to be formal. It could be one friend to another. It doesn't have to be long or short. A single meeting or many years worth of them, both outcomes are acceptable. It could be a regular weekly meeting or it could be a couple of random days in the year. It doesn't have to be scheduled at all, for instance, getting feedback on a potentially career limiting email from your mentor can often be the difference between success and failure.

We need more mentors thinking about diversity. Rico Mariani recently started an inspiring series of thoughts and threads (well more than that, he started a scholarship) focused on women in tech. He immediately reached out to everyone on his team and made it very clear, minorities in tech, regardless of which minority, need equal opportunities and access to strong mentors. So I'm going to do some more thinking on this. I think it is very hard for someone to reach out and accept mentoring and the bigger the differences which exist (level, race, gender, social status, personalities) the harder it is for someone to take that first step. Its probably time to reach out to potential mentees, a lot more often, and accept the rejections in favor of helping the diversity imbalances.

My Mentees for 2015

This won't focus solely on 2015, since one of my most important roles in mentoring occurred during 2014 just following the large Microsoft lay-offs. My list of mentees (anonymous) and how we spend our time.

The Entrepreneur - I once owned my own company and was independently successful for a few years during my 20's. This experience taught me quite a bit. One of my mentees is at a point in his life where he wants to develop a small company for himself and so we have sessions on how to transition from your day job to being a founder.
The Classic - Before the Microsoft mentoring site went down I got my last mentee referral for what would become my most normal mentoring setup. We meet on his schedule and the topics are around how to manage his career development, communications, timing and positioning. Sorry, I'll expand more on that in a bit ;-)
The Switcher - Within the past few months I've started working with someone who is unhappy in their position on their current team. They had already made a decision that improving their situation was not the right move, that they needed a change. For this person, I'm helping them improve their technical interviewing skills.
The LayOffs - When the large scale Microsoft lay-offs happened I realized quickly that many intelligent and capable people were going to start having a very hard time in the job market. It is easy to have your skills atrophy when you work in a mature company like Microsoft. For this group, I got to learn the challenges of mentoring in a group setting.
The Boxer - Everyone needs change and growth opportunities. This person had been working in the same area for too long and was feeling constrained, as if they had been put in a box for safe keeping. When in this position, having options and knowing what you can and should be able to ask for to improve your situation is key.

With the exception of the group training I did when the lay-offs occurred, my diversity scorecard has me gender biased (zero female mentees) but doing okay otherwise (I have mentees of many races, though they are all male). During the lay-offs, I was able to help multiple female engineers reset their skills and I'm happy that a majority of them were able to quickly find a position that was as good or better than what they left.

You Get More than you Give

My mentees are consistently conscious of my time. They don't want to waste it, since they view my time as important, and they view me as someone who maximizes the value of my own time. Guess what, they are right! I do value my own time and I can say hands down the time I spend mentoring is far more important than anything else I could be doing. So the secret to mentoring is that you spend time doing it, its that you can help yourself and others at the same time, in ways that will help you in the future.

After all, how do you become better at anything? You PRACTICE it. Mentoring gives you the opportunity to be a great listener and a strong communicator. It helps you to organize your experiences and arguments through self-reflection that you are able to share with someone else. Your mentee can sometimes reverse the experience on you and potentially point out something you missed when you dealt with a situation that you are using as an example. These moments, for me, have always led to improvements in my own career.

Your mentees will also have experiences that differ from your own. This will help you increase your own understanding of diversity and help provide you with insights to improve how you integrate and communicate with your co-workers. Perhaps a certain cultural bias prevents someone from speaking up, and you are able to experience this through your mentee, then it can make you more sensitive to this issue.

The last bit for me is around building your moral standards. If I'm not helping people, I'm not happy. There is a certain amount of human interaction and empathy that goes into the entire process that leaves me feeling like I've done something, bigger than myself. I've reached out and helped someone else. You can get this many ways, through teaching, mentoring or volunteering, so I challenge you to do all 3 to maximize your own benefits ;-)

Reaching Out

You've read everything above and you are thinking, "Sign me up!" So how do you start? I don't have 40 years of experience doing this, but I do have enough to know one thing. Not everyone realizes they need a mentor and pointing it out often isn't enough to convince them. Mentoring is a two-way street and even though I have time to give my advice it doesn't mean that others have the time to hear it. It is an exceptional case when someone walks up to you and ASKS you to mentor them. This almost never happens.

So you have to put yourself out there and let people know you are ready to help. When approaching a new mentee with an offer, do so privately. Not everyone sees having mentors as a strength and some see it as a sign of weakness. A powerful mentor relationship can also be seen as a threat or even favoritism. This is normally irrational and unhealthy thoughts by others but it is something to be aware of.

By the numbers, the amount of people that I will approach is quite small. I'm evaluating a lot of different outcomes and consequences. As a mentor I should do a MUCH better job of approaching people.

Once approached, the acceptance rate is probably 50/50. So be prepared to be rejected. This is key, never take someone else's rejection of your mentoring offer as anything against you. There are a lot of factors they are evaluating as well, including whether or not they even feel comfortable talking to you. They may not see you as a mentor at all. This is normal. When you get a rejection, kindly let them know that you are available in the future if things change, and move on.

Even out of those who accept, there is a likelihood you will never have your first mentoring session. If you do, there is a high likelihood it will be your last or next to last. Being mentored is not for everyone and maybe the session made them feel awkward. I bet psychologists have a similar track record with new clients. Also, not every problem is one to be fixed. Your mentee may find that it isn't the right time for them to think about and correct the issues impacting them. This is especially true in long term career mentoring.

Once you've gotten started be prepared to adapt yourself to many different types of people and expectations. If they want you to set up a schedule and manage it, then do that. Otherwise, if they want to set it up instead, then let them do that. The majority of the improvement they make will be when they are not interacting directly with you, so figure out how to follow up with them. If they like goals, set them up. The more you mentor the more you'll figure this out, but being prepared will hopefully your first few mentoring encounters much more fruitful.

If in doubt, get a mentor yourself and ask them about mentoring. Its the ultimate mentoring meta ;-)

Diversifying

Diversifying your mentees can be extremely difficult. First, people are more comfortable around those more like them. For this reason, in a work environment, people of similar ethnic backgrounds are likely to stick together. When a team has good diversity this doesn't happen as much, but then the team itself can stick together and becomes a blocker for technical diversity. Since minorities, by definition, are fewer within the groups, trying to find a mentor that matches your minority can be hard.

This leaves our minority populations in tech with a hard problem. First, they can't find high level mentors within their own minority. Second, they may find it hard or challenging to find high level mentors at all. This all assumes that they reach out. If they don't then its perhaps more unlikely that an appropriate mentor would not find them.

This type of diversity bias is what we have to overcome. It isn't really fixed by taking a "fair" approach to the problem. You can't sit back, and expect the odds to play themselves out here. While you may think that by accepting anyone and everyone you'll achieve diversity, you won't. You have to break the "fairness" and make a point of being unfair in your selection of mentees and how you reach out. Make a point of prioritizing for ethnic and gender diversity when you fill up your time initially and then worry about filling out the majority slots later. Trust me, they won't be nearly as hard to fill.

For my part, I'm going to commit to filling out my diversity card in the upcoming year of 2016. I've already tweeted that I have some mentoring capacity. I'm reserving that space for women and other minorities in tech. I will be proactive and start reaching out to potential mentees and work with other mentors in my network in case they know of people actively looking. I am going to increase my efforts towards STEM education for minorities though I don't yet have a plan for how I'm going to do that. If anyone has ideas let me know ;-) I'll make sure to be more vocal about my efforts as well in case it can inspire or inform others. Bringing awareness is a key step in solving any problem.

Finally a challenge to all you would be mentors. If you aren't sure, and you want to talk to someone about becoming a mentor, I hear you can call anyone on this thing called the Internet. I'd be happy to talk with you about my experiences and answer any questions you might have before you get started.

Sunday, November 29, 2015

A Collection of Principles for Fail-Fast

Previously I blogged about how EdgeHTML has adopted a model of fail-fast by identifying hard to recover from situations and faulting, rather than trying to roll back or otherwise proceed. There we covered a lot of details on the what and even the how. At that time I didn't establish principles and since writing that article I've received a lot of questions from my own developers around when to use fail-fast. So here it is, the principles of fail-fast.

Principle #1 All memory allocations shall be checked and the process fail-fast on failure.

Follow the KISS principle and just assume that all memory conditions (including stack overflows) are leading to a situation in which even if recovered the first, second or third party code will not run correctly.

Exceptions:

Exploratory allocations may be recoverable. Textures are a commonly used resource and are limited in availability. So some systems may have a recovery story when they can't allocated. However, even these systems likely have some required memory, such as the primary texture, and that should be demanded.

Principle #2 Flow control is ONLY for known conditions. Fail-fast on the unknown.

When writing new code favor fail-fast over continuing on unexpected conditions. You can always use failure telemetry to find common conditions and fix them. Telemetry will not tell you about logic bugs caused by continuing on the unexpected path.

A prime example of this is when using enumerations in a switch. Its common practice to put a non-functional default with an Assert. This is way too nice and doesn't do anything in retail code. Instead fail-fast on all unexpected flow control situations. If the default case handles a set of conditions, then put in some code to validate that ONLY those conditions are being handled.

Third party code is not an excuse. It is even more important that you use fail-fast to help you establish contracts with your third party code. An example is a COM component that returns E_OUTOFMEMORY. This is not a SUCCESS or S_OK condition. Its NOT expected. Using fail-fast on this boundary will provide the same value as using fail-fast in your own memory allocator.

Exceptions:

None. If there is a condition that should be recovered then it is a KNOWN condition and you should have a test case for it. For example, if you are writing code for the tree mutations caused by JavaScript operations on the Browser DOM, then there are known error recovery models that MUST be followed. No fail-fast there because the behavior is spec'ed and failure must leave the tree in a valid state. Maybe not an expected state for the developer using the API, but at least spec'ed and consistent.

Principle #3 Use fail-fast to enforce contracts and invariants consistently

Contracts are about your public/protected code. If you expect a non-null input, then enforce that with a fail-fast check (not much different from the allocation check). Or as before with enumerations, if you expect a certain range then fail-fast in the out of bounds conditions as well. When transitioning from your public to your private code you can use a more judicious approach since often times parameters have been fully vetted through your public interface. Still, obey the control flow principles.

For variable manipulation within your component, rely on checks for your invariants. For instance, if your component cannot store a value larger than a short, then ensure that down casts aren't truncating and fail if they do. This classically becomes a problem between 32 and 64-bit code when all of a sudden arbitrary code can manipulate values larger than originally designed for.

While a sprinkling of fail-fast around your code will eventually catch even missed invariant checks, the more consistently you use them, the closer your telemetry will be able to point you to the sources of failure.

Exceptions:

None. Again, if you find a condition hits too often, then you'll be forced to understand and supply a fix for it. Most likely a localized fix that has little or no impact on propagating errors to other surrounding code. For instance, truncation or clamping can be a designed (and perfectly acceptable) part of the component depending on its use case.

Principle #4 If you are unsure whether or not to use fail-fast, use fail-fast

This is the back-stop principle. If you find yourself not able to determine how a component will behave or what it might return (this can happen with black box APIs or even well documented, but closed APIs) then resort to fail-fast until you get positive confirmation of the possibilities.

As an example some COM APIs will return a plethora of COM error codes and you should not arbitrarily try to recover from the various failures or figure out which codes can and can't be returned. By using fail-fast and your telemetry pipeline you'll be able to find and resolve the sets of conditions that are important to your application and you'll have confidence that your solutions fix real world problems seen by your users.

Oddly, this is even more critical when working on pre-release operating systems, services or APIs. Often the introduction of a new error code or the increase in a specific set of error codes is indicative of an OS level bug. By tightening the expectations of your application on a specific API surface area you become one of the pinning tests for that API. While APIs do change, having unexpected or new behavior propagate through your application in unexpected and unintended ways is a bug waiting to happen. Better to crash and fix than to proceed incorrectly.

Exceptions:

Yes, of the fail-fast variety please ;-)

Sunday, November 22, 2015

From State of the Art to the State of Decay

I'm constantly in meetings where we are discussing 10+ year old code that "smart people wrote" so it must be fairly good. It seems the people take offense when you call 10 year old code bad names especially if they were somehow connected.

Let me start with myself. I've been working on the Internet Explorer code base for over 10 years, having started just before IE 7 Beta 1 was shipped. Instead of referring to other people's terrible code, I'll refer to my own terrible code. I'll also talk about our thinking on "State of the Art" at the time and how that leads to now where the same code which was the bees knees is now just a skeleton of its former self having suffered through the "State of Decay".

State of the Art in 2005

We have a legacy code base that has been dusted off after many years of neglect (IE 6 had been shuttered and IE 7 was the rebirth). The ultimate decay, many years of advancements in computer science, none of which were applied to one of the largest and most complex code bases you could conceive of. We had some new tools at the time, static analysis tools, which were very good at finding null pointer dereferences alongside actual security bugs. We had tens of thousands (maybe even hundreds of thousands) across all of Windows that needed to be investigated, understood and fixed.

Our ideas of state of the art at this time were that the code shouldn't crash. Ever. Reliability in our minds was error recovery and checking null's all over the place and protecting against the most malicious of external code since after being shuttered we had no idea who or how many consumers of our code there was. Fixing any single line of code presented us with a challenge. Did we break someone? In this state every line of code could take hours to days to understand the ramifications of adding even a single null check and bail-out. And spend hours and days we did.

To the extent that I personally introduced hundreds, perhaps thousands of null pointer checks in the code. After all, this was the state of the art at the time. We wanted stability, reliability and we wanted most of all to shut down those rare cases where the tool was pointing out an actual security hole as well. Of course I thought I was doing the right thing. My code reviewers did too. We all slept well at night getting paid to put in null pointer checks to work around the fact that our invariants were being broken constantly. By doing all of this work we were forever changing our ability to use those crashes we were fixing to find actual product bugs. "State of the Art" indeed. So how should those conversations about 10+ year old code be going? Should I really be attached to those decisions I made 10 years ago or should I evolve to fix the state of decay and improve the state of the art moving forward?

State of Decay in 2015

Those decisions 10 years ago have led to a current state of decay in the code. Of those thousands of null pointer checks, how many remain? When new developers look at the code, what should they glean from the many if-checks for conditions that may or may not be possible? The cognitive load it took me to put one in was hours and even sometimes days. What about the cognitive load to take one out?

It is now obvious that the code is no longer state of the art, but bringing the code up to similar quality as new code is an extreme amount of work. It is technical debt. To validate the technical debt we talk about how long that code has been executing for and "not causing problems" and we bring up those "smart people" from the past. Yes we were smart at the time and we made a local optimum decision to address a business need. It doesn't mean that maintaining that decision is the best course of action to resolve future business needs (in fact it is rarely the case that leaving anything in a state of disrepair is a good decision as the cost of deferring the repair of decay is non-linear).

I gave a simple example of null pointer checks to address reliability. There were many more decisions made that increased our technical debt versus the current industry state of the art. Things like increasing our hard dependencies on our underlying operating system to the extent that we failed to provide good abstractions to give us agility in our future technology decisions (I'm being nice, since we also failed to provide abstractions good enough to even allow for proper unit testing). We executed on a set of design decisions that ensured only new code or heavily rewritten code would even try to establish better abstractions and patterns for the future. This further left behind our oldest code and it meant that our worst decisions in the longest state of decay were the ones that continued accruing debt in the system.

Now there are real business reasons for some of this. Older code is more risky to update. Its more costly to update. It requires some of your most expert developers to deal with, developers you'd like to put on the new shiny. Not everyone's name starts with Mike and ends with Rowe and so you can risk losing top talent by deploying them on these dirtiest of jobs. Those are valid reasons. Invalid reasons are that smart people wrote the code 10+ years ago or that when you wrote that code it was on state of the art thinking. As we've noted, the state of the art changes, very quickly, and the new code has to adapt.

Start of the Art in 2015

Its useful then to look at how many "bad decisions" we made in the past since if we could have told the future, we could have made the 2015 decision and been done with all of this talk. Our code could be state of the art and our legacy could be eliminated. So what are some of those major changes that we didn't foresee and therefore have to pay a cost now and adapt?

Well, reliability went from trying to never crash, to crash as soon as something comes up that you don't have an answer for. Don't recover from memory failures. Don't recover from stack overflow exceptions. Don't recover from broken invariants. Crash instead. I recently posted on our adoption of the fail-fast model in favor of our error recovery model and the immense value we've already gotten from this approach. You can read about that here, "Improving Reliability by Crashing".

Security has seen an uptick in the number of exploits driven by use after free type conditions. Now, years ago a model was introduced to fix this problem, and it was called Smart Pointers. At least in the Windows code base where COM was prevalent this was the case. Even now in the C++ world you can get some automatic memory management using unique_ptr and shared_ptr. So how did that play out? Well, a majority of the code in the district of decay was still written using raw pointers and raw AddRef/Release semantics. So even when the state of the art became Smart Pointers, it wasn't good enough. It wasn't deployed to improve the decay, the decay and rot remained. After all, having a half and half system just means you use the broken 50% of the code to exploit the working 50%.

So basically "smart pointers" written by "smart people" turned out to not be the final conclusion of the state of the art when it came to memory lifetime and use after free. We have much, much better models if we just upgrade all of the code to remove the existing decay and adopt them. You can read about our evolution in thinking as we advanced through "Memory Protector" and on to "MemGC". If you read through both of those articles one commonality you'll find is that they are defense in depth approaches. They do completely eliminate some classes of issues, but they don't provide the same guarantees as an explicitly reference counted system that is perfect in its reference counting. They instead rely on the GC to control only the lifetime of the memory itself but not the lifetime of its objects.

So there is yet another evolution in the state of the art still to come and that is to move the run-time to an actual garbage collector while removing the reference counting semantics that still remain to control object lifetime. Now you can see how rapidly the state of the art evolves.

We've also evolved our state of the art with regard to web interoperability and compatibility. In IE 8 for the first time we introduced document modes after were created a huge mess for the web in IE 7 where our CSS strict mode changed so much from IE 6 that we literally "broke the web". Document modes allowed us to evolve more quickly while providing legacy sites on legacy modes the ability to continue running in our most modern platform. This persisted all the way until IE 11 where we then had 5 different versions of the web platform all living within the same binary and the number of versioning decisions was reaching into the tens of thousands. One aspect of this model was our Browser OM which I've recently documented some aspects of, "JavaScript Type System Evolution", if you are interested in going deeper.

So the state of the art now is EdgeHTML and an Evergreen web platform. This in turn allows us to delete some of the decay, kind of like extracting a cavity, but even this carries so much risk that the process is slow and tedious. The business value of removing inaccessible features (or hard to access features) becomes dependent on what we achieve as a result. So even the evergreen platform can suffer from decay if we aren't careful and still there are systems that undergo much head scratching as we determine whether or not they are necessary. But at least the future here is bright and this leads me to my final state of the art design change.

The industry has moved to a much more agile place of components, open source code, portable code and platform independent code. Components that don't have proper abstractions lack agility and are unable to keep pace with the rest of the industry. There are many state of the art improvements that could apply to this space, but I'm going to call out one that I believe in more than others, and that is replacing OS dependencies, abstractions and interfaces with the principles of Capabilities. And I mean Capabilities in terms of how Joe Duffy refers to them on his recent blogging on the Midori project. I plan on doing a deep session on Capabilities and applying them to the Web Platform in a future article, so I won't dive in further here. This is also a space where, unlike the above 3 state of the art advancements, I don't have much progress I can point to. So I hope to provide insights on how this will evolve the platform moving forward rather than how the platform has already adopted state of the art.

Accepting the Future

When it comes to legacy code we really have to stop digging in our heels. I often find people with the mind set that a given bit of code is correct until it is proven incorrect. They take the current code as gospel and often copy/paste it all over the place, using the existing code as a template for every bit of new code that they write. They often ignore the state of the art, since there aren't any examples in the decaying code they are working on. It is often easier to simply accept the current state of that code than it is to try and upgrade it.

I can't blame them. The costs are high and the team has to be willing to accept the cost. The cost to make the changes. The cost to take the risks. The costs to code review a much larger change due to the transformation. The costs to add or update testing. They have to be supported in this process and not told how smart developers 10 years ago wrote that so it must have been for a reason. If those developers were so smart, they might have thought about writing a comment explaining why they did something that looks absolutely crazy or that a future developer can't understand by simply reading it.

You could talk about how they were following patterns and at the time everyone understood those patterns. Well, then I should have documents on those patterns and I should be able to review those every year to determine if they are still state of the art. But the decay has already kicked in. The documents are gone, the developers who wrote the code don't remember themselves what they wrote or what those patterns mean anymore. Everyone has moved on, but the code hasn't been given the same option. Your costs for that code are increasing exponentially and there is no better time than now to understand it and evolve it to avoid larger costs in the future.

We should also all agree that we are now better than ourselves 10 years ago and its time to let the past go. Assume that we made a mistake back then and see if it needs correcting. If so, correct it. If not, then good job, that bit of code has lived another year.

If you are afraid of the code, then that is even more reason to go after it. I often here things like, "We don't know what might break." Really? That is an acceptable reason? Lack of understanding is something we accept in our industry as a reason to put yellow tape around an area and declare it off limits. Nope. Not a valid reason.

The next argument in this series is that the code is "going away". If its "going away", then who is doing that? Because code doesn't go anywhere by itself. And deleting some code is still an action that a developer has to perform. This is almost always another manifestation of fear towards the code. By noting it will "go away" you defer the decision of what to do with that code. And in software, since teams change all the time, maybe you won't even own the code when the decision is has to be made.

Conclusion

When it comes to evolving your code to the state of the art don't be a blocker. Accept that there are better ways to do whatever you did in the past and always try to be truthful when computing the value proposition when you decide not to upgrade a piece of code. Document that decision well and include a reasoning around what additional conditions would have swung your decision (you owe it to your future self). Schedule a reevaluation of the decision for some time in the future and see if decisions have changed. The worst case is you lose the analysis and the cost of analysis becomes exponential as well. If that happens you will surely cement that decay into place for a long time.

Try to avoid having your ego or the ego's of other make decisions about the quality of code from 10, 5 or even 2 years ago. Tech is one of the fastest evolving industries and it is a self reinforcing process. As individuals it is hard to keep up with this evolution and so we let our own biases and our own history slow advancement when the best thing we can do is getting out of the way of progress. Our experience is still extremely valuable, but we have to apply it towards the right problems. Ensuring that old code stays old is not one of them.

Cleaning up the decay can be part of the culture of your team, it can be part of the process, it can be rewarded or it can just be a dirty job that nobody wants to do. How much decay you have in your system will be entirely dependent on how well you articulate the importance of cleaning it up. The distance between state of the art and state of decay can be measured in months so it can't be ignored. You probably have more decay than you think and if you don't have a process in place to identify it and allocate resources to fix it, it is probably costing you far more than you realize.

Sunday, September 20, 2015

Fixing Web Interoperability with Testing

For an old and mature API surface area such as HTML 5 you would think that it would be relatively well tested. We've put years of effort into writing tests, either as browser vendors, or maybe just as sites or frameworks to make sure that our stuff works. But rarely do these efforts scale to the true size of the API set, its complications, its standardization, its interop across browsers and all of the quirks that exist that the web is reliant on.

However, as judged by the numerous differences between browsers, it is pretty clear there are no canonical tests. Tests for the web platform, if you assumed all browsers were correct in their own way, would have tens of acceptable results. Developers would have to be aware of these different results and try to code in a way that allowed all such possibilities. Alas, we don't have tests like this and developers only write to one given implementation.

What does Interop Look Like Today?

Currently Interop between browsers is decided by site reports and bug fixes against those site reports. It means reductions of millions of lines of JavaScript, HTML and CSS to figure out what might be causing the problem for the given site.

Depending on which browser vendor is trying to match another, those bug fixes may or may not be good for the web. Arguably implementing a bunch of webkit prefixes is great to make sites work, users might even be happy as well, but it reduces the uptake on the standards versions of those same APIs. To the extent, that a relatively well positioned webkit API might actually need to be in the standard and marked as an alias for the standard itself. I'm not picking on webkit prefixes here either, it just so happens that they were more popular and so became the mind share leaders.

So browser vendors rely on site reporting and mass per site testing, analysis of complicated scripts and reductions and potentially silent bug fixes to arrive at some sort of interoperable middle ground where most existing sites just work, most newly written sites are probably portable, and the web has multiple different ways of doing the same thing. This creates a complicated, sparsely tested API, with many developer pitfalls. Current web interop really isn't looking so good.

But, are tests really going to make a difference?

Web Testing Statistics

I am about to throw out numbers. These numbers are based on real testing, aggregation, and profiling, however, they are not precise. We have not exhaustively tested our own collection methodology to ensure that we aren't missing something. But given the size of the numbers we believe these to be fairly statistically accurate. I'll apply an assumption that I have a +/- 5% error in the data I'm about to present.

Also, these are about tests that we run against EdgeHTML. Which also means the data collected is about how these tests work in EdgeHTML. It is possible that numbers swing and change as features are detected in different browsers. Okay, you probably just want the numbers ;-)

Test Set	Unique APIs	Comments
Top 10000 WebSites	~2300	Site Crawler live data, not testing
EdgeHTML Regression Suite	~2500	Filtered, incidental usage not counted
Top WebDriver Enabled Suites	~2200	Web Driver enabled, any browser can run
CSS 2.1 Suite	~70	Not an API testing suites

But these numbers don't mean anything, how many APIs are there? I can't give the number for every browser, but I can tell you that EdgeHTML is actually on the low side due to how our type system works. We've adopted an HTML 5 compliant type system earlier than some of the other browsers. We've also worked hard to reduce vendor prefixes. For this reason, Safari, may have over 9k APIs detectable in the DOM. FireFox, which has a high level of standards compliance, will often have may moz prefixed properties and therefore their numbers will be elevated as well.

But for EdgeHTML the statistics are that we have ~4300 APIs and ~6200 entry points. The entry points is important since these represent things that aren't necessarily visible to the web developer. The way we do callable objects (objects that behave like functions) for instance. Or the fact that read-write properties have both a getter and a setter which must be tested.

The numbers we collected on testing have been aggregated to APIs, but we also keep the data on entry-points. The entry-point data always results in a lower percentage of testing, since not testing a getter or setter can only take away from your percentage it can't add to it.

So what if we added up all of the unique coverage? If we do that across the suites, then we end up with ~3400 APIs covered. That's not bad, we are starting to show that the "used web" might heavily intersect in that juncture right? Okay, lets run a few more numbers then.

What about unique to the top 10000 web sites? Turns out there are ~600 APIs in that group. Ouch, that could be an interop nightmare. If we instead use just publicly available testing (and remove our proprietary testing) the numbers jump up significantly. We are now over 1200 APIs that are hit by live sites which are not tested by public test suites... Ugh.

Confidence in Existing Tests

So we've talked a lot about the testing that does and does not exist, but what about the confidence we apply to the existing testing. This is going to be less data driven since this data is very hard for us to mine properly. It is, in fact, a sad truth that our tracking capabilities are currently lacking in this area and it creates a nasty blind spot. I'm not aware of this state for other browser vendors, but I do know that the Blink project does a lot of bug tagging and management that I would argue is superior to our own. So kudos to them ;-)

So notice that the EdgeHTML test suite is considered a regression suite. This is an evolution of testing that started likely 20 years ago. While we have added some comprehensive testing for new features, many older features, the old DOM if you will, only has basic testing which is mostly focused on verifying the API basically works, that it most certainly doesn't crash and often contains some set of tests that were previously bugs in the API that we eventually fixed and created a "regression" test for. In IE 9, which is where this legacy DOM suite forks into a legacy and modern suite, we carry over about half of our entry points. I have numbers from this period that range between 3100 and 3400 entry points depending on the document mode.

Given we mostly have regression suites, we also find that many APIs are only hit a small number of times. In fact, our initial test numbers were quite high, around 60% entry point coverage, until we factored out incidental hits. Once the filtering was employed we were back down at 30% and that is a high amount, since there is no perfect way to filter out incidental usage.

This all combines to make my confidence in our regression suite at about 50/50. We know that many of the tests are high value because they represent actual bugs we found on the web in how websites are using our APIs. However, many of them are now more than 5, 10 or even 15 years old.

What about the public web-test suites that we covered? Well, that I'm going to also call 50/50 because we aren't really seeing that those suites touch the same APIs or test the APIs that the web actually uses. I mean, 1200 APIs not hit by those suites is pretty telling. Note, we do add more public test suites to our runs all the time, and quality of a suite is not purely about how many APIs they test. There is a lot of value, for instance, in a suite that only tests one thing extremely well. This requires that you grade the suites quality through inspection. This is a step we have not done. We rely on jQuery and other major frameworks to vet their own suites and provide us with a rough approximation of something that matches their requirements.

So in all, given that our existing suites aren't very exhaustive, they often test very old functionality, and there are a lot of missing APIs in the suites that are used by top sites, I'd subjectively rank the current test suites in my low confidence category. How can we get a medium to high confidence suite, that is cross vendor/device capable, and if we could what would it look like?

Tools for a Public Suite

We already have some public suites available. What makes a suite public? What allows it to be run against multiple browsers? Let's look at some of the tools we have available for this.

First and foremost we have WebDriver, a public standard API set for launching and manipulating a browser session. The WebDriver protocol has commands for things like synthetic input, executing script in the target and getting information back from the remote page. This is a great start for automating browsers in the most basic ways. It is not yet ready to test more complicated things that are more OS specific. For instance, synthetic input is not the same as OS level input. Even OS level synthetic input is not the same as input from a driver within the OS. Often times we have to test deeply to verify that our dependency stacks work. However, the more we do this, the less likely the test is going to work in multiple browsers anyway. Expect advancements in this area over the coming months and years as web testing gets increasingly more sophisticated and WebDriver evolves as a result.

But WebDriver can really test anything in an abstract way. In fact, it could test an application like Notepad if there were a server for Notepad that could create sessions and launch instances. Something that generic doesn't sound like a proper tool, in fact it is just part of a toolchain. To make WebDriver more valuable we need to specify tests in ways that anyone's WebDriver implementation can pick them up, run them, and report results.

For this we have another technology called TestHarness.js. This is a set of functionality that when imported into your HTML pages allows them to define suites of tests. By swapping in a reporting script, the automation can then change the reporting to suit its own needs. In this way you write to a test API by obeying the TestHarness.js contract, but then you expect that someone running your test will likely replace these scripts with their own versions so that they can capture the results.

The final bit of the puzzle is enabling tests to be broadly available and usable. Thankfully there is a great, public, open source solution to this problem in GitHub. By placing your tests in a public repository and using TestHarness.js, you can be rest assured that a browser vendor can clone, run and report on your test suites. If your test suite is valuable enough it can be run by a browser vendor against their builds, not only protecting them from breaking the web, but protecting your from being broken. For large frameworks in broad usage across the web this is probably the best protection you could have to ensure your libraries continue to work as the web evolves.

The Vendor Suite

Hopefully the previous section has replaced any despair that I might have started at the beginning of the article with some Hope instead. Because I'm not hear to bum anyone out. I'm hear to fix the problem and accelerate the convergence of inteoperability.

To this end, my proposal is what I call the vendor suite. And note, I call it this, but it doesn't matter if a vendor releases this suite or someone in the community, but it has to be informed by the vendors and use data from the actual browsers in order to achieve a level of completeness the likes of which we have never seen. Also, these are my own opinions, not those of the my entire team which increases the likelihood that a large undertaking such as this may be something driven by the community itself and in turn be supported by the vendors.

The qualities of the vendor suite are that it accomplishes the following immediate goals...

We define tests in 3 levels

API - Individual API tests, where each API has some traits and some amount of automated and manually created testing is involved. The level of testing is at a single API, and the number of tests is dependent on the complexity of the API itself.
Suite - A more exhaustive set of tests which focus on a specification or feature area. These suites could accompany a specification and be part of their contribution to the standards process.
Ad Hoc - Tests which represent obscure, but important to test behaviors. Perhaps these behaviors are only relevant to a given device or legacy browser. Potentially their goal is to "fail" on a compliant and interoperable browser.

Tests should be written to verify the interoperable behavior until all browsers agree to transition to the standards behavior if their is a discrepancy. Interoperability is the primary goal. Do websites work? Not, does a browser adhere to a spec for which no web developer is pining.
As much testing should be driven by traits as possible.

Attributes - At an API level the behavior of a property - attribute pair will be generically tested. Examples of common behavior are how the property - attribute reflect one another. Also, how MutationObserver's are fired.
Due to traits based testing we can verify down to the ES 6 level the configuration of the property. This allows auto-generation of testing for read-only, "use strict" adherence, along with other flags such as enumerability and configurability.
Tests should be able to be read and rewritten while maintaining any hand-coded or manually added value. Things like throwing exceptions are often edge cases and would need someone to specifically test for those behaviors.

No vendor will pass the suite. It is simply not possible. For example:

Vendor prefixed APIs will be tested for.
Hundreds if not thousands of vendor specific APIs, device configurations, scenario configurations, etc... may be present.

To this end, the baselines and cross comparisons between browsers are what is being tested for. Vendors can use the results to help drive convergence and start talks on what we want the final behavior to be. As we align to those behaviors the tests can be updated to reflect our progress.

Having such a large scale, combined effort is really how we start converging the web. This is how we all start quickly working through obscure differences with a publicly visible suite, publicly contributed to, and to be honest, prioritized by you, the community. With such obscure differences out of the way it frees up our resources to deliver new web specifications and further achieve convergence on new APIs much more quickly.

Final Thoughts

So what do you think? Crazy idea? Tried a thousand times? Have examples of existing collateral that you think accomplishes this goal? I'm open to all form of comment and criticism. Any improvements to the web that let me get out of the business of reducing web pages and finding issues earlier is going to be a huge win for me regardless of how we accomplish it. I want out of the business of trying to understand how a site is constructed. There are far more web developers than there are browser developers so the current solutions simply don't scale in my opinion.

As another thought, what about this article is interesting to you? For instance, are the numbers surprising or about what you thought in terms of how tested the web is? Would you be interested in understanding or seeing these numbers presented in a more reliable and public way?

Sunday, June 14, 2015

Improving Reliability by Crashing

When we describe the reliability of a piece of software we probably apply traits like, never crashes, never hangs, is always responsive and doesn't use a lot of memory. About the only pieces of software that meet all of these criterion are simple utilities, with years of development poured into fixing bugs and little or not improvement to the feature set. Maybe Notepad or Calc would come to mind.

We don't tend to spend much time in that software though. Software has to have a lot of functionality and features before we really allow ourselves to spend enough time their to really stress it out. But we do spend our time in large, complex programs. In fact, you may be one of the people who spends around 70% of your time in your web browser. Whether it be Chrome, Safari, FireFox or Internet Explorer. Likely nobody considers these programs to be very reliable. Yet we conduct business in them, file our taxes, pay our bills, connect to our banks. We complain about how often they crash, how much memory they consume, but fail to recognize how many complex behaviors they accomplish for us before going down in a ball of flames.

So this article is about web browsers and more specifically Internet Explorer. Its about a past of expectations and assumptions. And its about the a future where crashing more, means crashing less. Hopefully you find that as intriguing as I do ;-)

The Past - Reliability == 100% Up-Time

The components that make up Internet Explorer run in some pretty extreme environments. For instance, WinInet and Urlmon, our networking stack, run in hundreds of thousands of third party applications. They also power a large portion of the world's networking stacks. And due to their history they are heavily extensible.

Having so many consumers, you'd imagine that every crash we fix and every bit of error recovery we put in to recover from every possible situation would lead to code which is highly robust to failure. That this robustness would mean all of those hundreds of thousands of applications would have the same robustness that we do. That they listen to every error code we return and take immediate and responsible actions to provide a crashless and infallible user experience. And here I interject that you should vote on the <sarcasm> tag for HTML 5.1 so we can properly wrap comments like those I just made.

The reality is they themselves are not robust. No error checking, no null checking, continuing through errors, catching and continuing exceptions, so many bad things that we can't even detail them all, though gurus like Raymond Chen have tried. But, at least we didn't crash, and this made the world a better place. We provide the unused ability to be robust, at great expense for our own code.

To build a robust component you can either start with principles that enable this such as two-phase commit or go through increasingly more expensive iterations of code hardening. Let's talk about each of these.

Two-Phase Commit

To keep the definition simple here, you first acquire all the resources you'll need for the operation. If that succeeds you commit the transaction. If it fails, you roll-back the operation. You can imagine a lot of software and algorithms aren't really built with this idealistic view in mind. However, the model is proven and used extensively in databases, financial software, distributed systems and networking.

It is a huge tax though. And it only works if you can reasonably implement a commit request phase. In this phase you'll ask all sub-systems to allocate all of the memory, stack or network resources that they might need to complete the task. If any of them fails, then you don't perform the operation. Further, you have to implement a roll-back mechanism to give back any resources that were successfully acquired.

In a browser with large sub-systems like layout and rendering, alongside author supplied programs in the form of scripts and libraries, such a system is prohibitively expensive. While some aspects of the browser could be ascribed to the two phase commit model, how could you orchestrate the intersections and compositions of all the possible ways those sub-systems could come together to create transactions? Good thing we have another model which might work ;-)

Hardening

Hardening is the detection of a state that will cause an unrecoverable failure in your code and employing a corrective measure to avoid that state. In the simplest form, a state that can cause your program to hit an unrecoverable failure would be an OOM or Out Of Memory, that in turn propagates a null pointer back through your code. It could also throw, in which case your model would change to RAII and some form of exception handling instead of error state propagation.

With this type of programming the hardening is done through propagation of error codes. In COM, we use HRESULTs for this. When memory fails to allocate we use E_OUTOFMEMORY and so we have to turn memory allocator failures into this method. But in addition you have to initialize objects. So you end up with some allocator methods that both have pointers to return but can also return more than just one error code, something other than just the E_OUTOFMEMORY. Also, once the error codes are introduced the propagate through your function definitions and many functions must all of a sudden change their signatures. I've coded what I think is the most BASIC form of this which handles just a couple of the initial failures that you would run into, and it is still almost 100 lines of code.

Can you make this better? Well yes, with some structure you can. You can use a pattern known as RAII to handle the clean-up cases more elegantly and automatically. You can also use exceptions with RAII to protect sub-trees of code from being updated to error propagators. You have to augment this with actual exceptions that can be thrown for each error case, but that is rather trivial.

In terms of the callers, you'll need to ensure that there is always someone to catch your thrown exception. Part of our goal in using RAII + exceptions is to avoid code that handles errors. If we find that we are often introducing try/catch blocks then the amount of code increases and we find that we are spending much of our time still implementing error handling and recovery logic.

At some point the argument of cleanliness or readability comes to bear in these cases and whether or not you use method signatures or exceptions becomes a matter of style or preference. Suffice to say, having looked at very large codebases having employed all form and variations, the percentage of code you write specifically to harden and recover is about the same either way.

Stress

How do we know what and when to Harden? Well, we commit acts of violence against our own code in the form of stress. We randomly fail allocations at arbitrary locations in the code. We spin up threads that take locks at various times that to change the timing of the threads which actually need them. When 1 of something would suffice we do 10 of them instead. We do things in parallel that are normally done synchronously. All of these contribute to inducing failures that a user may or may not see during their actual usage, but since it is one of the few mechanisms for exercising millions of lines of error recovery code, we have to fix each and every instance to unblock the next one.

Just like the static analysis tools stress can produce more bugs than a team could reasonably handle. You can also create states that are impossible to debug and track back depending on when the original failure occurred that lead to the final crash. The more you harden your code, the less likely you are to be able to discover the path that led to the failure you are seeing. After all, many hours could mean hundreds of thousands of recovered errors led you to where you are, any of which could have induced the state that caused your ultimate failure. Pretty tricky huh?

Once the product is released, then the users will also stress and you'll get more real-world usage. Those crashes will be even less actionable than your stress, since you won't be able to log or record how you go somewhere and a user could have hit the condition after may days of use. You can also not always upload the entire crash state depending on the user's preferences.

Basically hardening and stress demonstrate a curve that very quickly stops paying off for the development team. This can actually be good for a waterfall design approach though since you will get the most benefit and find the most impactful bugs early in your stress runs and then they'll get increasingly less important the closer you get to shipping. Any truly "need to fix" bugs will still bump up via hit-counts and can be correlated with user data. As a developer, this drives me crazy as I stare at the endless ocean of crashes that I know are issues but will never fix due to their hit-count.

Finding an Alternate Failure Model

So we know that two-phase commit is probably too expensive or even impossible in some large software projects. We also know that hardening and stress testing to increase reliability has its limits as well. It eventually leads to code which has few invariant conditions and simply checks for every possible thing. This is the type of code where you find a now infamous line of code that will leave your scratching your head every time

if (this == nullptr) return; // Don't crash if we get passed a bad pointer

That is one of the final incarnations of hardening in your code. People calling instance methods on null pointers and you protect all callers because you can't be bothered to go fix them all.

This brings us to a juncture where we can begin to explore another failure model. It needs to improve on some of the deficiencies of hardening while at the same time helping us achieve equal or greater reliability numbers. So what do we want to get rid of?

Reduce lines of code dedicated to checking and propagating errors. We found that upwards of 10% of the lines of code in a project could be just the failure recovery. We also found that these lines of code were often not covered with our existing test cases and you can't use stress to determine code coverage as the cost is prohibitive.
Allow for developers to establish invariant conditions in the code. If you can't even say your this pointer is guaranteed to be non-null, then you have some problems. But why stop there? Why not also be able to state that certain members are initialized (hardening could leave partial initialization at play) or that a component you are dependent on ONLY returns you valid values in the configurations you care about?
Stress bugs are more about product issues than extreme environmental conditions. Bugs are immediately actionable because failure occurs when failure is first seen. Stress and user reported crashes can be strongly correlated.

The failure model that meets these requirements is fail-fast. It is almost the opposite of hardening. You have almost no error recovery code. You can supply almost no guarantees to your host that you won't crash them. The past, 100% up-time, is gone. User and stress alike crash fast, crash early, and if those crashes are prominent they get fixed by improving our code understanding. Our unit tests exercise all code paths possible because there are fewer of them. Our code has invariant conditions so when others use us incorrectly, they are immediately alerted.

Seems fun, but won't that lead to LESS reliability, MORE crashes, and MORE unhappy customers?

Crash Less by Crashing More

The principles behind fail-fast are that you will crash less once you fix your first few waves of induced crashes. The following steps are a guide to implementing fail-fast on an existing code base. Its a factoring tutorial if you will. And the fall-out from each step is also clearly explained.

Create a mechanism for terminating your application reliably which maintains the state necessary for you to debug and fix the reason for termination. For now I will call this abandonment since it is a term we use. We also say use the verb induce to describe the process by which a component requests abandonment. [This stage creates only opportunity, and no fall-out]
Upgrade your memory allocation primitives to induce abandonment. This is most likely your worst offender of error recovery code bar none. And all of those null checks spread all over your code is definitely not helping you. If you are running with throwing new you might be in better shape ;-) [This stage will likely be painful. You'll find places where your system was getting back null stuff but had plenty of memory. You'll find places where you allocated things the wrong size because of bad math. Fix them!]
Work from the leaves and remove error recovery initiators in favor of new abandonment cases. You can introduce new mechanisms of abandonment to collect the right information so you again have enough to debug. [This stage will be less painful. For every 100 conversions you'll find 1 or 2 gross bugs where things were failing and recovering but were creating customer facing bugs. Now they create crashes you can fix instead.]
Work your way outward and fix all of the error propagators. If they have no errors to propagate then this is easy. If there are still error propagators that they call, then you can induce abandonment for unexpected error codes. This can help you quickly understand whether or not entire code regions are ready for improvement since if a root function never receives the propagated error, then it likely means all of the children never really generate them. [By this stage you should already be generating less crashes during stress than you did while hardening. It seems counter-intuitive, but simpler code with fewer conditions which are heavily tested are just more reliable.]

I work on a huge code base and our experiences with fail-fast, in just a single release, has yielded an EdgeHTML which is nearly twice as reliable as its counter-part, MSHTML. That is pretty impressive and is based on data from our stress infrastructure. We have other telemetry which paints a similar story for the user facing experience.

For end users, they may actually see more crashes up front, while we get a handle on those things that stress has missed. We had over 15 years of hardening to account for and so we are in the infancy of reintroducing our invariant conditions and converting code through stages 3 and 4 above. Each crash we get from a user will be a deep insight into an invariant condition to be understood and fixed in a way that further improves the system. In the old world that crash would have been a serpentine analysis of logic and code flow through multiple robust functions all gracefully handling the error condition until we find the one that doesn't, patching it, creating a new serpentine path for the next crash.

I'm converting the first snippet into the equivalent fail-fast code to show you the differences. It also gives some insight into how much code and commenting gets to disappear with the model. Note, we didn't have any real control flow in our example, but fail-fast doesn't mean that control flow disappears. Functions that return different states continue to do so. Those that only return errors on extreme failure cases move to inducing abandonment.