Showing posts with label Oculus. Show all posts
Showing posts with label Oculus. Show all posts

Thursday, November 10, 2016

W3C VR Workshop - Building a WebVR Content Pipeline

On October 19th and 20th the W3C held its first very VR Workshop. It was our first chance to both look back at what we had accomplished with some of the first revisions of the WebVR Specification which was working its way through as a W3C Community Group while also looking forward to the potentially large set of missing, future standards we were yet to write.

During the workshop I presented on a few topics, but mostly on accessibility of the Web in VR and performance. In the following few posts I'll be taking one of my lightning talks and expanding on it, line by line. I compared this with recording the lightning talks, taking my time, and hitting all of the points, but it seems better to get this into writing than to force everyone to put up with me 5-10 minutes at a time through a YouTube video!

My first performance talk was on the WebVR Content Pipeline. This is an area I'm supremely passionate about because it is wide open right now with so much potential for improvement. If we look at the commplicated, multi-tool build pipelines that exist in the games industry and use that as an indication of what is to come then that is a glimpse into where I'm thinking. If you want to see the original slides then you can find them here. Otherwise continue reading I'll cover everything there anyway.

The WebVR Content Pipeline Challenge


Not an official "fix this" challenge, but rather a challenging are to be fixed. I started this slide by reflecting on the foreign nature of build pipelines on the web. Web content has traditionally been a cobbled together mess of documents, scripts and graphics resources all copy deployed to a remote site. In just the past few years, maybe 5 at most, the concept of using transpilers, optimizers and resource packers has become almost commonplace for web developers. Whereas from 2000-2010 you might be rated on your CSS mastery and skillset, 2010 and onward we maybe started talking more about toolchains. This is good news, because it means some of the infrastructure is already present to enable our future build environments for WebVR.

My next reflection was on the complicate nature of VR content. You have a lot of interrelated components via linking or even programmatically through code dependencies. This comes across as meshes, textures, skins, shaders, animations and many other concepts. We also have a lot of experience in this area, coming from the games industry, so we know what kinds of tools are required. Unfortunately, even in games, the solutions to many of the content pipeline issues are highly specific to a given programming language, run-time or underlying graphics APIs. Hardware requirements make it even more custom.

My final reflection was on the graphic below. This is a content pipeline example where you start with assets on the left and they get processed into various formats before they finally get loaded on the client. The top of the graph represents what a common VR or Game pipeline might look like with many rounds of optimization and packaging. On the bottom of the graph we see what an initial web solution would be (or rather what it would lack). The difference in overall efficiency and sophistication is pretty clear. The web will need some more tools, distribution mechanisms and packaging formats if it wants to transmit VR efficiently, while retaining all of the amazing properties of deploying to the web.


Developer - Build, Optimize, Package


The graphic shows three major stages in the pipeline. First we start with a developer who is trying to create an experience. What steps in the pipeline can he do before sending things onto the server? How much optimization or knowledge of the target device can be had at this level? Remember, deploying to the web is not the same as deploying to an application store where you know almost exactly what your target device will be capable of. While fragmentation in the phone market does mean some progressive enhancement might be necessary, the web effectively guarantees it.

My first reflection is that existing build technologies for the web are heavily focused on soling problems with large 2D websites. These tools care mostly about those resources which we currently see scaling faster than the rest. This mostly means script and images. Some of the leading tools in this space are webpack and browserify. Since some set of tools do exist, it means that plugins are a potential short term solution.

My second reflection on this slide was that the game industry solution of packaging was also likely to not be the right solution. This breaks to principles of th web that we like. The first is that there is no installation required. Experiences are transient as you navigate from site to site. Even though these sites are presenting to you a full application, they aren't asking you for permission to install and take up permanent space on your disk. Instead the browser cache manages them. If they want more capability, then they have to ask for it. This might come in the form of specialized device access or the ability to store more persistent data using Indexed DB or Service Workers. The second principle that packaging breaks is that of iterative development. We like our ability to change something and have it immediately available when we hit F5.

My third reflection is around leveraging existing tools. There are many tools for 3D content creation, optimization and transformation. Most of the tools need to be repacked with the web in mind. Maybe they need to support more web accessible formats or they need to come with the library that allows them to be used efficiently. In the future I'll be talking about SDF fonts and how to optimize those for the web. You may be surprised to find that traditional solutions for game engines aren't nearly as good for the web as they are for the traditional packing model.

Another option is for these tools to have proper export options for the web. Embedded meta-data that is invisible to your user but still consumes their bandwidth has been a recent focus of web influentials like Eric Lawrence. Adobe supported export for the web for years, but often people wouldn't use the option and would instead ship an image that was 99% meta-data. Many 3D creation tools have never had the web in mind, so they often emit this extra data as well or export in uncompressed or unoptimized formats, more similar to raw. Upgrading these tools to target WebP, JPEG and PNG as image output formats, or any of the common texture compression formats supported by WebGL would be a great start.

My final reflection for this slide was on glTF. I think that this format could be the much needed common ground format for sharing and transferring 3D content with all of their dependencies. Its well structured format means that many tools could use it as both an import and export target. Optimization tools will find it easy to consume and rewrite. Finally client side the format is JavaScript friendly so that you can transform, explore and render it however you want. I'll be keeping a close eye on this format and contributing to the Github repository from time to time. I encourage you to check it out.

Server - CDNs, CORS, Beyond URLs


Our second leap in our build pipeline is the server. While you can precompile everything possible and then deploy it, there will be cases where this simply isn't feasible. We should rely on the server to ingest and optimize content as well, perhaps based on usage and access. Its also a tiered approach for the developer who might be working on the least powerful machine in their architecture. Having the server offload the optimization work means that iteration can be done much more quickly.

For the server my first reflection is that we need smarter servers and CDNs that offload many types of optimization from the developers machine and their build environment off into the cloud where it belongs. As an example, most developers don't produce 15 different streaming formats for their video today and upload each individually. We instead rely on our video distribution servers to cut apart the video, optimize, compress, resample and otherwise deliver the right content to the right place without us having to think about the myriad of devices connecting to us. For VR the same example would be in the creation of power of 2 textures, precomputing high quality mipmaps or even doing more obscure optimizations based on the requesting device.

For device specific tuning we can look towards compression. Each device will have a set of texture compression extensions that it supports and not all devices support every possible compression type. Computing and delivering these via the server can allow for current and future device adaptation without the developer having to think about redeploying their entire project with new formats.

The second reflection is on open worlds. For this we need high quality, highly available content. There are a large number of content pieces that will be pretty common/uniform. While you could generate all possible cubes, spherical maps and other common shapes on the client, you can also just make them widely available in a common interchange format and available over CORS.

For those not familiar CORS stands for Cross Origin Resource Sharing and is a way to get access to content on other domains and be able to inspect it fully. An example would be an image hosted on a server, perhaps it contains your password on it, so it would not be served via CORS. While you could retrieve that image and display it in the browser you would not be able to read its pixels or use it with WebGL. On the other hand if you had another image which was a simple brick texture you might want to use it on any site in WebGL. For this you would return the resource with CORS headers from the server and this would allow anyone to request and use the texture without worry of information leakage.

I suspect a huge repository of millions or even billions of objects to be available in this way within the next few years. If you are working on such a thing, reach out to me and let's talk!

My last real reflection is that accessing content via URL is inefficient. It doesn't allow the distribution of the same resources from different locations with the caching we need to make the VR web work. We need efficient interchange of similar and reusable resources across large numbers of sites, but not put the burden on a single CDN to deliver those so that caching works as it exists today in browsers. There are some interesting standards proposed that could solve this or maybe we need a new one.

Not even a reflection, but rather a note that HTTP/2 pushing content down the pipe as browsers request specific resources will be key. Enlightening the stack to glTF for instance to allow for server push of any external textures will be a huge win. But this comes with a lot of need for hinting. Am I getting the glTF to know the cost of an object or to render it? I will try to dedicate another future post to just HTTP/2, content analysis from the server and things we might build to make a future VR serving stack world and position aware. I don't think these concepts are new and are probably in heavy use in some modern MMORPG games. If you happen to be an expert at some company and want to have a conversation with me on the subject I would love to pick your brain!

Client - Progressive Enhancement


The last stage in our pipeline is on the client. This is where progressive enhancement starts to be our bread and butter technique. Our target environment is underpowered and at the high end of that spectrum will be LTE enabled mobile phones. Think Gear VR, DayDream and even Cardboard. Listening to Clay Bavor talk about the future it is clear Google is still behind Cardboard and admits to there being millions of units out there already with many more millions to come. Many people have their first and only experiences in VR on Cardboard.

Installation of content is no longer a crutch we can rely on. The VR Web is a no install, no long download, no wait environment. Nate Mitchell at OC3 alluded to this in his talk when he announced that Oculus would be focusing on some key projects to help move the VR Web forward. I still consider Oculus a team that delivers the pinnacle of VR so to take on the challenge of adapting the best VR experiences possible to this rather harsh set of mobile and web requirements is pretty epic. That is what the rest of this slide covers.

My first reflection after noting the requirements is that of progressive texture loading and fallback all the way to vertex colors when textures aren't available yet. The goal of VR is to get people into an immersive environment as fast as possible. Having a long loading screen breaks this immersion. The power of the web is the ability to go from site to site, without breaking your immersion, the way you do today when navigating between applications (or when you have to download a new application to serve a purpose for which you just gained a need). We also aspire to have VR to VR jumps work like they do in the sci-fi literature or in movies. A beautifully transparent experience as you switch from world to world.

We can achieve this with good design and since the VR Web is just starting we have the opportunity to design it right. My only contribution for now is to load your geometry with simple vertex colors, follow up with lightweight textures and finally, once the full texture is loaded and ready, upgrade to the highest quality experience. But don't block the user or lower your framerate significantly to jump up to that highest quality if it is going to be disruptive to your user. This will require some great libraries and quite a bit of experimentation to find all of the best practices. Expect more from me on these subjects in future articles as well.

My second reflection is on the importance of Service Workers. A feature designed to make the web offline can also be a powerful catalyst in helping the VR Web instantly load and become fully progressive. The features I think that are key in the near term are the ability to implement some of the prediction and prefetching for things like glTF resources. As the Service Worker intercedes it can fire off the requisite requests for all of the related textures and can cache them for later. We can also build in the progressive texture loading into the service worker and have it optimize for many different variables to deliver the best experience. Its basically like having the server of the future that we all want, but on the client and under our control.

Another feature of the Service Worker is understanding the entire world available to the experience and optimizing based on the current location and orientation information. This information could also fit into the future VR server, but we can test and validate the concepts in the Service Worker long before then.

My last reflection on the client is that there are some well defined app types for which we could deliver the best possible experiences using common code available to everyone, perhaps augmented by capabilities in the browser. This is highly controversial since everyone points out that there is indeed a base experience but that it is insufficient and customization is a must. I disagree, at least for now. Its like arguing against default transport controls in the HTML 5 video player. Why? 99% of your web developers will use them to provide 20% of the scenarios in the long tail of the web. Sure there is a 1% that develops top sites and accounts for 80% of the experiences and they'll surely want to add their own customization, but they are also in the best position to create the world class optimization and delivery services needed to make that happen.

To this end, I think its important that we build some best of class models for 360 photos and videos and optimize this into the browser core while VR is still standing up. These may only last for a couple of years, but these are the formative boot strapping years where we need these experiences to amaze and entice people to buy into the VR Web that doesn't quite exist yet.

Bonus Slides


I won't go into details on these. They are more conversation starters and major areas where I have some internal documentation started on what we might propose from the Oculus point of view. I'll rather list them here with a sentence describing some basic thoughts.

  1. Web Application Manifests - If these are useful for describing web pages as applications they can be used to describe additional 3D meta-data as well.
  2. Texture/Image Optimization - Decoding image types, compressing textures on the device, etc... Maybe the realm of Web Assembly, but who knows.
  3. glTF - Server enlightened data type with HTTP/2 server push, order optimized, LOD optimized and full predicted based on position and orientation.
  4. Position Aware Cube Maps - Load only visible faces, with proper LOD, perhaps even data-uri encode for the initial scene load.
  5. Meta Tags/HTTP Headers for Content Hinting - Initially this was only position and orientation so that the server could optimize but has since grown.


What's Next?


If you find things here interesting you can contact me now at justrog@oculus.com and we can talk about what opportunities might exist. I'll also be reaching out to you as I find and discover experts. I like to constantly grow my understanding of the state of the art and challenges that exist. Its my primary reason for joining Oculus, so that I could focus on VR problems all day, every day!

I have 3 more talks that I delivered at the conference, each prepared similar to this one as a blog post. I'll finish the perf series first and the I'll end with my rather odd talk on Browser UX in VR. Why was it odd? Stick around and I'll tell you all about it in the beginning of that post.

Saturday, October 8, 2016

Progressive Enhancement for the VR Web

Modern VR developers are wizards of the platform. Andrew Mo commented that they are the pioneers, the ones who survived dysentery on the Oregon Trail. It was meant as a joke, but every well timed joke carries more weight when it reflects a bit of the reality and gravity of the situation. Modern VR developers really are thriving in the ecosystem against all odds in tinier markets than their Mobile app and gaming counterparts while meeting a performance curve that requires deep knowledge of every limitation in the computing platform.

John Carmack, in his keynote, said that the challenge present in Mobile VR development is like dropping everyone a level. The AAA developers become indie devs, the indie devs are hobbyists and the hobbyists have just failed.

If VR is dominated by these early pioneers then where does the web fit in? These VR pioneering teams aren't web engineers. They don't know JavaScript. While WebGL seems familiar due to years of working with OpenGL, the language, performance and build/packaging/deployment characteristics are all quite different from that of a VR application in the store. Many new skills have to be employed to be successful in the VR Web.

There is a shining light in the dark here. Most people when they hear about WebVR immediately think about games or experiences coded entirely in web technologies, in JavaScript and WebGL. It’s a natural tendency to think of the final form a technology will take or even just draw parallels with what VR is today. Since today, VR is dominated by completely 3D immersive experiences, always inside of a headset, it can be hard to imagine another, smaller step that we could take.

Let's start imagining. What does a smaller step look like? How do we progressively evolve and enhance the web rather than assuming that we have to take the same major leaps that the VR pioneers have made to date? How do we reduce our risk and increase the chance of reward? How do we increase our target market size so that it greatly exceeds the constraint of people with consumer grade VR already available?

VR Augmentations and Metadata

Our first goal has to be that existing websites continue to serve billions of users. We need to progressively update sites to have some VR functionality, but not make that a requirement for interaction. Just like a favicon can be used by a site to slightly customize the browsing experience and make their site stand out in history, favorites or the bookmark bar, a VR ready site could supply a model, photosphere, 360 video or even an entire scene. This extra content would be hidden from the vast majority of users, but would be sniffed out by the next generation of VR ready browsers and then used to improve the user experience.

One of the most compelling options is to have your website provide an environment that can be rendered around your page. In this way you can set up a simple scene, decide where your content would get projected to and the browser would handle the rest through open web standards such as glTF. This isn't even a stretch of the imagination as a recent partnership between OTOY and Samsung is working on the right standards. I was able to sync up with Jules at OC3 and I have to say, there is a pretty big future in this technology and I'm happy to add it to the list of simple things existing website developers can do without having to spend years learning and working in VR and 3D graphics. Stick a meta or link tag in your head, or push it down as an http header (this is why meta+http-equiv is probably the best approach here) and you'll get more mileage out of users with a VR headset.

It doesn't stop here though. While this changes the environment your page runs in, it doesn't allow you to really customize the iconography of your site the way a simple, 3D model would be able to. Another glTF opportunity is in delivering favicon models that a browser can use to represent sites in collections like the tab list, most recently visited sites and favorites. A beautifully rendered and potentially animated 3D model could go a long way to putting your website on the mantle of everyone's future VR home space.

I think there is more value to be had in the Web Application Manifest specification too. For instance, why can't we specify all of our screenshots, videos and icons for an installable store page? A VR Browser would now be able to offer an install capability for your website that looks beautiful and rivals any existing app-store. Or if you like the app-store then you can specify the correct linkage and "prefer" your native experience. The browser in this case would redirect to the platform store, install your native application and off you go. I see this as exceptionally valuable for an existing VR centric developer who wants to augment their discovery through the web.

Immersive Hybrid Experiences

Our next goal is to start incrementally nudging our users into the VR space. While augmentations work for existing VR users and are ways to provide more VR like experiences on the existing web, we can do even better. Instead of a VR only experience we can build hybrid, WebGL+WebVR applications that are available to anyone with a modern browser.

How does this work? Well, we start with the commonality. Everyone has a browser today capable of some basic WebGL. This means any experiences you build can be presented to that user in 3D through the canvas tag in a kind of magic window. You can even go full screen and deliver something pretty immersive.

To improve this further, we can abstract input libraries that work across a large set of devices. Touchpad, touch, mouse, keyboard, device orientation, gamepad and the list continues to grow each day. By having a unified model for handling each type of input you can have a great experience that the user can navigate with mouse and keyboard or spin around in their chair and see through the window of their mobile phone. Many of these experiences start bordering on the realism and power of VR without yet delivering VR.

The last thing we do to nail this hybrid experience is detect the presence of WebVR. We've landed a property in the spec on the navigator object, vrEnabled. This will return true if we think there is a chance the user could use the device in VR. While I think there are some usability issues with such a simple property that will result in maybe some browser UX for turning VR on and off, this is a great start.

This is the next level of write once, run anywhere, but its built on the concept of progressive enhancement. Don't limit your user reach, instead scale your experience up as you detect the capabilities required. I've recently stated that this is a fundamental belief to our design of WebVR and I truly believe in maintaining this approach as long as there are users who can benefit from these choices.

I wanted to give an example of one of these experiences and so here you can see a 360 tour we've linked from our Oculus WebVR Developer portal. There will be many more examples to come, but this will run in any browser and progressively enable features as it detects them. My favorite is simply going there in your phone and looking around using device motion.

I can't speak highly enough of the value in building experiences like this. For this reason we at Oculus and Facebook will be releasing numerous options for building these experiences in minutes to hours rather than in days or more. Both content generation and viewing needs to be made easier. We need to make great photosphere export from pretty much any application (any developer should be able to emit and share a photosphere from anywhere in their game/application/experience), optimize how we transmit and display photospheres with potential improvements to streaming them in, with simple to use libraries. It has to be just as easy to extend this to use 360 video. Even a simple looping 360 video could provide your user with AMAZING bits of animation and value. Finally, extending this to make it interactive with text and buttons. You can go further if you have the skills to do so, but the basic libraries such as React VR will allow anyone to create great experiences with the above characteristics.

Getting out of the Way

Once we've extended existing websites, gotten some simple libraries in place and people start to build VR content, then final step is get the hell out of the way. This is easier said than done. There is a large contingent of web technology naysayers that have pointed out certain flaws in the web platform that make it non ideal for VR. I'll give some examples so it is clear that I do listen and try to resolve those issues.

  1. JavaScript is too slow. To which I respond, maybe when it comes to new experiences. JavaScript is a tuned language. It is tuned by browser vendors based on the content out there on the web today. Tomorrow, they could tune it differently. There are so many knobs in the run-time. C++ is no different. There are knobs you can tune when you pipe your code into cl.exe and it WILL result in differences in the run-time behavior and speed of your application. The browser is running in a chosen configuration to maximize the potential for a set of sites determined by usage.
  2. GC sucks. To which I reply, yes, the general model of GC that is present in your average web runtime does indeed suck. However, this is again, only one such configuration of a GC. Most GC's are heuristic in nature and so we can tune those heuristics to work amazingly well for VR or hybrid scenarios. I won't go into details, but let's just say I have a list of work ready for when I hire my GC expert into the Carmel team ;-)
  3. Binding overhead is too high. To which I respond, "Why?" There is no requirement that there be any overhead. An enlightened JIT with appropriate binding meta-data can do quite a bit here. While there are some restrictions in WebGL binding that slow things down by-design I have high hopes that we can arrive at some solutions to fix even that.

That list is not exhaustive. When you have full control over the environment and aren't part of a multi-tenant, collaborative runtime like a web browser, then you can be insanely clever. But insanely clever is only useful for the top 1% of applications once we enable anyone with a browser to build for VR. We need to get out of the way and make the common cases for the remaining 99% of the content not suck. That’s our space, that's our opportunity, that’s our goal.

Beyond the standard arguments against VR and the web I think there are more practical issues staring us in the face. The biggest one is the prevalence of full VR browsers. The first fully featured VR browser is going to be a deep integration of the web platform, OS, device and shell. There is simply too much going on today for a web application to be able to run with zero hiccups and an overall lack of measurement and feedback between the various sub-systems. Tuning everything from the async event loop, thread priorities and which sub-systems are running when in and out of VR is a very important set of next steps. Combining these enhancements with UX and Input models that enable people to have a level of trust in their to the extent that they could interact with a merchant in virtual space, whatever form that might take.

Right now we are experiencing the power of VR shells that plug into basic browser navigation models and do simple content analysis/extraction. This is an early days approach that only achieves a small fraction of the final vision.

For the graphics stack we need some basics like HTML to Texture support in order to bring in the full power of the 2D web into our 3D environments. I've been referring to 2D UX as having a "density of information" that can't be easily achieved in 3D environments. We need this in order for VR experiences to enable deep productivity and learning. Think about how often, in our "real" 3D world you break out your phone, laptop, book and other 2D format materials to quickly digest information. I don't think VR is any different in this regard. John Carmack, during his keynote, also noted that there was huge value in 2D UX because we've had so many years of experience developing. I believe this to be true and think that an HTML to Texture will broaden the use cases for WebVR dramatically.

We also need to enable full access to the pipeline. More and more I'm hearing about the necessity for compute shaders and access to gl extensions like multi-view. Even native is insufficient for delivering super high quality VR leading to advances in hardware, OS and software. The web needs to provide access to these capabilities quickly and safely. This may mean restrictions on how we compose and host content, but we need to experiment and find those restrictions now rather than holding the web back as a core VR graphics platform.

To close out, note how this section details yet more progressive enhancement. Each of the new capabilities and improvements we make will still take time to filter out to the ecosystem. This is no different in the native world where extensions like multi-view which have been defined for a couple of years are still not uniformly distributed. So producing experiences that reach the largest possible audience and gradually increase capability based on the device and browser features you detect is going to be key.

Over the next few months I'll be providing some tutorials talking about strategies and libraries you can use to enable VR progressive enhancement for your existing sites. You can also check out more of our demos and sign up for more information about our libraries like React VR at the Oculus Web VR Developer Portal.

Sunday, January 24, 2016

Browser VR Experiences – Let’s not VRML!

Virtual Reality is a hot topic right now and so there is a huge interest for APIs and standards related to it across a broad range of programming languages and platforms. One of the most interesting platforms right now, due to its pervasiveness is the web platform or Browser. Every major desktop and mobile platform now has access to a great Browser that is performant, highly interoperable with thousands of APIs and some general capabilities for composing the web platform into existing applications. The large set of developers who program for this platform are always itching for new and relevant technologies to be able to differentiate themselves and create rich new experience. This means, the race is on. How long before the web platform has a set of standards and specifications for helping guide the evolution of VR. Whatever is produced will be the software equivalent of the Oculus Rift and Gear VR. It’ll put in the hands of all developers and consumers, VR capability, as soon as the devices hit the market leading to pervasive VR.

But it’s not something you want to rush. Let’s not VRML! To be clear VRML (spec) was never really VR, but rather standard 3D projection onto a 2D plane we all experience today every time we play games. VRML, in other words, was way ahead of its time in more ways than one. As a technology it sort of replaced the existing HTML experience with a 3D experience through markup that more closely resembled SVG. It had concepts like link navigation that were necessary to allow switching views either within or between different scenes. It was like another top level document and so you could even switch back and forth between HTML and VRML quite easily. Since the experience wasn’t true VR, this wasn’t terrible. It wasn’t jarring for the user as they moved between 2D and 3D content.

Now let’s fast forward. The HTML world we live in has evolved immensely. By itself, HTML can render 3D content using WebGL. Since this maps to our current state of the art for game development it is a much better experience than say a limited markup language. It is more complex for sure, but various libraries intend to make it more approachable with Unity, three.js, Babylon.js and A-Frame being some really strong contenders. We also have true VR devices now and that means stereoscopic rendering and projection to the user. We’ve added rich media in the form of Audio and Video tags. The world is much more complicated and so any standard or specification has to now navigate existing Browser UI, user experience, the existence of new APIs and understand how developers want to code to the platform.

What I hope to present in the rest of the article is mostly a set of experiences that I see that exist. What are some of the challenges we face both in user, interface and API design to make those approachable? What is the line between Browser and Web Page and how do we bridge that gap in a way that enables developers AND users to have great experiences? Finally, how do we integrate the future 3D web without losing the power we already have established in the existing 2D web? These questions keep me up at night (literally, I was up last night until 3AM because I haven’t written this article yet ;-) and so I hope in the coming weeks and months to put as many of them to rest as possible.

Presentation Experiences for Browsers


Browsers currently have to deal with a number of different presentation formats. We then provide information about the current presentation (screen size, orientation, etc…) to the page using a number of technologies so that the content within can adapt. The overall experience is a combination of the Browser UX or Browser Chrome (not to be confused with Chrome the browser) and the content.

Screens and Phones


Recently trends have gone towards the browser being very light on the Chrome with thin edges, no status bars and minimal title areas. This is mainly due to the fact that user’s want the majority of the browser’s space to be dedicated to the content they are viewing. They still want critical UX elements like tabs since that has become a de-facto standard on how to organize your browsing experience. Finally over time the average number of tabs a user is comfortable managing has increased. That can give you some basic design criterion for building a good Browser UX.

For the screens on which we present the trends have also changed. The average page visible area when I started working on IE back in 2005 was 800x600. It grew to 1024x768. Right now I imagine it to be more fragmented and I haven’t looked in a long time. During that golden age the device pixel to CSS pixel was 1:1. This meant that every pixel you put on the screen was rendered as a physical pixel as well. With High DPI screens this is no longer true. Phones now have 4k screens, but couldn’t possibly render 4k worth of CSS pixels. Instead we simply use more than one screen pixel to represent the same CSS pixel but the effective screen real-estate is pretty much the same (actually less, since the number of CSS pixels on a phone screen might only be 400x800).

For desktop the average screen size is increasing as is the number of effective CSS pixels you can take advantage of. Very recently desktops have been balancing increased screen size with high DPI. A 28” 4K display is an example. 96DPI 4K is available at around 42” which is starting to get too big for usable desktop (neck strain, head tilt, etc…). For phones the number of device pixels (DPI) is increasing, screen size is approximately the same as are effective CSS pixels. With new casting technologies your phone could also be your desktop so you have to be prepared for that. I recommend reading the MSDN article on Windows 10 Continuum since it shows the API shape for how developers might deal with these types of devices.

Virtual Reality Presentations and Devices


I’ll start with the existing types of VR devices that we have. There are two basic forms. The first form is the integrated screen and device. This is demonstrated by a Gear VR or Google Cardboard. The user experience of these devices is kind of odd because you have to navigate content designed with and without VR in mind. The second form is the attached peripheral. This is the Oculus Rift and many other emerging contenders. Instead of using the same screen as the device for displaying 2D content, they have a secondary head mounted display (HMD)

Integrated Devices – GearVR and Google Cardboard


Let’s think through a common workflow. You are in your mail app, not 3D, so you are currently not in stereoscopic mode. If you put the device into your VR headset at this time, your brain would be super confused. Each eye would see a different half of the screen and it couldn’t do anything to combine this information into a visual. Some people would get sick, others would feel uncomfortable and probably 1 in half a billion people would be able to independently read each screen (Gecko People!).

Instead you click on a link and go to your web browser. All of a sudden, your screen shows two deformed images side by side. They look similar, but slightly different. Again, your mind can’t really do much since the deformation is too extreme for the image to look much like the original. It’s like looking at the world through coke bottles. This workflow is proving to be very confusing and uncomfortable.

You do put the device into a headset though and you are immersed in brilliant 3D. You spend the average 30 seconds on the web page and then you navigate back to your mail app and yet again you are confronted with sickness. You eventually realize that limitations of your devices and manage to work around these transitions, but you are still surprised EVERY time some content throws you into VR mode and you wish for a better experience.

Peripheral Devices – Oculus Rift, Morpheus, Vive


Again, we can think through a workflow here. You spend most of your day staring at your screen, navigating through your 2D applications, quite productive and happy. You launch a mail link and you are confronted with a blank page. It turns out this page has detected your Oculus and decided to use it. You probably notice your CPU fan noise just increased to the level of a Boeing 747 jet engine.

You quickly don your Rift and now the experience is obvious. You don’t quite like this application so you decide to exit. Again, you are now confronted with the Oculus shell and not any form of familiar Browser UX. You take off the Oculus and decide to get back to real work.

Alternately you could have clicked on some form of navigation primitive in the virtual space. Your desktop browser is going crazy navigating links, but you don’t see this occurring. Clearly the experience between the peripheral and the page wasn’t thought through very clearly. You wonder to yourself if this is the fault of the browser, the web page author or even the VR device itself.

Integrating Devices into the Web Experience


Adapting for Virtual Reality experiences requires that we understand the previously detailed experiences. That is what an experience might look like today if we implement the wrong set of APIs or no APIs at all. We certainly need to take into account multiple type devices and most importantly the different types of presentation they are capable of.

The browser, up to now, has only had to deal with rendering to a 2D surface, a single version of our image. No matter how complicated the end user’s setup was, this was abstracted from the Browser. They could be spanning us across multiple displays, or on a curved display or even changing the aspect ratio. To the Browser this doesn’t matter since it doesn’t change the geometry of our outputs.

But why does the geometry change? Why does a change in geometry matter? To answer these questions first look at your standard 2D browsing experience. In the 3D world in which you live, if replicated using existing 3D technologies, your computer monitor would be a quad. 2 triangles, oriented in space to form a square. Your camera would be each of your eyes and your eyes would be focused at a very specific location on the quad, the area of focus. While you are working the quad remains in a very narrow portion of your FoV. While you can see things to your left, your right, you can probably see your feet even, the monitor is not positioned in those locations.

Changing the position of the monitor, increasing its size, placing it nearer or further away all become uncomfortable operations to you, the user. Why? Well, the content is optimized for its default presentation. That of being visible in a square, placed in the center of our FoV. The geometry, in 3D space is that of a projected quad, at some depth, which makes the text and images the right size for us to comfortably read everything. I’m going to try and coin terms for each of these default experiences.

VR Experience #1 – Standard Web Browser, 2D Content Billboard Projection

Basically the same experience you get on your laptop should be the same experience you get on a VR device. This default experience will be fairly comfortable. First, you won’t be rotating your head up/down or left/right to try and navigate content. The same navigation primitives you use today will suffice. This will make the overall experience very accessible as well since it will limit the number of new things a user has to master before finding the space usable.

The Browser UX would probably take on the form of a phone browser. It would be lightweight and get out of your way, focusing mostly on the content. The content developers will be very happy. They spent millions on designing their experience and you’ve basically maintained that experience well.

Proposal #1 – The basic VR experience is NOT a standard and is not part of WebVR

This experience is how the browser allows the user to stay inside of the headset. It has nothing to do with APIs available to the page. It’s about the entire browser experience from soup to nuts and is therefore something to be tweaked by each vendor.

If you are not familiar with “Theater Mode”, then press F11 in your browser right now! Did you see how this provided a browser based full-screen experience? This is separate from the full-screen APIs. For the same reason that theater mode is a browser provided experience I believe that WebVR (the API set) should not intrude here. A browser, should be able to provide a VR experience for browsing that is independent of the content.

VR Experience #2 – Web Browser Features in 3D Space

My next experience is about the web browser having non-billboard 3D content. This is where the experience wants to utilize 3D space rather than simply maintain the status quo of a 2D screen in front of your eyes while the headset is on.

The goal here would be to achieve minority report like user interfaces for things like tabs, favorites, history and other content such as the recent news rivers present in Microsoft Edge and other browsers. This experience now only makes sense if the user is ready to be in a VR mode. They should show some intent for this experience because as we’ve seen before, they may need to prepare their device. They may need to either put on their peripheral HMD or they may need to drop their phone into their HMD harness. Either way, they are likely now deciding that they don’t want to continuously jump between a non-3D and a 3D experience. This is where our VR Experience #1 shines as well. When the browser does navigate to 2D content it has to do an amazing job of showing it in 3D space.

Proposal #2 – A Browser should have a VR Immersive Mode

Proposal #3 – A VR Immersive Mode is again the realm of the Browser not Standards APIs

With the browser in immersive mode, you can now display your tabs page as a 3D sphere of tiles and the user is likely to be happy. They can use head rotations to do navigate around as well as other inputs such as Gamepad, Mouse or Keyboard (or even more obscure capabilities like hand gestures).

Still this experience is completely outside of the API set necessary for a standards based API. We should only intrude where necessary is always one of my goals in standards. We find more and more reasons to intrude as time goes on, but that’s fine.

Theoretical Work-flows for VR Experiences #1 and #2


We’ve just turned basic VR into a button. When a browser detects that it can provide a VR like experience it shows intent in the UI. Further, you can always go into a VR mode, since any display can output a stereoscopic rendering that can then be consumed by an increasingly varied number of approaches. You could simply render to a 24” monitor, stereoscopically, mount a largish moving box to it to blank out light, have a sufficiently long flume attached to the user head with some corrective lenses and that is just as much a VR experience as say Google Cardboard.

If your browser is set up to use a peripheral it moves your entire browsing experience there using our 2D billboard approach. When it finds opportunities to use the VR space for 3D content it asks the user for permissions to assault their senses and then goes into that mode. Perhaps for browser experiences it does this automatically, but for page experiences it does so with user consent.

If the browser is not set up for a peripheral but rather an integrated device then it begins rendering its entire visual set stereoscopically.

Link navigation works great. You go from Tabs (3D) to Facebook (2D quad properly positioned for optimal viewing) to a WebGL site with VR capabilities (2D initially with a pop out to 3D, then back to 2D as you navigate away) and then you ask to pop out of VR mode and away you go.

You can start to see inspirations for additional VR experiences. Let’s complete an examination of Browser built-in experiences.

Browser VR Media Experiences


There is a lot of contention on what VR experiences are going to be the most popular. Right now, media is king. It delivers the highest quality immersive 3D experiences because it relies on the realism provided by, well, real film. It breaks down when only due to focal point issues where the mind expects to refocus. This is overcome by amazing directing that forces your eye to follow the focal points intended by the director. But let’s assume that 3D video and 3D images are going to be very important.

The problem here is that there are no standards for 3D video and images. Further, there may not be a great standard to evolve for some time. Further, most 3D media starts out as much less than 3D and is then composed into 3D. You can, for instance, take a pano and some pictures and stitch them into a great 3D media experience. However, there is no media format that does that for you that a browser can load. For that you are going to need actual WebVR. I’ll define this experience now, but will talk about it later when we get more into the WebVR applicable arenas. I will say that I think Carmack’s concept of VR script, converted to be based on a subset of JavaScript with a minimal API might be an even better than WebVR solution.

VR Experience #3 – Non-3D Media Composed into 3D Presentation

But this leads us to some other, more realistic experiences. Just as a video can have a full-screen button, if the browser can detect the presence of 3D video, then we should also be able to go VR mode on just that video. This integrates 2D to 3D very well. Some people have expressed further interest in having the 3D video break out and display in front of the page. This is so much more challenging. 3D experiences are designed to be immersive, not be an island of 3D within your focused FoV.

VR Experience #4 – 3D Videos get a VR Mode Button

We don’t have to stop at videos. If we can also get a basic 3D image format, we could also do 3D imaging. There will be more than one type of this sort of image though. First, you’ll have a basic stereoscopic quad that is presented more on a 2D billboard. Second you’ll have more immersive images taken with new cameras that capture fisheye views of the world.

I think that for the first experience we could get away with rendering a left/right version of the image in-line with the page. I haven’t done this and so I don’t know if the user experience is good, but I could imagine it not being terrible.

VR Experience #5 – Stereoscopic Inline Image Rendering

For the second experience we probably want to again take over the peripheral vision and break out of the billboard in front of the user’s face. The user can rotate and tilt their head and be fully immersed in the scene.

VR Experience #6 – Immersive Fisheye Image Rendering

Notice how so far none of the media experiences we’ve described, barring #3, require any APIs in the web page yet. They do require, potentially, some image formats. Maybe some new attributes, etc… If that is the case, then those APIs should be defined by some sort of standards specification. Most likely a VR extensions to HTML 5 (whereas I’d argue WebVR, is more like WebGL, and is thus its own specification not directly tied to the HTML 5 spec).

All of the above experiences are either accomplished by the VR mode of the browser, inline or through a VR breakout button. Just like a very large set of full-screen experiences that you enjoy today are browser provided with no additional work having to be done by the page author, I argue that many VR experiences will fall into the same. That will make VR much more consistent and approachable for end users and developers alike. Because next we are about to talk about the hard experiences, and yes, VR is hard.

Author Provided VR Experiences


You have users running their browser in VR mode and now you want to provide some VR light-up experiences. As an author, the browser will be in 1 of many states, but the most likely state is this, “The user does NOT HAVE any form of VR.” The next most likely option is, “The user has a Phone but cannot use, want or does not have VR.”

There are also some web truths. The number of WebGL experiences is very, very low on the web today. At least relative to the number of sessions, navigations, pages, etc… The number of WebGL experiences that would look good in VR is lower than that still. Finally, the added cost of providing WebVR like experiences will be a barrier of entry to many.

My goal, is not to make VR a gimmick, but to make it an institution of the web itself. My goal is that WebVR provides an experience that drives VR adoption rather than another reason for naysayers to claim that VR is not yet ready for the consumer.

For the author to be able to provide a WebVR experience, I think they need to be able to detect first a VR capable browser and then second, that the WebVR APIs needed to submit WebGL content to a device actually exist. Currently the WebVR specification focuses on methods for enumerating devices. So we can build in page experiences, but there isn’t a browser that has a VR mode that itself supports a VR experience. My prior experiences, especially VR Experience #1, are pre-requisites.

Proposal #4 – Browsers Commit to VR Experiences before VR APIs

To the extent that I can influence this, I find this to be one of the more important points of this article. With this in place an author will be able to detect that the browser has its own VR experience and whether or not it is enabled.

VR Experience #7 – Page Authors can Detect VR Capable Browsers

That doesn’t get us very far. Maybe the author would try to use our other built-in experiences, but they probably want more. This is where WebVR starts for me. If you haven’t read the WebVR spec you should. You can also read my previous article talking about its limitations in its current form (which is being actively worked on, so I suspect to have a much more positive article covering the specification in the future).

A TL;DR version of current capabilities:
  1. Enumerate and Retrieve VR Devices
  2. VR Devices are outputs (HMDs) or inputs (Position Sensors)
  3. Retrieve Stereoscopic Rendering Properties (FoV, eye positions, etc...)
  4. Commit Visuals to the Display
However, do we really need this information if we are in a Gear VR or Google Cardboard scenario? Mostly not. So we need some lighter weight APIs for the more common scenarios. Since Gear VR and Google Cardboard are the most prominent, it seems targeting them first would be a good idea.

The experience here is that they want to use the entire screen of the device. They want to full screen a Canvas element that is displaying some WebGL content which is being stereoscopically rendered. They can effectively do this today, but they do it at the expense of the user. Since the user doesn’t have a Browser controllable way of being in or out of VR mode, the author can simply display some stereoscopic, full-screen content and destroy the end user’s experience. Further, the browser has no proper way to overlay their own protected UX.

VR Experience #8 – Cooperative Full-Screen Stereoscopic Rendering for Integrated VR Devices

I think the solution here is really, really easy. So easy that we can overthink it. I’d prefer a solution for these devices that doesn’t require me to enumerate devices (unless I want the position sensor for Gear VR for instance) and doesn’t have me requesting unnecessary rendering properties (unless we want to provide some defaults supplied by the user).

Proposal #5 – Build Simple, Separate APIs for Integrated VR Device Scenarios

Once we get way from Gear VR and we start getting into Oculus my simple experiences break down, no longer working. You are probably wondering why an Oculus is so much different from say a Gear VR. They are both VR experiences, so why can’t they use the same solutions? The answer is simple, the approaches to VR are way different as are their target audiences and experiences.

Gear VR is great at 3D movies and simple 3D experiences, but since it uses your phone, and only the power of your phone, the complexity of the scenes can’t be at the same level as an Oculus. Refresh rates tend to suffer on phones (many phones ran at 50hz rather than 60hz just to save a bit more on battery) and the Oculus is going to ship with 90hz and hopefully get up to the 120hz range in the next year or two. You can start to see how it would be hard to keep up with the Oculus if you only had the power of a phone at your disposal.

The Oculus is a high end consumer device that is focused on achieving very high frame rates using powerful desktop class computing and GPU hardware. People were surprised at the amount of power it needs in fact and many, I’m sure, decided not to pre-order when they found their machines weren’t up to the challenge ;-) The Oculus is trying to show you the best of VR with 90hz+ high quality rendering, a ton of SDK features to help developers achieve this quality (time warping and view bypass) and it presents itself as a separate device so it doesn’t have the limitations of say your current display.

For this experience, we don’t know what the final API will look like. Oculus will continue getting better and changing their own APIs. These will in turn inform the APIs needed in WebVR to be able to control all of the things that can be configured in the Oculus. Since VR is so new it is hard to find even a couple of things that are similar to all devices. Contrast this with WebGL where it is based on OpenGL and the three primary GPU providers (Intel, nVidia and ATI) are all willing to build things that plug into the standard. We don’t often have many device specific differences to contend with in WebGL to provide a unified experience. Matching a API to WebVR will not be easy.

We do have some basic ideas though. So I’ll build the experiences off of that.

VR Experience #9 – WebGL Presentation to a Dedicated VR Device at Native Refresh Rates

With integrated devices we output to the screen. With dedicated VR devices we have to ship some textures to the device after the fact. For this reason, we need an API which takes this into account. You could just ship over the texture from a Canvas, but that turns out to be non-ideal. You probably want to ship several textures. Also, you want to know that the device is dedicated and whether or not it supports other features such as time warping and deformation.

It may also be that the device needs separate textures for the HUD texture versus the 3D scene texture.

I believe that WebGL provides the primitives for this already, so the goal is to make sure that we can combine these primitives with those of the WebVR device in a way that makes sense. We also need to make sure that the dedicated device can have appropriate features turned on an off so that the author can do the bits they want to do. For instance, turning off time warp and deformation may allow the author to experiment with other deformation methods on their own texture prior to committing the scene to the device.

Since dedicated devices have their own refresh rate, it isn’t sufficient to use the 60hz requestAnimationFrame cadence to render your scene. You’ll have to rely on the device to fill in at least 30 of your 90 frames every second using time warp or by simply repeating frames. At 60hz people can get simulation sickness and so 60hz isn’t the greatest target. We should make sure the API can support the native refresh rate of the device.

Proposal #6 – Build Device Oriented APIs for Dedicated Devices into WebVR

Proposal 5 and 6 are where I hope we spend most of our time when coming up with WebVR. We could spend our time figuring out many more complex scenarios, but the technology is just too young for that. We could also spend time trying to standardize the Browser experiences, but that isn’t in our interest either. There is still a ton of work in the author provided experiences though and it is where the most complicated APIs will exist.

Conclusions


It turns out there are a lot of VR experiences to focus on. One might think that this is a simple space with only a single solution, but the reality seems to be far more nuanced. While these are my opinions, I did think about them for quite some time (since I started really thinking about VR in 2014). I focused mostly on the user experiences and what my expectations are. As a browser vendor, I get an opportunity to maybe impact the space more than others. But I will also be one of the very early adopters browsing the web from my awesome with early builds and implementations of these features.

I think the balance between what we leave for the browser vendors to innovate on and what ends up the API under the control of the author is an important one for us to deliver. If we can deliver a strong default browsing experience for VR then that will be compounded by good VR content created by authors. If VR turns out to be a dumping ground of bad experiences and the browser leads the way, then we are likely to see slow uptake and high dissatisfaction from our users.

I’ve written this because I find this is more than a conversation. I can maybe fit a couple of these experiences or concepts into a half or hour long meeting, but never all of them. This is my foundation. I hope that it can help you build your own foundational understanding of the subject matter as well. I hope that this starts some conversations, some arguments or maybe even a company or two. As always you can catch me on Twitter @JustRogDigiTec or leave comments.

I’m going to treat this like an open article as well. If there are glaring mistakes or corrections I’ll probably supply edits over time. I’m sure I’m not covering all experiences and I’m sure 6 months from now there will be some other presentation/device that doesn’t quite fit into the two existing categories I’ve defined. I also intend to add some illustrations, but I’m terrible at them so they look great in my head and terrible on paper. Once I’ve corrected that I’ll upload them ;-)

Tuesday, December 22, 2015

A 2014 WebVR Challenge and Review

I almost can't believe it was nearly 2 years ago when I started to think about WebVR as a serious and quite new presentation platform for the web. The WebGL implementation in Internet Explorer 11 was state of the art and you could do some pretty amazing things with it. While there were still a few minor holes, customers were generally happy and the performance was great.

A good friend, Reza Nourai, had his DK1 at the time and was experimenting with a bunch of experimental optimizations. He was working on the DX12 team and so you can imagine he knew his stuff and could find performance problems both in the way we built games/apps for VR and in the way the hardware serviced all of the requests. In fact, it wasn't long after our Hackathon that he got a job at Oculus and gained his own bit of power over the direction that VR was heading ;-) For the short time where I had access to the GPU mad scientist we decided if Microsoft was going to give us two days for the first every OneHack then we'd play around with developing an implementation of this spec we kept hearing about, WebVR and attaching it to the DK1 we had access to.

This became our challenge. Over the course of 2 days, implement the critical parts of the WebVR specification (that was my job, I'm the OM expert for the IE team), get libVR running in our build environment and finally attach the business end of a WebGLRenderingContext to the device itself so we could hopefully spin some cubes around. The TLDR is that we both succeeded and failed. We spent more time redesigning the API and crafting it into something useful than simply trying to put something on the screen. So in the end, we wound up with an implementation that blew up the debug runtime and rendered a blue texture. This eventually fixed itself with a new version of the libVR, but that was many months later. We never did track down why we hit this snag, nor was it important. We had already done what we set out to do, integrate and build an API set so we had something to play around with. Something to find all of the holes and issues and things we wanted to make better. Its from this end point that many ideas and understandings were had and I hope to share these with you now.

Getting Devices

Finding an initializing devices or hooking up a protected process to a device is not an easy task. You might have to ask the user's permission (blocking) or any number of other things. At the time the WebVR implementation did not have a concept for how long this would take and they did not return a Promise or have any other asynchronous completion object (callback for instance) that would let you continue to run your application code and respond to user input, such as trying to navigate away. This is just a bad design. The browser needs APIs that get out of your way as quickly as possible when long running tasks could be serviced on a background thread.

We redesigned this portion and passed in a callback. We resolved the callback in the requestAnimationFrame queue and gave you access at this point to a VRDevice. Obviously this wasn't the best approach, but our feedback, had we had the foresight at the time to approach the WebVR group, would have been, "Make this a Promise or Callback". At the time a Promise was not a browser intrinsic so we probably would have ended up using a callback, and then later, moving to a Promise instead. I'm very happy to find the current specification does make this a Promise.

This still comes with trade-offs. Promises are micro-tasks and serviced aggressively. Do you really want to service this request aggressively or wait until idle time or some other time. You can always post your own idle task once you get the callback to process later. The returned value is a sequence and so it is fixed and unchanging.

The next trade-off comes when the user has a lot of VR devices. Do you init them all? Our API would let you get back device descriptors and then you could acquire them. This had two advantages. First, we could cache the previous state and return it more quickly without having to acquire the devices themselves. Second, you could get information about the devices without having to pay seconds of cost or have to immediately ask the user for permission. You might say, what is a lot? Well, imagine that I have 6 or 7 positional devices that I use along with 2 HMDs. And lets not forget positional audio which is completely missing from current specifications.

The APIs we build for this first step will likely be some of the most important we build for the entire experience. Right now the APIs cater towards the interested developer who has everything connected, is actively trying to build something with the API and is willing to work round poor user experience. Future APIs and experiences will have to be seamless and allow normal users easy access to our virtual worlds.

Using and Exposing Device Information

Having played with the concept of using the hardware device ID to tie multiple devices together, I find the arrangement very similar to how we got devices. While an enterprising developer can make sure that their environment is set up properly, we can't assert the same for the common user. At current, we should probably assume though that the way to tie devices together is sufficient. That an average user would only have one set of hardware. But then, if that is the case, why would we separate the positional tracking from the HMD itself? We are, after all, mostly tracking the position of the HMD itself in 3D space. For this reason, we didn't implement a positional VR device at all. We simply provided the positional information directly from the HMD through a series of properties.

Let's think about how the physical devices then map to existing web object model. For the HMD we definitely need some concept of WebVR and the ability to get a device which comprises of a rendering target and some positional/tracking information. This is all a single device, so having a single device expose the information makes the API much simpler to understand from a developer perspective.

What about those wicked hand controllers? We didn't have any, but we did have some gamepads. The Gamepad API is much more natural for this purpose. Of course it needs a position/orientation on the pad so that you can determine where it is. This is an easy addition that we hadn't made. It will also need a reset so you can zero out the values and set a zero position. Current VR input hardware tends to need this constantly, if for no other reason than user psychology.

Since we also didn't have WebAudio and positional audio in the device at the time we couldn't really have come up with a good solution then. Exposing a set of endpoints from the audio output device is likely the best way to make this work. Assuming that you can play audio out of the PC speakers directly is likely to fail miserably. While you could achieve some 3D sound you aren't really taking advantage of the speakers in the HMD itself. More than likely you'll want to send music and ambient audio to the PC speakers and you'll want to send positional audio information, like gunshots, etc... to the HMD speakers. WebAudio, fortunately, allows us to construct and bind our audio graphs however we want making this quite achievable with existing specifications.

The rest of WebVR was "okay" for our needs for accessing device state. We didn't see the purpose of defining new types for things like rects and points. For instance DOMPoint is overkill (and I believe it was named something different before it took on the current name). There is nothing of value in defining getter/setter pairs for a type which could just be a dictionary (a generic JavaScript object). Further, it bakes in a concept like x, y, z, w natively that shouldn't be there at all and seems only to make adoption more difficult. To be fair to the linked specification it seems to agree that other options, based solely on static methods and dictionary types is possible.

Rendering Fast Enough

The VR sweet spot for fast enough is around 120fps. You can achieve results that don't induce simulation sickness (or virtual reality sickness) at lesser FPS but you really can't miss a single frame and you have to have a very fast and responsive IMU (this is the unit that tracks your head movement). What we found when using canvas and window.requestAnimationFrame is that we could even get 60fps let alone more. The reason is the browser tends to lock to the monitor refresh rate. At the time, we also had 1 frame to commit the scene and one more frame the compose the final desktop output. That added 2 more frames of latency. That latency will dramatically impact the simulation quality.

But we could do better. First, we could disable browser frame commits and take over the device entirely. By limiting the browser frames we could get faster intermediate outputs from the canvas. We also had to either layer or flip the canvas itself (technical details) and so we chose to layer it since flipping an overly large full-screen window was a waste of pixels. We didn't need that many, so we could get away with a much smaller canvas entirely.

We found we didn't want a browser requestAnimationFrame at all. That entailed giving up time to the browser's layout engine, it entailed sharing the composed surface and in the end it meant sharing the refresh rate. We knew that devices were going to get faster. We knew 75fps was on the way and that 90 or 105 or 120 was just a year or so away. Obviously browsers were going to crush FPS to the lowest number possible for achieving performance while balancing the time to do layout and rendering. 60fps is almost "too fast" for the web and most pages only run at 1-3 "changed" frames per second while browsers do tricks behind the scenes to make other things like user interactivity and scrolling appear to run at a much faster rate.

We decided to add requestAnimationFrom to the VRDevice instead. Now we gained a bunch of capabilities, but they aren't all obvious so I'll point them out. First, we now hit the native speeds of the devices and we sync to the device v-sync and we don't wait for the layout and rendering sub-systems of the browser to complete. We give you the full v-sync if we can. This is huge. Second, we are unbound from the monitor refresh rate and backing browser refresh rate, so if we want to run at 120fps while the browser does 60fps, we can. An unexpected win is that we could move the VRDevice off-thread entirely and into a web worker. So long as the web worker had all of the functionality we needed, or it existed on the device or other WebVR interfaces we could do it off-thread now! You might point out that WebGL wasn't available off-thread generally in browsers, but to be honest, that wasn't a blocker for us. We could get a minimal amount of rendering available in the web worker. We began experimenting here, but never completed the transition. It would have been a couple of weeks, rather than days, of work to make the entire rendering stack work in the web worker thread at the time.

So we found, that as long as you treat VR like another presentation mechanism with unique capabilities, then you can achieve fast and fluid frame commits. You have to really break the browser's normal model to get here though. For this reason I felt that WebVR in general should focus more on how to drive content to this alternate device using a device specific API rather than on piggybacking existing browser features such as canvas, full-screen APIs, and just treating the HMD like another monitor.

Improving Rendering Quality

When we were focusing on our hack, WebVR had some really poor rendering. I'm not even sure we had proper time warp available to us in the SDK we chose and certainly the DK1 we had was rendering at a low frame rate, low resolution, etc... But even with this low resolution you still really had to think about the rendering quality. You still wanted to render a larger than necessary texture to take advantage of deformation. You still wanted to get a high quality deformation mesh that matched the users optics profile. And you still wanted to hit the 60fps framerate. Initial WebGL implementations with naive programming practices did not make this easy. Fortunately, we weren't naive and we owned the entire stack so when we hit a stall we knew exactly where and how to work around it. That makes this next section very interesting, because we were able to achieve a lot without the restrictions of building against a black box system.

The default WebVR concept at the time was to take a canvas element, size it to the full size of the screen and then full screen it onto the HMD. In this mode the HMD is visualizing the browser window. With the original Oculus SDK you even had to go move the window into an area of the virtual desktop that showed up on the HMD. This was definitely an easy way to get something working. You simply needed to render into a window, and move that onto the HMD desktop and disable all of the basic features like deformation etc... (doing them yourself) to get things going. But this wasn't the state of the art, even at that time. So we went a step further.

We started by hooking our graphics stack directly into the Oculus SDK's initialization. This allowed for us to set up all of the appropriate swap chains and render targets, while also giving us the ability to turn on and off Oculus specific features. We chose to use the Oculus deformation meshes for instances rather than our own since it offloaded 1 more phase out of our graphics pipeline that could be done on another thread in the background without us having to pay the cost.

That got us pretty far, but we still had a concept of using a canvas in order to get the WebGLRenderingContext back. We then told the device about this canvas and it effectively swapped over to "VR" mode. Again, this was way different than the existing frameworks that relied on using the final textures from the canvas to present to the HMD. This extra step seemed unnecessary so we got rid of it and had the device give us back the WebGLRenderingContext instead. This made a LOT of sense. This also allowed the later movement off to the web worker thread ;-) So we killed two birds with one stone. We admitted that the HMD itself was a device with all of the associated graphics context, we gave it its own rendering context, textures and a bunch of other state and simply decoupled that from the browser itself. At this point you could almost render headless (no monitor) directly to the HMD. This is not easy to debug on the screen though, but fortunately Oculus had a readback texture that would give you back the final image presented to the HMD, so we could use that texture and make it available, on demand, off of the device so we only paid the perf cost if you requested it.

At the time, this was the best we could do. We were basically using WebGL to render, but we were using it in a way that made us look a lot more like an Oculus demo, leaning heavily on the SDK. The rendering quality was as good as we could get at the time, without us going into software level tweaks. I'll talk about some of those ideas (in later posts), which have now been implemented I believe by Oculus demos and some industry teams, so they won't be anything new, but can give you a better idea of why the WebVR API has to allow for innovation and can't simply be a minimal extension of existing DOM APIs and concepts if it wants to be successful.

Improvements Since the Hackathon

We were doing this for a Microsoft Hackathon and our target was the Internet Explorer 11 browser. You might notice IE and later Microsoft Edge doesn't have any support for WebVR. This is both due to the infancy of the technology, but also due to their not being a truly compelling API. Providing easy access to VR for the masses sounds great, but VR requires a very high fidelity rendering capability and great performance if we want users to adopt it. I've seen many times where users will try VR for the first time, break out the sick bag, and not go back. Even if the devices are good enough, if the APIs are not good then it will hold back the adoption rates for our mainstream users. While great for developers, WebVR simply doesn't set, IMO, the web up for great VR experiences. This is a place where we just have to do better, a lot better and fortunately we can.

The concept of the HMD as its own rendering device seems pretty critical to me. Also, making it have its own event loop and making it available on a web worker thread also go a long way to helping the overall experience and achieving 120fps rendering sometime in the next two years. But we can go even further. We do want, for instance, to be able to render both 3D content and 2D content in the same frame. A HUD is a perfect example. We want the devices to compose, where possible, these things together. We want to use time warp when we can't hit the 120fps boundaries so that there is a frame that the user can see that has been moved and shifted. Let's examine how a pre-deformed, pre-composed HUD system, would look using our existing WebVR interfaces today if we turned on time warp?

We can use something like Babylon.js or Three.js for this and we can turn on their default WebVR presentation modes. By doing so, we get a canonical deformation applied for us when we render the scene. We overlay the HUD using HTML 5, probably by layering it over top of the canvas. The browser, then snapshots this and presents it to the HMD. The HUD itself is now "stuck" and occluding critical pixels from the 3D scene that would be nice to have. If you turned on time warp you'd smear the pixels in weird ways and it just wouldn't look as good as if you had submitted the two textures separately.

Let's redo this simulation using the WebGLRenderingContext on the device itself and having it get a little bit more knowledge about the various textures involved. We can instead render the 3D scene in full fidelity and commit that directly to the device. It now has all of the pixels. Further, it is NOT deformed, so the device is going to do that for us. Maybe the user has a custom deformation mesh that helps correct an optical abnormality for them, we'll use that instead of a stock choice. Next we tell the device its base element for the HUD. The browser properly layers this HUD and commits that as a separate texture to the device. The underlying SDK is now capable of time warping this content for multiple frames until we are ready to commit another update and this can respond to the user as they move their head in real-time with the highest fidelity.

You might say, but if you can render at 120fps then you are good right? No, not really. That still means up to 8ms of latency between the IMU reading and your rendering. The device can compensate for this by time warping with a much smaller latency by sampling the IMU when it goes to compose the final scene in the hardware. Also, since we decomposed the HTML overlay into its own texture, we can also billboard that into the final scene, partially transparent, or however we might want to show it. The 3D scene can shine through or we can even see new pixels from "around" the HUD since we didn't break them away.

Conclusion

Since our hack, the devices have changed dramatically. They are offering services, either in software or in the hardware that couldn't have been predicted. Treating WebVR like a do it all shop and then splash onto a flat screen, seems like its not going to be able to take advantage of the features in the hardware itself. An API that instead gets more information from the device and allows the device to advertise features that can be turned on and off might end up being a better approach. We move from an API that is device agnostic, to one that embraces the devices themselves. No matter what we build, compatibility with older devices and having something "just work" on a bunch of VR devices is likely not going to happen. There is simply too much innovation happening and the API really has to allow for this innovation without getting in the way.

Our current WebVR specifications are pretty much the same now as they were when we did our hack in 2014. Its been almost 2 years now and the biggest improvement I've seen is the usage of the Promise capability. I don't know what advances a specification such as WebVR, but I'm betting the commercial kit from Oculus coming out in 2016 will do the trick. With a real device gain broad adoption there will likely be a bigger push to get something into all of the major browsers.