Sunday, January 24, 2016

Browser VR Experiences – Let’s not VRML!

Virtual Reality is a hot topic right now and so there is a huge interest for APIs and standards related to it across a broad range of programming languages and platforms. One of the most interesting platforms right now, due to its pervasiveness is the web platform or Browser. Every major desktop and mobile platform now has access to a great Browser that is performant, highly interoperable with thousands of APIs and some general capabilities for composing the web platform into existing applications. The large set of developers who program for this platform are always itching for new and relevant technologies to be able to differentiate themselves and create rich new experience. This means, the race is on. How long before the web platform has a set of standards and specifications for helping guide the evolution of VR. Whatever is produced will be the software equivalent of the Oculus Rift and Gear VR. It’ll put in the hands of all developers and consumers, VR capability, as soon as the devices hit the market leading to pervasive VR.

But it’s not something you want to rush. Let’s not VRML! To be clear VRML (spec) was never really VR, but rather standard 3D projection onto a 2D plane we all experience today every time we play games. VRML, in other words, was way ahead of its time in more ways than one. As a technology it sort of replaced the existing HTML experience with a 3D experience through markup that more closely resembled SVG. It had concepts like link navigation that were necessary to allow switching views either within or between different scenes. It was like another top level document and so you could even switch back and forth between HTML and VRML quite easily. Since the experience wasn’t true VR, this wasn’t terrible. It wasn’t jarring for the user as they moved between 2D and 3D content.

Now let’s fast forward. The HTML world we live in has evolved immensely. By itself, HTML can render 3D content using WebGL. Since this maps to our current state of the art for game development it is a much better experience than say a limited markup language. It is more complex for sure, but various libraries intend to make it more approachable with Unity, three.js, Babylon.js and A-Frame being some really strong contenders. We also have true VR devices now and that means stereoscopic rendering and projection to the user. We’ve added rich media in the form of Audio and Video tags. The world is much more complicated and so any standard or specification has to now navigate existing Browser UI, user experience, the existence of new APIs and understand how developers want to code to the platform.

What I hope to present in the rest of the article is mostly a set of experiences that I see that exist. What are some of the challenges we face both in user, interface and API design to make those approachable? What is the line between Browser and Web Page and how do we bridge that gap in a way that enables developers AND users to have great experiences? Finally, how do we integrate the future 3D web without losing the power we already have established in the existing 2D web? These questions keep me up at night (literally, I was up last night until 3AM because I haven’t written this article yet ;-) and so I hope in the coming weeks and months to put as many of them to rest as possible.

Presentation Experiences for Browsers


Browsers currently have to deal with a number of different presentation formats. We then provide information about the current presentation (screen size, orientation, etc…) to the page using a number of technologies so that the content within can adapt. The overall experience is a combination of the Browser UX or Browser Chrome (not to be confused with Chrome the browser) and the content.

Screens and Phones


Recently trends have gone towards the browser being very light on the Chrome with thin edges, no status bars and minimal title areas. This is mainly due to the fact that user’s want the majority of the browser’s space to be dedicated to the content they are viewing. They still want critical UX elements like tabs since that has become a de-facto standard on how to organize your browsing experience. Finally over time the average number of tabs a user is comfortable managing has increased. That can give you some basic design criterion for building a good Browser UX.

For the screens on which we present the trends have also changed. The average page visible area when I started working on IE back in 2005 was 800x600. It grew to 1024x768. Right now I imagine it to be more fragmented and I haven’t looked in a long time. During that golden age the device pixel to CSS pixel was 1:1. This meant that every pixel you put on the screen was rendered as a physical pixel as well. With High DPI screens this is no longer true. Phones now have 4k screens, but couldn’t possibly render 4k worth of CSS pixels. Instead we simply use more than one screen pixel to represent the same CSS pixel but the effective screen real-estate is pretty much the same (actually less, since the number of CSS pixels on a phone screen might only be 400x800).

For desktop the average screen size is increasing as is the number of effective CSS pixels you can take advantage of. Very recently desktops have been balancing increased screen size with high DPI. A 28” 4K display is an example. 96DPI 4K is available at around 42” which is starting to get too big for usable desktop (neck strain, head tilt, etc…). For phones the number of device pixels (DPI) is increasing, screen size is approximately the same as are effective CSS pixels. With new casting technologies your phone could also be your desktop so you have to be prepared for that. I recommend reading the MSDN article on Windows 10 Continuum since it shows the API shape for how developers might deal with these types of devices.

Virtual Reality Presentations and Devices


I’ll start with the existing types of VR devices that we have. There are two basic forms. The first form is the integrated screen and device. This is demonstrated by a Gear VR or Google Cardboard. The user experience of these devices is kind of odd because you have to navigate content designed with and without VR in mind. The second form is the attached peripheral. This is the Oculus Rift and many other emerging contenders. Instead of using the same screen as the device for displaying 2D content, they have a secondary head mounted display (HMD)

Integrated Devices – GearVR and Google Cardboard


Let’s think through a common workflow. You are in your mail app, not 3D, so you are currently not in stereoscopic mode. If you put the device into your VR headset at this time, your brain would be super confused. Each eye would see a different half of the screen and it couldn’t do anything to combine this information into a visual. Some people would get sick, others would feel uncomfortable and probably 1 in half a billion people would be able to independently read each screen (Gecko People!).

Instead you click on a link and go to your web browser. All of a sudden, your screen shows two deformed images side by side. They look similar, but slightly different. Again, your mind can’t really do much since the deformation is too extreme for the image to look much like the original. It’s like looking at the world through coke bottles. This workflow is proving to be very confusing and uncomfortable.

You do put the device into a headset though and you are immersed in brilliant 3D. You spend the average 30 seconds on the web page and then you navigate back to your mail app and yet again you are confronted with sickness. You eventually realize that limitations of your devices and manage to work around these transitions, but you are still surprised EVERY time some content throws you into VR mode and you wish for a better experience.

Peripheral Devices – Oculus Rift, Morpheus, Vive


Again, we can think through a workflow here. You spend most of your day staring at your screen, navigating through your 2D applications, quite productive and happy. You launch a mail link and you are confronted with a blank page. It turns out this page has detected your Oculus and decided to use it. You probably notice your CPU fan noise just increased to the level of a Boeing 747 jet engine.

You quickly don your Rift and now the experience is obvious. You don’t quite like this application so you decide to exit. Again, you are now confronted with the Oculus shell and not any form of familiar Browser UX. You take off the Oculus and decide to get back to real work.

Alternately you could have clicked on some form of navigation primitive in the virtual space. Your desktop browser is going crazy navigating links, but you don’t see this occurring. Clearly the experience between the peripheral and the page wasn’t thought through very clearly. You wonder to yourself if this is the fault of the browser, the web page author or even the VR device itself.

Integrating Devices into the Web Experience


Adapting for Virtual Reality experiences requires that we understand the previously detailed experiences. That is what an experience might look like today if we implement the wrong set of APIs or no APIs at all. We certainly need to take into account multiple type devices and most importantly the different types of presentation they are capable of.

The browser, up to now, has only had to deal with rendering to a 2D surface, a single version of our image. No matter how complicated the end user’s setup was, this was abstracted from the Browser. They could be spanning us across multiple displays, or on a curved display or even changing the aspect ratio. To the Browser this doesn’t matter since it doesn’t change the geometry of our outputs.

But why does the geometry change? Why does a change in geometry matter? To answer these questions first look at your standard 2D browsing experience. In the 3D world in which you live, if replicated using existing 3D technologies, your computer monitor would be a quad. 2 triangles, oriented in space to form a square. Your camera would be each of your eyes and your eyes would be focused at a very specific location on the quad, the area of focus. While you are working the quad remains in a very narrow portion of your FoV. While you can see things to your left, your right, you can probably see your feet even, the monitor is not positioned in those locations.

Changing the position of the monitor, increasing its size, placing it nearer or further away all become uncomfortable operations to you, the user. Why? Well, the content is optimized for its default presentation. That of being visible in a square, placed in the center of our FoV. The geometry, in 3D space is that of a projected quad, at some depth, which makes the text and images the right size for us to comfortably read everything. I’m going to try and coin terms for each of these default experiences.

VR Experience #1 – Standard Web Browser, 2D Content Billboard Projection

Basically the same experience you get on your laptop should be the same experience you get on a VR device. This default experience will be fairly comfortable. First, you won’t be rotating your head up/down or left/right to try and navigate content. The same navigation primitives you use today will suffice. This will make the overall experience very accessible as well since it will limit the number of new things a user has to master before finding the space usable.

The Browser UX would probably take on the form of a phone browser. It would be lightweight and get out of your way, focusing mostly on the content. The content developers will be very happy. They spent millions on designing their experience and you’ve basically maintained that experience well.

Proposal #1 – The basic VR experience is NOT a standard and is not part of WebVR

This experience is how the browser allows the user to stay inside of the headset. It has nothing to do with APIs available to the page. It’s about the entire browser experience from soup to nuts and is therefore something to be tweaked by each vendor.

If you are not familiar with “Theater Mode”, then press F11 in your browser right now! Did you see how this provided a browser based full-screen experience? This is separate from the full-screen APIs. For the same reason that theater mode is a browser provided experience I believe that WebVR (the API set) should not intrude here. A browser, should be able to provide a VR experience for browsing that is independent of the content.

VR Experience #2 – Web Browser Features in 3D Space

My next experience is about the web browser having non-billboard 3D content. This is where the experience wants to utilize 3D space rather than simply maintain the status quo of a 2D screen in front of your eyes while the headset is on.

The goal here would be to achieve minority report like user interfaces for things like tabs, favorites, history and other content such as the recent news rivers present in Microsoft Edge and other browsers. This experience now only makes sense if the user is ready to be in a VR mode. They should show some intent for this experience because as we’ve seen before, they may need to prepare their device. They may need to either put on their peripheral HMD or they may need to drop their phone into their HMD harness. Either way, they are likely now deciding that they don’t want to continuously jump between a non-3D and a 3D experience. This is where our VR Experience #1 shines as well. When the browser does navigate to 2D content it has to do an amazing job of showing it in 3D space.

Proposal #2 – A Browser should have a VR Immersive Mode

Proposal #3 – A VR Immersive Mode is again the realm of the Browser not Standards APIs

With the browser in immersive mode, you can now display your tabs page as a 3D sphere of tiles and the user is likely to be happy. They can use head rotations to do navigate around as well as other inputs such as Gamepad, Mouse or Keyboard (or even more obscure capabilities like hand gestures).

Still this experience is completely outside of the API set necessary for a standards based API. We should only intrude where necessary is always one of my goals in standards. We find more and more reasons to intrude as time goes on, but that’s fine.

Theoretical Work-flows for VR Experiences #1 and #2


We’ve just turned basic VR into a button. When a browser detects that it can provide a VR like experience it shows intent in the UI. Further, you can always go into a VR mode, since any display can output a stereoscopic rendering that can then be consumed by an increasingly varied number of approaches. You could simply render to a 24” monitor, stereoscopically, mount a largish moving box to it to blank out light, have a sufficiently long flume attached to the user head with some corrective lenses and that is just as much a VR experience as say Google Cardboard.

If your browser is set up to use a peripheral it moves your entire browsing experience there using our 2D billboard approach. When it finds opportunities to use the VR space for 3D content it asks the user for permissions to assault their senses and then goes into that mode. Perhaps for browser experiences it does this automatically, but for page experiences it does so with user consent.

If the browser is not set up for a peripheral but rather an integrated device then it begins rendering its entire visual set stereoscopically.

Link navigation works great. You go from Tabs (3D) to Facebook (2D quad properly positioned for optimal viewing) to a WebGL site with VR capabilities (2D initially with a pop out to 3D, then back to 2D as you navigate away) and then you ask to pop out of VR mode and away you go.

You can start to see inspirations for additional VR experiences. Let’s complete an examination of Browser built-in experiences.

Browser VR Media Experiences


There is a lot of contention on what VR experiences are going to be the most popular. Right now, media is king. It delivers the highest quality immersive 3D experiences because it relies on the realism provided by, well, real film. It breaks down when only due to focal point issues where the mind expects to refocus. This is overcome by amazing directing that forces your eye to follow the focal points intended by the director. But let’s assume that 3D video and 3D images are going to be very important.

The problem here is that there are no standards for 3D video and images. Further, there may not be a great standard to evolve for some time. Further, most 3D media starts out as much less than 3D and is then composed into 3D. You can, for instance, take a pano and some pictures and stitch them into a great 3D media experience. However, there is no media format that does that for you that a browser can load. For that you are going to need actual WebVR. I’ll define this experience now, but will talk about it later when we get more into the WebVR applicable arenas. I will say that I think Carmack’s concept of VR script, converted to be based on a subset of JavaScript with a minimal API might be an even better than WebVR solution.

VR Experience #3 – Non-3D Media Composed into 3D Presentation

But this leads us to some other, more realistic experiences. Just as a video can have a full-screen button, if the browser can detect the presence of 3D video, then we should also be able to go VR mode on just that video. This integrates 2D to 3D very well. Some people have expressed further interest in having the 3D video break out and display in front of the page. This is so much more challenging. 3D experiences are designed to be immersive, not be an island of 3D within your focused FoV.

VR Experience #4 – 3D Videos get a VR Mode Button

We don’t have to stop at videos. If we can also get a basic 3D image format, we could also do 3D imaging. There will be more than one type of this sort of image though. First, you’ll have a basic stereoscopic quad that is presented more on a 2D billboard. Second you’ll have more immersive images taken with new cameras that capture fisheye views of the world.

I think that for the first experience we could get away with rendering a left/right version of the image in-line with the page. I haven’t done this and so I don’t know if the user experience is good, but I could imagine it not being terrible.

VR Experience #5 – Stereoscopic Inline Image Rendering

For the second experience we probably want to again take over the peripheral vision and break out of the billboard in front of the user’s face. The user can rotate and tilt their head and be fully immersed in the scene.

VR Experience #6 – Immersive Fisheye Image Rendering

Notice how so far none of the media experiences we’ve described, barring #3, require any APIs in the web page yet. They do require, potentially, some image formats. Maybe some new attributes, etc… If that is the case, then those APIs should be defined by some sort of standards specification. Most likely a VR extensions to HTML 5 (whereas I’d argue WebVR, is more like WebGL, and is thus its own specification not directly tied to the HTML 5 spec).

All of the above experiences are either accomplished by the VR mode of the browser, inline or through a VR breakout button. Just like a very large set of full-screen experiences that you enjoy today are browser provided with no additional work having to be done by the page author, I argue that many VR experiences will fall into the same. That will make VR much more consistent and approachable for end users and developers alike. Because next we are about to talk about the hard experiences, and yes, VR is hard.

Author Provided VR Experiences


You have users running their browser in VR mode and now you want to provide some VR light-up experiences. As an author, the browser will be in 1 of many states, but the most likely state is this, “The user does NOT HAVE any form of VR.” The next most likely option is, “The user has a Phone but cannot use, want or does not have VR.”

There are also some web truths. The number of WebGL experiences is very, very low on the web today. At least relative to the number of sessions, navigations, pages, etc… The number of WebGL experiences that would look good in VR is lower than that still. Finally, the added cost of providing WebVR like experiences will be a barrier of entry to many.

My goal, is not to make VR a gimmick, but to make it an institution of the web itself. My goal is that WebVR provides an experience that drives VR adoption rather than another reason for naysayers to claim that VR is not yet ready for the consumer.

For the author to be able to provide a WebVR experience, I think they need to be able to detect first a VR capable browser and then second, that the WebVR APIs needed to submit WebGL content to a device actually exist. Currently the WebVR specification focuses on methods for enumerating devices. So we can build in page experiences, but there isn’t a browser that has a VR mode that itself supports a VR experience. My prior experiences, especially VR Experience #1, are pre-requisites.

Proposal #4 – Browsers Commit to VR Experiences before VR APIs

To the extent that I can influence this, I find this to be one of the more important points of this article. With this in place an author will be able to detect that the browser has its own VR experience and whether or not it is enabled.

VR Experience #7 – Page Authors can Detect VR Capable Browsers

That doesn’t get us very far. Maybe the author would try to use our other built-in experiences, but they probably want more. This is where WebVR starts for me. If you haven’t read the WebVR spec you should. You can also read my previous article talking about its limitations in its current form (which is being actively worked on, so I suspect to have a much more positive article covering the specification in the future).

A TL;DR version of current capabilities:
  1. Enumerate and Retrieve VR Devices
  2. VR Devices are outputs (HMDs) or inputs (Position Sensors)
  3. Retrieve Stereoscopic Rendering Properties (FoV, eye positions, etc...)
  4. Commit Visuals to the Display
However, do we really need this information if we are in a Gear VR or Google Cardboard scenario? Mostly not. So we need some lighter weight APIs for the more common scenarios. Since Gear VR and Google Cardboard are the most prominent, it seems targeting them first would be a good idea.

The experience here is that they want to use the entire screen of the device. They want to full screen a Canvas element that is displaying some WebGL content which is being stereoscopically rendered. They can effectively do this today, but they do it at the expense of the user. Since the user doesn’t have a Browser controllable way of being in or out of VR mode, the author can simply display some stereoscopic, full-screen content and destroy the end user’s experience. Further, the browser has no proper way to overlay their own protected UX.

VR Experience #8 – Cooperative Full-Screen Stereoscopic Rendering for Integrated VR Devices

I think the solution here is really, really easy. So easy that we can overthink it. I’d prefer a solution for these devices that doesn’t require me to enumerate devices (unless I want the position sensor for Gear VR for instance) and doesn’t have me requesting unnecessary rendering properties (unless we want to provide some defaults supplied by the user).

Proposal #5 – Build Simple, Separate APIs for Integrated VR Device Scenarios

Once we get way from Gear VR and we start getting into Oculus my simple experiences break down, no longer working. You are probably wondering why an Oculus is so much different from say a Gear VR. They are both VR experiences, so why can’t they use the same solutions? The answer is simple, the approaches to VR are way different as are their target audiences and experiences.

Gear VR is great at 3D movies and simple 3D experiences, but since it uses your phone, and only the power of your phone, the complexity of the scenes can’t be at the same level as an Oculus. Refresh rates tend to suffer on phones (many phones ran at 50hz rather than 60hz just to save a bit more on battery) and the Oculus is going to ship with 90hz and hopefully get up to the 120hz range in the next year or two. You can start to see how it would be hard to keep up with the Oculus if you only had the power of a phone at your disposal.

The Oculus is a high end consumer device that is focused on achieving very high frame rates using powerful desktop class computing and GPU hardware. People were surprised at the amount of power it needs in fact and many, I’m sure, decided not to pre-order when they found their machines weren’t up to the challenge ;-) The Oculus is trying to show you the best of VR with 90hz+ high quality rendering, a ton of SDK features to help developers achieve this quality (time warping and view bypass) and it presents itself as a separate device so it doesn’t have the limitations of say your current display.

For this experience, we don’t know what the final API will look like. Oculus will continue getting better and changing their own APIs. These will in turn inform the APIs needed in WebVR to be able to control all of the things that can be configured in the Oculus. Since VR is so new it is hard to find even a couple of things that are similar to all devices. Contrast this with WebGL where it is based on OpenGL and the three primary GPU providers (Intel, nVidia and ATI) are all willing to build things that plug into the standard. We don’t often have many device specific differences to contend with in WebGL to provide a unified experience. Matching a API to WebVR will not be easy.

We do have some basic ideas though. So I’ll build the experiences off of that.

VR Experience #9 – WebGL Presentation to a Dedicated VR Device at Native Refresh Rates

With integrated devices we output to the screen. With dedicated VR devices we have to ship some textures to the device after the fact. For this reason, we need an API which takes this into account. You could just ship over the texture from a Canvas, but that turns out to be non-ideal. You probably want to ship several textures. Also, you want to know that the device is dedicated and whether or not it supports other features such as time warping and deformation.

It may also be that the device needs separate textures for the HUD texture versus the 3D scene texture.

I believe that WebGL provides the primitives for this already, so the goal is to make sure that we can combine these primitives with those of the WebVR device in a way that makes sense. We also need to make sure that the dedicated device can have appropriate features turned on an off so that the author can do the bits they want to do. For instance, turning off time warp and deformation may allow the author to experiment with other deformation methods on their own texture prior to committing the scene to the device.

Since dedicated devices have their own refresh rate, it isn’t sufficient to use the 60hz requestAnimationFrame cadence to render your scene. You’ll have to rely on the device to fill in at least 30 of your 90 frames every second using time warp or by simply repeating frames. At 60hz people can get simulation sickness and so 60hz isn’t the greatest target. We should make sure the API can support the native refresh rate of the device.

Proposal #6 – Build Device Oriented APIs for Dedicated Devices into WebVR

Proposal 5 and 6 are where I hope we spend most of our time when coming up with WebVR. We could spend our time figuring out many more complex scenarios, but the technology is just too young for that. We could also spend time trying to standardize the Browser experiences, but that isn’t in our interest either. There is still a ton of work in the author provided experiences though and it is where the most complicated APIs will exist.

Conclusions


It turns out there are a lot of VR experiences to focus on. One might think that this is a simple space with only a single solution, but the reality seems to be far more nuanced. While these are my opinions, I did think about them for quite some time (since I started really thinking about VR in 2014). I focused mostly on the user experiences and what my expectations are. As a browser vendor, I get an opportunity to maybe impact the space more than others. But I will also be one of the very early adopters browsing the web from my awesome with early builds and implementations of these features.

I think the balance between what we leave for the browser vendors to innovate on and what ends up the API under the control of the author is an important one for us to deliver. If we can deliver a strong default browsing experience for VR then that will be compounded by good VR content created by authors. If VR turns out to be a dumping ground of bad experiences and the browser leads the way, then we are likely to see slow uptake and high dissatisfaction from our users.

I’ve written this because I find this is more than a conversation. I can maybe fit a couple of these experiences or concepts into a half or hour long meeting, but never all of them. This is my foundation. I hope that it can help you build your own foundational understanding of the subject matter as well. I hope that this starts some conversations, some arguments or maybe even a company or two. As always you can catch me on Twitter @JustRogDigiTec or leave comments.

I’m going to treat this like an open article as well. If there are glaring mistakes or corrections I’ll probably supply edits over time. I’m sure I’m not covering all experiences and I’m sure 6 months from now there will be some other presentation/device that doesn’t quite fit into the two existing categories I’ve defined. I also intend to add some illustrations, but I’m terrible at them so they look great in my head and terrible on paper. Once I’ve corrected that I’ll upload them ;-)

No comments:

Post a Comment