Saturday, November 14, 2015

Improving Web Debugging with insights from the Native Debugger

Modern web browsers provide some very capable debugging tools for JavaScript. The ease of use of many of the features matches the ease of use of the language. Most features are accessible and controllable with intuitive UI. However, for the power user coming from native languages, there are some very useful yet missing features. Given the features of the JavaScript language there is the possibility of making even better hybrid use cases, some of which I would like to explore in this post.

My focus will be on breakpoints. Breakpoints in JavaScript debuggers come from either the debugger statement or by selecting lines of code in your script from the UI. Since JavaScript is complicated in that it can perform many operations in a line of code it has taken a while for debuggers to be able to break at the proper time in the evaluation of a statement. Usually it breaks at the beginning and you can kind of step through to get to the point you want. But if you are evaluating many hits of the same breakpoint this can be tedious. An advanced user might use conditional breakpoints, but depending on when the break happens you might find you are too early or too late to evaluate your condition appropriately. All of this aside, we find our way through the muck and manage to make it work.

In native, breakpoints are about code addresses coming from the symbolic debug information emitted with the program. Your debugger will either allow you to set breakpoints based on a direct address, or you can try to evaluate a symbol instead and obtain an address. Many debugging sessions in C++ start with a script or two to set up a bunch of breakpoints for a set of key functions. This is woefully missing from the current set of JavaScript based debuggers.

Another type of native breakpoint that is very handy is the memory breakpoint. You can set a breakpoint for when a value of code is read from or written to or both. This is also based on addresses and so these breakpoints are per instance. When you have thousands of instances this can be extremely handy.

So that is our list. We want symbolic breakpoints and memory breakpoints. We probably also want an easier way to run "debugger" scripts than typing into the console. Now, how native works and how JavaScript works is quite different so we can improve on some of these facilities as well as we describe them. Each section following will be broken down into a basic description, a set of challenges and a set of solutions/improvements. Not every challenge will have a solution provided unless I'm exceptionally on point today.

Symbolic Breakpoints

We want the ability to, given a name of a function and potentially the name of a script or HTML page (module), set a breakpoint on that function when we enter it. This is the same as in the native debugger where I might use a command like:
bp foobarbaz!MyClass::MyFunction
The native debugger also allows for things like deferred breakpoints (bu) which match modules not yet loaded. We can set multiple breakpoints using a wildcard match and the symbolic breakpoint command (bm). Additionally, with some clever scripting we can choose to debug the call in or the return value by using some debugger commands to step out of my function and consult, generally, the @eax/@rax registers.
bp foobarbaz!MyClass::MyFunction "gu; r @rax;"

Challenges

That is a pretty cool feature list. So what are some of the challenges? Well, JavaScript doesn't have symbols. It has function objects and function objects have code. Many function objects don't even have names. So how do you set breakpoints on those things? Interesting problem, we'll get to that in solutions.

Forgetting that many things aren't named, many things are. Every function has a "name" property and so you could imagine the debugger could set a breakpoint on anything with a name. It could also act like the native debugger if more than one exists and give you some sort of handle that you can set the breakpoint on in the case of say 15 functions with exactly the same name.

Modules are rough as well. For instance, you have multiple scripts in a single HTML file or you have dynamically generated names on the server etc... So how to match modules will be an interesting issue.

Speaking of modules, what about script contexts? Isn't that like multiple processes? Have you ever done multi-process debugging in something like WinDBG or VS? Not fun I can say. You set breakpoints in only the current process and when you break in and out of the debugger you have to constantly figure out your context to know what is going on and whether or not you are setting breakpoints in the right space. With things like iframes kind of acting like processes (or maybe physically being in another process in the case of cross-domain iframe isolation) this problem is basically something that has to be solved.

Solutions

Thankfully most of these have some immediate solutions. I'm going to tackle the symbolic issues first. For this to work you have to treat functions like objects. Functions are themselves breakpoint handles so I should be able to set a breakpoint based on a function object itself and this will be a function instance breakpoint.

I shouldn't stop there though. I also want symbolic names to work and even have deferred breakpoints for functions that don't exist yet. So for that, anytime a function is evaluated I will compare its name against my list of symbol expressions. A function might even change its name (not currently, but I don't see any reason why it shouldn't be able to), so while I only have to evaluate these breakpoints against a given function once, if the name changes, I'll have to invalidate the cache and reevaluate on next execution.

This in turn gives us the equivalent of bp, bu and even bm. You could imagine that bm would take a JavaScript RegExp object or string and so I could write some quite complicated function matching breakpoints. This would be pretty cool. To extend this we allow for each command to run on either the execution or return of the function. And of course, we supply appropriate UI so that on the return you can review the return value in a special variable window in a super easy way. Also, if you want to write conditional breakpoints we'd surface the return value in a special syntax so you can query it. Maybe, you want to only break when function foo returns true, the breakpoint/conditional for that might be...
debugger.breakPointReturn(null, /^foo$/, "@retVal === true");
The solution for modules I really don't have a good answer for. What I think it means is that you want the debugger to tell you about these modules, but then you want to either have positive or negative module matching as part of the breakpoint condition. For this reason I would actually make my breakpoint API take both module and symbol as separate values. This is different from native in that it only takes the full string and uses the ! character to split on. Since JavaScript has different identifier characters, building some like this would be somewhat foolish. Also, with WebAssembly in the future, we probably can't make guesses as to what is and is not valid for identifiers.

Cool, so the last solution we need is how to handle this problem of multiple global scopes. This is where the debugger can shine. I should be able to set breakpoints constrained to a specific global scope, constrained to a specific "module" as defined by the html or script name of the document where the source came from, or even set breakpoints generally across the entire environment. I think that in most cases users would be happy with breaking on anything, anywhere given a specific name.

Earlier we allowed you to set a breakpoint on a given function object. That is highly constrained and doesn't have the multiple global scopes problem so long as you tagged the right function.

Finally we get OM function breakpoints out of this as well. Since the OM is defined as a series of JS functions in all browsers now you can use them, or their symbolic names, to match things. We could even allow the constructor name + member name syntax to be very precise, and this would alleviate problems with minified scripts where the names are encoded up to the point that the function is executed or when function objects are used instead of function names. All of that obfuscation is now "debuggable". That would be super cool.

I'm pretty much freaking out this coffee shop with all of my excitement at this point. I really want these features. Just these breakpoints alone would simplify tons and tons of different debugging experiences that I deal with on a daily basis.

Memory Breakpoints

There is already a useful bit of this in the platform today in Observer.observe. This is some ES 7(?) syntax that allows you to be notified when changes to an object are made. I believe that the callbacks run in micro-task checkpoints, but don't quote me on this I could be completely wrong.

Memory breakpoints are bit more intrusive. They break on read or write depending on how they are configured. They also occur at a time where you have both the old value and new value waiting there so you can choose to "stop" the store or "update" the store if you'd like. This can be use to say tweak out a scenario where you believe a wrong value is breaking your site. Simply memory breakpoint, set the corrected value, and let the page continue on to success.

We probably also want to allow instance based or Constructor based syntax for this. Having all constructed instances of a given type have the breakpoint on automatically would be more useful than having to pick out specific instances. This is something lacking in the native debugger unless you do something really clever with multiple breakpoints.

Thankfully the feature list for this one is shorter than the last, but we still have some key challenges to overcome.

Challenges

Memory breakpoints are going to offer mostly performance challenges, but also some challenges to JIT'ed code. While accessing storage locations can be easily dealt with in the interpreter and even some not so well optimized JIT code, once something becomes a simple mov instruction in the assembly with no other overhead, evaluating the memory breakpoint can be challenging. So then the question becomes, do you have to immediately disable a ton of optimizations the moment this is turned on? I think the answer is yes, which is unfortunate. Nothing like super slowing your code when you are trying to repro a bug.

Beyond this JavaScript has fields and properties. Where properties are a bit harder to manage. They do indicate a "slot" on the object that is accessed, but they may also represent a "slot" in your prototype chain instead. Should memory breakpoints only operate on instance fields or should they extend to property getter/setter pairs. I think since you can set function breakpoints, restricting to fields might be okay. Though this does reduce the effectiveness of the feature.

Finally, the Constructor approach can be challenging. Every Global has its own set of Constructors. So if you set a Constructor memory breakpoint it would only apply to instances in the current global. This is similar to using function objects versus symbolic names. You could overcome this by trying to set a symbolic name memory breakpoint on EVERY instance, but that overhead seems way too larger as well. Ultimately the uses cases would define the requirements for this feature and whether or not the costs on runtime are justified by the debugging efficiency gained by the web developers.

Solutions

We inlined most of the solutions, but lets start again. First, we realize that memory breakpoints would likely disable some of the JIT'ed functional execution. This is generally okay since most engines are set up for this type of fallback already. They can fallback to less optimized code or even back to interpretation if necessary. We'd have some target for execution speed, such as no slower than executing at interpreted speed.

We'd restrict to fields since properties are "functions" and not really fields anyway. A property might also end up setting or retrieving from a field so you could still see that access if you wanted to. This would have an impact on OM breakpoints where properties are the way something like Element.className works, disallowing you from watching that property. You could use OM symbolic breakpoints with conditional logic to overcome this.

I think tagging a constructor so that any object returned as a result of new on that function were automatically breakpoint'ed is just super cool. I don't think there are many challenges there once the general infrastructure is in place. I'm curious if others would find that useful though. For browser built-ins the multiple Globals could probably be overcome as well if that proved to be an issue.

Conclusion

Curious if these enhancements would be useful to web developers or not. This could be a case of me solving my own problems. I often use native debugging in order to quickly figure out website problems and reduce the causes for broken websites. In these cases I don't have access to the original source nor can I easily make changes. But also, its a reflection of how we work in native debugging in general. The tools are so powerful that we can basically rewrite the code in place, even down to the assembler, so putting in logging and other types of solutions then rebuilding to try again is completely unnecessary.

If you have any debugging scenarios or stories where these features would help leave a comment or reach me on Twitter @JustRogDigiTec. I'd love to hear that these facilities might be useful beyond my own use cases.

2 comments:

  1. What are OM function breakpoints? Object Model?

    ReplyDelete
    Replies
    1. Yes. The browser has an object model of approximately 6000 functions (some of which are simple property getters and setters). Since these names can even be aliased it would be useful to put breakpoints directly on them. They are also native code, so you can't simply put debugger statements into your sources for them.

      Delete