Turbo-charging JavaScript - Trace Trees and V8

By Kurt Cagle
September 21, 2008 | Comments: 3

JavaScript, like the late Rodney Dangerfield, just gets no respect. While at any given point there are far more JavaScript programs running than C, C++, Java, PHP and Perl applications combined, JavaScript as a language has generally considered been a second class option for writing "real" applications due in great part to the fact that the language is interpreted rather than compiled, meaning that a JavaScript program will run an order of magnitude or more slower than an equivalent program in a language such as C++.

However, recently a number of advances in the way that JavaScript engines interpret (and compile) code look to herald a major increase in the speed and performance of JavaScript libraries, so much so that within the next year web applications may begin to approach (and in some cases may even beat) traditionally compiled applications on these terms.

Two different approaches are emerging in this effort. The first, proposed in a paper called Incremental Dynamic Code Generation with Trace Trees by Andreas Gal and Michael Franz of the University of California, Irvine, establishes how trace trees can be used to locate and compile looped content within an interpreted language as the program is running, in essence compiling and optimizing the program in real time. This algorithm (with some variations to handle edge cases not initally covered by Gal and Franz) is now being used within the new Trace Monkey application as part of Mozilla's Tamarin, now being incorporated into the current Mozilla 3.1 builds.

The second approach, developed by the V8 team for Google's new Chrome browser, differs in that it performs a compilation process prior to execution, then optimizes the compiled code with hooks for evaluations and imported code. There are some indications that it may in fact utilize a variation of the Trace Trees algorithm as well.

book cover book cover book cover
For a complete list of all things JavaScript,
visit javascript.oreilly.com

In both cases, however, the results even now are fairly stunning. For instance, when TraceMonkey enabled code was tested against the Sun Spider uBench suite, it ran 22.5 times faster than the equivalent pre Trace-Tree spider monkey code. Image manipulation code ran 6.5 times faster, matrix manipulation 6.3 times. These aren't percentage changes compared to existing code - this code is easily an order of magnitude faster across the board. Google's V8 code is testing at similar levels (indeed, the irony here is that the speed improvements for both systems are now within a few percentage of one another).


Just In Time Compilation

While the specifics vary in the implementations, both sets of code work on some fairly radical assumptions about the way that people actually use JavaScript. One of the most critical of these usages is that even when type is not declared, programmers are generally good about insuring that the implicit type of a given variable does not change over the course of execution.

Put another way, in most cases, if a programmer initially assigns a string to a variable, it is unlikely, though not completely unheard of, for that programmer to reassign the variable as a number or date. Even with numbers, it's generally fairly easy to determine whether or not a variable is in fact an integer or a floating point number, and to use that information to set the variable to that type explicitly within compiled code - until some evidence comes along to disprove that assertion.

Many of the fundamental precepts of compiler design are rooted in philosophies that emerged at a time when computers themselves were considerably slower, and hence less much less sophisticated. They used a very deterministic model where the worst case scenarios about type and performance were the ones coded to, unless the programmer specifically established optimization hints (such as explicitly declaring type). The idea that you can do performance enhancements in optimizing code by using stochastic methods - looking at the most likely scenarios rather than the worst case ones - were in general considered to be expensive and inefficient, as it required that the interpreter itself run optimizations on the fly (with the associated performance hits and memory overhead).

Recently, however, this attitude has been changing. Emerging "interpreters" are now keeping track of how often a given set of instructions is called or how often the implicit type of a variable gets changed, and when it becomes evident that a given pattern is in fact emerging, it will then attempt to optimize the sequence in question or data assignment being given as compiled code.

The sequences being watched in the Mozilla case are what are called "trace trees", where a tree in this case indicates a directional graph (or block of commands) within a loop. This works fine if the sequence in question is just one set of instructions after another, but if a conditional expression (an if-then-else statement) for instance occurs and both true and false conditions for that expression are given, the trace tree is essentially rebuilt with the new assumptions in place, and the trace engine goes back to watching the process. Similarly, if a variable does get reassigned to a different data type on the fly, the engine rebuilds the appropriate state trace with the assumption being that the variable in question is of a more generalized type.

You might expect that this process of tearing down and rebuilding trace trees would prove hideously slow and have a major impact upon performance, but as it turns out this is where the stochastic nature of programming comes into play. Such events are comparative outliers - a trace tree compiler may have to perform such actions nine or ten times over the course of running the script, but each time it can take advantage of the existing optimizations to make the process faster and more streamlined.

The trace tree approach works especially well with loops within loops, in essence by optimizing upon the inner loop while simultaneously holding enough state in the outer loop to switch to it as an optimization path if necessary. This technique is also used in order to better support recursive calls, which can be computationally expensive.

The trace tree engine also keeps track of potentially overlong or complex trees, and when such a condition emerges, the engine "unwinds" the tree at that stage and rebuilds it with the current state conditions. This keeps the trace trees manageable in memory, at the cost of a fairly small performance hit. Finally, the technique is also able to identify those particular programmatic constructs that just shouldn't be optimized, and keeps those blocks separate - they run at the interpreted speed, not the compiled speed.

The approach taken by Google Chrome's V8 compiler differs in a number of key ways. While TraceMonkey is a JIT JavaScript compiler to an intermediate virtual machine, V8 is a pre-process compiler that turns JavaScript directly into native machine code, and as such is currently only optimized for Intel and ARM chips. It should be pointed out here, however, that a browser tab instance in Chrome is itself a virtual machine that runs in a separate process from other tab VMs, so this distinction is not as clear cut as it may appear on the surface.

For all these differences, both V8 and TraceMonkey, along with the WebKit Squirrelfish JavaScript VM, which was the starting point for V8 and which will likely benefit from V8 technology moving forward, also have revolutionized their approaches to garbage collection. Garbage collection is one of the biggest headaches of JavaScript development, in great part because it is fairly difficult using current techniques to completely eliminate incomplete closures - where a block of code declares a variable that then, for one reason or another, remains in scope even as a web page that held the variable gets changed - this usually arises due to event handlers that are not properly terminated.

Most of the next generation of compilers are far better at being able to track and trace such closure variables, and during garbage collection are usually much better at identifying these variables when it is time to clean up the allocated memory. Not only does this mean that the applications themselves take up somewhat less memory (and are less likely to crash due to a memory overrun) but it also insures that it is easier to create larger contiguous memory blocks, reducing the amount of overhead necessary to process data held within these blocks.


Building Web Super Apps

There has long been a debate in web development circles between those who feel that browser applications will end up becoming the dominant application platform and those who argue that browsers lack the speed, performance and underlying capabilities to handle this role, especially for "editing" related applications (especially graphics, video and audio) and games. It is likely that the dramatic improvements opening up with JavaScript will challenge the latter view hard.

One of the first applications likely to benefit from such compiled Javascript are browsers themselves. The Mozilla Firefox browser is well known as being a JavaScript application itself, combining XUL components with JavaScript glue code. However, the reality of this development process is that much of the core architecture for Firefox is written using C++, while JavaScript bound components are typically relegated either to establishing user interface bindings or are part of the add-on system. The comparative slow performance of add-ons (as well as their frequently hideous memory profiles) is one of the big reasons why Firefox running with more than perhaps a dozen addons can slow to a crawl in a very short time.

With JIT JavaScript, however, add-on developers get performance at levels that are going to be at par with (or perhaps even better than) native C++ code, will get much better managed garbage collection, and will be able to do so in a language that is, in general much easier for a novice developer to pick up than C++ can be.

Yet this same philosophy is also being pushed within the Mozilla development process itself. It is very likely, according to an interview I had with Brendan Eich back in June 2008, that Firefox 3.1 will be the proving bed for a JIT JavaScript architecture, and Firefox 4.0 in turn will likely be rebuilt from the ground up just using JavaScript, with C++ relegated only to very critical core components.

One upshot of this is that there should be between a four and sixfold increase in the graphics API speed, both for raster and vector operations. While this likely won't end up putting Firefox at a point where it can replace Adobe Photoshop for sheer processing power, it is very likely to give Adobe Flash a run for its money (though to make matters even more interesting, Adobe is also looking at incorporating many of the same algorithms into their ActionScript engine).

That it has obvious implications for Scalable Vector Graphics (SVG) and the Canvas component goes without saying, but its also likely that the graphical improvements deriving from compiled JavaScript make interfacing into graphical chips' 3D libraries (through a wrapper layer around OpenGL or Direct3D) much more feasible as well. It's worth noting that one of the biggest limitations in getting the SVG animation layer built has been the sluggishness of the JavaScript implementations, rather than the performance of the Cairo graphics layer per se (which has in general been quite impressive), with improvements being made here, its far more likely that this final piece of SVG can be completed.

Another arena where advanced JavaScript capabilities will make a big difference is in the speed of DOM manipulation. AJAX components are all predicated upon the speed of JavaScript, and especially when you get pages with large numbers of such components interacting with one another performance can prove truly frustrating. This is especially noticeable with components such as the XForms addon extension, which varies in speed from marginally acceptable to hideously slow depending upon the complexity of the model and the number of components that are bound to that model. RDF manipulation similarly suffers from poor performance issues that likely will see considerable improvement with the newer JavaScript model (as will most things involving XML manupulation).

Finally, all of these things should be seen in the context of other innovations going on in all of these browsers, especially the incorporation of offline databases such as Google Gears and Firefox's SQLLite database. This makes it possible to persist information between sessions more effectively than can be handled with cookies and makes it possible to offload potentially large data structures safely.

You can play with the new TraceMonkey capabilities in the nightly Mozilla builds for Firefox 3.1. Because the trace trees engine is still a little buggy, it is currently disabled, but you can enable it once you install Firefox 3.1 by typing about:config in the navigator bar, then setting javascript.options.jit.content to true. You can try out Google Chrome here.

Persistence, performance, rich APIs and increasing broadband connectivity are all likely to make a huge difference for this latest generation of browsers, and the quantum improvement of JavaScript capabilities due to Trace Trees and precompiled JavaScript will likely play a major part in that evolution.


Kurt Cagle is Online Editor for O'Reilly Media. He lives in Victoria, British Columbia.


You might also be interested in:

3 Comments

JavaScript is a scripting language used to develop & design webpages by using html tags.it is just in time compiled language.easy to use.
====================================================
WilliamStallings

For Sale By Owner

More JavaScript programs running at a given time than all of those combined? WTF

Was your Kernel, Window Manager, Print Driver or Browser written in Javascript (maybe Firefox would be an exception but for every ounce of JS there is a pound of C++)

Currently my system has around -175- processes running and all of a handful of tabs open, yes a real comparison, certainly 5 is more than 175.

Thank you for that last comment -- well stated, and you saved me some typing. :)

News Topics

Recommended for You

Got a Question?