#+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier
#+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier
#+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript
#+STYLE:
#+AUTHOR: Mihai Bazon
#+EMAIL: mihai.bazon@gmail.com
- UglifyJS --- a JavaScript parser/compressor/beautifier
This package implements a general-purpose JavaScript
parser/compressor/beautifier toolkit. It is developed on [[http://nodejs.org/][NodeJS]]. With
minimal changes it should work on any JavaScript platform (what is
Node-specific is usage of =exports= and =JSON.stringify=, although the
latter is quite portable now).
The tokenizer/parser generates an abstract syntax tree from JS code. You
can then traverse the AST to learn more about the code, or do various
manipulations on it. This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a
port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn
Haverbeke]].
( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of
UglifyJS. )
The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and
manipulates the AST generated by the parser to provide the following:
-
ability to re-generate JavaScript code from the AST. Optionally
indented---you can use this if you want to “beautify” a program that has
been compressed, so that you can inspect the source. But you can also run
our code generator to print out an AST without any whitespace, so you
achieve compression as well.
-
shorten variable names (usually to single characters). Our mangler will
analyze the code and generate proper variable names, depending on scope
and usage, and is smart enough to deal with globals defined elsewhere, or
with =eval()= calls or =with{}= statements. In short, if =eval()= or
=with{}= are used in some scope, then all variables in that scope and any
variables in the parent scopes will remain unmangled, and any references
to such variables remain unmangled as well.
-
various small optimizations that may lead to faster code but certainly
lead to smaller code. Where possible, we do the following:
-
foo["bar"] ==> foo.bar
-
remove block brackets ={}=
-
join consecutive var declarations:
var a = 10; var b = 20; ==> var a=10,b=20;
-
resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the
replacement if the result occupies less bytes; for example 1/3 would
translate to 0.333333333333, so in this case we don't replace it.
-
consecutive statements in blocks are merged into a sequence; in many
cases, this leaves blocks with a single statement, so then we can remove
the block brackets.
-
various optimizations for IF statements:
- if (foo) bar(); else baz(); ==> foo?bar():baz();
- if (!foo) bar(); else baz(); ==> foo?baz():bar();
- if (foo) bar(); ==> foo&&bar();
- if (!foo) bar(); ==> foo||bar();
- if (foo) return bar(); else return baz(); ==> return foo?bar():baz();
- if (foo) return bar(); else something(); ==> {if(foo)return bar();something()}
-
remove some unreachable code and warn about it (code that follows a
=return=, =throw=, =break= or =continue= statement, except
function/variable declarations).
** Usage
There is a helper script now --- =bin/uglifyjs= --- that uses the library to
compress a script using the maximum compression settings. Synopsis:
#+BEGIN_SRC sh
uglifyjs [ options... ] [ filename ]
#+END_SRC
=filename= should be the last argument and should name the file from which
to read the JavaScript code. If you don't specify it, it will read code
from STDIN.
Supported options:
-
=-b= or =--beautify= --- output indented code; when passed, additional
options control the beautifier:
-
=-i N= or =--indent N= --- indentation level (number of spaces)
-
=-q= or =--quote-keys= --- quote keys in literal objects (by default,
only keys that cannot be identifier names will be quotes).
-
=-nm= or =--no-mangle= --- don't mangle variable names
-
=-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various
optimizations that result in smaller, less readable code).
-
=-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too
(by default we don't do this).
-
=--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass
=--no-squeeze=) it will reduce consecutive statements in blocks into a
sequence. For example, "a = 10; b = 20; foo();" will be written as
"a=10,b=20,foo();". In various occasions, this allows us to discard the
block brackets (since the block becomes a single statement). This is ON
by default because it seems safe and saves a few hundred bytes on some
libs that I tested it on, but pass =--no-seqs= to disable it.
-
=--no-dead-code= --- by default, UglifyJS will remove code that is
obviously unreachable (code that follows a =return=, =throw=, =break= or
=continue= statement and is not a function/variable declaration). Pass
this option to disable this optimization.
-
=-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial
comment tokens in the generated code (assumed to be copyright information
etc.). If you pass this it will discard it.
-
=-o filename= or =--output filename= --- put the result in =filename=. If
this isn't given, the result goes to standard output (or see next one).
-
=--overwrite= --- if the code is read from a file (not from STDIN) and you
pass =--overwrite= then the output will be written in the same file.
-
=--ast= --- pass this if you want to get the Abstract Syntax Tree instead
of JavaScript as output. Useful for debugging or learning more about the
internals.
-
=-v= or =--verbose= --- output some notes on STDERR (for now just how long
each operation takes).
-
=--extra= --- enable additional optimizations that have not yet been
extensively tested. These might, or might not, break your code. If you
find a bug using this option, please report a test case.
*** API
Symlink the lib directory as ~/.node_libraries/uglifyjs, so that the
require calls in the following sample will work:
#+BEGIN_SRC espresso
var jsp = require("uglifyjs/parse-js");
var pro = require("uglifyjs/process");
var orig_code = "... JS code here";
var ast = jsp.parse(orig_code); // parse code and get the initial AST
ast = pro.ast_mangle(ast); // get a new AST with mangled names
ast = pro.ast_squeeze(ast); // get an AST with compression optimizations
var final_code = pro.gen_code(ast); // compressed code here
#+END_SRC
The above performs the full compression that is possible right now. As you
can see, there are a sequence of steps which you can apply. For example if
you want compressed output but for some reason you don't want to mangle
variable names, you would simply skip the line that calls
=pro.ast_mangle(ast)=.
Some of these functions take optional arguments. Here's a description:
-
=jsp.parse(code, strict_semicolons)= -- parses JS code and returns an AST.
=strict_semicolons= is optional and defaults to =false=. If you pass
=true= then the parser will throw an error when it expects a semicolon and
it doesn't find it. For most JS code you don't want that, but it's useful
if you want to strictly sanitize your code.
-
=pro.ast_mangle(ast, do_toplevel)= -- generates a new AST containing mangled
(compressed) variable and function names. By default it doesn't touch the
names defined in the toplevel scope, but if you pass =true= as second
argument it will compress them as well.
-
=pro.ast_squeeze(ast, options)= -- employs further optimizations designed
to reduce the size of the code that =gen_code= would generate from the
AST. Returns a new AST. =options= can be a hash; the supported options
are:
-
=pro.gen_code(ast, beautify)= -- generates JS code from the AST. By
default it's minified, but if you pass =true= for the second argument it
will be nicely formatted and indented. Additionally, you can control the
behavior by passing a hash for =beautify=, where the following options are
supported (below you can see the default values):
- =indent_start: 0= -- initial indentation in spaces
- =indent_level: 4= -- indentation level, in spaces (pass an even number)
- =quote_keys: false= -- if you pass =true= it will quote all keys in
literal objects
*** Beautifier shortcoming -- no more comments
The beautifier can be used as a general purpose indentation tool. It's
useful when you want to make a minified file readable. One limitation,
though, is that it discards all comments, so you don't really want to use it
to reformat your code, unless you don't have, or don't care about, comments.
In fact it's not the beautifier who discards comments --- they are dumped at
the parsing stage, when we build the initial AST. Comments don't really
make sense in the AST, and while we could add nodes for them, it would be
inconvenient because we'd have to add special rules to ignore them at all
the processing stages.
** Compression -- how good is it?
(XXX: this is somewhat outdated. On the jQuery source code we beat Closure
by 168 bytes (560 after gzip) and by many seconds.)
There are a few popular JS minifiers nowadays -- the two most well known
being the GoogleClosure (GCL) compiler and the YUI compressor. For some
reason they are both written in Java. I didn't really hope to beat any of
them, but finally I did -- UglifyJS compresses better than the YUI
compressor, and safer than GoogleClosure.
I tested it on two big libraries. [[http://www.dynarchlib.com/][DynarchLIB]] is my own, and it's big enough
to contain probably all the JavaScript tricks known to mankind. [[http://jquery.com/][jQuery]] is
definitely the most popular JavaScript library (to some people, it's a
synonym to JavaScript itself).
I cannot swear that there are no bugs in the generated codes, but they
appear to work fine.
Compression results:
| Library | Orig. size | UglifyJS | YUI | GCL |
|------------+------------+----------+----------------+------------------------|
| DynarchLIB | 636896 | 241441 | 246452 (+5011) | 240439 (-1002) (buggy) |
| jQuery | 163855 | 72006 | 79702 (+7696) | 71858 (-148) |
UglifyJS is the fastest to run. On my laptop UglifyJS takes 1.35s for
DynarchLIB, while YUI takes 2.7s and GCL takes 6.5s.
GoogleClosure does a lot of smart ass optimizations. I had to strive really
hard to get close to it. It should be possible to even beat it, but then
again, GCL has a gazillion lines of code and runs terribly slow, so I'm not
sure it worths spending the effort to save a few bytes. Also, GCL doesn't
cope with =eval()= or =with{}= -- it just dumps a warning and proceeds to
mangle names anyway; my DynarchLIB compiled with it is buggy because of
this.
UglifyJS consists of ~1100 lines of code for the tokenizer/parser, and ~1100
lines for the compressor and code generator. That should make it very
maintainable and easily extensible, so I would say it has a good place in
this field and it's bound to become the de-facto standard JS minifier. And
I shall rule the world. :-) Use it, and spread the word!
** Bugs?
Unfortunately, for the time being there is no automated test suite. But I
ran the compressor manually on non-trivial code, and then I tested that the
generated code works as expected. A few hundred times.
DynarchLIB was started in times when there was no good JS minifier.
Therefore I was quite religious about trying to write short code manually,
and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a
= 10 : b = 20”, though the more readable version would clearly be to use
“if/else”.
Since the parser/compressor runs fine on DL and jQuery, I'm quite confident
that it's solid enough for production use. If you can identify any bugs,
I'd love to hear about them ([[http://groups.google.com/group/uglifyjs][use the Google Group]] or email me directly).
[1] I even reported a few bugs and suggested some fixes in the original
[[http://marijn.haverbeke.nl/parse-js/][parse-js]] library, and Marijn pushed fixes literally in minutes.
** Links
** License
UglifyJS is released under a ZLIB-like license:
#+BEGIN_EXAMPLE
Copyright 2010 (c) Mihai Bazon mihai.bazon@gmail.com
Parser based on parse-js (http://marijn.haverbeke.nl/parse-js/).
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any
damages arising from the use of this software.
Permission is granted to anyone to use this software for any
purpose, including commercial applications, and to alter it and
redistribute it freely, subject to the following restrictions:
-
The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
-
Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
-
This notice may not be removed or altered from any source
distribution.
#+END_EXAMPLE