Oops got totally carried away. This will crunch and obfuscate a javascript. It has
been tested on one 25k javascript and did not break it but.... that's not what you
call intensive QA. If anyone wants to test it and offer some feedback - ie broke script
or didn't break script that would be useful. This probably has some real utility
as it condensed a script I use a lot from 25k to 8k - that's a way faster download. I was
amazed that that comments/whitespace/function and var names make up fully 2/3 of the code.
Note that we list the change in function names so you can
modify the HTML as required. You could automate this pretty easily
with HTML::Parser.
Update 1
Modified original code to deal with quoted strings appropriately and added a hint of documentation
Update 2
Modified to put each function on its own line as intended.
#!/usr/bin/perl -w
use strict;
my ( $data, %funcs, %globals );
open JS, $ARGV[0] or die "Useage $0 <file>\nCan't open '$ARGV[0]': $!\
+n";
local $/;
$data = <JS>;
close JS;
my $was = length $data;
$data =~ s|//.*\n|\n|g; # strip single line comments
$data =~ s|/\*.*?\*/||gs; # strip multi-line comments
$data =~ s|^\s*\n||gm; # strip blank lines
$data =~ s|^\s+||gm; # strip leading whitespace
$data =~ s|\s+$||gm; # strip trailing whitespace
# do the magic and split into functions
my @functions = split /(?=\bfunction\s)/, $data;
# map function names to new names starting with 'a'
my $name = 'a';
for (@functions) {
$funcs{$1} = $name++ if m/^function\s+(\w+)/;
}
# modify all the function names
my $funcs = join '|', keys %funcs;
my $func_sub = qr/\b($funcs)\b/;
s|$func_sub|$funcs{$1}|g for @functions;
# modify all the global vars
# as we have split on function keyword these should
# be in the first element of our function array
unless ($functions[0] =~ /^function/) {
my @globals = $functions[0] =~ m/var\s+(\w+)/g;
if (@globals) {
$globals{$_} = $name++ for @globals;
my $globals = join '|', keys %globals;
my $global_sub = qr/\b($globals)\b/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$global_sub|$globals{$1}|g;
}
$func = join '', @chunks;
}
}
}
# modify all the scoped vars continuing var names on from func/global
+names
my $end_globals = $name;
for my $func (@functions) {
next unless $func =~ m/^function/;
my ( @locals, %locals );
$name = $end_globals; # each function can use the same local name
+s
@locals = $func =~ m/var\s+(\w+)/g;
my ($local) = $func =~ m/function\s+\w+\s*\(([^\)]+)/;
if ($local) {
$local =~ s/\n|\s//g;
push @locals, split ',', $local;
}
for my $var (@locals) {
next unless $var;
$locals{$var} = $name++;
}
next unless keys %locals;
my $locals = join '|', keys %locals;
my $local_sub = qr/\b($locals)\b/;
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s|$local_sub|$locals{$1}|g;
}
$func = join '', @chunks;
}
# do some initial condensation around curlies
for (@functions) {
s/\n{/{/gm;
s/\n}/}/gm;
}
# now every exposed line ending should end in a ; { or } if we are to
+safely
# condense this down by removing newlines - we add the ; if are missin
+g
for my $func (@functions) {
my @lines = split "\n", $func;
for (@lines) {
$_.= ";" unless m/(?:}|{|;)$/;
}
$func = join '', @lines;
$func .= "\n";
}
# remove whitespace around all operators
my @operators = qw# + - * / = == != < > <= >= ( ) [ ] { } ? ; : #;
push @operators, ','; # need to do it this way to avoid warnings
$_ = quotemeta $_ for @operators;
my $operator_sub = join '|', @operators;
$operator_sub = qr/($operator_sub)/;
for my $func (@functions) {
my @chunks = chunk($func);
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s#[ \t]+$operator_sub#$1#g;
s#$operator_sub[ \t]+#$1#g;
}
$func = join '', @chunks;
}
# ta da time to print out the results
# first display a list of modified fuction names.
# Note: any function called in the html will have to be modified accor
+dingly!
print "New function names are:\nwas called\t=>\tis now called\n";
print "$_()\t=>\t$funcs{$_}()\n" for keys %funcs;
print "\n";
# print out the modified code
print @functions;
# a few stats just for the hell of it
print "\n\nLength change:\n";
my $is = length join '', @functions;
printf "Originally %d bytes now %d bytes or %2d%% of original size\n",
+ $was, $is, ($is/$was)*100;
exit;
# this sub splits a function into quoted and unquoted chunks
sub chunk {
my $func = shift;
my @chunks;
my $chunk = 0;
my $found_quote = '';
for (split //, $func) {
# look for opening quote
if (/'|"/ and ! $found_quote) {
$found_quote = $_;
$chunk++;
$chunks[$chunk] = $_;
next;
}
# look for coresponding closing quote
if ( $found_quote and /$found_quote/ ) {
$found_quote = '';
$chunks[$chunk] .= $_;
$chunk++;
next;
}
# no quotes so just add to current chunk
$chunks[$chunk] .= $_;
}
# strip whitespace from unquoted chunks;
for (@chunks) {
next if m/^(?:"|')/; # leave quoted strings alone
s/^[ \t]+|[ \t]+$//g;
}
return @chunks;
}
=head1 NAME
javastrip.pl - a Perl script to obfuscate and condense javascript code
Varsion 0.0000001
=head1 SYNOPSIS
javastrip.pl <file>
where file is the raw javascript only, not HTML
output is to STDOUT so send it wherever you want with a > redirect:
javastrip.pl infile.js > outfile.js
make a backup of original file first. keep backup. process is irreve
+rsible.
=head1 DESCRIPTION
This script is primarily designed to munge .js files. It will procees
+any
pure javascript but was not designed to process javascript embedded in
+
HTML.
It processes a javascript in several stages. The first stage is to rem
+ove
all comments, blank lines and leading/trailing whitespace. This is a f
+airly
safe thing to do and should not break scripts.
The next stage is rather more dangerous. All the fuctions are renamed.
+ The
first function found will be renamed a() the next b() and so on. All g
+lobal
vars are similarly renamed to single (if not all used up) letter names
+ that
follow in sequence from the function names. Finally all local function
+ vars
are renamed starting with the letter immediately after the last global
+.
The net result is that all the functions and variables will now have 1
+-2
letter meaningless names. There is plenty of scope for disaster here b
+ut I
do not have enough javascript on hand to detect any. The algorithm wor
+ks OK
on my style of javascripting.
The final stage is to condense the script down. Each function is writt
+en to
a single line. All excess whitespace around operators is stripped out.
+ This
is a fairly safe stage too. Literal newlines in strings will be stripp
+ed if
you are using them. "\n" is just fine.
If it breaks a script you can comment out different sections to see wh
+ich
process is to blame.
=head2 BUGS
Bound to be. This script was knocked up over a couple of hours and has
+ had
minimal testing. The algorithm works OK on my style of javascripting b
+ut my
javascript looks a lot like Perl. Email me any scripts that break when
+ you
javastrip them and I'll see if I can patch it.
=head1 AUTHOR
tachyon aka Dr James Freeman E<lt>jfreeman@tassie.net.auE<gt>
=head1 LICENSE
This package is free software; you can redistribute it and/or modify i
+t under
the terms of the "GNU General Public License".
=head1 DISCLAIMER
This script is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the "GNU General Public License" for more details.
=cut
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.