http://qs321.pair.com?node_id=742814

anxara has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl script that is given HTML (say in $html) and needs to parse $html into formatting appropriate for a .js (JavaScript) file that will be called via a <script type="text/javascript" src="myfile.js"> tag. My problem is that if $html contains JavaScript code instead of plain HTML, that JavaScript code does NOT need to be formatted (since it is already JavaScript). For instance, if I wanted to display a Google AdSense ad via this .js file, my script would have to parse the following (contained in $html):

<script type="text/javascript"><!-- google_ad_client = "0123456789"; google_alternate_color = "FFFFFF"; google_ad_width = 120; google_ad_height = 90; google_ad_format = "120x90_0ads_al_s"; //2007-05-16: adsense4u google_ad_channel = "1328300801"; google_color_border = "FFFFFF"; google_color_bg = "FFFFFF"; google_color_link = "000000"; google_color_text = "000000"; google_color_url = "78B749"; //--> </script> <script type="text/javascript" src="http://pagead2.googlesyndication.c +om/pagead/show_ads.js"> </script>

I want my script to be able to completely remove the first <script> and </script> tags (but not the code inbetween) since what is contained inbetween is pure JavaScript and should be kept intact/unparsed. However, the second <script> tag contains a "src" and so I want this tag to be parsed into a format suitable for a document.write(); JS statement. So it should look something like this after being parsed:

<!-- google_ad_client = "0123456789"; google_alternate_color = "FFFFFF"; google_ad_width = 120; google_ad_height = 90; google_ad_format = "120x90_0ads_al_s"; //2007-05-16: adsense4u google_ad_channel = "1328300801"; google_color_border = "FFFFFF"; google_color_bg = "FFFFFF"; google_color_link = "000000"; google_color_text = "000000"; google_color_url = "78B749"; //--> document.write('<'+'script type="text/javascript" src="http://pagead2. +googlesyndication.com/pagead/show_ads.js">'); document.write('<'+'/script>');

In the end, my goal is to be able to take the original HTML/JavaScript mix and parse it, via perl, into a form suitable for a .js file. I am not sure where to start and would appreciate any help provided. Thanks!

Replies are listed 'Best First'.
Re: Parsing HTML code for formatting into JavaScript
by jasonk (Parson) on Feb 10, 2009 at 16:48 UTC

    If it were me, I'd start with Jemplate instead of doing whatever it is you are trying to do here...


    www.jasonkohles.com
    We're not surrounded, we're in a target-rich environment!
Re: Parsing HTML code for formatting into JavaScript
by moritz (Cardinal) on Feb 10, 2009 at 16:51 UTC
    What have you tried so far?

    Also last time I looked it violated google's terms of services to use anything but the unmodified adsense code, and they can be very picky when you violate their TOS (ie they can, and probably will, block your account).

      This is what I have tried so far, but it can only format the second <script> tag into something that can be put into a document.write() statement:

      sub html_to_js_var { my($vars) = shift; # Declare my @html = split(/\n/, $vars->{'html'}); my $var = $vars->{'js_var'}; my $in_quotes = $vars->{'in_quotes'}; # A value of either 'single' + or 'double' for ' or " respectively # Format foreach my $line (@html) { if ($in_quotes eq "single") { $line =~ s/\'/\\\'/g; $line =~ s/\</\<\'\+\'/g; } elsif ($in_quotes eq "double") { $line =~ s/\"/\\\"/g; $line =~ s/\</\<\"\+\"/g; } } return @html; }

      It doesn't actually isolate the first <script> section but would instead parse everything passed to it. Btw, it would be called like so:

      my @js_code = html_to_js_var({ "html" => $html, "in_quotes" => "single" # Could be "double" });

      What I am trying to do is display Google AdSense ads (or other ads) on my website and others, in a rotation with other ads. I know this is somehow possible (without violating Google's rules) b/c the PHP program OpenX ad server does it.
        What I am trying to do is display Google AdSense ads (or other ads) on my website and others, in a rotation with other ads. I know this is somehow possible (without violating Google's rules) b/c the PHP program OpenX ad server does it.

        It is possible with server side scripts - for example the server could keep a list of five ad snippets, and randomly deliver one of them on each request. That way the client always sees the unmodified HTML+JS that Google gave you.

        It's also a bit simpler to implement, because it doesn't imply parsing HTML.