Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Perlmonks Code Proxy

by Ovid (Cardinal)
on Aug 19, 2000 at 00:22 UTC ( [id://28583]=perlquestion: print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

If I want to copy some code from Perlmonks, I find that I cannot cut and paste directly from the Web page without losing formatting. What I usually do is view source, copy the code to a file, and run a script I wrote that converts that HTML into proper Perl code. What I would prefer, however, is to have a "Download this code" link after code snippets that are properly posted with <CODE></CODE> tags.

Since this feature is not available (or if it is, I'm not aware of it), I thought it would make a nice project to write a script that would do this for me. I don't know much about proxy servers or Web automation, so this is a learning experience for me.

The following is the first stab at the code (kind of a "proof of concept"). All this code is supposed to do is display a Web page with a (currently) non-functional download link after each CODE posting. Also, all HREF links are pointed back to this code. The problem lies in the regex and the while loop that it is in. When I run the code, it simply hangs. While running it through a debugger, it seems to identify matches in a random, non-sequential order, thus not permitting the while loop to end.

#!/usr/bin/perl -w use strict; use CGI; use LWP::Simple; my $query = new CGI; my $basename = 'http://www.perlmonks.org/'; my $script = 'http://www.someserver.com/path/to/script.cgi'; # I track the actual URL as a hidden field in the HTML my $url = defined $query->param('url') ? $query->param('url') : $ +basename; # Default to $basename if no $url exists my $content = get (defined $url ? $url : $basename); # Add a hidden field with actual URL after <BODY> tag $content =~ s!(<BODY[^>]*>)!$1<INPUT TYPE="hidden" NAME="basename" VAL +UE="$url">!; # Have absolute paths go through this script $content =~ s!href="$basename!href="$script?url=$basename!gi; # Have relative paths go through this script $content =~ s!href\s*=\s*"/!href="$script?url=$basename!gi; # In the following regex, note the following: # Code tags are translated as # <PRE><TT><font size="-1">...</font></TT></PRE> # # <font size=...> and </font> are optional. This is turned off if w +e use "Large font code" # Quotes around -1 in the font tag are optional. They don't always +exist. # I discovered that in examining source for "Death to Dot Star!" my $code_regex = '<PRE><TT>(?:<font size="?-1"?>)?([^<]+)(?:</font>)?< +/TT></PRE>'; # These will be used to create the download link my $href1 = '<P><A HREF="' . $script . '?process=download&code='; my $href2 = '&url=' . $url . '">Download this code</A><P>'; my $i = 0; while ($content =~ m!($code_regex)!go) { my $match = $1; $content =~ s!$match!$match$href1$i$href2!; $i++; } print $query->header; print $content;
I know this is probably something ridiculously simple that I have missed, but I am pulling my hair out over this. Any help would be appreciated.

Cheers,
Ovid

Incidentally, some of the regexes and code above work only because of the layout of Perlmonks. This should not be viewed as any sort of general purpose script.

Replies are listed 'Best First'.
Re: Perlmonks Code Proxy
by tye (Sage) on Aug 19, 2000 at 00:40 UTC
    while ($content =~ m!($code_regex)!go) { my $match = $1; $content =~ s!$match!$match$href1$i$href2!; $i++; }

    You can't use m//g in a scalar context when you are modifying the string you are matching against between scalar m//g's. I bet the $content =~ s!!! is resetting pos($content) to 0 each time, causing the infinite loop. The random order of matching is probably the debugger trying but not succeeding to restore complex regex context. In this case, debug with print statements. ):

            - tye (but my friends call me "Tye")
Re: Perlmonks Code Proxy
by Zebu (Novice) on Aug 19, 2000 at 04:23 UTC
    Always be cautious when using variables as a regexp: if it isn't protected with the \Q...\E sequence, you may be in big trouble. Your $basename var contains dots which are treated like if it were a regexp (Gee... If you understand what I say, you're good ;-)

    $basename = 'http://www.perlmonks.org/'; $content =~ s!href="$basename!...
    It will be interpreted as  $content=~ s!href="http://www.perl...
    Use s!\Qhref="$basename...\E!\Q...\E! instead.
    So it will be translated to s!href=http://www\.perl...
      Sheesh. That's what I get for posting code too quickly. Fortunately, this hasn't caused a problem as I have used exclamation points as delimiters (and avoided the problem with the slashes) and the dot metacharacters match the actual dot characters. I got lucky! Thanks for pointing that out.

      This has actually caused me a problem with the $match variable as this will often contain characters that will have special meaning in a regex, but I can't simply wrap them in \Q and \E because of the problems with $ and @, so I'm going to write a short snippet that will handle that substitution for me, but this is turning out to be just a more difficult problem than I imagined!

      Cheers,
      Ovid

        If you want to have the variable contents being interpreted as a string literal rather than regex chars, the perlfunc:quotemeta function is useful.
Re: Perlmonks Code Proxy
by Boogman (Scribe) on Aug 19, 2000 at 00:40 UTC
    I might be completely off base here, but could you just do something like  $content =~ s!($code_regex)! $1.$href1.$i++.$href2!ge; work? I think its probably cause your substituting stuff into content, making the string different when it tries to do a second match, that was causing it to do strange things. Just a thought - I haven't actually tried it out or anything.
      I tried something like s///ge at one point, but it doesn't work in this case. The /e causes the right hand side of s/// to be handled as code to eval. Well, since we are deliberately matching Perl code with this regex, this solution tends to choke quite spectacularly.

      I think its probably cause your substituting stuff into content, making the string different when it tries to do a second match...

      Oh!!!!! Good point. Didn't think Perl would bomb on that. Need to go check it out. Thanks.

      Cheers,
      Ovid

      Update: Oops. Just noticed that tye pointed out the same problem and I've verified that it's the bug. Here's a bit of sample code that can reproduce it (don't do this at home, kids):

      #!/usr/bin/perl $string = "1"; # Infinite loop caused by modifying the string we are matching against + in while statement while ($string =~ /(\d)/g) { $match = $1; $string =~ s/$match/$match/; } print $string;

      Update 2: After reading through Boogman and tilly's comments below, I'll have to see what I can do to reproduce the /e error. It was rather frustrating.

        The /e causes the right hand side of s/// to be handled as code to eval. Well, since we are deliberately matching Perl code with this regex, this solution tends to choke quite spectacularly.
        Hmmm... Thats strange. I decided to fool around with it and tried this out:
        my $i = 1; my $string = 'print "hello";'; my $content = 'print "hello";$x + 2; print "hello";print "hello";print + "hello";'; $content =~ s/($string)/$1.$i++."\n"/ge; print "$content\n"; eval $string;
        and it printed out
        print "hello";1 $x + 2; print "hello";2 print "hello";3 print "hello";4 hello
        It doesn't seem that it executed the print statement that was matched, even though if we hand the string directly to exec, it does print out hello.
RE: Perlmonks Code Proxy
by Anonymous Monk on Aug 19, 2000 at 01:39 UTC
    Hello, I've had the same problem of copying code and pasting it directly into my Perl editor. However, I found that if I copy the code directly from the "browser" view (not source view) and then paste it into WordPad using Paste Special > Unformatted Text, it works just fine... it looks just like the "browser" view. From there, you can copy the script from the WordPad document and paste it normally into just about any editor (I'm using PerlBuilder). Hope this helps. Mike S.
Re: Perlmonks Code Proxy
by nate (Monk) on Aug 19, 2000 at 19:51 UTC
    It should work now if you click on the d/l code link on the bottom of any node that contains a <CODE> block.

    Everything allows you to create different "htmlpages" for types, so we created a "document downloadcode page", with no nested containers around it:

    [% my $text = $$NODE{doctext}; my $str; while($text=~/<CODE>((.|\n)*?)<\/CODE>/ig) { my $code = $1; $str.=$code."\n\n\n"; } $str; %]

    All that was left was to hack in a change of content type to "application/octet" in the core, and viola!


    Okay, merlyn's right. "text/plain" it is...
      Uh, what the heck is application/octet? Why not send it as it really is, text/plain, and that way it gets the right download type on every machine, and let me decide whether I want to see it in my browser, or dictate to my browser that I want to save it on disk?

      Yours for a better web,

      -- Randal L. Schwartz, Perl hacker

      yeah that should have been application/octet-stream but text/plain works too

      vroom | Tim Vroom | vroom@cs.hope.edu
        Cool, now I can view it in my browser and cut/paste there, or hit option-click and it downloads as a nice Mac-OK text file! Thanks!

        -- Randal L. Schwartz, Perl hacker

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://28583]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-20 03:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found