Perlmonks Code Proxy

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

If I want to copy some code from Perlmonks, I find that I cannot cut and paste directly from the Web page without losing formatting. What I usually do is view source, copy the code to a file, and run a script I wrote that converts that HTML into proper Perl code. What I would prefer, however, is to have a "Download this code" link after code snippets that are properly posted with <CODE></CODE> tags.

Since this feature is not available (or if it is, I'm not aware of it), I thought it would make a nice project to write a script that would do this for me. I don't know much about proxy servers or Web automation, so this is a learning experience for me.

The following is the first stab at the code (kind of a "proof of concept"). All this code is supposed to do is display a Web page with a (currently) non-functional download link after each CODE posting. Also, all HREF links are pointed back to this code. The problem lies in the regex and the while loop that it is in. When I run the code, it simply hangs. While running it through a debugger, it seems to identify matches in a random, non-sequential order, thus not permitting the while loop to end.

#!/usr/bin/perl -w
use strict;

use CGI;
use LWP::Simple;

my $query    = new CGI;
my $basename = 'http://www.perlmonks.org/';
my $script   = 'http://www.someserver.com/path/to/script.cgi';

# I track the actual URL as a hidden field in the HTML
my $url      = defined $query->param('url') ? $query->param('url') : $
+basename;

# Default to $basename if no $url exists
my $content  = get (defined $url ? $url : $basename);

# Add a hidden field with actual URL after <BODY> tag
$content =~ s!(<BODY[^>]*>)!$1<INPUT TYPE="hidden" NAME="basename" VAL
+UE="$url">!;

# Have absolute paths go through this script
$content =~ s!href="$basename!href="$script?url=$basename!gi;

# Have relative paths go through this script
$content =~ s!href\s*=\s*"/!href="$script?url=$basename!gi;

# In the following regex, note the following:
#   Code tags are translated as
#   <PRE><TT><font size="-1">...</font></TT></PRE>
#
#   <font size=...> and </font> are optional.  This is turned off if w
+e use "Large font code"
#   Quotes around -1 in the font tag are optional.  They don't always 
+exist.
#   I discovered that in examining source for "Death to Dot Star!"

my $code_regex = '<PRE><TT>(?:<font size="?-1"?>)?([^<]+)(?:</font>)?<
+/TT></PRE>';

# These will be used to create the download link
my $href1 = '<P><A HREF="' . $script . '?process=download&code='; 
my $href2 = '&url=' . $url . '">Download this code</A><P>';

my $i = 0;

while ($content =~ m!($code_regex)!go) {
    my $match = $1;
    $content =~ s!$match!$match$href1$i$href2!;
    $i++;
}
    
print $query->header;
print $content;
[download]

I know this is probably something ridiculously simple that I have missed, but I am pulling my hair out over this. Any help would be appreciated.

Cheers,
Ovid

Incidentally, some of the regexes and code above work only because of the layout of Perlmonks. This should not be viewed as any sort of general purpose script.

Comment on Perlmonks Code Proxy Download Code

Replies are listed 'Best First'.
Re: Perlmonks Code Proxy by tye (Sage) on Aug 19, 2000 at 00:40 UTC
`while ($content =~ m!($code_regex)!go) { my $match = $1; $content =~ s!$match!$match$href1$i$href2!; $i++; }` [download] You can't use m//g in a scalar context when you are modifying the string you are matching against between scalar m//g's. I bet the `$content =~ s!!!` is resetting `pos($content)` to 0 each time, causing the infinite loop. The random order of matching is probably the debugger trying but not succeeding to restore complex regex context. In this case, debug with `print` statements. ): - tye (but my friends call me "Tye")	[reply] [d/l] [select]
Re: Perlmonks Code Proxy by Zebu (Novice) on Aug 19, 2000 at 04:23 UTC
Always be cautious when using variables as a regexp: if it isn't protected with the \Q...\E sequence, you may be in big trouble. Your $basename var contains dots which are treated like if it were a regexp (Gee... If you understand what I say, you're good ;-) `$basename = 'http://www.perlmonks.org/'; $content =~ s!href="$basename!...` [download] It will be interpreted as `$content=~ s!href="http://www.perl...` Use `s!\Qhref="$basename...\E!\Q...\E!` instead. So it will be translated to `s!href=http://www\.perl...`	[reply] [d/l] [select]
(Ovid) RE(2): Perlmonks Code Proxy by Ovid (Cardinal) on Aug 19, 2000 at 07:57 UTC
Sheesh. That's what I get for posting code too quickly. Fortunately, this hasn't caused a problem as I have used exclamation points as delimiters (and avoided the problem with the slashes) and the dot metacharacters match the actual dot characters. I got lucky! Thanks for pointing that out. This has actually caused me a problem with the `$match` variable as this will often contain characters that will have special meaning in a regex, but I can't simply wrap them in `\Q` and `\E` because of the problems with `$` and `@`, so I'm going to write a short snippet that will handle that substitution for me, but this is turning out to be just a more difficult problem than I imagined! Cheers, Ovid	[reply]
RE: (Ovid) RE(2): Perlmonks Code Proxy by jplindstrom (Monsignor) on Aug 19, 2000 at 18:52 UTC
If you want to have the variable contents being interpreted as a string literal rather than regex chars, the perlfunc:quotemeta function is useful.	[reply]
Re: Perlmonks Code Proxy by Boogman (Scribe) on Aug 19, 2000 at 00:40 UTC
I might be completely off base here, but could you just do something like `$content =~ s!($code_regex)! $1.$href1.$i++.$href2!ge;` work? I think its probably cause your substituting stuff into content, making the string different when it tries to do a second match, that was causing it to do strange things. Just a thought - I haven't actually tried it out or anything.	[reply] [d/l]
(Ovid) RE(2): Perlmonks Code Proxy by Ovid (Cardinal) on Aug 19, 2000 at 00:46 UTC
I tried something like `s///ge` at one point, but it doesn't work in this case. The `/e` causes the right hand side of `s///` to be handled as code to eval. Well, since we are deliberately matching Perl code with this regex, this solution tends to choke quite spectacularly. I think its probably cause your substituting stuff into content, making the string different when it tries to do a second match... Oh!!!!! Good point. Didn't think Perl would bomb on that. Need to go check it out. Thanks. Cheers, Ovid Update: Oops. Just noticed that tye pointed out the same problem and I've verified that it's the bug. Here's a bit of sample code that can reproduce it (don't do this at home, kids): `#!/usr/bin/perl $string = "1"; # Infinite loop caused by modifying the string we are matching against + in while statement while ($string =~ /(\d)/g) { $match = $1; $string =~ s/$match/$match/; } print $string;` [download] Update 2: After reading through Boogman and tilly's comments below, I'll have to see what I can do to reproduce the `/e` error. It was rather frustrating.	[reply] [d/l]
RE: RE: Re: Perlmonks Code Proxy by Boogman (Scribe) on Aug 19, 2000 at 01:24 UTC
The /e causes the right hand side of s/// to be handled as code to eval. Well, since we are deliberately matching Perl code with this regex, this solution tends to choke quite spectacularly. Hmmm... Thats strange. I decided to fool around with it and tried this out: `my $i = 1; my $string = 'print "hello";'; my $content = 'print "hello";$x + 2; print "hello";print "hello";print + "hello";'; $content =~ s/($string)/$1.$i++."\n"/ge; print "$content\n"; eval $string;` [download] and it printed out `print "hello";1 $x + 2; print "hello";2 print "hello";3 print "hello";4 hello` [download] It doesn't seem that it executed the print statement that was matched, even though if we hand the string directly to exec, it does print out hello.	[reply] [d/l] [select]
RE (tilly) 4: Perlmonks Code Proxy by tilly (Archbishop) on Aug 19, 2000 at 01:28 UTC
RE: RE (tilly) 4: Perlmonks Code Proxy by Boogman (Scribe) on Aug 19, 2000 at 01:49 UTC
Some notes below your chosen depth have not been shown here
RE: Perlmonks Code Proxy by Anonymous Monk on Aug 19, 2000 at 01:39 UTC
Hello, I've had the same problem of copying code and pasting it directly into my Perl editor. However, I found that if I copy the code directly from the "browser" view (not source view) and then paste it into WordPad using Paste Special > Unformatted Text, it works just fine... it looks just like the "browser" view. From there, you can copy the script from the WordPad document and paste it normally into just about any editor (I'm using PerlBuilder). Hope this helps. Mike S.	[reply]
Re: Perlmonks Code Proxy by nate (Monk) on Aug 19, 2000 at 19:51 UTC
It should work now if you click on the d/l code link on the bottom of any node that contains a <CODE> block. Everything allows you to create different "htmlpages" for types, so we created a "document downloadcode page", with no nested containers around it: `[% my $text = $$NODE{doctext}; my $str; while($text=~/<CODE>((.\|\n)?)<\/CODE>/ig) { my $code = $1; $str.=$code."\n\n\n"; } $str; %]` [download] All that was left was to hack in a change of content type to "application/octet" in the core, and viola*! Okay, merlyn's right. "text/plain" it is...	[reply] [d/l]
RE: Re: Perlmonks Code Proxy by merlyn (Sage) on Aug 19, 2000 at 19:58 UTC
Uh, what the heck is `application/octet`? Why not send it as it really is, `text/plain`, and that way it gets the right download type on every machine, and let me decide whether I want to see it in my browser, or dictate to my browser that I want to save it on disk? Yours for a better web, -- Randal L. Schwartz, Perl hacker	[reply]
RE: Re: Perlmonks Code Proxy by vroom (His Eminence) on Aug 19, 2000 at 22:06 UTC
yeah that should have been application/octet-stream but text/plain works too vroom \| Tim Vroom \| vroom@cs.hope.edu	[reply]
RE: RE: Re: Perlmonks Code Proxy by merlyn (Sage) on Aug 19, 2000 at 23:00 UTC
Cool, now I can view it in my browser and cut/paste there, or hit option-click and it downloads as a nice Mac-OK text file! Thanks! -- Randal L. Schwartz, Perl hacker	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks