http://qs321.pair.com?node_id=185455
Category: PerlMonks Related Scripts
Author/Contact Info Graciliano M. P.
Description: The '+' used to split big lines inside CODE are very usefull to reade in the browser, but to copy the CODE and test not! Just point to a node and it will return all the codes of the page without '+'.
#####################################
# PLAY WITH THE PERLMONKS SITE (1). #
#####################################

use LWP::Simple ;

$|=1;

my $node = '185131' ;
my $url = "http://www.perlmonks.org/index.pl?node_id=$node" ;
if ($node =~ /^http:\/\//) { $url = $node ;}

print "Getting node $node...\n" ;
print "$url\n" ;

$html = get($url);

my $lng = length($html) ;
print "$lng bytes.\n\n" ;

$html =~ s/\r\n?/\n/gs ;

my (@codes) = ( $html =~ /<pre><tt><font.*?>(.*?)<\/font><\/tt><\/pre>
+/gsi );

foreach my $code ( @codes ) {
  $code =~ s/\n<font.*?>\+<\/font>//gi ;
  $code = filter_from_html($code) ;
  print
"# CODE #################################################\n"
  if ($#codes > 0) ;
  print "$code\n" ;
}

####################
# FILTER_FROM_HTML #
####################

sub filter_from_html {
  my ( $code ) = @_ ;

  my %SYMBOLS_html = (
  'acute' => 'aeiouAEIOU#áéíóúÁÉÍÓÚ' ,
  'grave' => 'aeiouAEIOU#àèìòùÀÈÌÒÙ' ,
  'circ'  => 'aeiouAEIOU#âêîôûÂÊÎÔÛ' ,
  'uml'   => 'aeiouAEIOU#äëïöüÄËÏÖÜ' ,
  'tilde' => 'aoAO#ãõÃÕ' ,
  'cedil' => 'cC#çÇ' ,
  'lt'    => '#<' ,
  'gt'    => '#>' ,
  'quot'  => '#"' ,
  ) ;
  
  $code =~ s/&#(\d{1,3});/pack("C",$1)/eg;

  $code =~ s/&amp;?/&/gsi ;
  $code =~ s/&nbsp;?/ /gsi ;
  
  my ($start,$end,@letras1,@letras2,$max);

  foreach my $Key ( keys %SYMBOLS_html ) {
    ($start , $end) = split('#' , $SYMBOLS_html{$Key}) ;
    @letras1 = split('' , $start) ;
    @letras2 = split('' , $end) ;
    
    $max = $#letras1 ;
    if ($#letras2 > $max) { $max = $#letras2 ;}
    
    for(0..$max) {
      $code =~ s/\&$letras1[$_](?i:$Key);?/$letras2[$_]/g ;
    }
  }

  return( $code ) ;
}

#######
# END #
#######

# Send your feedback!
#
# "The creativity is the expression of the liberty".
Replies are listed 'Best First'.
Re: Playing with PerlMonks site (1) - Copy a CODE without the '+'
by Anonymous Monk on Jul 26, 2002 at 08:01 UTC
    Or, just use the "d/l code" link right below the 'comment on' link and realize you just solved a non-problem.
Re: Playing with PerlMonks site (1) - Copy a CODE without the '+'
by Corion (Patriarch) on Jul 26, 2002 at 08:01 UTC

    Of course, the d/l code link does the same :-))

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Playing with PerlMonks site (1) - Copy a CODE without the '+'
by gmpassos (Priest) on Jul 26, 2002 at 08:05 UTC
    Thanks! I don't know this! New user...

    But we still need to save the file on d/l, since I'm using IE...

    But this code can be used for educational purpose, specially &filter_from_html.

    "The creativity is the expression of the liberty".