From time to time hacking in the terminal I need some rest. One of the things I'd love then is reading a random Darwin Award.

The text seems to be pretty good hidden in the html tree, so I decided to use an empirical approach, which filters the surrounding stuff.
#!/usr/bin/perl -w use strict; use WWW::Mechanize; my $agent = WWW::Mechanize->new( autocheck => 1 ); $agent->get(''); my $content = $agent->content( format => "text" ); my $cr = chr 169; $content =~ s/.*\d\d\s+Urban Legend//s; $content =~ s/.*\d\d\s+Personal Account//s; $content =~ s/.*Reader Submission\s+Pending Acceptance//s; $content =~ s/\s*DarwinAwards\.com\s*$cr.*//s; $content =~ s/.*?\([^\)]*?\d{2}[^\)]*\) //s; $content =~ s/.*Darwin\s?Award\s?Nominee//si; $content =~ s/.*Confirmed \S+\s?by Darwin//si; $content =~ s/.*Honorable Mentions//s; $content =~ s/submitted by.*//si; $content =~ s/109876543210.*//s; $content =~ s/^\s+//; print $content;

Replies are listed 'Best First'.
Re: Random Darwin Award in plain text
by wazoox (Prior) on Mar 23, 2006 at 13:19 UTC
    A valuable replacement for your usual fortune cookies :)
      fortune cookies? who needs fortune cookies ;-)
      perl -MLWP::Simple -e '@_ = split/\%\n/, get(q( +fun/signatures.txt));print splice @_, @_*rand,1'
Re: Random Darwin Award in plain text
by willyyam (Priest) on Mar 29, 2006 at 16:35 UTC

    I quite like this, thank you. I find that not all entries have a trailing newline, so I added this to the code before the print statement: $content = $content . "\n";

    Is there a way to run fmt -72 or something similar on this text block to have it break the lines neatly?

      Ah good idea. About the formatting:
      perl | fmt -72
      would do it.

      A perl only solution could use Text::Wrap:
      #!/usr/bin/perl -w use strict; use WWW::Mechanize; use Data::Dumper; use Text::Wrap qw(wrap); my $agent = WWW::Mechanize->new( autocheck => 1 ); $agent->get(''); my $content = $agent->content( format => "text" ); my $cr = chr 169; $content =~ s/.*\d\d\s+Urban Legend//s; $content =~ s/.*\d\d\s+Personal Account//s; $content =~ s/.*Reader Submission\s+Pending Acceptance//s; $content =~ s/\s*DarwinAwards\.com\s*$cr.*//s; $content =~ s/.*?\([^\)]*?\d{2}[^\)]*\) //s; $content =~ s/.*Darwin\s?Award\s?Nominee//si; $content =~ s/.*Confirmed \S+\s?by Darwin//si; $content =~ s/.*Honorable Mentions//s; $content =~ s/submitted by.*//si; $content =~ s/109876543210.*//s; $content =~ s/^\s+//; print wrap("\t", "", "$content\n");