How am I suppose to use your program?
-
This clobbers any existing listurls.txt, gives me two copies of the data and puts a useless status message in preferedname.txt:
linkextractor http://www.blah.com/ > preferedname.txt
-
This clobbers any existing listurls.txt and puts a useless status message in urls.txt:
linkextractor http://www.blah.com/ > preferedname.txt & del listurls.t
+xt
-
This clobbers any existing listurls.txt and loses any error status message:
linkextractor http://www.example.com/ > nul & move listurls.txt prefer
+edname.txt
Suggestions:
- Don't say it's OK when it isn't. Use the correct message.
- Don't say it's OK when it is. Only send the URIs to STDOUT.
- Send error messages (incl non 200 status messages) to STDERR.
- Convert the URIs to absolute URIs.
- Remove duplicate URIs.
- Replace my $url = <@ARGV>; with my ($url) = @ARGV;.
- The domain www.example.com (among others) was set aside for examples. It's better to use that than www.blah.com, a real live domain.
Suggestions applied:
use strict;
use warnings;
use List::MoreUtils qw( uniq );
use WWW::Mechanize qw( );
# usage: linkextractor http://www.blah.com/ > listurls.txt
my ($url) = @ARGV;
my $mech = WWW::Mechanize->new();
my $response = $mech->get($url);
$response->is_success()
or die($response->status_line() . "\n");
print map { "$_\n" }
sort { $a cmp $b }
uniq
map { $_->url_abs() }
$mech->links();
Update: At first, I didn't realize it was outputing to STDOUT in addition to listurls.txt. I recommended that the output should be sent to STDOUT. This is a rewrite.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|