How am I suppose to use your program?
-
This clobbers any existing listurls.txt, gives me two copies of the data and puts a useless status message in preferedname.txt:
linkextractor http://www.blah.com/ > preferedname.txt
-
This clobbers any existing listurls.txt and puts a useless status message in urls.txt:
linkextractor http://www.blah.com/ > preferedname.txt & del listurls.t
+xt
-
This clobbers any existing listurls.txt and loses any error status message:
linkextractor http://www.example.com/ > nul & move listurls.txt prefer
+edname.txt
Suggestions:
- Don't say it's OK when it isn't. Use the correct message.
- Don't say it's OK when it is. Only send the URIs to STDOUT.
- Send error messages (incl non 200 status messages) to STDERR.
- Convert the URIs to absolute URIs.
- Remove duplicate URIs.
- Replace my $url = <@ARGV>; with my ($url) = @ARGV;.
- The domain www.example.com (among others) was set aside for examples. It's better to use that than www.blah.com, a real live domain.
Suggestions applied:
use strict;
use warnings;
use List::MoreUtils qw( uniq );
use WWW::Mechanize qw( );
# usage: linkextractor http://www.blah.com/ > listurls.txt
my ($url) = @ARGV;
my $mech = WWW::Mechanize->new();
my $response = $mech->get($url);
$response->is_success()
or die($response->status_line() . "\n");
print map { "$_\n" }
sort { $a cmp $b }
uniq
map { $_->url_abs() }
$mech->links();
Update: At first, I didn't realize it was outputing to STDOUT in addition to listurls.txt. I recommended that the output should be sent to STDOUT. This is a rewrite.