Others have suggested HTML::LinkExtor. Here is a way to do it using
HTML::TreeBuilder::XPath. Very handy if you need to extract other information from the file.
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file("/path/to/file.html");
$tree->eof;
my @links = $tree->findnodes('//a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
That will print every link. If you only want the links from the table then:
my @links = $tree->findnodes('//td/a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
Output:
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+001.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+002.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+003.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+004.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+005.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+006.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+007.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+008.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+009.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0
+010.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.t
+xt