note
tangent
Others have suggested HTML::LinkExtor. Here is a way to do it using <a href="https://metacpan.org/pod/HTML::TreeBuilder::XPath">HTML::TreeBuilder::XPath</a>. Very handy if you need to extract other information from the file.
<code>
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file("/path/to/file.html");
$tree->eof;
my @links = $tree->findnodes('//a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
</code>
That will print every link. If you only want the links from the table then:
<code>
my @links = $tree->findnodes('//td/a') ;
for my $link ( @links ){
print $link->attr('href'), "\n";
}
</code>
Output:
<code>
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0001.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0002.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0003.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0004.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0005.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0006.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0007.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0008.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0009.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0010.txt
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.txt
</code>
1161477
1161477