Uh. You wanna keep two lists: one full of URLs queued for crawling and the other with those you successfully visited (this one will be searched on each iteration so let it be hash). So the logic is:
sub crawl {
my @queue = @_;
my %visited;
while(my $url = shift @queue) {
next if $visited{$url};
my $content = $http_ua->get($url);
# do useful things with $content
push @queue, $link_extractor->links($content);
$visited{$url} = 1;
}
}
That's all. When size and efficiency start to really matter you will evaluate migrating data to something like Cache::Cache or Berkeley DB.