If you want a nice and complete discussion about writing spiders and parsing HTML you may want to look at the new O'Reilly tome Perl and LWP. This includes many examples of mining information from websites, ranging from using a few regexps to pull out the information, to rebuilding the HTML in tree from and throwing it out again, or spidering entire sites in the correct manner.
I've recently had to write a spider for work and whilst I'd got it working and doing what we needed this book pointed out a few things I'd over-looked thus allowing me to tighten things and cut down the chances of things falling to pieces. Well recommended.