|There's more than one way to do things|
HTML::TokeParser help - parsing headlinesby perleager (Pilgrim)
|on Mar 07, 2004 at 00:33 UTC||Need Help??|
perleager has asked for the wisdom of the Perl Monks concerning the following question:
I just decided to embrace on learning everything about LWP :)
I went out to buy the Perl & LWP book to start out with learning some parsing by extracting headlines from a given news site. In this book, the chapter that's about using Tokens to extract headlines; they use the bbc news site for the example site to retrieve headlines. However, since the example no longer works due to the different html coding for each headline, I decided to use reuters news headlines (reuters business section). I'm having a bit trouble with my coding. The problem with the code I'm using is it prints out nothing, therefore I'm figuring I'm not doing the toking part right (I do have all the modules installed).
So first thing to do, I looked for the headlines in the source. I found the pattern goes as:
Heres my following code to extract the headlines using HTML::TokeParser :
The code looks for the <td class="earlyHeadline">, then the next portion looks for the "a href" part. Then the line where it prints out the url is printing out nothing =(. Can anyone point out what I'm doing wrong? Am I even on the right track?