Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Pulling info out of html pages

by Bishma (Beadle)
on Apr 08, 2002 at 19:29 UTC ( [id://157525]=perlquestion: print w/replies, xml ) Need Help??

Bishma has asked for the wisdom of the Perl Monks concerning the following question:

Chances are something similar to this has been asked before, but after an hour of searching I decided just to ask since I'm apperantly not using the right search criteria.

I am in the process of writing new message boards for my website and I'd like to convert the old threads to the new system. The problem is the old threads are stored as html files and I need to pull a few specific pieces of info out to format for my db files (posters name, date, IP, and message).

The coder of my origonal message boards was nice enough to comment each element. So what I need to be able to do is get all characters between certain tags.
Something like this:
<!--Poster--><B>Name</B> <!--Date--><B>4/4/02</B> etc...
So everything between the comment and the </B> should be saved and the rest discarded.

I hope thats enough info.
Thanks in advance

Replies are listed 'Best First'.
Re: Pulling info out of html pages
by Kanji (Parson) on Apr 08, 2002 at 19:37 UTC
      excelent, thank you.
      TokeParser will do everything I need.
Re: Pulling info out of html pages
by Ovid (Cardinal) on Apr 08, 2002 at 19:49 UTC

    Here's the shell of a script that uses HTML::TokeParser to print the data you are looking for.

    use strict; use warnings; use HTML::TokeParser; my @sections = qw/ Poster Date /; my $p = HTML::TokeParser->new( 'test.html' ); while ( my $token = $p->get_token ) { my ( $type, $text ) = @$token; if ( $type eq 'C' ) { # we have an HTML comment foreach my $section ( @sections ) { if ( $text =~ /$section/ ) { $p->get_tag( "b" ); my $data = $p->get_trimmed_text; print "$data\n"; last; } } } }

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://157525]
Approved by jlongino
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 16:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found