Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Well, even though this wasnt addressed to me:

Mine will extract all the above information just change the following lines

print "($depth)$monkname posted '$monkname' on $date\n"; $hashref->{$monkname}->{$node_id}={ date=>$date, title=>$title, depth=>$depth };
Then you can extract whatever you want.
$VAR1 = { 'demerphq' => { '110238' => { 'depth' => '13', 'title' => 'Corions Name Space +', 'date' => 'Sep 05, 2001 at 01: +04' }, 'Home' => '108447', '110195' => { 'depth' => '12', 'title' => 'Re: Name Space', 'date' => 'Sep 04, 2001 at 15: +46' } }, 'George_Sherston' => { 'Home' => '103111', '124767' => { 'depth' => '13', 'title' => 'Re: Re: Nam +e Space', 'date' => 'Nov 11, 2001 + at 22:33' }, 'Name Space' => { 'depth' => '9', 'title' => 'Name Sp +ace', 'date' => 'Sep 04, +2001 at 13:33' }, '121046' => { 'depth' => '14', 'title' => 'Re: Re: Re: + Name Space', 'date' => 'Oct 24, 2001 + at 01:21' }, '117665' => { 'depth' => '13', 'title' => 'Re: TheOrbT +wo\'s Name Space', 'date' => 'Oct 09, 2001 + at 00:05' }, '117303' => { 'depth' => '13', 'title' => 'Re: Re: Nam +e Space', 'date' => 'Oct 07, 2001 + at 03:57' }, '110244' => { 'depth' => '13', 'title' => 'Re: Re: Nam +e Space', 'date' => 'Sep 05, 2001 + at 01:58' }, '122854' => { 'depth' => '13', 'title' => 'Re: Re: Nam +e Space', 'date' => 'Nov 02, 2001 + at 08:07' } }, };
Note that the depths are as follows:9 root node, 12 reply, 13, reply to a reply...
But a thought: You dont want the posts from just a fixed depth in the parse tree. That would for instance eliminate you from the list (you dont have a reply to yourself) as well as anyone who explained their name in a reply to another persons explaination, merphq would be an example, however I believe there are more as well.

Actually, one of the more interesting issues with this thread was acurately picking up all names from all levels, there is an annoying habit of <UL> tags messing up the pattern, also of the main post being marked up differently.

Anyway, Ill revisit this a bit later, :-)

Yves / DeMerphq
--
Have you registered your Name Space?


In reply to Re: Re: (crazyinsomniac) Re: Extract info from HTML by demerphq
in thread Extract info from HTML by George_Sherston

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2021-11-28 15:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?