Hello fellow Monks!
I need to write simple notifier to be always up-to-date with certain threads replies
I am aware of WebService::Vichan, but i already started my HTML::Tokeparser::Simple approach, which i don't want to abandon yet.
I belive that using simplier tools i will learn more and whole code will be more effective than using Super::Duper::Module -> do_everything(\$data);
Given belown partial document file
<div class="post reply body-not-empty" id="reply_8735435">
(cut out for visibility)
<p class="body-line ltr ">The first 3 lines were 15% bait power, but t
+hen it fell to mere 5% and the last lines are literally 0%, try again
+ in a few days.</p>
</div>
(cut out for visibility)
<div class="post reply body-not-empty" id="reply_8735439">
(cut out for visibility)
<div class="body" >
<p class="body-line ltr ">
<a onclick="highlightReply('8735417', event);" href="/b/res/8735417.ht
+ml#8735417">>>8735417</a>
</p>
<p class="body-line ltr quote">>Reddit is a great place for discour
+se and there are many active subreddits where field professionals reg
+ularly answer questions on issues of health, science, engineering, et
+c</p>
<p class="body-line ltr ">Yeah, as far as content goes, Reddit kicks 8
+chan's ass. They have some great boards for serious academic discussi
+on.</p>
<p class="body-line empty ">
i want to iterate over "reply_xxx" id divs and once found i want to descend below to finally rip out whole body class div
Then, proceed to next reply-a-like div until EOF
Simple? nope :P
The issue i am running into is extistence of Tokeparse's cursor thingie, a state indicator which internally "knows" where in document parser actually is.
Using this
my $parser = HTML::TokeParser::Simple->new(\$data);
while (my $div = $parser->get_tag('div','/div')) {
my $id = $div -> get_attr('id');
next unless (defined $id and $id =~ /reply/);
# tutaj kursor jest wewnatrz taga z odpowiedzia
# wiec iteruje glebiej
while ( my $inner_div = $parser -> get_tag('div','/div')) {
my $inner_class = $inner_div -> get_attr('class');
next unless (defined $inner_class and $inner_class eq 'body');
#~ # print "div.$id > div.$inner_class \n";
my $text = $parser -> get_text;
print "$id: '$text' \n";
#~ # print $id ." ";
}
}
gives a result where only first ID is matched and inner while loop iterates over all replies' bodies until EOF
Obviously it's not what i am after :)
my first though was to isolate content of rest of HTML document after matching "reply id", run inner while until first closing div, then feed outer while with not-already-consumed document's data and do it until actual EOF
as you can see, it seems uneffective in first thought.
How do Monks would hande this task? By "rewinding" internal cursor using unget_token method? Tokeparser is not a must, i am open to other solutions, but it's welcome.