Dear Monks,
I need to extract texts from repeated html patterns. A good example is this site homeandfamilynetwork dot com
Html is given below.
Probably a repeating pattern would be here:
<p class="entry-summary"> t e x t <div
Once I get this portion I will get rid of html and purify text easily using a filter.
I know that in general this is not an easy task and someone can recommend, for example, to use HTML::ContentExtractor as a help. But if somebody already did something similar I would appreciate if you share the code with me.
<div class="article box medium">
<div class="header"><a class="cate
+gory" href="/home-improvement/" rel="tag">Home Improvement</a><h2 cla
+ss="entry-title first-item"><a href="http://www.homeandfamilynetwork.
+com/home-improvement/organization/get-that-closet-organized/740">Get
+That Closet Organized</a></h2> </div>
<p class="entry-summary">
<p>Cluttered closet? Can't find th
+at coat you have because it's hidden under mountains of junk? We have
+ the solution.<div class="footer"><a href="http://www.homeandfamilyne
+twork.com/blog/better-homes-and-gardens-decorating/161">Better Homes
+and Gardens ...</a> On <abbr title="January 13, 2011">January 13, 201
+1</abbr> </div> </div> <li>
<div class="article box medium">
<div class="header"><a class="cate
+gory" href="/fitness/" rel="tag">Fitness</a><h2 class="entry-title"><
+a href="http://www.homeandfamilynetwork.com/fitness/weight-loss/8-die
+t-rules-you-should-be-breaking/741">8 Diet Rules You Should Be Breaki
+ng</a></h2> </div>
<p class="entry-summary">Trying to
+ lose weight and listening to everything the internet or your friends
+ say? Maybe it's time to stop listening and start eating. <div class=
+"footer"><a href="http://www.homeandfamilynetwork.com/blog/fitnessmag
+azinecom/168">FitnessMagazine.com</a> On <abbr title="January 13, 201
+1">January 13, 2011</abbr> </div> </div> <li>