comment on

Hi, I am trying to parse HTML data using regex, below is the HTML code

 <td class="body3" valign="top"><p style="margin-top:1ex; margin-botto
+m:1ex;">The purpose of this study is to compare two types of care - s
+tandard <span class="hit_org">oncology</span> care and standard <span
+ class="hit_org">oncology</span> care with early palliative care (sta
+rted soon after diagnosis) to see which is better for improving the e
+xperience of patients and families with advanced lung and non-colorec
+tal GI cancer.  The study will use questionnaires to measure patients
+' and caregivers' quality of life, mood, coping and understanding of 
+their illness.</p></td>
[download]

I tried to extract the text using below code. ($bs) = $pre_bs =~ m/\>(.*)\</; Information of only 1st tag will be removed, not all. So I tried with this as well, $bt =~ s/<.*>//gi; but its not working, everything is removed in this case. I want to remove all tags in a line no matter how many are they, tried multiple combinations but nothing is working. Thanks

In reply to Removing text between HTML tags by perll

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


The stupid question is the question not asked
	PerlMonks