http://qs321.pair.com?node_id=1100530


in reply to Removing text between HTML tags

The substitution below works for the sample provided, however this is the wrong way to do it, I've assumed that this is very simple HTML with nothing that would break a very simple minded substitution. (eg, what would happen when a button with the alternative text "Next >" ) There is a famous response to this on another site, but the Perl specific response is to use a HTML parsing module eg HTML::TokeParser::Simple which helpfully has extracting the content from a html file as the first example.
s/<[^>]+>//g;

print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

Replies are listed 'Best First'.
Re^2: Removing text between HTML tags
by perll (Novice) on Sep 14, 2014 at 15:03 UTC
    Thanks, I know about HTML::TokeParser::Simple, but I am working on my office laptop and firewall blocks cpan :( It will take time for me to get that module. Anyway it is a known set of HTML and will be same for all pages, thank you.
      Is Metacpan blocked?

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      “Without any further ado,” talk with your boss and ask him or her to arrange for you to have access to CPAN.   (You can, if necessary, install all of the modules that you need locally to just your own account and machine, so there are no system-integrity risks.)   There is zero doubt in my mind that there is really no other business-justifiable way to get this job done.   (And there undoubtedly will be more business-cases like this one.   You must have the Right Tools For The Job.)

        Thanks for reply, my company sucks, we have access only to intranet and to access internet we have a separate bay. This is a new project still in POC and I am working off the records to show the director I can do something :) I got HTML::TokeParser and HTML::TreeBuilder and planning to re-write the code.