Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

emilianenko:

Here's a quick bit of code to get you started:

use strict; use warnings; $/=undef; while (my $line = <DATA>) { for ($line =~ m/<a[^>]*>(.*?)<\/a>/gs) { print "Name '$_'\n"; } } __DATA__ <a href="foo">Jon.Martinez</a><li>gabba, gabba, hey!</li><a href=bar>Mary Jones</a><p>Gazebo!</p><a href="baz">Rob Oticus</a><a>Joe Blow</a>

Note that we slurp all the file in at once ($/=undef) otherwise we can't find names spread over two lines (like Mary Jones). We also need to use the 's' switch on the regular expression to let '.' match newlines (again to pick up Mary Jones!.

Running it gives you:

$ perl foo.pl 1 Name 'Jon.Martinez' Name 'Mary Jones' Name 'Rob Oticus' Name 'Joe Blow'

Now, having said all that: Remember to review perlre and perlop. Also, you may want to use a real HTML parser instead of hacking away with regular expressions. Otherwise you can find some difficulties with unexpected formatting.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Update: changed 'e' to 's' (thanks for catching that, hbm!)


In reply to Re: A regex question by roboticus
in thread A regex question by emelianenko

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-04-23 20:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found