Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: A regex question

by roboticus (Chancellor)
on Oct 28, 2011 at 20:22 UTC ( #934502=note: print w/replies, xml ) Need Help??

in reply to A regex question


Here's a quick bit of code to get you started:

use strict; use warnings; $/=undef; while (my $line = <DATA>) { for ($line =~ m/<a[^>]*>(.*?)<\/a>/gs) { print "Name '$_'\n"; } } __DATA__ <a href="foo">Jon.Martinez</a><li>gabba, gabba, hey!</li><a href=bar>Mary Jones</a><p>Gazebo!</p><a href="baz">Rob Oticus</a><a>Joe Blow</a>

Note that we slurp all the file in at once ($/=undef) otherwise we can't find names spread over two lines (like Mary Jones). We also need to use the 's' switch on the regular expression to let '.' match newlines (again to pick up Mary Jones!.

Running it gives you:

$ perl 1 Name 'Jon.Martinez' Name 'Mary Jones' Name 'Rob Oticus' Name 'Joe Blow'

Now, having said all that: Remember to review perlre and perlop. Also, you may want to use a real HTML parser instead of hacking away with regular expressions. Otherwise you can find some difficulties with unexpected formatting.


When your only tool is a hammer, all problems look like your thumb.

Update: changed 'e' to 's' (thanks for catching that, hbm!)

Replies are listed 'Best First'.
Re^2: A regex question
by emelianenko (Initiate) on Oct 28, 2011 at 20:58 UTC
    Thank you whole heartedly. I am going to study this that you wrote. Definitively I want to incorporate Perl into my bagage but I am finishing C now. Right after that I will because I am fanatic about managing information. thank you again best regards

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://934502]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2023-02-05 13:28 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (31 votes). Check out past polls.