comment on

Take a look at the documentation for regex quantifiers, and capture groups.

Your code will match numbers that come in multiples of 4 integers. For example something-1234.html will match as well as something-12341234.html. For matching only 4 digits, your pattern can be simplified to:

$url=~/(\d{4})\.htm/i;
[download]

Note, that the + has been removed from your regex. Also, as your code is written $num will not contain the number. It will contain the whole URL. To get just the number, you need to get the value of the first capture group

$num = $1;
[download]

To allow for 4 or more digits, use the following

$url=~/(\d{4,})\.htm/i;
[download]

To allow for only 4 or 5 digits, use the following

$url=~/(\d{4,5})\.htm/i;
[download]

UPDATE:I really like the named capture groups feature that comes with perl versions 5.10 and greater. They can be overkill when you are only dealing with one or two groups, but can make the code much more clear if you are dealing with multiple capture groups.

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;

my $url = 'something-12345.html'; 
$url =~ /(?<num>\d{4,5})\.htm/i;
my $num = $+{num};
print "$num\n";
exit;
[download]

In reply to Re: Grabbing numbers from a URL by kevbot
in thread Grabbing numbers from a URL by htmanning

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Pathologically Eclectic Rubbish Lister
	PerlMonks