Re^2: Matching a pattern in Regex

If we care about the variations of the input data, even your regex is not enough—e.g it won't match <figr n="2">Figure 2</figr>, as it contains more than one space between figr and n. The IMHO best solution is to process XML-like data as XML. But the input string can be not well-formed...

I wrote a simple example of how the job could be done, if the input data is a part a of well-formed XML document:

#!/usr/bin/perl

use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        figr => sub {
            my $fnum = $_->att('n');
            $_->del_att('n');
            $_->set_tag('FIGIND');
            $_->set_att(NUM => $fnum, ID => sprintf('FG.%03d', $fnum))
+;
        }
    }
);

my $str;
{
local $/ = undef;
$str = <DATA>;
}
$str = "<dummy>$str</dummy>";
$twig->parse($str);
$str = $twig->sprint;
$str =~ s!</?dummy>!!g;
print $str;

__DATA__
Nerve cells come in many shapes and sizes,
but they all have a number of identifiable parts. 
A typical nerve cell is shown in <figr n="1">Figure
1</figr>. Like all other cells in the body, 
it has a nucleus that contains genetic information.
<figr  n="2">Figure 2</figr>. The cell is covered by a 
membrane and is filled with a fluid.
[download]

It prints:

Nerve cells come in many shapes and sizes,
but they all have a number of identifiable parts.
A typical nerve cell is shown in <FIGIND ID="FG.001" NUM="1">Figure
1</FIGIND>. Like all other cells in the body,
it has a nucleus that contains genetic information.
<FIGIND ID="FG.002" NUM="2">Figure 2</FIGIND>. The cell is covered by 
+a
membrane and is filled with a fluid.
[download]

s;;Just-me-not-h-Ni-m-P-Ni-lm-I-ar-O-Ni;;tr?IerONim-?HAcker ?d;print

Comment on Re^2: Matching a pattern in Regex Select or Download Code


Welcome to the Monastery
	PerlMonks