If we care about the variations of the input data, even your regex is not enough—e.g it won't match
<figr n="2">Figure 2</figr>, as it contains more than one space between
figr and
n. The IMHO best solution is to process XML-like data as XML. But the input string can be not well-formed...
I wrote a simple example of how the job could be done, if the input data is a part a of well-formed XML document:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $twig = XML::Twig->new(
twig_handlers => {
figr => sub {
my $fnum = $_->att('n');
$_->del_att('n');
$_->set_tag('FIGIND');
$_->set_att(NUM => $fnum, ID => sprintf('FG.%03d', $fnum))
+;
}
}
);
my $str;
{
local $/ = undef;
$str = <DATA>;
}
$str = "<dummy>$str</dummy>";
$twig->parse($str);
$str = $twig->sprint;
$str =~ s!</?dummy>!!g;
print $str;
__DATA__
Nerve cells come in many shapes and sizes,
but they all have a number of identifiable parts.
A typical nerve cell is shown in <figr n="1">Figure
1</figr>. Like all other cells in the body,
it has a nucleus that contains genetic information.
<figr n="2">Figure 2</figr>. The cell is covered by a
membrane and is filled with a fluid.
It prints:
Nerve cells come in many shapes and sizes,
but they all have a number of identifiable parts.
A typical nerve cell is shown in <FIGIND ID="FG.001" NUM="1">Figure
1</FIGIND>. Like all other cells in the body,
it has a nucleus that contains genetic information.
<FIGIND ID="FG.002" NUM="2">Figure 2</FIGIND>. The cell is covered by
+a
membrane and is filled with a fluid.
s;;Just-me-not-h-Ni-m-P-Ni-lm-I-ar-O-Ni;;tr?IerONim-?HAcker ?d;print