Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Capture a string

by pop18 (Novice)
on Feb 28, 2008 at 10:49 UTC ( [id://670882]=perlquestion: print w/replies, xml ) Need Help??

pop18 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have a string like

<citref idrefs="cit">5c,5d</citref>

and want to get the output as

<citref idrefs="cit5c cit5d">5<it>c</it>,5<it>d</it></citref>

Please help!
Thanks,
POP

Replies are listed 'Best First'.
Re: Capture a string
by olus (Curate) on Feb 28, 2008 at 12:12 UTC
    use strict; use warnings; my $s1 = '<citref idrefs="cit">5c,5d,5Es</citref>'; my $s2 = '<qwerty alrefs="xad">5c,5d,5Es</qwerty>'; print parse($s1)."\n"; print parse($s2)."\n"; sub parse { my $orig = shift; my ($cit, $idrefs, $values, @values); $orig =~ m/.*?>(.*)<\//; $values = $1; @values = split ',', $values; $orig =~ m/.*?"(.*)">/; $cit = $1; $idrefs .= "$cit$_ " for @values; chop $idrefs; $orig =~ s/(.*?=").*(">.*)/$1$idrefs$2/; @values = map{ s/([a-z]+)/<it>$1<\/it>/i; $_;} @values; $values = join ',', @values; $orig =~ s/>.*?</'>'.$values.'<'/es; return $orig; }
    And the output
    <citref idrefs="cit5c cit5d cit5Es">5<it>c</it>,5<it>d</it>,5<it>Es</i +t></citref> <qwerty alrefs="xad5c xad5d xad5Es">5<it>c</it>,5<it>d</it>,5<it>Es</i +t></qwerty>
Re: Capture a string
by Punitha (Priest) on Feb 28, 2008 at 11:53 UTC

    Hi,

    I have tried to produce the exact output of yours with the same input, i got these codes,

    use strict; while(<DATA>){ chomp; $_=~s/(<citref idrefs=\")([^"]*)(\">)((?:(?!<\/citref>).)*)(<\/cit +ref>)/$1.idgen($2,$4).$3.citeref($4).$5/sgie; print "$_\n"; } sub idgen{ my ($id,$idcon) = @_; if($idcon =~/,/){ $idcon=~s/([^,]+)(?=,|$)/$id$1/gi; $idcon=~s/,/ /gi; } else{ $idcon=$id.$idcon; } return($idcon); } sub citeref{ my ($con) = @_; if($con =~/,/){ my (@con) = split/,/,$con; map{s/[a-z]+/<it>$&<\/it>/i} @con; $con =join(',',@con); } else{ $con =~s/[a-z]+/<it>$&<\/it>/gi; } return($con); }

    __DATA__ <citref idrefs="cit">5c,5d</citref> <citref idrefs="cit">5d</citref>

    But please explain us furthermore to provide you the better solution like,

      1. The content of the 'citref' tag (i.e) it will always contain the comma or not
      2. The idrefs generated in the output should be preceded with the idref content of input or it will always be 'cit'

    Punitha

Re: Capture a string
by grizzley (Chaplain) on Feb 28, 2008 at 11:58 UTC
    With one regexp or with piece of code containing regexp? What exactly is the format of idrefs attribute (only letters a-z)? What is the format of value inside markers? Is it <digit><letter>,<digit><letter> (2 data strings) or <digit><letter>,<digit><letter>,... (2 or more) or maybe <any char><any char>,...?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://670882]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-24 09:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found