http://qs321.pair.com?node_id=11117402

corfuitl has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am trying to implement a function to split a sentence with XML tags in 3 parts. To be more clear, I want to parse a sentence with XML tags so that I will not have the wrapping tags. Input sentences are:

<d id="43">Text </d> here <a id="33"/> <b id="33"/> Text <d id="43">text</d> here <d id="43">text here</d> <d id="43">text here</d> <d id="44">text here</d>

Output should be

start: "", middle: "<d id="43">Text </d> here", end: " <a id="33"/>" start: "<b id="33"/> ", middle: "Text <d id="43">text</d> here", end: +"" start: "<d id="43">", middle: "text here", end: "</d>" start: "", middle: "<d id="43">text here</d> <d id="44">text here</d>" +, end: ""

I started my code but I don't think it's efficient. Any suggestions? a, b and c tags are self tags while d is always paired.

my $segment = $_; my $start =""; my $end =""; my $middle =""; while ($segment =~ /^(<[a|b|c] id=\".*?\"\/>)/ || $segment =~ /^(\ +s+)/){ $start .= $1; $segment =~ s/^\Q$1\E//; } while ($segment =~ /(\s+)$/ || $segment =~ /(<[a|b|c] id=\".*?\"\/ +>)$/){ $end = "$1$end"; $segment =~ s/\Q$1\E$//; } while ($segment =~ /^(<d id=\".*?\">).*?(<\/d>)/){ ---- } print "start: \"$start\", middle: \"$middle\", end: \"$end\"\n";