Hi
I am trying to implement a function to split a sentence with XML tags in 3 parts. To be more clear, I want to parse a sentence with XML tags so that I will not have the wrapping tags. Input sentences are:
<d id="43">Text </d> here <a id="33"/>
<b id="33"/> Text <d id="43">text</d> here
<d id="43">text here</d>
<d id="43">text here</d> <d id="44">text here</d>
Output should be
start: "", middle: "<d id="43">Text </d> here", end: " <a id="33"/>"
start: "<b id="33"/> ", middle: "Text <d id="43">text</d> here", end:
+""
start: "<d id="43">", middle: "text here", end: "</d>"
start: "", middle: "<d id="43">text here</d> <d id="44">text here</d>"
+, end: ""
I started my code but I don't think it's efficient. Any suggestions? a, b and c tags are self tags while d is always paired.
my $segment = $_;
my $start ="";
my $end ="";
my $middle ="";
while ($segment =~ /^(<[a|b|c] id=\".*?\"\/>)/ || $segment =~ /^(\
+s+)/){
$start .= $1;
$segment =~ s/^\Q$1\E//;
}
while ($segment =~ /(\s+)$/ || $segment =~ /(<[a|b|c] id=\".*?\"\/
+>)$/){
$end = "$1$end";
$segment =~ s/\Q$1\E$//;
}
while ($segment =~ /^(<d id=\".*?\">).*?(<\/d>)/){
----
}
print "start: \"$start\", middle: \"$middle\", end: \"$end\"\n";