Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Extract a string from the line

by kulua (Initiate)
on May 29, 2021 at 08:16 UTC ( #11133249=perlquestion: print w/replies, xml ) Need Help??

kulua has asked for the wisdom of the Perl Monks concerning the following question:

I have a string in the xml file

<I want to extract the digit 72 from this file

Where is the mistake in the code ?

String is

<stlib:mem_bit>72</stlib:mem_bit>

use strict; use warnings; my $data = <<'EOD'; <stlib:mem_bit>72</stlib:mem_bit> EOD open my $fh, '<', \$data; while (my $line = <$fh>) { while ( $line =~ /.*st_mem_bit.*(\d+)/g ) { my $a= $1; } } print $a; close $fh;

Replies are listed 'Best First'.
Re: Extract a string from the line
by choroba (Archbishop) on May 29, 2021 at 09:32 UTC
    Use an XML-aware library to handle XML.
    use XML::LibXML; my $dom = 'XML::LibXML'->load_xml(location => 'file.xml'); my $digits = $dom->findvalue('//stlib:mem_bit'); print $digits, "\n";

    To do it correctly, you should register the namespace, but you haven't provided its definition, so I can't show you exactly how to do it. Let's say the whole XML file look like this:

    <r xmlns:stlib="http://stlib.xml.ns"> <ch> <stlib:mem_bit>72</stlib:mem_bit> </ch> </r>

    The correct XML namespace handling code would be

    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $dom = 'XML::LibXML'->load_xml(location => '1.xml'); my $xpc = 'XML::LibXML::XPathContext'->new; $xpc->registerNs(s => 'http://stlib.xml.ns'); my $digits = $xpc->findvalue('//s:mem_bit', $dom); print $digits, "\n"

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Extract a string from the line
by LanX (Sage) on May 29, 2021 at 08:57 UTC
    Please ask yourself:

    What's the difference between

    >  stlib:mem_bit

    and

    >  st_mem_bit

    ?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    PS: Parsing XML with a Regex is in most cases not a good idea...

Re: Extract a string from the line
by Tux (Canon) on May 29, 2021 at 09:30 UTC

    What LanX said. Do not do it with regex.

    Your regex is wrong (besides what LanX noted) in that it is extremely unsafe and misses some restrictions.

    my @numbers = m{ <stlib:membit # Opening tag [^.]* # Optional attributes > # End of opening tag \s* # Optional whitespace ([0-9]+) # The number you want \s* # Optional whitespace </stlib:membit> # Closing tag }gx; # I want all of them in this line

    Enjoy, Have FUN! H.Merijn
      > What LanX said. Do not do it with regex.

      Yeah, but for clarification, I said "in most cases". :)

      Sometimes the XML is just so static and restricted that using a full parser would be overkill.

      pdftohtml -xml is one example for that.

      PS: if you want to allow optional whitespace, you might also want to add an /s modifier to match newlines too.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Extract a string from the line
by hippo (Bishop) on May 29, 2021 at 11:20 UTC
    Where is the mistake in the code ?

    In addition to the incorrect regex you have a scoping problem with your $a. This is hidden from you because of your choice of variable name. Never use $a or $b as a general variable name in perl - they are special variables used for sort routines. If you replace your $a with $num for example then your code won't compile:

    $ cat scoping.pl #!/usr/bin/env perl use strict; use warnings; my $data = <<'EOD'; <stlib:mem_bit>72</stlib:mem_bit> EOD open my $fh, '<', \$data; while (my $line = <$fh>) { while ( $line =~ /.*st_mem_bit.*(\d+)/g ) { my $num= $1; } } print $num; close $fh; $ ./scoping.pl Global symbol "$num" requires explicit package name at ./scoping.pl li +ne 22. Execution of ./scoping.pl aborted due to compilation errors. $

    See Coping with scoping for why this is.

    FWIW, I'm happy to add my voice to those suggesting that an XML parser is almost certainly a better approach for this task.


    🦛

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11133249]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2021-11-27 03:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?