Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

How to print data between tags from file sequentially?

by TonyNY (Beadle)
on Jun 22, 2018 at 03:15 UTC ( [id://1217156]=perlquestion: print w/replies, xml ) Need Help??

TonyNY has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a text file that contains the follwoing that I want to print one line at a time. is it possible to print only the data between > < sequentiallly?

<Answer type="string">ServerName</Answer> <Answer type="string">10.10.10.11</Answer> <Answer type="string">Windows Server 2012</Answer>

output would be something like:

Commputer: ServerName IP Address: 10.10.10.11 OS: Windows Server 2012

This is what I have so far but it only prints out the entire lines at the same time:

#!/usr/bin/perl # open file open(FILE, "data.txt") or die("Unable to open file"); # read file into an array @data = <FILE>; # close file close(FILE); # print file contents foreach $line (@data) { print $line; }

Thanks

Replies are listed 'Best First'.
Re: How to print data between tags from file sequentially?
by haukex (Archbishop) on Jun 22, 2018 at 08:09 UTC
    I have a text file

    Are you sure it's a text file? It looks like XML (or maybe HTML?) to me. Even if just a section of a longer text file contains XML data, I would strongly recommend extracting that section of text and using a proper module to parse it (see e.g. Parsing HTML/XML with Regular Expressions for why). Mojo::DOM has a nice interface, and it handles even this case of the XML not having a root node:

    use warnings; use strict; use Mojo::DOM; my $html = <<'END_HTML'; <Answer type="string">ServerName</Answer> <Answer type="string">10.10.10.11</Answer> <Answer type="string">Windows Server 2012</Answer> END_HTML # note: ->xml(1) turns on case sensitivity my $dom = Mojo::DOM->new->xml(1)->parse($html); my $values = $dom->find('Answer')->map('all_text'); print "Computer: $values->[0]\n"; print "IP Address: $values->[1]\n"; print "OS: $values->[2]\n";

    Minor edits

      Good advice, use a proper parser, and I find Mojo::DOM makes things like this trivial, even for odd data.

      Actually it is an xml file but redirected the output to a text file. Unfortunately the environment where I work is too strict for me to install any modules but thanks.
Re: How to print data between tags from file sequentially?
by AnomalousMonk (Archbishop) on Jun 22, 2018 at 04:30 UTC

    Quick and dirty:

    c:\@Work\Perl\monks>perl -wMstrict -le "my @data = ( q{<Answer type='string'>ServerName</Answer>}, q{<Answer type='string'>10.10.10.11</Answer>}, q{<Answer type='string'>Windows Server 2012</Answer>}, ); ;; for my $datum (@data) { $datum =~ s{ \A .*? (?<= >) ([^<]*) .* }{$1}xms; print qq{'$datum'}; } " 'ServerName' '10.10.10.11' 'Windows Server 2012'
    (Update: However, this solution is fragile; see Parsing HTML/XML with Regular Expressions (thanks to haukex).) The prepended labels I leave to you.

    Update: The following works as well and may be preferable to a  s/// substitution as it is simpler:
        ($datum) = $datum =~ m{ (?<= >) [^<]* }xmsg;


    Give a man a fish:  <%-{-{-{-<

      Thanks
Re: How to print data between tags from file sequentially?
by usemodperl (Beadle) on Jun 22, 2018 at 07:25 UTC
    #!/usr/bin/perl # # This is an answer, in comments and code, to the question: # How to print data between tags from file sequentially? # URL: https://perlmonks.org/index.pl?node_id=1217156 # # LET'S EMBED THE FILE IN THE SCRIPT TO MAKE THIS EASY! # YOUR FILE IS NOW ANYTHING AFTER __DATA__ AT THE END: # # open file # open(FILE, "data.txt") or die("Unable to open file"); # # ALWAYS START WITH THESE TWO LINES, FOR HELPFUL ERROR MESSAGES: use strict; use warnings; # read file into an array # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON: my @data = <DATA>; # close file # NOT NECESSARY ANYMORE: # close(FILE); # print file contents foreach my $line (@data) { # THIS PRINTS EACH LINE, NOT WHAT YOU WANT: # print $line; # MODIFY EACH LINE WITH A REGEX* FOR CUSTOM PRINT! # (*Short for "Regular Expression") # # Regex allows you to search and replace! # s is the command and // are the quotes: # s / FIND THIS TEXT / REPLACE WITH THIS / $line # For each line =~ # Do something like s/ # Search for < # a < character [^>]+ # and one or more characters: [ ]+ # that are not a > character: ^> > # Followed by a > character # (So anything like <whatever>) //gx; # Replace it with nothing: // # g means replace all of them (global) # x allows these comments because it is # usually put on one line like this: # # $line =~ s/<[^>]+>//g; # COOL! # NOW THE LINE IS CHANGED SO YOU CAN LOOK AT EACH LINE # WITH =~ TO FIND OUT WHICH TEXT TO PRINT: # SIMPLE: if ($line =~ /ServerName/) { print "Computer: $line"; } # SOMETHING MORE COMPLICATED: elsif ($line =~ /^ # IF LINE BEGINS WITH \d # A NUMBER (DIGIT) /x # (END REGEX, x JUST ALLOWS THESE COMMENTS) ){ # IF THE ELSIF DECISION IS YES BECAUSE WE # SAW A NUMBER, THEN DO THIS: print "IP Address: $line"; } # FIND WINDOWS OR LINUX OR MAC: # NOTE: i at the end of the next regex means # "case insensitive" so an "a" or "A" are equal. elsif ($line =~ /Windows|Linux|Mac/i) { print "OS: $line"; } # PRINT UNKNOWN LINES TOO SO YOU CAN SEE HOW # TO ADD MORE DECISIONS ABOVE AND MAKE IT # PRINT WHAT YOU WANT: else { print "UNKNOWN: $line"; } } # THE EMBEDDED FILE: __DATA__ <Answer type="string">ServerName</Answer> <Answer type="string">10.10.10.11</Answer> <Answer type="string">Windows Server 2012</Answer>
    STOP REINVENTING WHEELS, START BUILDING SPACE ROCKETS!CPAN 🐫

      Hi usemodperl,

      First of all I want to say thanks so much for this tutorial type solution!

      After changing the following line from

      # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON:

      my @data = <DATA>;

      to:

      my @data = <FILE>;

      here are my results after parsing the actual file:

      results:

      UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: ServerName UNKNOWN: 10.10.10.11 UNKNOWN: bfRootServer (0) OS: Linux Red Hat Enterprise Server 6.9 (2 +.6.32-696.23.1.el6.x86_64) UNKNOWN: Fri, 22 Jun 2018 10:26:53 -050 +0 UNKNOWN: 9.2.1.48 UNKNOWN: SUpportGroup1 UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: 34.402ms UNKNOWN: Plural UNKNOWN: UNKNOWN: UNKNOWN:

      contents of the source text/xml file excluding the actual infrastrucure names:

      <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property &quot;_SupportGroup&qu +ot;) of it) of bes computers whose ( name of it as lowercase starts w +ith &quot;ServerName&quot;)"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI>

      So to summarize what my desired output after parsing the file is:

      Computer: ServerName IP Address: 10.10.10.11 Root Server: bfRootServer OS: Windows Server 2012 Last Report Time: Fri, 22 Jun 2018 10:26:53 -0500 BES Agent Version: 9.2.1.48 Support Group: SupportGroup1

      Thanks again!

        #!/usr/bin/perl -l # # This is an answer, in comments and code, to the question: # How to print data between tags from file sequentially? # URL: https://perlmonks.org/index.pl?node_id=1217156 # =cut NOTE: THIS IS NOT MODERN PERL BEST PRACTICES! THIS IS THE SWISS ARMY CHAINSAW GETTING IT DONE THE OLD FASHIONED WAY: Perlmonks are technically correct about the best way to do it with modules but since OP can't install modules then perl's built in bag of tricks can save the day. This sort of thing is very well known to be a bad solution to a worse problem but sometimes you have to do what works instead of what is best. This is why perl can work miracles and also why people complain about unmaintainable code. To be good code this entire script would have to be rewritten using appropriate CPAN modules. Techniques and comments are geared entirely towards comprehension by the OP, still learning basics. =cut # # LET'S EMBED THE FILE IN THE SCRIPT TO MAKE THIS EASY! # YOUR FILE IS NOW ANYTHING AFTER __DATA__ AT THE END: # # open file # open(FILE, "data.txt") or die("Unable to open file"); # # OPENING FILES THE RIGHT WAY: # use autodie; # SO YOU DON'T HAVE TO CHECK # # OPEN LIKE THIS TO READ FILE: # open my $FILE, "<", "data.txt"; # # THEN YOU CAN DO: # my @data = <$FILE>; # # close $FILE; # # ALWAYS START WITH THESE TWO LINES, FOR HELPFUL ERROR MESSAGES: use strict; use warnings; # THIS MAKES ERROR MESSAGES EVEN BETTER BUT # SHOULD BE REMOVED WHEN DONE HACKING: use diagnostics; # THIS MODULE LETS YOU SEE DATA: use Data::Dumper; # read file into an array # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON: chomp(my @data = <DATA>); # CHOMP REMOVES END OF LINES: \n # LOOK AT DATA: print 'Input data: '; print Dumper @data; print 'That was @data (which now contains DATA).'; print 'Let\'s remove empty lines.'; print 'Press return to continue...'; <STDIN>; # PAUSE # GET RID OF EMPTY LINES: # \S+ means one or more characters that are not space. @data = grep /\S+/, @data; # LOOK AT DATA: print Dumper @data; print 'Empty lines removed.'; print 'Let\'s remove extra space.'; print 'Press return to continue...'; <STDIN>; # REMOVE LEADING SPACE FROM ALL LINES: foreach my $line (@data) { $line =~ s/^\s+//; } # LOOK AT DATA: print Dumper @data; print 'Extra space removed.'; print 'Let\'s make array @data into string $data.'; print 'Press return to continue...'; <STDIN>; # PUT THE ARRAY INTO A STRING: my $data = join "\n", @data; # LOOK AT DATA: print Dumper $data; print 'Made string $data from array @data.'; print 'Let\'s find Tuples in $data and put them in @dat2.'; print 'Press return to continue...'; <STDIN>; # PUT ALL THE TUPLES INTO A NEW ARRAY: my @dat2 = ($data =~ /<Tuple>(.*?)<\/Tuple>/sg); # LOOK AT DATA: print Dumper @dat2; print 'Found Tuples in $data and put them in @dat2.'; print 'Let\'s split @dat2 back into lines.'; print 'Press return to continue...'; <STDIN>; # SPLIT SELF BACK TO LINES @dat2 = map { split /\n/ } @dat2; # LOOK AT DATA: print Dumper @dat2; print 'Split @dat2. Let\'s remove empty lines.'; print 'Press return to continue...'; <STDIN>; # GET RID OF EMPTY LINES: @dat2 = grep /\S+/, @dat2; # LOOK AT DATA: print Dumper @dat2; print 'Removed empty lines.'; print 'Let\'s remove the tags.'; print 'Press return to continue...'; <STDIN>; foreach my $line (@dat2) { # REMOVE THE TAGS: $line =~ s/<[^>]+>//g; # COOL! # ALSO REMOVE THAT TRAILING ` FROM ANY LINE (IP): $line =~ s/\`$//; } # LOOK AT DATA: print Dumper @dat2; print 'Removed the tags.'; print 'Let\'s use our @labels and print formatted data!.'; print 'Press return to continue...'; <STDIN>; # SETUP A COUNTER TO KEEP TRACK OF LINES. SINCE WE KNOW # THERE ARE 7 FOR EACH RECORD, PRINT A SEPARATOR EVERY # 7 LINES. THIS IS BRITTLE: IF THE DATA CHANGES IT WILL # BREAK BUT IF THE DATA FORMAT IS STATIC THIS WILL WORK # TILL THE END OF TIME, OR TILL SOMEONE ELSE BREAKS IT # BY "UPGRADING" THE CODE THIS CODE RELIES ON. BRITTLE! my $count = 0; # DEFINE LABELS FOR EACH LINE OF DATA: my @labels = ( "Computer", "IP Address", "Root Server", "OS", "Last Report Time", "BES Agent Version", "Support Group", ); # GET THE NUMBER OF LABELS: my $size = scalar @labels; foreach my $line (@dat2) { chomp $line; # REMOVE LINE ENDING: \n print "$labels[$count]: $line"; # PRINT LABEL AND DATA (THIS BREAKS EA +SY) if ($count == ($size - 1)) { # BECAUSE... $count = -1; # COMPUTERS START COUNTING AT 0 print ""; # PRINT BLANK LINE TO SEPARATE RECORDS } $count++; # INCREMENT COUNT BY 1 } # THE EMBEDDED FILE: __DATA__ <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property &quot;_IRS_ServerRespo +nsibilityGroup&quot;) of it) of bes computers whose ( name of it as l +owercase starts with &quot;vtjaa42vl006052&quot;)"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property &quot;_IRS_ServerRespo +nsibilityGroup&quot;) of it) of bes computers whose ( name of it as l +owercase starts with &quot;vtjaa42vl006052&quot;)"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property &quot;_IRS_ServerRespo +nsibilityGroup&quot;) of it) of bes computers whose ( name of it as l +owercase starts with &quot;vtjaa42vl006052&quot;)"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI>
        STOP REINVENTING WHEELS, START BUILDING SPACE ROCKETS!CPAN 🐪
Re: How to print data between tags from file sequentially?
by jbodoni (Monk) on Jun 24, 2018 at 14:49 UTC

    Bearing in mind that you asked to print only the data between > < sequentially, here's one way to do it:

    #!/usr/bin/perl use strict;use warnings; while ( <> ) { print "$1\n" if /<Answer type="string">(.+?)<\/Answer>/; }
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1217156]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-19 17:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found