How to print data between tags from file sequentially?

TonyNY has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to print data between tags from file sequentially? by haukex (Archbishop) on Jun 22, 2018 at 08:09 UTC
I have a text file Are you sure it's a text file? It looks like XML (or maybe HTML?) to me. Even if just a section of a longer text file contains XML data, I would strongly recommend extracting that section of text and using a proper module to parse it (see e.g. Parsing HTML/XML with Regular Expressions for why). Mojo::DOM has a nice interface, and it handles even this case of the XML not having a root node: `use warnings; use strict; use Mojo::DOM; my $html = <<'END_HTML'; <Answer type="string">ServerName</Answer> <Answer type="string">10.10.10.11</Answer> <Answer type="string">Windows Server 2012</Answer> END_HTML # note: ->xml(1) turns on case sensitivity my $dom = Mojo::DOM->new->xml(1)->parse($html); my $values = $dom->find('Answer')->map('all_text'); print "Computer: $values->[0]\n"; print "IP Address: $values->[1]\n"; print "OS: $values->[2]\n";` [download] Minor edits	[reply] [d/l]
Re^2: How to print data between tags from file sequentially? by marto (Cardinal) on Jun 22, 2018 at 09:17 UTC
Good advice, use a proper parser, and I find Mojo::DOM makes things like this trivial, even for odd data.	[reply]
Re^2: How to print data between tags from file sequentially? by TonyNY (Beadle) on Jun 22, 2018 at 16:54 UTC
Actually it is an xml file but redirected the output to a text file. Unfortunately the environment where I work is too strict for me to install any modules but thanks.	[reply]
Re^3: How to print data between tags from file sequentially? by haukex (Archbishop) on Jun 22, 2018 at 17:42 UTC
the environment where I work is too strict for me to install any modules Really, not even with local::lib? Also have a look at Yes, even you can use CPAN.	[reply]
Re^4: How to print data between tags from file sequentially? by TonyNY (Beadle) on Jun 22, 2018 at 18:46 UTC
Re: How to print data between tags from file sequentially? by AnomalousMonk (Archbishop) on Jun 22, 2018 at 04:30 UTC
Quick and dirty: `c:\@Work\Perl\monks>perl -wMstrict -le "my @data = ( q{<Answer type='string'>ServerName</Answer>}, q{<Answer type='string'>10.10.10.11</Answer>}, q{<Answer type='string'>Windows Server 2012</Answer>}, ); ;; for my $datum (@data) { $datum =~ s{ \A .? (?<= >) ([^<]) .* }{$1}xms; print qq{'$datum'}; } " 'ServerName' '10.10.10.11' 'Windows Server 2012'` [download] (Update: However, this solution is fragile; see Parsing HTML/XML with Regular Expressions (thanks to haukex).) The prepended labels I leave to you. Update: The following works as well and may be preferable to a `s///` substitution as it is simpler: `($datum) = $datum =~ m{ (?<= >) [^<]* }xmsg;` Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: How to print data between tags from file sequentially? by TonyNY (Beadle) on Jun 22, 2018 at 16:56 UTC
Thanks	[reply]
Re: How to print data between tags from file sequentially? by usemodperl (Beadle) on Jun 22, 2018 at 07:25 UTC
#!/usr/bin/perl # # This is an answer, in comments and code, to the question: # How to print data between tags from file sequentially? # URL: https://perlmonks.org/index.pl?node_id=1217156 # # LET'S EMBED THE FILE IN THE SCRIPT TO MAKE THIS EASY! # YOUR FILE IS NOW ANYTHING AFTER __DATA__ AT THE END: # # open file # open(FILE, "data.txt") or die("Unable to open file"); # # ALWAYS START WITH THESE TWO LINES, FOR HELPFUL ERROR MESSAGES: use strict; use warnings; # read file into an array # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON: my @data = <DATA>; # close file # NOT NECESSARY ANYMORE: # close(FILE); # print file contents foreach my $line (@data) { # THIS PRINTS EACH LINE, NOT WHAT YOU WANT: # print $line; # MODIFY EACH LINE WITH A REGEX* FOR CUSTOM PRINT! # (*Short for "Regular Expression") # # Regex allows you to search and replace! # s is the command and // are the quotes: # s / FIND THIS TEXT / REPLACE WITH THIS / $line # For each line =~ # Do something like s/ # Search for < # a < character [^>]+ # and one or more characters: [ ]+ # that are not a > character: ^> > # Followed by a > character # (So anything like <whatever>) //gx; # Replace it with nothing: // # g means replace all of them (global) # x allows these comments because it is # usually put on one line like this: # # $line =~ s/<[^>]+>//g; # COOL! # NOW THE LINE IS CHANGED SO YOU CAN LOOK AT EACH LINE # WITH =~ TO FIND OUT WHICH TEXT TO PRINT: # SIMPLE: if ($line =~ /ServerName/) { print "Computer: $line"; } # SOMETHING MORE COMPLICATED: elsif ($line =~ /^ # IF LINE BEGINS WITH \d # A NUMBER (DIGIT) /x # (END REGEX, x JUST ALLOWS THESE COMMENTS) ){ # IF THE ELSIF DECISION IS YES BECAUSE WE # SAW A NUMBER, THEN DO THIS: print "IP Address: $line"; } # FIND WINDOWS OR LINUX OR MAC: # NOTE: i at the end of the next regex means # "case insensitive" so an "a" or "A" are equal. elsif ($line =~ /Windows\|Linux\|Mac/i) { print "OS: $line"; } # PRINT UNKNOWN LINES TOO SO YOU CAN SEE HOW # TO ADD MORE DECISIONS ABOVE AND MAKE IT # PRINT WHAT YOU WANT: else { print "UNKNOWN: $line"; } } # THE EMBEDDED FILE: __DATA__ <Answer type="string">ServerName</Answer> <Answer type="string">10.10.10.11</Answer> <Answer type="string">Windows Server 2012</Answer> [download] _{STOP REINVENTING WHEELS, START BUILDING SPACE ROCKETS!—CPAN} 🐫	[reply] [d/l]
Re^2: How to print data between tags from file sequentially? by TonyNY (Beadle) on Jun 22, 2018 at 16:34 UTC
Hi usemodperl, First of all I want to say thanks so much for this tutorial type solution! After changing the following line from # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON: my @data = <DATA>; to: my @data = <FILE>; here are my results after parsing the actual file: results: `UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: ServerName UNKNOWN: 10.10.10.11 UNKNOWN: bfRootServer (0) OS: Linux Red Hat Enterprise Server 6.9 (2 +.6.32-696.23.1.el6.x86_64) UNKNOWN: Fri, 22 Jun 2018 10:26:53 -050 +0 UNKNOWN: 9.2.1.48 UNKNOWN: SUpportGroup1 UNKNOWN: UNKNOWN: UNKNOWN: UNKNOWN: 34.402ms UNKNOWN: Plural UNKNOWN: UNKNOWN: UNKNOWN:` [download] contents of the source text/xml file excluding the actual infrastrucure names: <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property "_SupportGroup&qu +ot;) of it) of bes computers whose ( name of it as lowercase starts w +ith "ServerName")"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> [download] So to summarize what my desired output after parsing the file is: `Computer: ServerName IP Address: 10.10.10.11 Root Server: bfRootServer OS: Windows Server 2012 Last Report Time: Fri, 22 Jun 2018 10:26:53 -0500 BES Agent Version: 9.2.1.48 Support Group: SupportGroup1` [download] Thanks again!	[reply] [d/l] [select]
Re^3: How to print data between tags from file sequentially? by usemodperl (Beadle) on Jun 22, 2018 at 21:06 UTC
#!/usr/bin/perl -l # # This is an answer, in comments and code, to the question: # How to print data between tags from file sequentially? # URL: https://perlmonks.org/index.pl?node_id=1217156 # =cut NOTE: THIS IS NOT MODERN PERL BEST PRACTICES! THIS IS THE SWISS ARMY CHAINSAW GETTING IT DONE THE OLD FASHIONED WAY: Perlmonks are technically correct about the best way to do it with modules but since OP can't install modules then perl's built in bag of tricks can save the day. This sort of thing is very well known to be a bad solution to a worse problem but sometimes you have to do what works instead of what is best. This is why perl can work miracles and also why people complain about unmaintainable code. To be good code this entire script would have to be rewritten using appropriate CPAN modules. Techniques and comments are geared entirely towards comprehension by the OP, still learning basics. =cut # # LET'S EMBED THE FILE IN THE SCRIPT TO MAKE THIS EASY! # YOUR FILE IS NOW ANYTHING AFTER __DATA__ AT THE END: # # open file # open(FILE, "data.txt") or die("Unable to open file"); # # OPENING FILES THE RIGHT WAY: # use autodie; # SO YOU DON'T HAVE TO CHECK # # OPEN LIKE THIS TO READ FILE: # open my $FILE, "<", "data.txt"; # # THEN YOU CAN DO: # my @data = <$FILE>; # # close $FILE; # # ALWAYS START WITH THESE TWO LINES, FOR HELPFUL ERROR MESSAGES: use strict; use warnings; # THIS MAKES ERROR MESSAGES EVEN BETTER BUT # SHOULD BE REMOVED WHEN DONE HACKING: use diagnostics; # THIS MODULE LETS YOU SEE DATA: use Data::Dumper; # read file into an array # PUT my BEFORE ALL VARIABLES TO PREVENT TYPOS LATER ON: chomp(my @data = <DATA>); # CHOMP REMOVES END OF LINES: \n # LOOK AT DATA: print 'Input data: '; print Dumper @data; print 'That was @data (which now contains DATA).'; print 'Let\'s remove empty lines.'; print 'Press return to continue...'; <STDIN>; # PAUSE # GET RID OF EMPTY LINES: # \S+ means one or more characters that are not space. @data = grep /\S+/, @data; # LOOK AT DATA: print Dumper @data; print 'Empty lines removed.'; print 'Let\'s remove extra space.'; print 'Press return to continue...'; <STDIN>; # REMOVE LEADING SPACE FROM ALL LINES: foreach my $line (@data) { $line =~ s/^\s+//; } # LOOK AT DATA: print Dumper @data; print 'Extra space removed.'; print 'Let\'s make array @data into string $data.'; print 'Press return to continue...'; <STDIN>; # PUT THE ARRAY INTO A STRING: my $data = join "\n", @data; # LOOK AT DATA: print Dumper $data; print 'Made string $data from array @data.'; print 'Let\'s find Tuples in $data and put them in @dat2.'; print 'Press return to continue...'; <STDIN>; # PUT ALL THE TUPLES INTO A NEW ARRAY: my @dat2 = ($data =~ /<Tuple>(.*?)<\/Tuple>/sg); # LOOK AT DATA: print Dumper @dat2; print 'Found Tuples in $data and put them in @dat2.'; print 'Let\'s split @dat2 back into lines.'; print 'Press return to continue...'; <STDIN>; # SPLIT SELF BACK TO LINES @dat2 = map { split /\n/ } @dat2; # LOOK AT DATA: print Dumper @dat2; print 'Split @dat2. Let\'s remove empty lines.'; print 'Press return to continue...'; <STDIN>; # GET RID OF EMPTY LINES: @dat2 = grep /\S+/, @dat2; # LOOK AT DATA: print Dumper @dat2; print 'Removed empty lines.'; print 'Let\'s remove the tags.'; print 'Press return to continue...'; <STDIN>; foreach my $line (@dat2) { # REMOVE THE TAGS: $line =~ s/<[^>]+>//g; # COOL! # ALSO REMOVE THAT TRAILING ` FROM ANY LINE (IP): $line =~ s/\`$//; } # LOOK AT DATA: print Dumper @dat2; print 'Removed the tags.'; print 'Let\'s use our @labels and print formatted data!.'; print 'Press return to continue...'; <STDIN>; # SETUP A COUNTER TO KEEP TRACK OF LINES. SINCE WE KNOW # THERE ARE 7 FOR EACH RECORD, PRINT A SEPARATOR EVERY # 7 LINES. THIS IS BRITTLE: IF THE DATA CHANGES IT WILL # BREAK BUT IF THE DATA FORMAT IS STATIC THIS WILL WORK # TILL THE END OF TIME, OR TILL SOMEONE ELSE BREAKS IT # BY "UPGRADING" THE CODE THIS CODE RELIES ON. BRITTLE! my $count = 0; # DEFINE LABELS FOR EACH LINE OF DATA: my @labels = ( "Computer", "IP Address", "Root Server", "OS", "Last Report Time", "BES Agent Version", "Support Group", ); # GET THE NUMBER OF LABELS: my $size = scalar @labels; foreach my $line (@dat2) { chomp $line; # REMOVE LINE ENDING: \n print "$labels[$count]: $line"; # PRINT LABEL AND DATA (THIS BREAKS EA +SY) if ($count == ($size - 1)) { # BECAUSE... $count = -1; # COMPUTERS START COUNTING AT 0 print ""; # PRINT BLANK LINE TO SEPARATE RECORDS } $count++; # INCREMENT COUNT BY 1 } # THE EMBEDDED FILE: __DATA__ <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property "_IRS_ServerRespo +nsibilityGroup") of it) of bes computers whose ( name of it as l +owercase starts with "vtjaa42vl006052")"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property "_IRS_ServerRespo +nsibilityGroup") of it) of bes computers whose ( name of it as l +owercase starts with "vtjaa42vl006052")"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNa +mespaceSchemaLocation="BESAPI.xsd"> <Query Resource="(names of it, ip addresses of it, root server + of it, operating systems of it, last report time of it, agent versio +ns of it, values of results from (BES Property "_IRS_ServerRespo +nsibilityGroup") of it) of bes computers whose ( name of it as l +owercase starts with "vtjaa42vl006052")"> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Result> <Tuple> <Answer type="string">ServerName</Answ +er> <Answer type="string">10.10.10.1`</Ans +wer> <Answer type="string">bfRootServer (0) +</Answer> <Answer type="string">Linux Red Hat En +terprise Server 6.9 (2.6.32-696.23.1.el6.x86_64)</Answer> <Answer type="time">Fri, 22 Jun 2018 1 +0:26:53 -0500</Answer> <Answer type="string">9.2.1.48</Answer +> <Answer type="string">SupportGroup1</A +nswer> </Tuple> </Result> <Evaluation> <Time>34.402ms</Time> <Plurality>Plural</Plurality> </Evaluation> </Query> </BESAPI> [download] _{STOP REINVENTING WHEELS, START BUILDING SPACE ROCKETS!—CPAN} 🐪	[reply] [d/l]
Re: How to print data between tags from file sequentially? by jbodoni (Monk) on Jun 24, 2018 at 14:49 UTC
Bearing in mind that you asked to print only the data between > < sequentially, here's one way to do it: `#!/usr/bin/perl use strict;use warnings; while ( <> ) { print "$1\n" if /<Answer type="string">(.+?)<\/Answer>/; }` [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl: the Markov chain saw
	PerlMonks