perl parsing

cbtshare has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: perl parsing by Laurent_R (Canon) on Oct 04, 2017 at 06:27 UTC
You've been given a solution that presumably works fine, but I would like to comment with a side note. my @file = `cat text.txt`; [download] Calling the system or shell for reading the file is really poor practice in Perl (except possibly for command-line one-liners). Perl offers all the tools to do that with much better control on what happens and what to do if something goes wrong. Look at the way poj opens and reads the file in pure Perl, that's much better.	[reply] [d/l]
Re^2: perl parsing by AppleFritter (Vicar) on Oct 04, 2017 at 12:11 UTC
There's also File::Slurp, which is quite useful if you want to read an entire file in one go: `use File::Slurp; my @file = read_file('text.txt');` [download]	[reply] [d/l]
Re^3: perl parsing by choroba (Cardinal) on Oct 04, 2017 at 12:14 UTC
Please, don't recommend broken modules. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re: perl parsing by AnomalousMonk (Archbishop) on Oct 04, 2017 at 05:48 UTC
I can parse for the name ... You could parse for the name if the code compiled, but it doesn't. After some fixes, you can get the following, but there seems to be another problem. c:\@Work\Perl\monks\cbtshare>perl -wMstrict -le "my @file = `cat text.txt`; ;; foreach my $line (@file) { while ($line =~ /name \s+(.?) \s+(.?)/mgx) { my $name = $1; print qq{name '$name' other '$2'}; } } " name 'Brian' other '' name 'Andrew' other '' name 'ryan' other '' [download] Why is `$2` always empty? Update: Also, is there any point to the `/g` modifier in the `/name \s+(.?) \s+(.?)/mgx` match? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: perl parsing by poj (Abbot) on Oct 04, 2017 at 06:02 UTC
how to attribute all the devices to the names I guess you want to build a Hash of Arrays (HoA) `#!/usr/bin/perl use strict; use Data::Dumper; my $infile = 'text.txt'; open IN,'<',$infile or die "Could not open $infile : $!"; my $name; my %hash = (); while (<IN>){ s/^\s+\|\s+$//g; # trim leading/trailing spaces my ($col1,$col2) = split /\s+/,$_,2; if ($col1 eq 'name'){ $name = $col2; } elsif ($col1 eq 'device') { push @{$hash{$name}},$col2; } else { # skip line } } close IN; print Dumper \%hash;` [download] poj	[reply] [d/l]
Re: perl parsing by Marshall (Canon) on Oct 04, 2017 at 08:33 UTC
A rather strange looking solution, but with an approach that can be extended to many such situations: (and no I don't think this is the "best" solution). #!/usr/bin/perl use strict; use warnings; my $line; while ( defined ($line = <DATA>)) { if ($line =~ /^name/) { $line = process_record ($line); redo if defined $line; # another name line } } sub process_record { my $line = shift; (my $name) = $line =~ /^name\s+(\w+)/; my %devices; while (defined ($line = <DATA>) and $line !~ /^name/) { if ( (my $device) = $line =~ /^device\s+(\w+\s+\w+)/) { $device =~ s/(\w+)\s+(\w+)/$1 $2/; $devices{$device}=1; } } print "$name:\n"; print " device $_\n" foreach keys %devices; return $line; } =PRINTS: Brian: device ipad 2001 Andrew: device ipad 2009 ryan: device ipad 2005 device cell 2009 =cut __DATA__ socks something name Brian shirt yellow socks black device ipad 2001 device ipad 2001 device ipad 2001 tag no tag 0 name Andrew shirt orange socks black device ipad 2009 tag no tag 0 name ryan shirt blue socks black device ipad 2005 device cell 2009 tag yes tag 1 [download]	[reply] [d/l]
Re^2: perl parsing by cbtshare (Monk) on Oct 04, 2017 at 20:50 UTC
thank you all!!	[reply]
Re^2: perl parsing by cbtshare (Monk) on Oct 04, 2017 at 22:44 UTC
one issue is that Brian has 3 devices, you code prints one device ipad 2001 device ipad 2001 device ipad 2001	[reply]
Re^3: perl parsing by AnomalousMonk (Archbishop) on Oct 05, 2017 at 00:11 UTC
How could you change Marshall's solution or perhaps poj's here to give you the results you want? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^3: perl parsing by Marshall (Canon) on Oct 05, 2017 at 17:50 UTC
Well I figured that these were "dupes". Consider what would happen if `$devices{$device}=1;` was changed to `$devices{$device}++;` and what that would mean for adapting the printout of the hash to show the number of identical devices.	[reply] [d/l] [select]
Re: perl parsing by kcott (Archbishop) on Oct 05, 2017 at 03:11 UTC
G'day cbtshare, Here's the technique I might have used for this task: `#!/usr/bin/env perl use strict; use warnings; use autodie; use constant { IN_FILE => 'pm_1200636_text.txt', HEADER => 0, KEY => 1, VALUE => 2, }; my %parsed; { open my $fh, '<', IN_FILE; my $name; while (<$fh>) { my @fields = split; if ($fields[HEADER] eq 'name') { $name = $fields[KEY]; next; } if ($fields[HEADER] eq 'device') { push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE]; next; } } } # For testing only use Data::Dump; dd \%parsed;` [download] This only reads a record at a time, so there should be no memory issues that might occur when slurping entire files. The only data that persists after the anonymous block is `%parsed`: process that as necessary. Also note that as `$fh` goes out of scope at the end of the anonymous block, Perl automatically closes this for you (there's no need for a `close` statement in this instance). I used the same data as you posted (see the spoiler). `$ cat pm_1200636_text.txt name Brian shirt yellow socks black device ipad 2001 device ipad 2001 device ipad 2001 tag no tag 0 name Andrew shirt orange socks black device ipad 2009 tag no tag 0 name ryan shirt blue socks black device ipad 2005 device cell 2009 tag yes tag 1` [download] Output from a sample run: `{ Andrew => { ipad => [2009] }, Brian => { ipad => [2001, 2001, 2001] }, ryan => { cell => [2009], ipad => [2005] }, }` [download] See also: "perldsc - Perl Data Structures Cookbook"; autodie; open; and, Data::Dump. Everything else is very straightforward and basic Perl, but feel free to ask if anything is unclear. — Ken	[reply] [d/l] [select]
Re^2: perl parsing by cbtshare (Monk) on Oct 05, 2017 at 19:27 UTC
Thank very much .Your solution is quite similar to that of poj. I will attempt to explain what is being done and then how I used what I understood to try and arrive at the solution I need. while (<IN>){ #remove spaces from the beginning or the end of the file s/^\s+\|\s+$//g; # splits the files based on columns based on space and limit the amoun +t split by 4 my ($col1,$col2,$col3) = split /\s+/,$_,4; #checks to see if the word name is matched to get the variable next ov +er which would be the actual name , then put it in variable $name if ($col1 eq 'name'){ $name = $col2; #checks to see if the word device is matched to get the variable next +over which would be the actual type, then next over is another attrib +ute(not on the example) } elsif ($col1 eq 'device') { ##Here the push name, device type and other variable into a hash push @{$hash{$name}},$col2, $col3; } else { # skip line } } close IN; [download] #prints everything print Dumper \%hash My issue now comes when I need to print out the content in a structure way, or into a file name device $col3 device $col3 I can sort through hash and get the name only, not all the other attributes.But why? I put them all into the hash right? `foreach my $line(keys %hash) { print $line }` [download] I believe you are doing somewhat similar `##defining the fields you want including the file, HEADER would be the + first field and if name or device then KEY is the next value over an +d VALUE the next use constant { IN_FILE => 'pm_1200636_text.txt', HEADER => 0, KEY => 1, VALUE => 2, }; my %parsed; { open my $fh, '<', IN_FILE; my $name; while (<$fh>) { my @fields = split; if ($fields[HEADER] eq 'name') { $name = $fields[KEY]; next; }` [download] This is the part that gives me issues since I need to print the values in a specified format, so data dumper wouldnt work , any help please? `if ($fields[HEADER] eq 'device') { push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE]; next; } } }` [download]	[reply] [d/l] [select]
Re^3: perl parsing by kcott (Archbishop) on Oct 05, 2017 at 22:32 UTC
Your analysis of what the code is doing is mostly correct. In places, you indicate that operations are being performed on "files"; both solutions are reading the files line-by-line, and those operations are being performed on "records". Consider these corrections: #remove spaces from both the beginning orand the end of the ~~file~~record # splits the ~~files~~records based on ... You also appear to have misunderstood the LIMIT argument of split: you've used a value of `4` in two places, which doesn't make much sense as the maximum number of fields of any record is `3`. Further reading of that documentation will explain why "`@fields = split;`" needs no arguments nor any preprocessing to trim whitespace. The data structures produced by the two solutions are different: an HoA and an HoHoA. We both provided a link to perldsc: perhaps you need to read, reread or study in more detail. The part that seems to elude you, in both cases, is how to translate the information in the data structures to whatever output format you need. You wrote (at the end of each of those analyses, respectively): "My issue now comes when I need to print out the content in a structure way, ..." "This is the part that gives me issues since I need to print the values in a specified format, ..." Without any knowledge of the required output format, there's no way we can help. Again, the perldsc documentation has several sections on accessing the data in complex structures: the answer probably lies therein. There are a few other areas where it looks like you really don't understand certain fundamentals. For instance, using the name `$line` for the variable that holds a key in: `foreach my $line(keys %hash) { print $line }` [download] would seem to indicate that you don't know what keys does. I would recommend that you bookmark perlintro and refer to it often. Make sure you understand the very basic information it presents, then follow links to related functions, in-depth documentation, tutorials, advanced topics, and so on, as necessary. For instance, the section on Hashes has links to keys and values (I half suspect that, in the code previously mentioned, "`values %hash`" was probably closer to what you wanted, instead of "`keys %hash`"); you'll also find many others such as perldata (fuller details), perlreftut (tutorial), and even perldsc (advanced topic already mentioned). Do note that's just some of the links in one of many sections: the entire document is like that and I think you'll find it a most useful resource. — Ken	[reply] [d/l] [select]
Re^4: perl parsing by cbtshare (Monk) on Oct 06, 2017 at 01:57 UTC
Re^5: perl parsing by kcott (Archbishop) on Oct 06, 2017 at 05:35 UTC
Re: perl parsing by Marshall (Canon) on Oct 06, 2017 at 02:34 UTC
I saw your question about accounting for Brian having more than one of the same device. Here is yet another solution... I didn't use a HoH in my first solution partly because that can be a difficult concept for beginners. In general I don't recommend approaches that require reading the entire input file into memory and then parsing that memory copy of the file because that often essentially means that the data is being "handled" in some way more than once and can take a lot of memory in the process. Some of the files that I work with can get quite large. #!/usr/bin/perl use strict; use warnings; my %devices; # a HOH Hash of Hash {name}{device} my $current_name; while ( my $line = <DATA>) { $current_name = $1 if ($line =~ m/^name\s+(\w+)\s+/); if ( (my $device) = $line =~ /^device\s+([\w\s]+)\n/) { $device =~ s/[ ]+/ /g; # multiple-space to a single space $devices{$current_name}{$device}++; } } # print the %devices hash - requires 2 loops foreach my $name (sort keys %devices) { print "$name:\n"; foreach my $device (keys %{$devices{$name}}) { print " $devices{$name}{$device}\t$device\n"; } } =Prints Andrew: 1 ipad 2009 Brian: 3 ipad 2001 ryan: 1 ipad 2005 1 cell 2009 =cut __DATA__ socks something name Brian shirt yellow socks black device ipad 2001 device ipad 2001 device ipad 2001 tag no tag 0 name Andrew shirt orange socks black device ipad 2009 tag no tag 0 name ryan shirt blue socks black device ipad 2005 device cell 2009 tag yes tag 1 [download]	[reply] [d/l]
Re^2: perl parsing by cbtshare (Monk) on Oct 06, 2017 at 04:47 UTC
Thank I am understanding a bit better now, but I tried your solution , mainly the HOH printing but I am getting error.."Not a Hash reference at line 42 `foreach my $line(keys %hash) { print "$line\n"; ##This works and prints the names foreach my $sit (keys %{$hash{$line}}) #### <--line 42 line throwin +g error { print "$hash{$line}{$sit}\n"; } }` [download]	[reply] [d/l]
Re^3: perl parsing by Marshall (Canon) on Oct 06, 2017 at 05:13 UTC
Show your complete code. I can't tell what your problem is from this snippet. I avoided a HoH (Hash of Hash) in my first code post partly because as I suspected beginners have problems with this. You are proving me right. I suggest that use perhaps my first code that doesn't use any complicated data structures. That will be easier for you to work with? This code took me some minutes to write. It very well could be that it will take you literally hours to understand it. You will not learn if you don't put in the effort.	[reply]
Re^4: perl parsing by cbtshare (Monk) on Oct 06, 2017 at 15:29 UTC
Re^5: perl parsing by poj (Abbot) on Oct 06, 2017 at 16:59 UTC
Some notes below your chosen depth have not been shown here


"be consistent"
	PerlMonks