Re: perl parsing
by Laurent_R (Canon) on Oct 04, 2017 at 06:27 UTC
|
You've been given a solution that presumably works fine, but I would like to comment with a side note.
my @file = `cat text.txt`;
Calling the system or shell for reading the file is really poor practice in Perl (except possibly for command-line one-liners). Perl offers all the tools to do that with much better control on what happens and what to do if something goes wrong.
Look at the way poj opens and reads the file in pure Perl, that's much better.
| [reply] [d/l] |
|
use File::Slurp;
my @file = read_file('text.txt');
| [reply] [d/l] |
|
Please, don't recommend broken modules.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] |
Re: perl parsing
by AnomalousMonk (Archbishop) on Oct 04, 2017 at 05:48 UTC
|
I can parse for the name ...
You could parse for the name if the code compiled, but it doesn't. After some fixes, you can get the following, but there seems to be another problem.
c:\@Work\Perl\monks\cbtshare>perl -wMstrict -le
"my @file = `cat text.txt`;
;;
foreach my $line (@file) {
while ($line =~ /name \s+(.*?) \s+(.*?)/mgx) {
my $name = $1;
print qq{name '$name' other '$2'};
}
}
"
name 'Brian' other ''
name 'Andrew' other ''
name 'ryan' other ''
Why is $2 always empty?
Update: Also, is there any point to the /g modifier in the /name \s+(.*?) \s+(.*?)/mgx match?
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: perl parsing
by poj (Abbot) on Oct 04, 2017 at 06:02 UTC
|
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $infile = 'text.txt';
open IN,'<',$infile
or die "Could not open $infile : $!";
my $name;
my %hash = ();
while (<IN>){
s/^\s+|\s+$//g; # trim leading/trailing spaces
my ($col1,$col2) = split /\s+/,$_,2;
if ($col1 eq 'name'){
$name = $col2;
} elsif ($col1 eq 'device') {
push @{$hash{$name}},$col2;
} else {
# skip line
}
}
close IN;
print Dumper \%hash;
poj | [reply] [d/l] |
Re: perl parsing
by Marshall (Canon) on Oct 04, 2017 at 08:33 UTC
|
A rather strange looking solution, but with an approach that can be extended to many such situations: (and no I don't think this is the "best" solution).
#!/usr/bin/perl
use strict;
use warnings;
my $line;
while ( defined ($line = <DATA>))
{
if ($line =~ /^name/)
{
$line = process_record ($line);
redo if defined $line; # another name line
}
}
sub process_record
{
my $line = shift;
(my $name) = $line =~ /^name\s+(\w+)/;
my %devices;
while (defined ($line = <DATA>) and $line !~ /^name/)
{
if ( (my $device) = $line =~ /^device\s+(\w+\s+\w+)/)
{
$device =~ s/(\w+)\s+(\w+)/$1 $2/;
$devices{$device}=1;
}
}
print "$name:\n";
print " device $_\n" foreach keys %devices;
return $line;
}
=PRINTS:
Brian:
device ipad 2001
Andrew:
device ipad 2009
ryan:
device ipad 2005
device cell 2009
=cut
__DATA__
socks something
name Brian
shirt yellow
socks black
device ipad 2001
device ipad 2001
device ipad 2001
tag no
tag 0
name Andrew
shirt orange
socks black
device ipad 2009
tag no
tag 0
name ryan
shirt blue
socks black
device ipad 2005
device cell 2009
tag yes
tag 1
| [reply] [d/l] |
|
| [reply] |
|
| [reply] |
|
| [reply] [d/l] |
|
Well I figured that these were "dupes". Consider what would happen if $devices{$device}=1; was changed to $devices{$device}++; and what that would mean for adapting the printout of the hash to show the number of identical devices.
| [reply] [d/l] [select] |
Re: perl parsing
by kcott (Archbishop) on Oct 05, 2017 at 03:11 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use constant {
IN_FILE => 'pm_1200636_text.txt',
HEADER => 0,
KEY => 1,
VALUE => 2,
};
my %parsed;
{
open my $fh, '<', IN_FILE;
my $name;
while (<$fh>) {
my @fields = split;
if ($fields[HEADER] eq 'name') {
$name = $fields[KEY];
next;
}
if ($fields[HEADER] eq 'device') {
push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE];
next;
}
}
}
# For testing only
use Data::Dump;
dd \%parsed;
This only reads a record at a time, so there should be no memory issues that might occur when slurping entire files.
The only data that persists after the anonymous block is %parsed:
process that as necessary.
Also note that as $fh goes out of scope at the end of the anonymous block,
Perl automatically closes this for you (there's no need for a close statement in this instance).
I used the same data as you posted (see the spoiler).
Output from a sample run:
{
Andrew => { ipad => [2009] },
Brian => { ipad => [2001, 2001, 2001] },
ryan => { cell => [2009], ipad => [2005] },
}
See also:
"perldsc - Perl Data Structures Cookbook";
autodie;
open; and,
Data::Dump.
Everything else is very straightforward and basic Perl, but feel free to ask if anything is unclear.
| [reply] [d/l] [select] |
|
while (<IN>){
#remove spaces from the beginning or the end of the file
s/^\s+|\s+$//g;
# splits the files based on columns based on space and limit the amoun
+t split by 4
my ($col1,$col2,$col3) = split /\s+/,$_,4;
#checks to see if the word name is matched to get the variable next ov
+er which would be the actual name , then put it in variable $name
if ($col1 eq 'name'){
$name = $col2;
#checks to see if the word device is matched to get the variable next
+over which would be the actual type, then next over is another attrib
+ute(not on the example)
} elsif ($col1 eq 'device')
{
##Here the push name, device type and other variable into a hash
push @{$hash{$name}},$col2, $col3;
}
else
{
# skip line
}
}
close IN;
#prints everything
print Dumper \%hash
My issue now comes when I need to print out the content in a structure way, or into a file
name
device $col3
device $col3
I can sort through hash and get the name only, not all the other attributes.But why? I put them all into the hash right?
foreach my $line(keys %hash)
{
print $line
}
I believe you are doing somewhat similar
##defining the fields you want including the file, HEADER would be the
+ first field and if name or device then KEY is the next value over an
+d VALUE the next
use constant {
IN_FILE => 'pm_1200636_text.txt',
HEADER => 0,
KEY => 1,
VALUE => 2,
};
my %parsed;
{
open my $fh, '<', IN_FILE;
my $name;
while (<$fh>) {
my @fields = split;
if ($fields[HEADER] eq 'name') {
$name = $fields[KEY];
next;
}
This is the part that gives me issues since I need to print the values in a specified format, so data dumper wouldnt work , any help please?
if ($fields[HEADER] eq 'device') {
push @{$parsed{$name}{$fields[KEY]}}, $fields[VALUE];
next;
}
}
}
| [reply] [d/l] [select] |
|
Your analysis of what the code is doing is mostly correct.
In places, you indicate that operations are being performed on "files";
both solutions are reading the files line-by-line, and those operations are being performed on "records".
Consider these corrections:
#remove spaces from both the beginning orand the end of the filerecord
# splits the filesrecords based on ...
You also appear to have misunderstood the LIMIT argument
of split:
you've used a value of 4 in two places,
which doesn't make much sense as the maximum number of fields of any record is 3.
Further reading of that documentation will explain why "@fields = split;"
needs no arguments nor any preprocessing to trim whitespace.
The data structures produced by the two solutions are different: an HoA and an HoHoA.
We both provided a link to perldsc:
perhaps you need to read, reread or study in more detail.
The part that seems to elude you, in both cases, is how to translate the information in the data structures
to whatever output format you need.
You wrote (at the end of each of those analyses, respectively):
"My issue now comes when I need to print out the content in a structure way, ..."
"This is the part that gives me issues since I need to print the values in a specified format, ..."
Without any knowledge of the required output format, there's no way we can help.
Again, the perldsc documentation has several sections on accessing
the data in complex structures: the answer probably lies therein.
There are a few other areas where it looks like you really don't understand certain fundamentals.
For instance, using the name $line for the variable that holds a key in:
foreach my $line(keys %hash)
{
print $line
}
would seem to indicate that you don't know what keys does.
I would recommend that you bookmark perlintro and refer to it often.
Make sure you understand the very basic information it presents, then follow links to related functions,
in-depth documentation, tutorials, advanced topics, and so on, as necessary.
For instance, the section on Hashes has links to keys
and values
(I half suspect that, in the code previously mentioned, "values %hash" was probably closer to what you wanted,
instead of "keys %hash");
you'll also find many others such as perldata (fuller details),
perlreftut (tutorial), and even
perldsc (advanced topic already mentioned).
Do note that's just some of the links in one of many sections:
the entire document is like that and I think you'll find it a most useful resource.
| [reply] [d/l] [select] |
|
|
Re: perl parsing
by Marshall (Canon) on Oct 06, 2017 at 02:34 UTC
|
I saw your question about accounting for Brian having more than one of the same device. Here is yet another solution... I didn't use a HoH in my first solution partly because that can be a difficult concept for beginners.
In general I don't recommend approaches that require reading the entire input file into memory and then parsing that memory copy of the file because that often essentially means that the data is being "handled" in some way more than once and can take a lot of memory in the process. Some of the files that I work with can get quite large.
#!/usr/bin/perl
use strict;
use warnings;
my %devices; # a HOH Hash of Hash {name}{device}
my $current_name;
while ( my $line = <DATA>)
{
$current_name = $1 if ($line =~ m/^name\s+(\w+)\s+/);
if ( (my $device) = $line =~ /^device\s+([\w\s]+)\n/)
{
$device =~ s/[ ]+/ /g; # multiple-space to a single space
$devices{$current_name}{$device}++;
}
}
# print the %devices hash - requires 2 loops
foreach my $name (sort keys %devices)
{
print "$name:\n";
foreach my $device (keys %{$devices{$name}})
{
print " $devices{$name}{$device}\t$device\n";
}
}
=Prints
Andrew:
1 ipad 2009
Brian:
3 ipad 2001
ryan:
1 ipad 2005
1 cell 2009
=cut
__DATA__
socks something
name Brian
shirt yellow
socks black
device ipad 2001
device ipad 2001
device ipad 2001
tag no
tag 0
name Andrew
shirt orange
socks black
device ipad 2009
tag no
tag 0
name ryan
shirt blue
socks black
device ipad 2005
device cell 2009
tag yes
tag 1
| [reply] [d/l] |
|
foreach my $line(keys %hash)
{
print "$line\n"; ##This works and prints the names
foreach my $sit (keys %{$hash{$line}}) #### <--line 42 line throwin
+g error
{
print "$hash{$line}{$sit}\n";
}
}
| [reply] [d/l] |
|
Show your complete code. I can't tell what your problem is from this snippet.
I avoided a HoH (Hash of Hash) in my first code post partly because as I suspected beginners have problems with this. You are proving me right.
I suggest that use perhaps my first code that doesn't use any complicated data structures. That will be easier for you to work with?
This code took me some minutes to write. It very well could be that it will take you literally hours to understand it. You will not learn if you don't put in the effort.
| [reply] |
|
|
|