Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: creating array of hashes from input file

by Corion (Patriarch)
on Mar 07, 2017 at 13:49 UTC ( [id://1183829]=note: print w/replies, xml ) Need Help??


in reply to creating array of hashes from input file

My approach when parsing lists depends on the nature of the list.

If I can find out from looking at a single line what kind it is, then I use regular expressions to fill out a hash and flush the record whenever a new set starts.

If I can't find out from looking at a single line what kind it is, I use counters or flags to know what line I am on.

In your case, it looks to me as if you basically have a report with some header data and then three payload lines, the issuance date, the address and the description. One ugly thing seems to be that the address and the permission can span multiple lines, but from the unrepresentative example you have posted, each item seems to be delimited by a blank line from the previous item.

Going from these assumptions, my approach would be something like the following (untested):

#!perl -w use strict; use Data::Dumper; # Output a row of information sub flush { my( $record ) = @_; if( $record->{permit} ) { print Dumper $record; }; delete $record->{permit}; }; # This will collect all information for one entry: my %info; my $last_page; my $expected_pages; my $record_kind; my %next_record = ( address => 'description', description => undef, ); while(<DATA>) { if( m!^Page (\d+) of (\d+)! ) { $last_page = $1; $expected_pages ||= $2; next; }; if( m!^(Jan|Feb|...|Jun|...) (19\d\d|20\d\d)! ) { $info{ report_date } = "$2-$1"; next; }; # ... more code to skip the header left for the reader next if( m!MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT! ); next if( m!^Permit Issued! ); if( m!^(\d\d)/(\d\d)/((?:19|20)\d\d)$! ) { flush(\%info); $info{ permit } = "$3-$2-$1"; $record_kind = 'address'; <>; # skip empty line next }; # Fast-forward until the next set of lines if( $record_kind ) { while( <> !~ /^\s*$/ ) { s!\s*$!!; $info{ $record_kind } .= " " . $_; }; $record_kind = $next_record{ $record_kind }; } else { die "Unknown line [$_] on line $."; }; }; warn "Uhoh - expected $expected_pages but only read up to $last_page" if( $expected_pages != $last_page ); __DATA__ Page 1 of 3 100 Civic Center Way Calabasas, California 91302 7/12/2012 9:21:02AM MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT Jun 2012 Permit Issued Address Description 06/01/2012 26166 ROYMOR DR Upgrade panel from 100 amp to 200 amp 06/04/2012 24956 NORMANS WAY (6) light fixtures @ patio; (3) branch circuits; (4) electric heaters 06/05/2012 4273 VICASA DR Construct 339 SF Covered Loggia

Replies are listed 'Best First'.
Re^2: creating array of hashes from input file
by chimiXchanga (Novice) on Mar 08, 2017 at 06:38 UTC
    that looks beautiful! may I kindly ask, would this layout would be an easier an approach to parse? And if so, how would you approach joining the data on the next line to the proper column?
    100 Civic Center Way + Page 1 of 3 Calabasas, California 91302 + 7/12/2012 9:21:02AM MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT Jun 2012 Permit Issued Address Description 06/01/2012 26166 ROYMOR Upgrade panel fr +om 100 amp to 200 amp DR 06/04/2012 24956 NORMANS (6) light fixtur +es @ patio; (3) branch circuits; (4) electric heaters WAY 06/05/2012 4273 VICASA Construct 339 SF + Covered Loggia DR 06/07/2012 26011 ALIZIA CANYON R/R (1) <100K BT +U Furnace in garage; () <100K BTU condenser outsid +e (NO DUCTS TO BE CHANGED OUT) DR E 06/07/2012 4240 LOST HILLS R/R (7) windows +(like for like) AT LEAST ONE PANE MUST BE TEMPERED RD 1503 06/08/2012 3574 ELM Construct Retain +ing Wall in front of (E) retaining wall: 4 1/2' average x 3 +2 LF = approx. 144 SF DR 06/13/2012 4026 TOWHEE Construct a 460 +SF Pool/49 SF Spa DR
      would this layout would be an easier an approach to parse

      Maybe, depends on what pages 2 and 3 are like. Are you parsing many reports all formatted the same ?. Or is the 3 pages just a small sample of the real report.

      #!perl use strict; use Data::Dumper; my $infile = 'report1.txt'; # date address description my $fmt = "A16 A38 A*"; my @data = (); my $recno = -1; my $flag = 0; open IN,'<',$infile or die "$infile $!"; while (<IN>){ chomp; next unless /\S/; $flag = 0 if /Page \d of \d/; my ($date,$addr,$desc) = unpack $fmt,$_; if ( $date =~ /\d\d.\d\d.20\d\d/ ){ $flag = 1; ++$recno; $data[$recno] = [ $date,$addr,$desc ]; } elsif ($flag) { $data[$recno][1] .= ' '.$addr if $addr; $data[$recno][2] .= ' '.$desc if $desc; } } close IN; print scalar(@data)." records read\n"; print Dumper \@data;
      poj

      Here, instead of a regex I would likely use unpack with the appropriate template(s).

      Append to %info until you encounter an empty line, then flush.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183829]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-25 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found