Re: creating array of hashes from input file

My approach when parsing lists depends on the nature of the list.

If I can find out from looking at a single line what kind it is, then I use regular expressions to fill out a hash and flush the record whenever a new set starts.

If I can't find out from looking at a single line what kind it is, I use counters or flags to know what line I am on.

In your case, it looks to me as if you basically have a report with some header data and then three payload lines, the issuance date, the address and the description. One ugly thing seems to be that the address and the permission can span multiple lines, but from the unrepresentative example you have posted, each item seems to be delimited by a blank line from the previous item.

Going from these assumptions, my approach would be something like the following (untested):

#!perl -w
use strict;
use Data::Dumper;

# Output a row of information
sub flush {
    my( $record ) = @_;
    if( $record->{permit} ) {
        print Dumper $record;
    };
    delete $record->{permit};
};



# This will collect all information for one entry:
my %info;
my $last_page;
my $expected_pages;
my $record_kind;

my %next_record = (
    address => 'description',
    description => undef,
);

while(<DATA>) {
    if( m!^Page (\d+) of (\d+)! ) {
        $last_page = $1;
        $expected_pages ||= $2;
        next;
    };
    if( m!^(Jan|Feb|...|Jun|...) (19\d\d|20\d\d)! ) {
        $info{ report_date } = "$2-$1";
        next;
    };

    # ... more code to skip the header left for the reader
    next if( m!MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT! );
    next if( m!^Permit Issued! );

    if( m!^(\d\d)/(\d\d)/((?:19|20)\d\d)$! ) {
        flush(\%info);
        $info{ permit } = "$3-$2-$1";
        $record_kind = 'address';
        <>; # skip empty line
        next
    };

    # Fast-forward until the next set of lines
    if( $record_kind ) {
        while( <> !~ /^\s*$/ ) {
            s!\s*$!!;
            $info{ $record_kind } .= " " . $_;
        };
        $record_kind = $next_record{ $record_kind };
    } else {
        die "Unknown line [$_] on line $.";
    };
};

warn "Uhoh - expected $expected_pages but only read up to $last_page"
    if( $expected_pages != $last_page );

__DATA__
Page 1 of 3

100 Civic Center Way
Calabasas, California 91302

7/12/2012 9:21:02AM

MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT
Jun 2012

Permit Issued

Address

Description

06/01/2012

26166 ROYMOR
DR

Upgrade panel from 100 amp to 200 amp

06/04/2012

24956 NORMANS
WAY

(6) light fixtures @ patio; (3) branch circuits; (4) electric
heaters

06/05/2012

4273 VICASA
DR

Construct 339 SF Covered Loggia
[download]

Comment on Re: creating array of hashes from input file Download Code

Replies are listed 'Best First'.
Re^2: creating array of hashes from input file by chimiXchanga (Novice) on Mar 08, 2017 at 06:38 UTC
that looks beautiful! may I kindly ask, would this layout would be an easier an approach to parse? And if so, how would you approach joining the data on the next line to the proper column? 100 Civic Center Way + Page 1 of 3 Calabasas, California 91302 + 7/12/2012 9:21:02AM MONTHLY EXTERNAL MODIFICATIONS PERMITS REPORT Jun 2012 Permit Issued Address Description 06/01/2012 26166 ROYMOR Upgrade panel fr +om 100 amp to 200 amp DR 06/04/2012 24956 NORMANS (6) light fixtur +es @ patio; (3) branch circuits; (4) electric heaters WAY 06/05/2012 4273 VICASA Construct 339 SF + Covered Loggia DR 06/07/2012 26011 ALIZIA CANYON R/R (1) <100K BT +U Furnace in garage; () <100K BTU condenser outsid +e (NO DUCTS TO BE CHANGED OUT) DR E 06/07/2012 4240 LOST HILLS R/R (7) windows +(like for like) AT LEAST ONE PANE MUST BE TEMPERED RD 1503 06/08/2012 3574 ELM Construct Retain +ing Wall in front of (E) retaining wall: 4 1/2' average x 3 +2 LF = approx. 144 SF DR 06/13/2012 4026 TOWHEE Construct a 460 +SF Pool/49 SF Spa DR [download]	[reply] [d/l]
Re^3: creating array of hashes from input file by poj (Abbot) on Mar 08, 2017 at 07:46 UTC
would this layout would be an easier an approach to parse Maybe, depends on what pages 2 and 3 are like. Are you parsing many reports all formatted the same ?. Or is the 3 pages just a small sample of the real report. #!perl use strict; use Data::Dumper; my $infile = 'report1.txt'; # date address description my $fmt = "A16 A38 A*"; my @data = (); my $recno = -1; my $flag = 0; open IN,'<',$infile or die "$infile $!"; while (<IN>){ chomp; next unless /\S/; $flag = 0 if /Page \d of \d/; my ($date,$addr,$desc) = unpack $fmt,$_; if ( $date =~ /\d\d.\d\d.20\d\d/ ){ $flag = 1; ++$recno; $data[$recno] = [ $date,$addr,$desc ]; } elsif ($flag) { $data[$recno][1] .= ' '.$addr if $addr; $data[$recno][2] .= ' '.$desc if $desc; } } close IN; print scalar(@data)." records read\n"; print Dumper \@data; [download] poj	[reply] [d/l]
Re^3: creating array of hashes from input file by Corion (Patriarch) on Mar 08, 2017 at 08:49 UTC
Here, instead of a regex I would likely use unpack with the appropriate template(s). Append to `%info` until you encounter an empty line, then flush.	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks