http://qs321.pair.com?node_id=11150119


in reply to Re: Multiple file handles
in thread Multiple file handles

If there were enough data to warrant an external application then I would be looking to use a database.

I agree. It's easy enough. It's Friday, i got my lunchbreak, so why not give the OP an example (using PostgreSQL, since that's what i have installed). Since OP didn't give us example files, let's just create some data out of thin air (zipcodes and census data) for our lovely tropical island. The censusdata is missing the zipcodes (because our friend Penultimo doesn't understand them) but it has valid city names.

#!/usr/bin/env perl use strict; use warnings; use Crypt::Digest::SHA256 qw[sha256_hex]; use Carp; # Make a fake zipcode list print "Writing ZIP codes\n"; open(my $zipfh, '>', 'zipcodes.txt') or croak($!); for(my $i = 10_000; $i <= 99_999; $i++) { # Fake a city name ;-) my $cityname = sha256_hex($i); print $zipfh "$i;$cityname\n"; } close $zipfh; # Make a fake data file print "Writing data\n"; open(my $datafh, '>', 'census.txt') or croak($!); for(my $i = 0; $i < 1_000_000; $i++) { my $zip = int(rand(89_999)) + 10_000; my $cityname = sha256_hex($zip); my $population = int(rand(1_000_000)); print $datafh "$cityname;$population\n"; } close $datafh; print "Done\n";

Next step is to log in to our database, create some tables and insert the data from the files. The last step is to add the missing zipcodes into the census data table using the city names.

#!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); my @stmts = ( "CREATE TABLE IF NOT EXISTS zipcodes ( zipcode text NOT NULL, cityname text NOT NULL, CONSTRAINT zipcodes_pk PRIMARY KEY(zipcode) )", "CREATE TABLE IF NOT EXISTS censusdata ( census_id bigserial, zipcode text, cityname text NOT NULL, peoplecount bigint NOT NULL, CONSTRAINT census_pk PRIMARY KEY(census_id) )", # Empty tables if they already have data "TRUNCATE zipcodes", "TRUNCATE censusdata", ); foreach my $stmt (@stmts) { print "Running $stmt...\n"; $dbh->do($stmt) or croak($dbh->errstr); } print "Loading zipcodes into database...\n"; my $zipsth = $dbh->prepare("INSERT INTO zipcodes (zipcode, cityname) V +ALUES (?, ?)") or croak($dbh->errstr); open(my $zipfh, '<', 'zipcodes.txt') or croak($!); while((my $line = <$zipfh>)) { chomp $line; my ($zipcode, $cityname) = split/\;/, $line; if(!$zipsth->execute($zipcode, $cityname)) { croak($dbh->errstr); } } close $zipfh; print "Loading census data into database...\n"; my $censussth = $dbh->prepare("INSERT INTO censusdata (cityname, peopl +ecount) VALUES (?, ?)") or croak($dbh->errstr); open(my $censusfh, '<', 'census.txt') or croak($!); while((my $line = <$censusfh>)) { chomp $line; my ($cityname, $population) = split/\;/, $line; if(!$censussth->execute($cityname, $population)) { croak($dbh->errstr); } } close $censusfh; print "Adding missing zipcodes to census data...\n"; $dbh->do("UPDATE censusdata c SET zipcode = z.zipcode FROM zipcodes z +WHERE c.cityname = z.cityname") or croak($dbh->errstr); $dbh->commit; print "Done\n";

Now that we have the data in the database, it's very simple to run some reports. El Presidente needs to know which areas of his island are overpopulated and where to move those people. He also wants to know if any areas are still missing census reports.

#!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); generateReport($dbh, "Most populated zip codes", "SELECT zipcode, sum( +peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) desc LIMIT 10"); generateReport($dbh, "Least populated zip codes", "SELECT zipcode, sum +(peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) LIMIT 10"); generateReport($dbh, "Zip codes without census", "SELECT z.zipcode, z. +cityname FROM zipcodes z WHERE NOT EXISTS ( SELECT 1 FROM cens +usdata c WHERE c.zipcode = +z.zipcode )"); $dbh->commit; sub generateReport { my ($dbh, $title, $stmt) = @_; print $title, "\n"; print "=" x length($title), "\n"; my $sth = $dbh->prepare($stmt) or croak($dbh->errstr); $sth->execute or croak($dbh->errstr); while((my @line = $sth->fetchrow_array)) { print join(';', @line), "\n"; } $sth->finish; print "\n\n"; }

Results:

Most populated zip codes ======================== 15864;16121159 50274;15581665 90418;15213432 47146;14784133 79867;14738692 66935;14603419 25969;14572689 29868;14531580 63780;14452431 95050;14350914 Least populated zip codes ========================= 26940;110605 10822;158808 11776;170321 43384;187501 61853;211522 34028;227782 21339;228705 61363;319850 22065;322939 42693;329595 Zip codes without census ======================== 99999;fd5f56b40a79a385708428e7b32ab996a681080a166a2206e750eb4819186145

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP

Replies are listed 'Best First'.
Re^3: Multiple file handles
by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:26 UTC
    Wow, thats a great way to do it, definitely gives me some ideas and you connecting to the DB helps for reference because ive been struggling with that due to my company's current script using a bunch of regex off of a config file to connect. I'll have to get better at SSCCE so I don't end up bogging you guys down or giving you too little info. Thank you so much!