Here's one more (regex only) approach, which no one seems to have tried yet. I'm basing this on the assumption that the second column (which contains your intended hash key) is always separated from the first column by at least three spaces, whereas word separations within the first column are always single spaces:
#!/usr/bin/perl -w
use strict;
my %hash;
while (<DATA>) {
s/^\s+//; # remove leading whitespace
if ( /(.*?)\s{3,}(\S+)/ ) {
my ($val,$key) = ($1,$2);
$hash{$key} = $val;
}
}
print map { "$_ : $hash{$_}\n" } sort keys %hash;
__DATA__
Total index B50001
Crude processing (capacity) B5610C
Primary & semifinished processing (capacity) B562A3C
Finished processing (capacity) B5640C
Manufacturing ("SIC") B00004
Manufacturing (NAICS) GMF
Durable manufacturing (NAICS) GMFD
Wood product G321
+ 321
Nonmetallic mineral product G327
+ 327
Primary metal G331
+ 331
Iron and steel products G3311A2
+ 3311,2
Fabricated metal product G332
+ 332
Machinery G333
+ 333
__OUTPUT__
B00004 : Manufacturing ("SIC")
B50001 : Total index
B5610C : Crude processing (capacity)
B562A3C : Primary & semifinished processing (capacity)
B5640C : Finished processing (capacity)
G321 : Wood product
G327 : Nonmetallic mineral product
G331 : Primary metal
G3311A2 : Iron and steel products
G332 : Fabricated metal product
G333 : Machinery
GMF : Manufacturing (NAICS)
GMFD : Durable manufacturing (NAICS)
I have to confess, I've been pretty slow to get comfortable with unpack(), myself. It is certainly one of the more difficult functions to grasp (and its description in perfunc is still a bit hard to follow).