Re: Re: Re: Help me write a good reg-exp for this text

by shenme (Priest)
on Sep 05, 2003 at 17:32 UTC

in reply to Re: Re: Help me write a good reg-exp for this text
in thread Help me write a good reg-exp for this text

So if the data format really _is_ fixed-width columns then something like dragonchild's code would work, using
my @column_widths = (57, 17, '*');
for the widths (check against the real column widths).   Although to remove the leading _and_ trailing spaces from each piece I'd do something like:
my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_; foreach my $piece ($desc, $code, $other_thingy) { $piece =~ s/^\s+//; $piece =~ s/\s+$//; }
(I think that's right, hmmm, testing with dragonchild's modified code ....)
# Change these to the actual column widths. Use a star at the end to g +et the rest. my @column_widths = ( 57, 17, '*'); my $unpack_spec = join ' ', map { "A$_" } @column_widths; my %codes; while (<DATA>) { chomp; my ($desc, $code, $other_thingy) = unpack $unpack_spec, $_; foreach my $piece ($desc, $code, $other_thingy) { $piece =~ s/^\s+//; $piece =~ s/\s+$//; } $codes{$code} = { Description => $desc, Other_Thing => $other_thingy, }; } my $choice = 'GMF'; print "$choice: $codes{$choice}{Description}\n"; $choice = 'G3311A2'; print "$choice: $codes{$choice}{Description}\n"; __DATA__ Total index B50001 Crude processing (capacity) B5610C Primary & semifinished processing (capacity) B562A3C Finished processing (capacity) B5640C Manufacturing ("SIC") B00004 Manufacturing (NAICS) GMF Durable manufacturing (NAICS) GMFD Wood product G321 + 321 Nonmetallic mineral product G327 + 327 Primary metal G331 + 331 Iron and steel products G3311A2 + 3311,2 Fabricated metal product G332 + 332 Machinery G333 + 333 _ _ OUTPUT _ _ GMF: Manufacturing (NAICS) G3311A2: Iron and steel products

