Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Help Parsing a Text File

by PrimeLord (Pilgrim)
on Mar 23, 2004 at 18:49 UTC ( [id://339113]=perlquestion: print w/replies, xml ) Need Help??

PrimeLord has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! I am hoping you can help me with trying to parse information out of a data file. The file is a plain text file and the formats looks somethign the following:
!Data. $ 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar !Data *
Each entry I am trying to parse out starts with the d-dd-ABCdd line and I basically need the lines below it until the next entry starts. So examples of what I am trying to parse out are:
1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar
And...
1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar --
etc... It would be easy enough to just split on lines that starts with a \n or a --, but the -- is part of the entry that I need to retain. Any ideas on how a good way to parse out this information? I appreciate any help you can offer. Thanks!

Prime

Replies are listed 'Best First'.
Re: Help Parsing a Text File
by tcf22 (Priest) on Mar 23, 2004 at 19:02 UTC
    Perhaps something like this is what you need
    my @entries; my $entry; my $start_parsing = 0; while(<DATA>){ if(/\d-\d\d-ABC\d\d/){ push(@entries, $entry) if(defined $entry); $entry = ''; $start_parsing = 1; } next unless($start_parsing); $entry .= $_; } push(@entries, $entry) if(length($entry) > 0); #Last entry my $i = 0; foreach(@entries){ print "Entry #$i:\n$_\n"; $i++; } __DATA__ 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar

    -----------------------

    Output is
    Entry #0: 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar Entry #1: 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- Entry #2: 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- Entry #3: 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- Entry #4: 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar Entry #5: 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar
    Update: Took into account leading lines to be ignored.

    - Tom

Re: Help Parsing a Text File
by YuckFoo (Abbot) on Mar 23, 2004 at 19:10 UTC
    Prime,

    I think you want an Array of Arrays (AoA). Read about them in perldsc and perlref.

    YuckFoo

    Here is one way to go:

    #!/usr/bin/perl use strict; use Data::Dumper; my @parts = ([]); while (my $line = <DATA>) { chomp $line; if ($line =~ /^\d-\d\d-ABC\d\d/) { push (@parts, []); } push (@{$parts[-1]}, $line) } print Dumper \@parts; __DATA__ 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar
Re: Help Parsing a Text File
by Roy Johnson (Monsignor) on Mar 23, 2004 at 19:20 UTC
    Something like this?
    my %hash; my $key; while (<DATA>) { if (/^(\d-\d\d-.*)/) { $key = $1 } else { push(@{$hash{$key}}, $_) if $key } } while (my ($k,$v) = each %hash) { print "K=$k\n"; print " V=$_" for @$v; } __DATA__ $ 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar

    The PerlMonk tr/// Advocate
Re: Help Parsing a Text File
by tachyon (Chancellor) on Mar 23, 2004 at 19:16 UTC

    You have to love data files with inconsistent record separators. If you had "\n\n" ie a blank line as the rec sep you could just set $/ = "\n\n" to read a record at a time. You could fix the file format with this:

    perl -pi.bak -e 's/--\n/\n/' inconsitent.txt

    You could then just set the input record separator to two newlines and read the data in one record at a time. But you say that the -- is an important part so you need to do something like this:

    my @recs; my $data = ''; while(<DATA>) { next unless $data or m/\d+\-\d+/; $data .= $_; if ( m/^(?:\n|\-\-\n)$/ ) { push @recs, $data; $data = ''; } } print "$_\n\n\n" for @recs;

    so that you don't lose the -- parts.

    cheers

    tachyon

      That is exactly what I was looking for thanks!!!

      And thanks to everyone for your responses!

      -Prime
Re: Help Parsing a Text File
by Anonymous Monk on Mar 23, 2004 at 19:43 UTC
    #!/usr/bin/perl use strict; use warnings; my @data; while(<DATA>) { chomp; if((my $first = /^\$/) .. (my $last = /^!Data/)) { push @data, $_ unless $first || $last; } } print join "\n", @data; __DATA__ !Data. $ 1-11-ABC22 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 2-15-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo bar Foo Bar -- 1-11-ABC33 (12:12) ABC 12 Foo Bar Foo Bar Foo abr Foo bar Foo Bar Foo Bar Foo Bar -- 2-15-ABC45 (12:12) ABC 12 Foo Bar Foo Bar Foo Bar Foo Bar 3-33-ABC15 (12:12) ABC 12 Foo Bar Foo Bar !Data *

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://339113]
Approved by tcf22
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-20 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found