Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Separating records

by TStanley (Canon)
on Sep 22, 2010 at 18:03 UTC ( [id://861330]=perlquestion: print w/replies, xml ) Need Help??

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

I have the following file:
00210 SHIFT PAY PRIV SYSOU 00211 SV-PROG OS SAVE 00215 OS MIGRATE SAVE 00217 DEM OS SUPER SAVE1 00219 DEM OS SUPER SAVE2 00221 DEM OS SUPER SAVE3 00901 DSDFIL 01401 PERISH STORE INV (V) INV810 BOB FERRANTE (EDIT REPORT IF 05309 PRSH INV) EXTRACT WAS RUN) (V) INV820 BOB FERRANTE (PERISHABLE INVENTORY REPORT) (VTD8) D:\DEPT\ACCT\MAIL\INV820.DAT (V) INV820C DIANE CALLAHAN (PERISHABLE INVENTORY-BAKERY) (QUARTERLY RUN ONLY) 01402 PERSH INV BOOKS (V) INV805 58 COPIES - BOB FERRANTE (V) INV805 2 COPIES - JIM MIAMIS (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (V) INV805 58 COPIES - USER (V) INV805 2 COPIES - USER (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM 01403 BAKERY INV BOOKS (V) INV805 35 COPIES - USER (V) INV805A 5 COPIES - SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (J) INV805 35 COPIES - USER (J) INV805A 5 COPIES - SAVE IN COMPUTER ROOM 01405 PRSH INV. EXTRACT (V) MSI000 OPERATIONS DOCUMENTATION (MSI DUMP LISTING) 01501 INV PRICE GUIDE 01502 INV SLOTBOOK (V) CIO102 OPERATIONS SUPERVISOR (2 COPIES) (2 COPIES-IN BINDERS AND LEAVE WITH CODERS) PRICE GUIDES 01503 INV-DUPS THE FOLLOWING OUTPUT WILL ONLY BE PRODUCED IF DUPLICATE SLOTS ARE FOUND. (V) INV900 USER (V) INV969 USER

The lines that start with five numbers is the run number and job name for each job, and would mark the beginning of a record. The other lines are output distribution for that job if it produces any output. I am trying to get just the jobs that actually produce output, and skip over the ones that do not. I did a quick search of the site, but nothing stood out. Any suggestions would be welcome.


TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Replies are listed 'Best First'.
Re: Separating records
by toolic (Bishop) on Sep 22, 2010 at 18:26 UTC
    Here's one way which I believe works for your data (you should have shown your desired output for me to be sure):
    use strict; use warnings; my $prev; while (<DATA>) { print $prev unless /^\d{5}\s/; $prev = $_; } print $prev unless $prev =~ /^\d{5}\s/; __DATA__ 00210 SHIFT PAY PRIV SYSOU 00211 SV-PROG OS SAVE 00215 OS MIGRATE SAVE 00217 DEM OS SUPER SAVE1 00219 DEM OS SUPER SAVE2 00221 DEM OS SUPER SAVE3 00901 DSDFIL 01401 PERISH STORE INV (V) INV810 BOB FERRANTE (EDIT REPORT IF 05309 PRSH INV) EXTRACT WAS RUN) (V) INV820 BOB FERRANTE (PERISHABLE INVENTORY REPORT) (VTD8) D:\DEPT\ACCT\MAIL\INV820.DAT (V) INV820C DIANE CALLAHAN (PERISHABLE INVENTORY-BAKERY) (QUARTERLY RUN ONLY) 01402 PERSH INV BOOKS (V) INV805 58 COPIES - BOB FERRANTE (V) INV805 2 COPIES - JIM MIAMIS (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (V) INV805 58 COPIES - USER (V) INV805 2 COPIES - USER (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM 01403 BAKERY INV BOOKS (V) INV805 35 COPIES - USER (V) INV805A 5 COPIES - SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (J) INV805 35 COPIES - USER (J) INV805A 5 COPIES - SAVE IN COMPUTER ROOM 01405 PRSH INV. EXTRACT (V) MSI000 OPERATIONS DOCUMENTATION (MSI DUMP LISTING) 01501 INV PRICE GUIDE 01502 INV SLOTBOOK (V) CIO102 OPERATIONS SUPERVISOR (2 COPIES) (2 COPIES-IN BINDERS AND LEAVE WITH CODERS) PRICE GUIDES 01503 INV-DUPS THE FOLLOWING OUTPUT WILL ONLY BE PRODUCED IF DUPLICATE SLOTS ARE FOUND. (V) INV900 USER (V) INV969 USER
Re: Separating records
by johngg (Canon) on Sep 22, 2010 at 22:08 UTC

    An alternative approach (if the data file is not too large) would be to slurp the whole data file into a scalar string in memory then use split to break it into records at the point between a newline and five digits. You can then use grep and tr (see Transliteration in Quote and Quote like Operators) to extract only those records that span more than one line, i.e. tr counts more than one newline.

    use strict; use warnings; my $data = do { local $/; <DATA>; }; my @records = split m{(?<=\n)(?=\d{5}\D)}, $data; my @goodRecords = grep { tr{\n}{} > 1 } @records; print @goodRecords; __END__ 00210 SHIFT PAY PRIV SYSOU 00211 SV-PROG OS SAVE 00215 OS MIGRATE SAVE 00217 DEM OS SUPER SAVE1 00219 DEM OS SUPER SAVE2 00221 DEM OS SUPER SAVE3 00901 DSDFIL 01401 PERISH STORE INV (V) INV810 BOB FERRANTE (EDIT REPORT IF 05309 PRSH INV) EXTRACT WAS RUN) (V) INV820 BOB FERRANTE (PERISHABLE INVENTORY REPORT) (VTD8) D:\DEPT\ACCT\MAIL\INV820.DAT (V) INV820C DIANE CALLAHAN (PERISHABLE INVENTORY-BAKERY) (QUARTERLY RUN ONLY) 01402 PERSH INV BOOKS (V) INV805 58 COPIES - BOB FERRANTE (V) INV805 2 COPIES - JIM MIAMIS (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (V) INV805 58 COPIES - USER (V) INV805 2 COPIES - USER (V) INV805A XTRA COPIES-SAVE IN COMPUTER ROOM 01403 BAKERY INV BOOKS (V) INV805 35 COPIES - USER (V) INV805A 5 COPIES - SAVE IN COMPUTER ROOM ANNUAL STORE INVENTORY ONLY: (J) INV805 35 COPIES - USER (J) INV805A 5 COPIES - SAVE IN COMPUTER ROOM 01405 PRSH INV. EXTRACT (V) MSI000 OPERATIONS DOCUMENTATION (MSI DUMP LISTING) 01501 INV PRICE GUIDE 01502 INV SLOTBOOK (V) CIO102 OPERATIONS SUPERVISOR (2 COPIES) (2 COPIES-IN BINDERS AND LEAVE WITH CODERS) PRICE GUIDES 01503 INV-DUPS THE FOLLOWING OUTPUT WILL ONLY BE PRODUCED IF DUPLICATE SLOTS ARE FOUND. (V) INV900 USER (V) INV969 USER

    The output.

    I hope this is useful.

    Cheers,

    JohnGG

      EXCELLENT!! Thank you very much!

      TStanley
      --------
      People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://861330]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (11)
As of 2024-04-23 21:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found