Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Categorize files in directory

by kris2000 (Initiate)
on Oct 16, 2003 at 18:28 UTC ( #299834=perlquestion: print w/replies, xml ) Need Help??

kris2000 has asked for the wisdom of the Perl Monks concerning the following question:

I have a Bunch of files in a directory. Characteristics of the Files: Header: FULL TAIL:ENDS But some files have more than one header(FULL) before the tail. I am trying to group them into two categories one with one header and the rest into the other! I appreciate if you can help me! thanks, Kris

20031022 Edit by jeffa: Changed title from 'Regular Expressions! '

Replies are listed 'Best First'.
Re: Categorize files in directory
by l2kashe (Deacon) on Oct 16, 2003 at 18:31 UTC

    Can you give sample files? They don't need to contain your data, as long as they are in the right format. There are many was to accomplish what you would like to do. Unfortunately there is no way to help without more information.

    use perl;

      Here are couple of sample files.

      file 1)(full occured 3 times vs. the second file where it occured only once!

      FULL322418809544200 444FM15852298FP02 1033019970623200307072003EI +CHIN + BEDFORD MA01731 484BB02785SUN MID-ATLANTI + 919 EAST MAIN ST. HDQ 4309 RICHMOND + VA23219 FULL322418809544200 TX79902 OA910UT +17124QWEST PHOENIX RHPS 20 E THOMAS ST 5TH FLOOR + PHOENIX AZ85012 + FULL322418809544200 444FM15 +852298FJ02 1033019970707200307072003EICHIN ST. + HDQ 4309 RICHMOND VA23212 + &&

      File 2) FULL occured only once.

      FULL322418214114900 444FM15852013FI02 1120619930326200307012003FA +RAGO 458ON08815FLEET 8022667304690 NAAMANS RD + CLAYMONT DE197 +03 &&
      thanks!

      Each Files ends with "&&"

      Edit, BazB added code tags.

        You haven't really told us what you want to do with the files. "Separate"... does that mean put them into different directories? Hold their filenames in different arrays? Break the ones with multiple headers into multiple files? We only know what you tell us.

        Once you've opened the directory and gotten a list of files, you could use this snippet to proceed.

        FILE_LIST: foreach my $filename ( @files ) { open FILE, "<", $filename or die "Can't open $filename. $!\n"; my $headers = 0; while ( my $line = <FILE> ) { $headers++ if $line =~ /\bFULL\b/; if ( $headers > 1 ) { # do whatever it is you intend to do with # multi-header'ed files. next FILE_LIST; } } # Do whatever it is you intended to do with # single-header files. } continue { close FILE; } # Now you're done.

        As you can see, Regular Expressions! are only a very small part of making this thing work for you.


        Dave


        "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

        Unless I'm mistaken, this snippet should go a good ways toward solving your problem.

        use File::Spec::Functions; use File::Copy; our @files = glob "*.txt"; mkdir for qw( multiple single ); for my $file ( @files ) { if ( 1 > number_of_headers( $file ) ) { move( $file, catfile( "multiple", $file ) ); } else { move( $file, catfile( "single", $file ) ); } } sub number_of_headers { my $file = shift; my $count = () = slurp( $file ) =~ /FULL/g; return $count; } sub slurp { my $file = shift; local $/; local *SLURP; open SLURP, "<", $file or die "Couldn't open $file for reading: $! +"; my $content = <SLURP>; close SLURP or warn "Couldn't close $file: $!"; return $content; }
        Ok so let's get this straight then you want to READ the files and depending o the number of "FULL" lines in them sort the files into different directories or just a file with a count of how man of each there are??

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://299834]
Approved by jdtoronto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2022-09-29 04:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (125 votes). Check out past polls.

    Notices?