Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Clearing lines in a file based on an array containing the lines

by rycher (Acolyte)
on Apr 22, 2009 at 20:16 UTC ( [id://759395]=perlquestion: print w/replies, xml ) Need Help??

rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I'm trying to simplify my subroutine that is supposed to open an (LDIF)text file, read in its contents into an array then print to a new file while excluding the lines that exist in another array.

I can currently accomplish this with just the word I want to exclude, but of course this leads to bloated code because I have to continuously open and close the file on each removal.

Here is what I have so far:

sub GROOMLDIF { # Delete LDAP-internal fields my @erasethese = qw(structuralObjectClass entryUUID creatorsName + modifiersName createTimestamp modifyTimestamp entryCSN); open(FILE,"< data/all.ldif"); my @LINES = <FILE>; close(FILE); open(FILE,"> data/groomed.ldif"); foreach my $LINE (@LINES) { my @array = split(/\:/,$LINE); print FILE $LINE unless ($array[0] eq "$erasethese[0]"); } close(FILE); open(FILE,"< data/groomed.ldif"); @LINES = <FILE>; close(FILE); open(FILE,"> data/groomed.ldif"); foreach my $LINE (@LINES) { my @array = split(/\:/,$LINE); print FILE $LINE unless ($array[0] eq "$erasethese[1]"); } close(FILE); }

I would like to combine all of them into one open/close. I've tried a nested for-loop that incremented with no luck.

Any help would be appreciated.

Replies are listed 'Best First'.
Re: Clearing lines in a file based on an array containing the lines
by almut (Canon) on Apr 22, 2009 at 20:32 UTC

    When you put the names of the fields in a hash

    my %erasethese = map { $_ => 1 } qw(structuralObjectClass entryUUID creatorsName modifiersName createT +imestamp modifyTimestamp entryCSN);

    you can then write

    print FILE $LINE unless exists $erasethese{$array[0]};

    which would test if $array[0] is any of the keywords (which - if I've understood you correctly - is what you want to achieve).

      Thank you Almut... that was awesome. I was 'this' close (*holding fingers very close together.*)

      Thank you monks for your assistance. Much appreciated.
Re: Clearing lines in a file based on an array containing the lines
by ramrod (Curate) on Apr 22, 2009 at 20:27 UTC
    You can eliminate the second "grooming" by comparing against your elements at the same time:

    print FILE $LINE unless ($array[0] eq "$erasethese[0]" || $array[0] eq + "$erasethese[1]" );
    As far as only opening the file once, you can try opening it for both reading and writing (+<)
Re: Clearing lines in a file based on an array containing the lines
by toolic (Bishop) on Apr 22, 2009 at 20:33 UTC
    I think grep might help you out, especially if you need to check against all of the @erasethese items. Since you did not provide a small sample of your input file, I will only offer this untested code (which does check against all @erasethese items):
    sub GROOMLDIF { # Delete LDAP-internal fields my @erasethese = qw( structuralObjectClass entryUUID creatorsName modifiersName createTimestamp modifyTimestamp entryCSN ); open my $fh_in , '<', "data/all.ldif" or die "can not open dat +a/all.ldif: $!"; open my $fh_out, '>', "data/groomed.ldif" or die "can not open dat +a/groomed.ldif: $!"; while (<$fh_in>) { my $line = $_; my @thing = (split /:/)[0]; print $fh_out $line unless (grep {$thing eq $_} @erasethese); } }
Re: Clearing lines in a file based on an array containing the lines
by jwkrahn (Abbot) on Apr 22, 2009 at 23:33 UTC

    Perhaps this will work better for you (UNTESTED):

    sub GROOMLDIF { # Delete LDAP-internal fields my $erasethese = qr/\A(?:structuralObjectClass|entryUUID|creatorsN +ame|modifiersName|createTimestamp|modifyTimestamp|entryCSN):/; open my $IN, '<', 'data/all.ldif' or die "Cannot open 'data/a +ll.ldif' $!"; open my $OUT, '>', 'data/groomed.ldif' or die "Cannot open 'data/g +roomed.ldif' $!"; while ( <$IN> ) { print $OUT $_ unless /$erasethese/; } }
Re: Clearing lines in a file based on an array containing the lines
by NiJo (Friar) on Apr 23, 2009 at 18:12 UTC
    $command = join (' | grep -v ', @erase_these) system "cat $in_file" . $command . " > $out_file"
    Least amount of your code, C speed, use of multiple cores for free

    Limitation: maximum command length on shell

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://759395]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-28 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found