Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Learning to use the hash effectively

by Stamp_Guy (Monk)
on Jun 25, 2001 at 02:28 UTC ( [id://91134]=perlquestion: print w/replies, xml ) Need Help??

Stamp_Guy has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to teach myself hashes and have come up with the following block of code to change a line in a pipe-deliminated flat file. I was wondering if my fellow monks here could give me some advice as to how I could make this code simpler more efficient etc. I'm also wondering if there are better ways to do the same function. All suggestions, comments, etc. would be appreciated! Thanks!
Stamp_Guy
#!/usr/bin/perl -w use strict; # Predeclare variables. my %test; my @fileData; my @sorted; # Initialize the variable that will hold the data to be changed. my $change = "this is cool"; # Open the file open(TEST, "test.txt") || die "File couldn't be opened for reading: $! +"; # Create a hash while (<TEST>) { my ($key,@fileData) = split /\|/; chomp @fileData; $test{$key} = \@fileData; } # Change the data $test{mykey}->[1] = "$change"; # Place all the hash data into an array of pipe-seperated values for s +orting. foreach my $key (keys %test) { push (@sorted, "$key|$test{$key}->[0]|$test{$key}->[1]|$test{$key} +->[2]|$test{$key}->[3]"); } # Sort the data by the number in the last part of the array. @sorted = map $_->[1], sort { $a->[0] <=> $b->[0] } map [ substr($_,rindex($_,'|')+1), $_ ], @sorted; # Put the line breaks back in. for (@sorted) { $_ = "$_\n"; } open(TEST, ">test.txt") || die "File couldn't be opened for writing: $ +!"; print TEST @sorted; close(TEST)

Replies are listed 'Best First'.
Re: Learning to use the hash effectively
by btrott (Parson) on Jun 25, 2001 at 02:36 UTC
    Two things I will recommend:
    • File locking. You have a very big race condition in your code: you read in the contents of the file, alter them, then write them back out. Between the time that you've read the file and you write the file, someone else (ie. another process) could have changed that file. Then you would overwrite that chance when you write the file w/ your file contents.

      One way to fix this is to open the file in read/append mode, flock it, seek to the beginning of the file, read from it, alter the file contents in memory, seek back to the beginning, truncate it, then rewrite the contents from memory. The flock will prevent the race condition (at least w/ another version of your program that uses flock).

      Another way to fix the problem is to use a semaphore file, like in tilly's Simple Locking. Here, you flock a semaphore file when you want to enter the "critical section" of your program, and then other processes of your program cannot enter that critical section until you have released the lock.

      I would recommend the second approach.

    • My second suggestion is, you could just use a DBM file for this, particularly since you already have the notion of keys mapping to values. In particular, you could use MLDBM to serialize the data structure into the DBM format of your choice.

      btrott
      Thanks for your suggestions. I normally use the second method of file locking, however when I am testing on my Win98 box, I leave it off because it causes errors.

      Stamp_Guy

Re: Learning to use the hash effectively
by suaveant (Parson) on Jun 25, 2001 at 03:26 UTC
    First of all change
    while (<TEST>) { my ($key,@fileData) = split /\|/; chomp @fileData; $test{$key} = \@fileData; } #to while (<TEST>) { chomp; my ($key,@fileData) = split /\|/; $test{$key} = \@fileData; }
    Otherwise you are chomping every item in the array, when you know there can only be a newline at the end of the line.

    I would change

    foreach my $key (keys %test) { push (@sorted, "$key|$test{$key}->[0]|$test{$key}->[1]|$test{$ke +y}->[2]|$test{$key}->[3]"); } #to foreach my $key (keys %test) { push @sorted, (join '|', ($key,@{$test{$key}}); }
    except that that is overkill... since you can't have the multiples of the same key, you don't need to sort on the whole line, just the key, I would do...
    open(TEST, ">test.txt") || die "File couldn't be opened for writing: $ +!"; foreach my $key (sort keys %test) { print TEST (join '|', ($key,@{$test{$key}}); print TEST "\n"; } close(TEST);
    Instead of the whole @sorted thing

    Update Sorry, you wanted to sort by the last item in the array of data... change

    foreach my $key (sort keys %test) { #to foreach my $key (sort { $test{$a}[-1] <=> $test{$b}[-1] } keys %test) +{
    the $test{$a}[-1] <=> $test{$b}[-1] sorts numerically based on the final item in the data array at each key
    Untested, but I believe it all works fine

                    - Ant

      Suaveant, thanks for your excellent suggestions! They work quite well. I had thought that since hashes are by nature unsorted that I could only sort them by ascii value. This is much more compact and efficient.
        You can't sort hashes, but you can sort their keys :)

                        - Ant

Re: Learning to use the hash effectively
by particle (Vicar) on Jun 25, 2001 at 03:38 UTC
    this bit looks overly complex (my formatting)~
    # Place all the hash data into an array of pipe-seperated values for s +orting foreach my $key (keys %test) { push @sorted, "$key|$test{$key}->[0]|$test{$key}->[1]|" . "$test{$key}->[2]|$test{$key}->[3]"; } # Sort the data by the number in the last part of the array. @sorted = map $_->[1], sort { $a->[0] <=> $b->[0] } map [ substr($_,rindex($_,'|')+1), $_ ], @sorted; # Put the line breaks back in. for (@sorted) { $_ = "$_\n"; }
    you build an array, break it down to sort, then add to it again. how about using element 3 from the array in the hash already for the compare...
    why not try something like (untested for errors)~
    # Sort the data by the number in the last part of the array @sorted = # return just the hash->key map { $_->[1] } # compare hash->key->value[3]'s sort { $a->[0] <=> $b->[0] } # anon array w/ hash->key->value[3], hash->key map { [ %test->{$_}->[3], $_ ] } # keys from hash keys %test; # rebuild array of pipe-seperated values, and put the line breaks back + in $_.="|$test{$_}[0]|$test{$_}[1]|$test{$_}[2]|$test{$_}[3]\n" for(@sort +ed);

    ~Particle

Re: Learning to use the hash effectively
by bikeNomad (Priest) on Jun 25, 2001 at 05:13 UTC
    You might want to look at DBD::CSV, which will let you deal with your pipe separated file as if it were a real database.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://91134]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-19 20:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found