Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Merging specific data from 2 files into a third.

by sheasbys (Initiate)
on Nov 10, 2006 at 16:48 UTC ( [id://583360]=perlquestion: print w/replies, xml ) Need Help??

sheasbys has asked for the wisdom of the Perl Monks concerning the following question:

I have 2 files that need to have certain data merged and other data discarded. Each line of the file is 210 characters long and has no separators. Here is the jist of what I am attempting to do:

1. From file_1 select specific character positions and match this in file_2.
2. From the matched characters in file_2 select the whole line and change certain character positions to some predefined values.
3. Output this to file_3.
4. If you encounter a line in file_1 that starts with the number 2, this is a special line that must be copied to file_3 unchanged and in the order it was encountered in file_1.

In my code I do not want to open file_2 in memory as these files will have a few hundred thousand lines. I have tried to use Tie::File and read it one line at a time but I am encountering the following error:

Can't modify negation (-) in predecrement (--) at ./SS7Merge line 62, near "$output_line ="
Execution of ./SS7Merge aborted due to compilation errors.

Here is my code:

use File::Copy; use Tie::File; my($input_file1) = $ARGV[0]; my($input2) = $ARGV[1]; my($output_file) = $ARGV[2]; if ( !defined($input_file1) || !defined($input2) || !defined($output_f +ile) ) { print "Error: usage: ./SS7Merge input_file1 input_file2 output_fil +e\n"; } else { # -----Backup the input files in case of error----- copy( $input_file1, $input_file1 . ".bak" ) or die "Could not backup file 1 $input_file1 to $input_file1.bak: + $!\n"; copy( $input2, $input2 . ".bak" ) or die "Could not backup file 2 $input_file2 to $input_file2.bak: + $!\n"; # -----Attempt to open all of the files----- open( INFILE1, $input_file1 ) || die( "Could not read input file 1 + ($input_file1): $!" ); open( OUTPUT, "> " . $output_file ) || die( "Could not open output + file ($output_file): $!" ); # We are going to read file2 into an array. The file will not be loade +d into memory which will improve processing of large files. tie @input2, 'Tie::File', \*FH, or die "Problem tying file $input2: $! +"; while (<INFILE1>) { my $line = $_; chomp($line); # -----A line starting with a '2' is a header and is left unch +anged if ( $line !~ m/^2/ ) { foreach $line2 (@input2) { $date = substr( $line, 6, 6 ); $number_dialed = substr( $line, 29, 10 ); if ( index( $line2, $date ) != -1 and index( $line2, $ +number_dialed ) != -1 ) { $record_type = substr( $line, 5, 2 ); # -----From File2----- $carrier_info = substr( $line2, 44, 5 ); $destination_number = substr( $line2, 122, 10 ); $connect_time = substr( $line, 54, 6 ); $send_to_OCN = substr( $line, 186, 4 ); $record_type = "25"; $send_to_OCN = "2604"; -----Generate the output string----- $output_line = substr( $line, 0, 4 ) . $record_typ +e . $date . substr( $line, 12, 17 ) . $number_dialed . substr( $line, 39, 5 ) . $carrier_info . substr( $line, 49, 5 ) . $connect_time . substr( $line, 60, 62 ) . $destination_number . substr( $line, 132, 54 + ) . $send_to_OCN . substr( $line, 190, 20 ) . "\ +n"; # -----Debug code. Add in if you are experiencing p +roblems----- # print OUTPUT $output_line; # print STDOUT "Output " . ++$outputcount . "\n"; last; } } } else { print OUTPUT $line . "\n"; } } # Untie the array before closing the file use untie @input2; # -----Close all of the files----- close( INFILE1 ); close( OUTPUT ); }

Replies are listed 'Best First'.
Re: Merging specific data from 2 files into a third.
by liverpole (Monsignor) on Nov 10, 2006 at 16:53 UTC
    Hi sheasbys,

    I'm afraid your going to wince when I tell you this...

    You simply forgot a comment character:

    -----Generate the output string-----

    but it should be:

    #-----Generate the output string-----

    That's why you're getting the "Can't modify negation (-) in predecrement (--) ..." message ;-)

    Update:  By the way, with Perl programming you can make things easier for yourself by adding two lines to near the beginning of the script:

    use strict; use warnings;

    Granted you will now get lots of warning messages:

    Global symbol "$line2" requires explicit package name at SS7Merge.pl l +ine 37. Global symbol "$date" requires explicit package name at SS7Merge.pl li +ne 38. Global symbol "$number_dialed" requires explicit package name at SS7Me +rge.pl line 39. Global symbol "$line2" requires explicit package name at SS7Merge.pl l +ine 41. Global symbol "$date" requires explicit package name at SS7Merge.pl li +ne 41. Global symbol "$line2" requires explicit package name at SS7Merge.pl l +ine 41. Global symbol "$number_dialed" requires explicit package name at SS7Me +rge.pl line 41. Global symbol "$record_type" requires explicit package name at SS7Merg +e.pl line 42. Global symbol "$carrier_info" requires explicit package name at SS7Mer +ge.pl line 45 ...

    but in the long run this will help more than hinder, as your mistakes will be caught early and often.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Thanks. This is my first shot at Perl and all I can say to that is DOH! Unfortunately when I run this I now get this error:

      Name "main::FH" used only once: possible typo at ./SS7Merge line 35.
      Name "main::output_line" used only once: possible typo at ./SS7Merge line 62.
      seek() on unopened filehandle FH at /System/Library/Perl/5.8.6/Tie/File.pm line 93.
      Tie::File: your filehandle does not appear to be seekable at ./SS7Merge line 35

      Could my useage of Tie::File be incorrect? Is there another way search file_2 for the matching string from file_1 one line at a time as opposed to using something like:

      @input2=<infile2>

      Thank you kind monks for your assistance,
      Stephen

        Well, as I said above, you'd best "use strict;" and "use warnings;"

        Then you'll notice that in some places you've used $input2 and some places $input_file2 (I'm guessing you probably want the latter, to match the other variables).

        Here's what I've done to get rid of all the warnings, in most cases merely adding my to your variables to make them lexically scoped:

        use File::Copy; use Tie::File; use strict; use warnings; my($input_file1) = $ARGV[0]; my($input_file2) = $ARGV[1]; my($output_file) = $ARGV[2]; if ( !defined($input_file1) || !defined($input_file2) || !defined($out +put_file) ) { print "Error: usage: ./SS7Merge input_file1 input_file2 output_fil +e\n"; } else { # -----Backup the input files in case of error----- copy( $input_file1, $input_file1 . ".bak" ) or die "Could not backup file 1 $input_file1 to $input_file1.bak: + $!\n"; copy( $input_file2, $input_file2 . ".bak" ) or die "Could not backup file 2 $input_file2 to $input_file2.bak: + $!\n"; # -----Attempt to open all of the files----- open( INFILE1, $input_file1 ) || die( "Could not read input file 1 + ($input_file1): $!" ); open( OUTPUT, "> " . $output_file ) || die( "Could not open output + file ($output_file): $!" ); # We are going to read file2 into an array. The file will not be loade +d into memory which will improve processing of large files. use FileHandle; my $fh = new FileHandle; my @input_file2; tie @input_file2, 'Tie::File', \$fh, or die "Problem tying file $input +_file2: $!"; while (<INFILE1>) { my $line = $_; chomp($line); # -----A line starting with a '2' is a header and is left unch +anged if ( $line !~ m/^2/ ) { foreach my $line2 (@input_file2) { my $date = substr( $line, 6, 6 ); my $number_dialed = substr( $line, 29, 10 ); if ( index( $line2, $date ) != -1 and index( $line2, $ +number_dialed ) != -1 ) { my $record_type = substr( $line, 5, 2 ); # -----From File2----- my $carrier_info = substr( $line2, 44, 5 ); my $destination_number = substr( $line2, 122, 10 ) +; my $connect_time = substr( $line, 54, 6 ); my $send_to_OCN = substr( $line, 186, 4 ); $record_type = "25"; $send_to_OCN = "2604"; #-----Generate the output string----- my $output_line = substr( $line, 0, 4 ) . $record_ +type . $date . substr( $line, 12, 17 ) . $number_dialed . substr( $line, 39, 5 ) . $carrier_info . substr( $line, 49, 5 ) . $connect_time . substr( $line, 60, 62 ) . $destination_number . substr( $line, 132, 54 + ) . $send_to_OCN . substr( $line, 190, 20 ) . "\ +n"; # -----Debug code. Add in if you are experiencing p +roblems----- # print OUTPUT $output_line; # print STDOUT "Output " . ++$outputcount . "\n"; last; } } } else { print OUTPUT $line . "\n"; } } # Untie the array before closing the file use untie @input_file2; # -----Close all of the files----- close( INFILE1 ); close( OUTPUT ); }

        Hopefully that will get you farther...


        s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://583360]
Approved by chargrill
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 14:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found