Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Text Manipulation

by johnirl (Monk)
on Aug 22, 2002 at 10:16 UTC ( [id://191971]=perlquestion: print w/replies, xml ) Need Help??

johnirl has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks
I have the following file

--,--,--,--,1.280000e+2,9.930000e+0 --,--,--,--,1.920000e+2,9.950000e+0 --,--,--,--,2.560000e+2,1.013000e+1 --,--,--,--,2.000000e+0,4.370000e+0 --,--,--,--,4.000000e+0,5.300000e+0 --,--,--,--,8.000000e+0,6.590000e+0 --,--,--,--,1.600000e+1,7.830000e+0 --,--,--,--,2.400000e+1,8.710000e+0 --,--,--,--,3.200000e+1,9.160000e+0 --,--,--,--,6.400000e+1,9.510000e+0 --,2.000000e+0,6.500000e+0,--,--,-- --,2.000000e+0,6.450000e+0,--,--,-- --,4.000000e+0,6.650000e+0,--,--,-- --,4.000000e+0,6.570000e+0,--,--,-- --,8.000000e+0,6.550000e+0,--,--,-- --,8.000000e+0,6.600000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,2.400000e+1,6.650000e+0,--,--,-- --,2.400000e+1,6.680000e+0,--,--,-- --,2.400000e+1,6.640000e+0,--,--,-- --,3.200000e+1,6.720000e+0,--,--,--

What I am trying to do is match up anything from the first half whose fifth value is equal to anything in the second halfs second value. Confused?
Imagine that the rows begining with --, --, --, -- are the first half and the rest the second. So now we have two seperate sets. Now ignore all "--". This leaves you with two sets of two columns. What I want to do is match the first in each of these sets. i.e. above the fourth row in the first half matched the first row in the second half. But then I don't want it to match the second. No value should be repeated.
I also need to maintain the order in the first half. And all unmatched values need to be printed at the end.

The desired order from above would be

2.000000e+0,4.370000e+0,2.000000e+0,6.500000e+0 4.000000e+0,5.300000e+0,4.000000e+0,6.650000e+0 8.000000e+0,6.590000e+0,8.000000e+0,6.550000e+0 1.600000e+1,7.830000e+0,1.600000e+1,6.570000e+0 2.400000e+1,8.710000e+0,2.400000e+1,6.650000e+0 3.200000e+1,9.160000e+0,3.200000e+1,6.720000e+0 1.280000e+2,9.930000e+0,--,-- 1.920000e+2,9.950000e+0,--,-- 2.560000e+2,1.013000e+1,--,-- 6.400000e+1,9.510000e+0,--,-- --,--,2.000000e+0,6.450000e+0 --,--,4.000000e+0,6.570000e+0 --,--,8.000000e+0,6.600000e+0 --,--,1.600000e+1,6.570000e+0 --,--,2.400000e+1,6.680000e+0 --,--,3.200000e+1,6.720000e+0

The code I have at the moment is the following but I am having dificulty making it work. If you would rether create your own code than work on mine please fell free. A solution is solution.

#!/usr/bin/perl -w use strict; my $file = "< SqlResults_full"; my @leftarray; my @rightarray; my @temp; my @left; my @right; my $j = 0; my $k = 0; open(DATA, $file) or die "Can\'t open " . $file . " for output : $!"; while(<DATA>){ if (/^--,.*,--,--,--$/){ chomp; my $leftline = $_; my @lefttemp = (split/,/, $leftline); my $leftcut = $lefttemp[0] . ", " . $lefttemp[1] . ", + " . $lefttemp[2]; push(@leftarray, $leftcut); } elsif (/^--,--,--,--,.*$/){ chomp; my $rightline = $_; my @righttemp = (split/,/, $rightline); my $rightcut = $righttemp[3] . ", " . $righttemp[4] . + ", " . $righttemp[5]; push(@rightarray, $rightcut); } } my $leftsize = @leftarray; print "Left side is " . $leftsize . "\n"; print "Right side is ". $rightsize . "\n"; for (my $rightcount=0;$rightcount < $rightsize;$rightcount++){ my @right = split (/,/, $rightarray[$rightcount]); for (my $leftcount=0;$leftcount < $leftsize;$leftcount++){ my @left = split (/,/, $leftarray[$leftcount]); if ($left[1] eq $right[1]){ print $leftarray[$leftcount] . ", " . $rightarray[$rightcount] +. "\n"; splice(@rightarray, $rightcount, 1); $rightcount--; $rightsize--; splice(@leftarray, $leftcount, 1); $leftcount--; $leftsize--; }#if }#while print "\n"; }#while

j o h n i r l .

Sum day soon I'Il lern how 2 spelI (nad tYpe)

Replies are listed 'Best First'.
Re: Text Manipulation
by davorg (Chancellor) on Aug 22, 2002 at 10:50 UTC

    I'd do something like this:

    #!/usr/bin/perl -w use strict; my %data; my @missing; while (<DATA>) { chomp; my @line = split /,/; if ($line[1] eq '--') { # first half $data{$line[4]} = \@line; } else { # second half if (exists $data{$line[1]}) { print join ',', @{$data{$line[1]}}[4,5], @line[1,2]; print "\n"; delete $data{$line[1]}; } else { push @missing, \@line; } } } foreach (sort keys %data) { print join ',', @{$data{$_}}[4,5,0,1]; print "\n"; } foreach (@missing) { print join ',', @{$_}[3,4,1,2]; print "\n"; } __END__ --,--,--,--,1.280000e+2,9.930000e+0 --,--,--,--,1.920000e+2,9.950000e+0 --,--,--,--,2.560000e+2,1.013000e+1 --,--,--,--,2.000000e+0,4.370000e+0 --,--,--,--,4.000000e+0,5.300000e+0 --,--,--,--,8.000000e+0,6.590000e+0 --,--,--,--,1.600000e+1,7.830000e+0 --,--,--,--,2.400000e+1,8.710000e+0 --,--,--,--,3.200000e+1,9.160000e+0 --,--,--,--,6.400000e+1,9.510000e+0 --,2.000000e+0,6.500000e+0,--,--,-- --,2.000000e+0,6.450000e+0,--,--,-- --,4.000000e+0,6.650000e+0,--,--,-- --,4.000000e+0,6.570000e+0,--,--,-- --,8.000000e+0,6.550000e+0,--,--,-- --,8.000000e+0,6.600000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,2.400000e+1,6.650000e+0,--,--,-- --,2.400000e+1,6.680000e+0,--,--,-- --,2.400000e+1,6.640000e+0,--,--,-- --,3.200000e+1,6.720000e+0,--,--,--

    Which seems to give the output you wanted.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Text Manipulation
by crazyinsomniac (Prior) on Aug 22, 2002 at 11:49 UTC
    You know, I hate seing stuff like
    my $rightcut = $righttemp[3] . ", " . $righttemp[4] . ", " . $rightte +mp[5];
    It's abuse of the "." operator. You need to abuse whitespace instead.
    my $rightcut = $righttemp[3] . ", " . $righttemp[4] . ", " . $righttemp[5]; my $rightcut = "$righttemp[3], $righttemp[4], $righttemp[5]"; my $rightcut = join ',', @righttemp[3..5]; ## perldoc perldata
    #!/usr/bin/perl -w my @L = (); my @R = (); while(<DATA>) { # BUILD @LIST chomp; next unless $_; my @Line = split ',', $_; if($Line[4] ne '--') { # 5th value real? push @R, \@Line; } elsif($Line[1] ne '--') { #2nd value real? push @L, \@Line; } } $\="\n"; print "R ".scalar(@R); print "L ".scalar(@L); print "TOTAL LINES ".( @R + @L ); use Data::Dumper; COMPARE(\@L,\@R); sub COMPARE { my( $L, $R ) = @_; my @Ret = (); my %L = map { $_ => $_; } 0..$#$L; my %R = map { $_ => $_; } 0..$#$R; for my $I(0..$#$R ) { for my $J(0..$#$L ) { if($R->[$I]->[4] eq $L->[$J]->[1]) { next unless exists $L{$J}; delete $L{$J}; delete $R{$I}; print join ',', @{ $R->[$I] }[4,5,], @{ $L->[$J] }[1,2 +]; last; } } } print join ',', @{ $R->[$_] }[4,5,0,0] for keys %R; print join ',', @{ $L->[$_] }[0,0,1,2] for keys %L; } __DATA__ --,--,--,--,1.280000e+2,9.930000e+0 --,--,--,--,1.920000e+2,9.950000e+0 --,--,--,--,2.560000e+2,1.013000e+1 --,--,--,--,2.000000e+0,4.370000e+0 --,--,--,--,4.000000e+0,5.300000e+0 --,--,--,--,8.000000e+0,6.590000e+0 --,--,--,--,1.600000e+1,7.830000e+0 --,--,--,--,2.400000e+1,8.710000e+0 --,--,--,--,3.200000e+1,9.160000e+0 --,--,--,--,6.400000e+1,9.510000e+0 --,2.000000e+0,6.500000e+0,--,--,-- --,2.000000e+0,6.450000e+0,--,--,-- --,4.000000e+0,6.650000e+0,--,--,-- --,4.000000e+0,6.570000e+0,--,--,-- --,8.000000e+0,6.550000e+0,--,--,-- --,8.000000e+0,6.600000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,1.600000e+1,6.570000e+0,--,--,-- --,2.400000e+1,6.650000e+0,--,--,-- --,2.400000e+1,6.680000e+0,--,--,-- --,2.400000e+1,6.640000e+0,--,--,-- --,3.200000e+1,6.720000e+0,--,--,--

     
    ______crazyinsomniac_____________________________
    Of all the things I've lost, I miss my mind the most.
    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://191971]
Approved by smitz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-23 08:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found