Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

joined records from multiple csv

by sheolikar (Initiate)
on Jul 07, 2015 at 16:46 UTC ( [id://1133572]=perlquestion: print w/replies, xml ) Need Help??

sheolikar has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks im new to perl.I would like to extract coloumn from multiple csv and join data based on some common fields in perl.For that i am calling another subroutine by returning value but speed is very slow and not working any help code i have written for ex :

my $file = 'tbl_Chemchar.csv'; my @items; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; $i=1; while (my $line = <$data>) { my $sap_id = ''; my $catalog_num = ''; my $note_text = ''; my($catalog_num,$sap_id,$note_text) = split ',', $line; if($sap_id ne '' && $i>1) { chomp($sap_id); chomp($note_text); $packreportid = processxref($sap_id); push @items, { id => $sap_id , packreporid => $packreportid, not +e_text => $note_text}; } $i++; } sub processxref { my ($sapval_id) = shift; my $filename = 'exportchemchar.csv'; open(my $data2, '<', $filename) or die "Could not open '$filen +ame' $!\n"; while (my $line2 = <$data2>) { chomp $line2; @fields = split ",",$line2; $packrecordval = $fields[0]; $saprecord = $fields[2]; if($saprecord == $sapval_id) { return $packrecordval; exit; } # print "$cmpval \n"; } #close $data2; }

Replies are listed 'Best First'.
Re: joined records from multiple csv
by vinoth.ree (Monsignor) on Jul 07, 2015 at 17:10 UTC
    but speed is very slow

    Ofcourse it's slow. You are scanning an entire file2 for every line in file1. This means that your your execute time is approximately square of the size of the file

    You could have done some search here, thats ok, I got you some node which already discussed here,read out this node.

    comparing csv files in perl

    compare records in two csv files


    All is well. I learn by answering your questions...
Re: joined records from multiple csv
by Loops (Curate) on Jul 07, 2015 at 17:11 UTC
    You're reading and parsing the second file for every record of the first. To speed things up, consider reading that file just once at the start. Put its contents into a hash variable based on the field you wish to join, ie. sap_id. Then as you read each record of the first file, you can very quickly access the data without going back out to the disk file. As an aside, you might want to consider some of the CPAN modules for handling CSV files.
Re: joined records from multiple csv
by kevbot (Vicar) on Jul 08, 2015 at 04:08 UTC
Re: joined records from multiple csv
by kcott (Archbishop) on Jul 08, 2015 at 12:06 UTC

    G'day sheolikar,

    Welcome to the Monastery.

    "... speed is very slow ..."

    I already see replies addressing the primary issue: you're parsing exportchemchar.csv multiple times. Follow the advice given and only parse it once.

    There's a secondary issue: you're writing your own code to parse your CSV files. This is one wheel that you shouldn't be attempting to reinvent. This is a non-trivial task, despite the fact that it often appears trivial at first glance. All the work has already been done for you: see Text::CSV. Furthermore, the underlying Text::CSV_XS is likely to be substantially faster than anything you write yourself: an added bonus addressing your slow speed problem.

    "... and not working ..."

    Unfortunately, that sort of error report (if you can call it that) is pretty much worthless. Sorry if that sounds harsh, but just think about it. You've provided: no sample input; no actual output; no expected output; no warning or error messages; and no description of how it's not working. In short, nothing at all that indicates what problem you'd like us to help you with.

    [For future reference, the guidelines in "How do I post a question effectively?" can help with all of this.]

    Having said that, there are some issues with your code which may have some bearing on whatever your problem might be.

    I see a number of undeclared variables. The strict pragma will alert you to this type of problem. I strongly recommend you use it in all your code.

    I also see a number of duplicate declarations of variables. The warnings pragma will alert you to this type of problem. I strongly recommend you use it in all your code, as well.

    In sub processxref, you have an exit that will never be reached because it's preceded by a return:

    return $packrecordval; exit;

    You have two potential issues with the key/value pair 'packreporid => $packreportid'. The key is probably a mispelling of packreportid (note the missing 't'); and, neither that key (either spelling) nor its value appear to be used anywhere in your code.

    Your overall code layout isn't too bad but could be improved. A consistent indentation style, and removal of extraneous whitespace, will make the code easier to read. You'll find the logic is clearer and, accordingly, errors therein will be easier to spot. perlstyle and perltidy may help in this regard.

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1133572]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-03-29 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found