Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: Speed up file write taking weeks

by Marshall (Canon)
on Jul 02, 2019 at 00:43 UTC ( [id://11102284]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Speed up file write taking weeks
in thread Speed up file write taking weeks

You wrote: "The input files contain 65 million and 72 million records. The output file has 1.7 trillion records." And then, "We guess that the 1.7 trillion records will generate around 100 million unique records".

If the final result is the generation of these 100 million "unique records" (whatever that means), what is your plan for doing that from this humongous flat file of 1.7 trillion records? A factor of millions is a lot!

It is plausible to have an SQL DB with 65 + 72 million records. If those 2 tables combine to produce a smaller table (less than the sum of the input rows) of 100 million, I suspect there is a much more efficient algorithm to do that. However, I just don't know enough about what you are doing! My gosh what will you do with this 1.7 trillion record file after you generate it? How will you arrive at the 100 million unique records?

Replies are listed 'Best First'.
Re^4: Speed up file write taking weeks
by Sanjay (Sexton) on Nov 22, 2019 at 16:00 UTC

    Summarise based on another field

      It is plausible to have 2 tables, one with 65 million records and one with 72 million records and, generate a resulting table of 100 million records without having to build a multi-trillion line intermediate table.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11102284]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-03-29 08:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found