good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hi , i need a fast way to delete duplicates entrys from very hugefiles ( >2 Gbs ) , these files are in plain text. ..To clarify, this is the structure of the file: 30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0005655,00|||+0000000000000,00 30xx|000009925000194653|00000000000000|20081031|02510|00000005445363|01|F|0207|00|||+0000000000000,00|||+0000000000000,00 30xx|4150010003502043|CARDS|20081031|MP415001|00000024265698|01|F|1804|00|||+0000000000000,00|||+0000000000000,00 Having a key formed by the first 7 fields i want to print or delete only the duplicates( the delimiter is the pipe..). I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (out of memory!) In using HP-UX large servers. I 'm very new to perl, but i read somewhere tha Tie::File module can handle very large files , i tried but cannot get the right code... Any advice will be very well come. Thank you in advance. Regards PD:I do not want to split the files. In reply to Huge files manipulation by klashxx
|
|