I have multiple fastq files in the following format giving the reads and the number of times each read occur in a file, separated by tab:
data1.txt
@NS500278
AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC
+
=CCGGGCGGG1GGJJCGJJCJJJCJJGGGJJGJGJJJCG8JGJJJJ1JGG8=JGCJGG$G 1 :da
+ta1.txt
@NS500278
CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC
+
CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJ>JJJGGG8$CGJJGGCJ8JJ 3 :da
+ta1.txt
@NS500278
TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG
+
CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJJJJJGGG8$CGJJGGCJ8JJ 2 :da
+ta1.txt
data2.txt
@NS500278
AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC
+
AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA</AE<EE 1 :data
+2.txt
@NS500278
CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC
+
AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE 2 :dat
+a2.txt
@NS500278
TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG
+
AAAAA#EEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEAE6AE<EAEEAE 2 :da
+ta2.txt
I want to sum the occurrence of read in all the files if the second line of each read matches i.e the output for the above two files should come like this:
@NS500278
AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC
+
=CCGGGCGGG1GGJJCGJJCJJJCJJGGGJJGJGJJJCG8JGJJJJ1JGG8=JGCJGG$G 1 :da
+ta1.txt 1 :data2.txt count:2
@NS500278
CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC
+
CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJ>JJJGGG8$CGJJGGCJ8JJ 3 :da
+ta1.txt 2 :data2.txt count:5
@NS500278
TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG
+
CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJJJJJGGG8$CGJJGGCJ8JJ 2 :da
+ta1.txt 2 :data2.txt count:4
My code is working with file upto 10GB but if the file exceed this size it hangs. I want my script to run any size of file. Any help will be appreciated.