A fine question. I'm uncertain what assumptions I can make about the data files, as I don't control the code which generates them. Each individual data file looks like:
# time data
0.000000 99.537
1.000000 100.273
2.000000 98.169
3.000000 105.835
4.000000 93.013
5.000000 96.145
6.000000 87.040
7.000000 97.764
8.000000 97.811
I have to join the data files based on the time point. I can probably assume that the time points will be ordered and the same in each file, and also a fixed column width that, worst case, I can calculate per file. I cannot assume the timepoints will always be integers.
I was being conservative when I wrote this, but now it seems to be my bottleneck. I am sending an email to the other developer asking what guarantees we can work out about it.
| [reply] [d/l] |
Making some assumptions and writing some 20 lines of custom import code has made the whole program roughly 3x faster (about 5 minutes per set of data files on a 900mhz athlon).
| [reply] |