Re: Reducing Memory Usage
by knoebi (Friar) on Jul 16, 2004 at 07:26 UTC
|
i don't know what you need to do with your file exactly, except sorting. you could give Tie-File a try. With it you can access every single line in the file, but not the whole file is loaded into the memory.
ciao knoebi
| [reply] |
|
I have looked at that option briefly but as I have had problems with running the substr function, I believe it is not a practical alternative.
I need to compare the 3rd to 11th character (location) of each line with every other line, if these characters are the same, I need to compare a time (12th to 15th) character and sort all of the lines according to that time. I also need to convert the time that is in a strange format every time i read it, so this data preparation is quite time consuming, and I do not want to run it ever single time I need the value.
Ciao PerlingTheUK
| [reply] |
|
my %index;
my $line=0;
while (<FILE>) {
my ($location,$time) = /^..(.{9})(.{4})/;
push @{$index{$location}},[$time,$line];
$line++;
}
This would results in a hash keyed on the 'location', with the value being a reference to an array with contain the info you need to sort the lines. This seems to be the minumum amount of info needed to determine the sort order.
The next step is to sort the arrays by the time values, you've stored, and fetch the lines in order from the file:
(untested code again)
for my $location (keys %index) {
my @sorted = sort { $a->[0] <=> $b->[0]}
@{$index{$location}};
for my $entry (@sorted) {
seek FILE, 81 * $entry->[1], 0;
read FILE, $line, 80;
print $line,"\n";
}
}
This method should be very memory efficient I think, and not too slow either; the biggest slowdown is probably the seeking around in the file.
This method works because we know the lengths of records. If we don't we could use the tell function before we read a line, to also store the exact start position of the line in the index... | [reply] [d/l] [select] |
|
Re: Reducing Memory Usage ( under 10%)
by BrowserUk (Patriarch) on Jul 16, 2004 at 09:52 UTC
|
Okay. This is just a skeleton, but this creates 50,000 Bus objects, and gives each of them 33 x 80-byte timetables. All are individually getable and setable. All fully OO (externally).
Total data: 50,000 * 33 * 80 = 125 MB.
Total process memory consumed: 140 MB.
Adding methods to manipulate the data is just a case of each method calling the get() routine and then splitting the data into it's constituent bits to manipulate. Trading a little time for memory.
Or if you need text-key access to your buses using a hash pushes it to 150 MB.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
| [reply] [d/l] [select] |
Re: Reducing Memory Usage
by mhi (Friar) on Jul 16, 2004 at 09:21 UTC
|
Since you say that the file size has increased from 5 to 125MB, I'll just guess it won't stop there... So, yes, a Database would be the way to go.
If that is not feasible, you might want to create a sort-file from your original data that consists of the sorting criteria in a directly (ascii-)sortable fixed-length format starting at the beginning of the line and the original data afterwards, separated by a delimiter.
This file can then be sorted by any simple sort program. (if you're on a unix box or have cygwin available, 'sort' should do the job easily and you can tweak the buffer size it uses for optimum performance on your box. After all, sorting files is exactly what it was written for!)
After sorting, just filter out the sorting info and the delimiter again and you have your sorted data. | [reply] |
|
That sounds interesting, but I believe before starting that I will definitely go the database way.
The size is likely to come to an end at 150 to 175 MByte.
Thank You all for you answers. Anyway I am aware that Perl likes to be slightly !thriftless! when it comes to memory usage. Nevertheless would I like to know if there are any techniques known in Perl to reduce memory usage, (apart from those helping to avoid memory leaks).
Does anyone around know any links, documentation, books about this and closely related problems?
| [reply] |
|
Your selected algorithm is the best way to control Perl's memory usage.
First, I might suggest that you decode the "wierd" date in your file ONE time, by going through the large file once, and rewriting it to a new file with the "proper" date.
Second, if your Perl program is just a sorting thing, (or that is at least a major function of it), then if it's a big enough problem, purchasing a dedicated specialized sort program for your OS might be a better investment. Syncsort is such a product that may fit your needs. There are versions for Windows and for most important flavors of UNIX.
| [reply] |
|
|
I completely agree with you (mhi). Imagine a few months later, you loading a 200,300 or 400 MB file in the memory... It's crazy!
There is so many free databases, like mysql. You should think carefully about it.
-DBC
| [reply] |
Re: Reducing Memory Usage
by Jonathan (Curate) on Jul 16, 2004 at 08:43 UTC
|
Have you thought of using DBD:SQLite? it comes with a self contained database and is said to be rather fast. Might be what you need | [reply] |
Re: Reducing Memory Usage
by Anonymous Monk on Jul 16, 2004 at 07:26 UTC
|
Not to put everything to memory.
You need database. | [reply] |
|
Yep I know would be right what I want but my company does not like that, as it is believed (and true) to take a lot of time for administration.
| [reply] |
|
If your company is unwilling to use the right too for the job, then you're out of luck. As for administration - no, it doesn't take much, if the database is dedicated to this program of yours and doesn't allow remote access.
And even if it does take "a lot of time" you need to weigh that up against the costs of continuing as you are. And against the costs when your dataset grows even further.
| [reply] |
|
If your company doesn't want to use a database because it's too expensive, then they should have no problem in deciding for the cheaper option of getting the machine 2GB of RAM so that it can do the job required.
2GB of RAM can't cost much, can it? Not when you compare it to the time taken and expense of administering a database.
| [reply] |
|
How much time goes to administrate a 125 Mb textfile?
| [reply] |
|
Sorry if I'm missing something, but it's EASY AND FREE to setup a mySQL database. I did it on my laptop in under half-an-hour. Then you can create indexes and sort efficiently and yada..yada..yada. Besides, SQL is a helluvalot easier to learn than Perl.
| [reply] |
Re: Reducing Memory Usage
by BrowserUk (Patriarch) on Jul 16, 2004 at 07:42 UTC
|
Whats the average length of string, and how many are there in your 125 MB file?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
| [reply] [d/l] |
|
All lines are 80 chars, adding up to about 1,6 to 1.8 million lines.
| [reply] |
|
| [reply] [d/l] |
|
|
Re: Reducing Memory Usage
by bunnyman (Hermit) on Jul 16, 2004 at 15:19 UTC
|
Anything that makes me understand how memory is allocated in scalars/arrays/hashes?
Devel::Size | [reply] |
Re: Reducing Memory Usage
by Gilimanjaro (Hermit) on Jul 16, 2004 at 14:19 UTC
|
Another approach:
(untested code follows)
my @objects;
while (<FILE>) {
my ($location,$time) = /^..(.{9})(.{4})/;
push @objects, bless [
$location,
$time,
$location.$time,
tell(FILE)
], MyObject;
}
package MyObject;
sub overload cmp => sub { $_[0]->[2] cmp $_[1]->[2] };
sub location { return shift->[0] }
sub time { return shift->[1] }
sub record { seek FILE,shift->[3],0; my $b; read FILE,$b,80; return $b
+ }
The overload would allow plain old sort to work on the array, and should be pretty fast as the keys to sort on are stored already.
The time conversion could possible by done by a function which stores previously converted values in a hash, so you can do a cheap hash lookup instead of an expensive conversion for values you've already seen.
You'll need to make sure the filehandle stays open, possibly in the MyObject package so the records can be retreived when they're actually needed</P. | [reply] [d/l] |
Re: Reducing Memory Usage
by periapt (Hermit) on Jul 20, 2004 at 12:22 UTC
|
| [reply] |