slurped scalar map

0xbeef has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: slurped scalar map by ikegami (Patriarch) on Jun 20, 2006 at 14:33 UTC
If I understand correctly, `tie` or captures will be useful. Here's an example of the latter: `sub new_accessor { my $start = $_[0]; my $end = $_[1]; my $data_ref = \$_[2]; # Avoid making a copy. return sub { return substr($$data_ref, $start, $end-$start); }; } { my $data = ...; my $start1 = ...; # Calculate start from map. my $end1 = ...; # Calculate end from map. my $rec1 = new_accessor($start1, $end1, $data); print($rec1->(), "\n"); }` [download] `tie` would allow you to do the same, but you'd use `print($rec1, "\n");` instead of `print($rec1->(), "\n");` Untested.	[reply] [d/l] [select]
Re^2: slurped scalar map by 0xbeef (Hermit) on Jun 20, 2006 at 15:13 UTC
Thanks, this is spot on for my requirement. ++ Regards, Niel	[reply]
Re^3: slurped scalar map by ikegami (Patriarch) on Jun 20, 2006 at 16:35 UTC
I should have mentioned that `substr` makes a copy when `$rec1->()` is called. That's unavoidable. You can't extract a string from another without making the new string. However, the copy is only done when `$rec1->()` is executed. If that's a problem, you could extract small chunks at a time. For example, only 100 (by default) chars are duplicated at any given time. sub new_callback_accessor { my $start = $_[0]; my $end = $_[1]; my $data_ref = \$_[2]; # Avoid making a copy. return sub { my ($callback, $blk_size) = @_; local *_; $blk_size = 100 unless defined $blk_size; $blk_size = $end-$start unless $blk_size; my $ofs = $start; my $len = $end-$start; while ($len) { $blk_size = $len if $blk_size > $len; $_ = substr($$data_ref, $ofs, $blk_size); $callback->(); $ofs += $len; $len -= $blk_size; } }; } { my $data = ...; my $start1 = ...; # Calculate start from map. my $end1 = ...; # Calculate end from map. my $rec1 = new_callback_accessor($start1, $end1, $data); $rec1->(sub { print }); print("\n"); } [download] Untested.	[reply] [d/l] [select]
Re^3: slurped scalar map by BrowserUk (Patriarch) on Jun 20, 2006 at 18:23 UTC
Be warned. This method uses more space than simply copying the records to an array, Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: slurped scalar map by Zaxo (Archbishop) on Jun 20, 2006 at 14:46 UTC
If you can rely on the fixed offset of the data in a line, unpack, or substr/regex matching will get you the data. It will be easier if you split the file into an array of lines, or else originally slurp it that way, `my @lines = <$handle>; my %record; for (@lines) { next unless /^\\| (\w+) \\| (\d+)/; $record{$1} = $2; }` [download] That doesn't just assign one value to one variable which is named after another piece of the data; it associates all those other pieces with their data. You wind up with a more useful and easier-to-manage representation of the data in your file. If you're stuck with that scalar variable, you can use the same regex globally, `my %record = $data =~ /^\\| (\w+) \\| (\d+)/g;` [download] That looks simpler, but it is, IMO, more fragile. To get exactly what you asked for, knowing the offset and length of the field, `my $rec1ref = \substr $data, $offset, $len; $$rec1ref = $newval;` [download] If length($newval) != $len, the offsets to subsequent data will be disturbed and the data seen in $$rec1ref will be truncated or augmented. After Compline, Zaxo	[reply] [d/l] [select]
Re: slurped scalar map by BrowserUk (Patriarch) on Jun 20, 2006 at 14:54 UTC
You could use lvalue refs: `#! perl -slw use strict; my $data = <<EOD; record 1 record 2 is a bit longer record 3 is just this length EOD my $p = 0; my @refs; while( my $o = 1+index $data, "\n", $p ) { push @refs, \substr $data, $p, $o - $p; $p = $o } print $$_ for @refs;` [download] which works okay (from 5.8.4 (maybe 5.8.3 I forget) onwards), but don't try assigning to them unless your replacements are exactly the same length as the originals. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: slurped scalar map by 0xbeef (Hermit) on Jun 20, 2006 at 21:26 UTC
Thanks, this elaborates on your analysis as johngg mentioned. I do not have fixed record seperators but I do have records of variable length/content so I rely on offset and size. I intend to use your example based on calculating each record's end offset: `#!/usr/bin/perl -slw use strict; my $data = <<EOD; record 1 is 20 bytesrecord 2 is 20 bytesrecord 3 is longer at 30 bytes EOD my $p = 0; my @refs; # endpos would be calculated based on recsize - this is simplified: my @endpos=(20,40,70); for (@endpos) { push @refs, \substr $data, $p, $_ - $p; $p = $_; } print '[',$$_,']' for @refs;` [download] Hope I got that right. Niel	[reply] [d/l]
Re^3: slurped scalar map by BrowserUk (Patriarch) on Jun 21, 2006 at 07:36 UTC
Just be aware that even this method will only save you space where your records are longer than (from memory) 12 characters. And if you are having to store another array containing the record length, then you have to factor the size of that array into the argument as well unless you replace each record length with the lvalue ref of the record as you go. The space consumed storing the lengths will depend upon whether the numeric values are loaded and stored as IVs or PVs. Around 20 bytes/length for the former and approx. 50 for the later in addition to that used to store the lvalue ref. Also, it only makes sense to build an array of refs if you are going to randomly access each (or some) records more than once. Otherwise, it would be better to simply generate and use an lvalue ref for each record as you need it. The trade-off between replacing the lengths with lvalue refs, and generatng them on the fly will depend upon the number and frequency with which you re-access records through the life of the program; how the length are loaded etc. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: slurped scalar map by dragonchild (Archbishop) on Jun 20, 2006 at 14:29 UTC
Optimize for correctness, first. Parse that into a hash and get it working. Then, if it's not fast enough (and I highly doubt that will be a problem), then come back and ask a question with a working implementation. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^2: slurped scalar map by 0xbeef (Hermit) on Jun 20, 2006 at 15:41 UTC
I am already past the "working" phase and in the "optimisation" phase.I'm curious about efficiency in terms of "best programming practise". The program (to large to post) creates a file consisting of N records and an index containing key info like fpos markers at the end. (the records consist of the stdout/stderr of several o/s commands and files => 30-50Mb/server for almost 100 servers) The program currently reads the index first, then processes & reads each record as it requires it while processing the data-file. I'm trying find a faster solution, i.e. performing larger sequential reads upfront. Of course, it may have extra considerations, such as an max. slurp size. This exercise will be worth it (in my mind at least) if I can understand the margin by which `<sequential slurp><process><process><process>` operations are faster than `<slurp 1 record><process><slurp next record><process> ...` Hope this makes sense. Niel	[reply] [d/l] [select]
Re^3: slurped scalar map by dragonchild (Archbishop) on Jun 20, 2006 at 17:28 UTC
The OS already does that for you. When you read from a file, you're not actually reading from the disk itself. You read from a buffer than the disk manager creates for you. So, slurp-process-slurp-process is going to be nearly as fast (or faster) as slurp-process-process-process. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^4: slurped scalar map by 0xbeef (Hermit) on Jun 20, 2006 at 19:50 UTC
Re: slurped scalar map by johngg (Canon) on Jun 20, 2006 at 14:48 UTC
This Re: Reading nested records in binary data by BrowserUk could contain what you want. Cheers, JohnGG	[reply]


Don't ask to ask, just ask
	PerlMonks