Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: How to capture data length from the record ?

by gone2015 (Deacon)
on Feb 13, 2009 at 10:29 UTC ( [id://743582] : note . print w/replies, xml ) Need Help??


in reply to How to capture data length from the record ?

I suggest making friends with pack and unpack -- there's the tutorial.

A smattering of things which may not be obvious:

  • the pack/unpack TEMPLATE string is evaluated at run-time, so it can be dynamic -- you can, for example, interpolate repeat counts into one, or build a string depending on what you are packing/unpacking.

  • C is the way to handle bytes as individual integers (0..255) -- but watch out if you have 'utf8' ("wide character") strings to unpack.

  • a is the way to handle sections of arbitrary byte values.

  • in unpack you can use  x to skip past bytes, and you can use  @ to step to particular positions in the input.

  • the construct  C/a*, and all its cousins, is very useful where you have data prefixed by its length in bytes

  • you can bracket stuff in pack/unpack templates, and apply repeat counts to bracketted sections (and nest bracketted sections). So:  '(C/(NnnN2)(a4)2)*' unpack as many items as it can ( '(.....)*'), where each item comprises: a byte count followed by that many  NnnN2 elements, followed by 2 elements each of 4 bytes.

Replies are listed 'Best First'.
Re^2: How to capture data length from the record ?
by bh_perl (Monk) on Feb 16, 2009 at 03:17 UTC
    hi..

    This is my program, but its did not keep process next record and its captured first record only. Might be some mistake on my looping. Could somebody help me ?

    open (DATA, "$inputdir/$inputfile"); while (<DATA>) { my ($filehdr, $data) = m|(.{40})(.*)|; ($len = substr($data, $offset, 4)) =~ s/^0//g; foreach $val ($data) { $cdata = substr($val, $offset, $len); print "$cdata\n"; } } close(DATA);

      Well... I can offer a few observations on this code:

      1: open (DATA, "$inputdir/$inputfile"); 2: while (<DATA>) { 3: my ($filehdr, $data) = m|(.{40})(.*)|; 4: ($len = substr($data, $offset, 4)) =~ s/^0//g; 5: 6: foreach $val ($data) { 7: $cdata = substr($val, $offset, $len); 8: print "$cdata\n"; 9: } 10: } 11: close(DATA);
      • first: no use strict or use warnings... so the fact that you never give $offset a value is not pointed out to you.

      • two of your variables are declared "my", but not the others... which isn't encouraging, either.

      • line 1: the DATA file handle has a particular use. There is no need to use it for your input file, and doing so makes things less clear than they might be.

      • also line 1: adding at least an or die "$!" after the open is recommended -- there is no point in proceeding if the file open fails, and if it does, it will be helpful to know why it fails.

      • line 2: while (<DATA>) -- what do you expect this to do ? What I expect it to do is to read the file, line by line -- where $/ specifies the current line-ending. I see nothing in your description of the input file that suggests that the "records" are separated by "\n". Indeed the description suggests that it is a continuous stream of hex characters ([0-9A-F]), in which case <DATA> will read the entire file in one gulp... (assuming $/ has not been set to anything) In which case, why bother with the while ?

      • line 3: this sets $filehdr to be the first 40 bytes of the current input "line", and $data to be the rest. I'd be tempted to do this with substr, being more direct and probably faster. But what you have will work.

      • line 4: the $offset value is never set, and will be treated as zero. I assume that it is supposed to be the offset within $data of the next "record". If so, then the fact that it's not set to anything will be a big part of why your code only does something with the first "record".

      • also line 4: it appears that the length is a decimal value. Since the rest appears to be hex, that bothers me...

      • line 6: what do you expect foreach $val ($data) to do ? foreach works its way through a LIST -- the list here is exactly one element long... so not much looping involved.

      • line 7: again uses $offset which has no value.

      But do not despair... the solution is close. Consider: what $offset is doing; how you should update it after each "record"; and then how to recast your loop to work your way along the $data string. You could think about an initial value for $offset, which could eliminate the need to split the input into $filehdr and $data.

      As it happens, I would still use unpack for this, but substr will also get the job done.


        hi..

        Thank you very much for your help but i still stuck on reading the file format.

        Right now, I am able to open the file and read every 2048 block. But, I do not have any ideas how to read each of record on the block based on the total record length?. End of the block will be padded by filler to make the block size is 2048.

        The file format description as below:-
        The Ericsson AIN records are stored in an input file in 2048 byte bloc +ks. Every block is made of variable length records padded to make a 2 +048 byte blocks. Every record is made of a fixed 180 byte part and records may contain +additional tagged fields making the total length variable. Total length of the record including fix length and variable length ca +n be determined based on field RECORD_LENGTH.

        This is my coding:-
        #!/usr/bin/perl -w use Cwd; use warnings; use strict; use Getopt::Long; use constant BLOCKLEN => 2048; use constant FIXRECORD => 180; my ($trace, $help, $infile); my $swap = ''; my $indir = getcwd; my $outdir = getcwd; GetOptions ( "h|help" => \$help, "filename|f=s" => \$infile, "swap|s" => \$swap, "input|i=s" => \$indir, "output|o=s" => \$outdir, "trace|t" => \$trace ) or usage(); my $template = "A8 A1 A2 A2 A5 A3 A20 A20 A2 A3 A2 A2 A28 A2 A3 A5 A1 + A1 A1 A6 A6 A6 A6 A6 A1 A6 A3 A15 A7 A7 H*"; #my @fldsize = (A8 A1 A2 A2 A5 A3 A20 A20 A2 A3 A2 A2 A28 A2 A3 A5 A1 +A1 A1 A6 A6 A6 A6 A6 A1 A6 A3 A15 A7 A7 H*); my @fldname = ( "Call Identification Number", "Cause for Output", "Recor +d Type", "Record Number", "Record Sequence Number", "Record Size" +, "X Number", "A Sub Number", "A Category", "A Sub Number Type", "A Sub Numbering Plan Ind", "A Sub +Type", "B Sub Number", "B Category", "B Sub Number +Type", "Fault Code", "Call Status", "Force Disconne +ction Info", "Abnormal Call Release Ind", "Start Date", "Start Ti +me", "End Date", "End Time", "Duration", "Pulse Charging Ind", "Number of Meter Pulses", "Tariff +Class", "Exchange ID", "Outgoing Route", "Incoming Route +", "Additional TAG" ); sub usage { print ("USAGE: $0 -i <input_folder> -o <output_folder> -f <input_f +ilename>\n\n"); exit; } my $outfile = $infile; my ($data, $cdr, $offset, $recLen); my ($callid,$cause,$recType,$recNum,$recSeq); my @rec = (); if ($infile) { open (OUTPUT, ">$outdir/$outfile") or die ("Can't open $outdir/$ou +tfile\n"); open (DATA, "$indir/$infile") or die ("Can't open $indir/$infile\n +"); binmode DATA; while (read(DATA, $data,2048)) { $recLen = 0; $offset = 0; foreach my $val ($data) { ($callid,$cause,$recType,$recNum,$recSeq,$recLen) = unpack + "A8 A1 A2 A2 A5 A3", $val; #if ($recLen == 0x81) { #$recLen = unpack "C3", substr $data, 18, 3, ''; #} #$cdr = substr($data, $offset, $recLen); #@rec = unpack $template, $cdr; print "RECLen : $callid,$cause,$recType,$recNum,$recSeq,$r +ecLen,$offset\n"; $offset += $recLen; } } print "~~~~~~~~~~~~~~~~~~~~~~~~~~~ NEXT BLOCK ~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~\n"; close(DATA); close(OUTPUT); }
        This is the real file data:-
        30 30 32 30 33 34 37 30 30 30 32 30 31 32 36 34 30 38 32 34 34 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 31 37 32 35 37 31 34 32 32 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 31 30 33 32 37 38 36 35 31 30 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 31 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 32 31 37 31 30 30 34 32 33 30 33 32 34 34 30 30 30 30 32 32 34 30 20 20 20 20 20 20 30 30 30 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 49 4e 42 47 4d 4f 20 49 41 42 31 49 20 20 9f a1 0b 00 9f be 00 02 00 16 9f be 01 81 0f 01 55 01 f0 83 18 02 20 10 00 10 00 00 00 01 9f be 19 81 09 00 01 10 60 13 00 13 13 00 9f be 1b 08 02 01 10 ff ff ff ff ff a1 06 82 04 04 00 00 00 30 30 32 30 33 34 39 36 30 30 32 30 31 32 36 34 30 39 31 38 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 31 32 32 37 33 38 36 39 37 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 31 30 33 32 37 31 30 31 38 38 33 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 30 34 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 34 34 37 31 30 30 34 32 33 30 33 32 35 30 31 30 30 30 30 31 34 31 30 30 30 30 31 33 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 4e 41 54 47 4d 4f 20 4e 41 54 47 57 49 20 30 30 32 30 33 34 37 33 30 30 32 30 31 32 36 34 31 30 32 30 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 31 37 36 39 38 37 35 35 36 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 31 31 33 30 30 31 33 31 33 30 30 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 31 30 30 30 34 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 32 33 34 31 30 30 34 32 33 30 33 32 35 31 33 30 30 30 32 34 30 31 30 30 30 33 31 38 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 53 53 46 44 43 46 4f 4e 41 54 47 57 49 20 9f a1 0b 00 9f a0 01 81 03 03 1a d2 a1 06 82 04 04 00 00 00 30 30 32 30 33 34 37 34 34 30 32 30 31 32 36 34 31 31 32 35 32 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 31 37 36 39 38 37 35 35 36 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 31 30 33 32 37 38 36 35 31 30 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 31 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 32 33 34 31 30 30 34 32 33 30 33 32 35 31 33 30 30 30 32 34 30 31 30 30 30 31 35 39 30 31 31 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 49 4e 42 47 4d 4f 20 49 41 42 32 49 20 20 9f a1 0b 00 9f a0 01 81 03 03 1a d1 9f be 00 02 00 16 9f be 01 81 0f 01 55 01 f0 83 19 02 20 10 00 10 00 00 00 01 9f be 19 81 09 00 03 10 60 13 00 13 13 00 9f be 1b 08 02 01 10 ff ff ff ff ff a1 06 82 04 04 00 00 00 30 30 32 30 33 34 39 37 30 30 30 30 31 32 36 34 31 32 31 38 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 37 34 31 35 36 38 39 31 20 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 30 32 37 33 30 35 35 32 30 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 34 34 33 31 30 30 34 32 33 30 33 32 35 31 35 30 30 30 30 33 33 31 30 30 30 30 33 32 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 4e 41 54 47 4c 4f 20 54 4d 4b 4c 47 49 20 30 30 32 30 33 35 30 30 30 30 30 30 31 32 36 34 31 33 31 38 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 33 33 31 36 38 31 37 32 33 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 30 32 37 33 30 35 35 31 35 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 34 34 34 31 30 30 34 32 33 30 33 32 35 31 36 30 30 30 30 33 33 31 30 30 30 30 33 32 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 4e 41 54 47 4c 4f 20 54 4d 4b 4c 4a 49 20 30 30 32 30 33 34 39 39 30 30 30 30 31 32 36 34 31 34 31 38 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 33 35 36 33 35 32 33 39 38 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 30 32 37 33 30 35 35 31 35 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 34 34 34 31 30 30 34 32 33 30 33 32 35 31 36 30 30 30 30 33 33 31 30 30 30 30 33 32 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 4e 41 54 47 4c 4f 20 54 4d 4b 4c 47 49 20 30 30 32 30 33 34 39 38 30 30 30 30 31 32 36 34 31 35 31 38 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 33 37 39 38 31 32 32 37 34 20 20 20 20 20 20 20 20 20 20 31 20 30 30 34 30 31 30 30 32 37 33 30 35 35 31 35 46 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 30 30 33 20 20 20 20 20 30 30 30 31 30 30 34 32 33 30 33 32 34 34 33 31 30 30 34 32 33 30 33 32 35 33 32 30 30 30 30 34 39 31 30 30 30 30 34 38 30 30 34 47 4c 4d 41 2f 4d 59 35 34 43 4e 41 31 2f 30 4e 41 54 47 4c 4f 20 54 4d 4b 4c