Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl binary file reading

by kepler (Scribe)
on May 02, 2016 at 19:22 UTC ( [id://1162033]=perlquestion: print w/replies, xml ) Need Help??

kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having some troubles reading a binary file. That file has several records of 36 bytes, with about 4 fields with different lengths. First I've opened the file, set the binmode, and tryed to retrieve all the data in a variable. Then, I've tryed to isolate the fields with substr, and switched the \x00 characters into space... The problem is that the file is about 15 MB - and I only can read about 2Kb. So some caracheter - wich might be needed (some ascii values correspond to a integer number) - is messing things up....Any sugestion? Regards, Kepler

Replies are listed 'Best First'.
Re: Perl binary file reading
by choroba (Cardinal) on May 02, 2016 at 19:32 UTC
    Without seeing the code, we can't help you. If you wrote it the way you should, it should have worked.

    Note that unpack might be more suitable to extract the values than substr.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Hi, Sorry. Here's the code:
      my $filename = "Words.txt"; open(my $fh, '<', $filename) or die "Could not open file '$filename' $ +!"; binmode $fh; my $data = <$fh>; $data =~ s/\x00/\ /gi; close $fh; my $length = length($data); my $count = 0; for(my $i=0;$i < $length/36;$i++){ my $name = substr($data,$i*36,28); my $book = ord(substr($data,$i*36 + 28,1)) + 1; my $v1 = ord(substr($data,$i*36 + 29,1)) + 1; my $v2 = ord(substr($data,$i*36 + 30,1)) + 1; print $name . " - " . "$book\:$v1\:$v2\n"; $count += 1; }
      Stops after 38 records - it should go up to some thousands... The lenght says its about only 1300 characters - it should be also some thousands... Regards, Kepler
        > my $data = <$fh>;

        Do you know what implements the diamond operator? readline! You're reading up to the first newline only, binmode doesn't change this. Use read instead.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
        What do you get if you try this on the file?
        #!/usr/bin/perl use strict; use warnings; my $fname = "Words.txt"; my $fsize = -s $fname; open( my $fh, '<', $fname ) or die "$fname: $!\n"; binmode $fh; $/ = undef; my $data = <$fh>; my $length = length( $data ); my $nnulls = tr/\x00/ /; print "Read $length of $fsize bytes, changed $nnulls nulls to spaces\n +";
        That will just report the basic numbers to see whether you've read the entire file, and how many null bytes it contains.

        As for the stuff you're doing with ord(substr(...)) + 1; ... that looks like the sort of thing that should be done when you still have null bytes in the file (i.e. without converting the nulls to spaces). And it looks like the sort of thing you should be doing with unpack, as others have mentioned.

Re: Perl binary file reading
by talexb (Chancellor) on May 02, 2016 at 19:53 UTC

    If it's binary data, it might have a ^D, which indicates the end of a stream of data. That's on Linux .. and I think a ^Z is used for Windows. I'm not positive about either of those, but it's somewhere to start looking.

    If this is a file of records, hopefully the records are of a fixed length. If not, that makes things a little more challenging.

    Some code would be helpful -- there are many ways to solve this, but we can offer much more useful solutions if we know where you're starting from. PS And please use code tags around the code. :)

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      If using a console text input stream, a ^D on Unix or a ^Z on Windows will indeed result in an EOF for the user program.

      With an HD file, there is no such a thing! Because all files are binary on the hard disk. The file system returns "end of file" to the function reading when all bytes (in the intrinsically binary file) on disk are consumed. The file system always calculates EOF based upon number of bytes in the file. There is no ^D or ^Z at the "end of the file". This is needed for a text input stream because there is no limit on the number of bytes (characters).

      Using read and a big buffer like 50 MB could be ok on a modern computer. Reading in smaller chunks is fine, but more complicated.

      If it's binary data, it might have a ^D, which indicates the end of a stream of data. That's on Linux .. and I think a ^Z is used for Windows.

      Should be no problem with binmode.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Right -- that was my point. If they're using readline, they're subject to the rules of reading text (break the chunks into lines, stop at the 'end of file'). If they're using read, then they'll just get chunks of the file up to the file size.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Perl binary file reading
by GotToBTru (Prior) on May 02, 2016 at 19:33 UTC

    You've obviously got code .. why not share it with us? Just what is necessary to see how you are getting data from the file, and how you are trying to extract the fields from it.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1162033]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-25 23:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found