Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^5: Error binmode() on unopened filehandle

by Marshall (Canon)
on May 03, 2020 at 15:41 UTC ( [id://11116390]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Error binmode() on unopened filehandle
in thread Error binmode() on unopened filehandle

Ok, spent some more time fiddling with this. I now see what you mean (code attached).
I have never read a binary file with the <> angle operator. That idea would have never have occured to me. I have always used a read() specifying the num of bytes to read as shown below. This implies that although this "angle read" appeared to work in my .jpg example, there could be some CRLF sequence in the data that would cause this .jpg read to fail. I find that interesting to know. Good point.

I guess bottom line: Don't use <> bracket when reading binary files!

#!/usr/bin/perl use strict; use warnings; my $data = <<EOF; first second EOF print "data var in text mode - this works...\n"; print "$data\n"; print "----\n"; open my $fh, '<', \$data; binmode $fh; my $num_bytes = read ($fh, my $buf, 20000); print "read () binary doesn't completely work..the normal way to read +binary\n"; print "this is Windows machine and I don't see both CR and LF characte +rs\n"; print "but I think that is due to Perl translation of line endings\n"; print "bytes read = $num_bytes\n"; print '',$buf; print "----\n"; print "using angle operator for binary read doesn't work\n"; print "I've never tried this before and I'm not sure why\n"; print "this doesn't work - need explanation of the angle <>op.\n"; close $fh; open $fh, '<', \$data or die "$!"; binmode $fh; my $bdata = <$fh>; print '',$bdata; __END__ data var in text mode - this works... first second ---- read () binary doesn't completely work..the normal way to read binary this is Windows machine and I don't see both CR and LF characters but I think that is due to Perl translation of line endings bytes read = 13 first second ---- using angle operator for binary read doesn't work I've never tried this before and I'm not sure why this doesn't work - need explanation of the angle <>op. first

Replies are listed 'Best First'.
Re^6: Error binmode() on unopened filehandle
by haukex (Archbishop) on May 03, 2020 at 16:05 UTC
    Once you go to BINMODE on a file handle, a record separator makes no sense.

    I think I understand where you're coming from: when reading a binary file, it often makes more sense to use read instead of readline (aka <>), and I personally would probably use the former.

    However, I also see several incorrect statements mixed in your nodes, like "Use of the DATA file handle is "special". Your initial premise that you could read binary data from the DATA file handle is wrong. That data will be in a character format." - this is wrong, see my node here.

    I guess bottom line: Don't use <> bracket when reading binary files!

    DATA is just another filehandle, and readline is not that magical, it can be used to read any filehandle (whether DATA, a binary file, etc.), as long as you pay attention to $/. For example, you can set $/ to a reference to an integer, and then readline will read "records" from the file, very much like read does.

    Update:

    print "this is Windows machine and I don't see both CR and LF characte +rs\n"; print "but I think that is due to Perl translation of line endings\n";

    binmode turns off the CRLF to LF conversion, so if you're not seeing CRLF line endings (not sure how you determined that?) then that means the source file has only LF instead of CRLF line endings. Update 2: Hmm, see replies.

    Minor edits for clarity.

      That is interesting.

      I also would use read() for reading a binary file.

      binmode turns off the CRLF to LF conversion, so if you're not seeing CRLF line endings (not sure how you determined that?) then that means the source file has only LF instead of CRLF line endings.
      the Perl source file is written on Windows machine with CRLF line endings.
      n_bytes is 13, which is 2 short.
      I am a bit perplexed about that.
      This:

      my $data = <<EOF; first second EOF
      evidently deletes the <CR> characters.

      Update:

      use strict; use warnings; open (my $out, '>', "test_endings.txt") or die "$!"; print $out "first\n"; print $out "second\n"; close $out; open (my $in, '<', "test_endings.txt") or die "$!"; binmode $in; my $num_bytes = read ($in, my $buf, 20000); print "bytes read = $num_bytes\n"; ## prints 15 The <CR>'s are there in bin mode
        This:
        my $data = <<EOF; first second EOF
        evidently deletes the <CR> characters.

        Hmm, I'm quite surprised by that, and I'm still looking for the place where that's documented. Even trying to turn off the default :crlf layer on Windows doesn't seem to restore the CRLFs in $data. In addition, even on *NIX, eval "<<BAR\r\nx\r\ny\r\nBAR" causes the returned value to have only \n's, so it appears to be something to do with how heredocs are parsed. In fact, I've reported a bug.

            my $data = <<EOF;
            ...
            EOF

        evidently deletes the <CR> characters.

        As I understand it, in this particular case the CRs (carriage returns) are never there (in $data) to begin with. A here-doc is just another way to compose a string, in this case with double-quote interpolation (but that has no bearing here). Each line ends in a single  \n (newline) character.

        Writing such a line to a Windoze "text"-mode (i.e., non-binmode-ed) file causes CRs to be added. This can be seen with an "ordinary" string containing newlines that is written in "text" mode and then read back binmode-ed:

        c:\@Work\Perl\monks\Marshall>perl -wMstrict -e "use autodie; ;; use Data::Dump qw(dd); ;; my $s = qq{first\nsecond\n}; dd 's:', $s; print 'length: ', length $s, qq{\n}; ;; { open my $fh, '>', 'junque'; print $fh $s; close $fh; } ;; { open my $fh, '<', 'junque'; binmode $fh; my $t = do { local $/; <$fh>; }; dd 't:', $t; print 'length: ', length $t, qq{\n}; close $fh; } " ("s:", "first\nsecond\n") length: 13 ("t:", "first\r\nsecond\r\n") length: 15


        Give a man a fish:  <%-{-{-{-<

Re^6: Error binmode() on unopened filehandle
by jo37 (Deacon) on May 03, 2020 at 15:52 UTC
    I guess bottom line: Don't use <> bracket when reading binary files!

    I don't see a problem with this, as long as you use binmode and local $/

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
Re^6: Error binmode() on unopened filehandle
by ikegami (Patriarch) on May 04, 2020 at 03:33 UTC

    Naw, you just need to set $/ appropriately. There is no real difference between

    binmode $fh; read($fh, my $buf, 20000);
    and
    binmode $fh; local $/ = \20000; my $buf = <$fh>;

    Of course, both are junk. Why would you only read the first 20,000 bytes? The following make more sense:

    binmode $fh; my $file = ''; 1 while read($fh, $file, 8*1024, length($file));
    or
    binmode $fh; local $/; my $file = <$fh>;
      Of course, both are junk. Why would you only read the first 20,000 bytes?

      There are a lot of scenarios where you might want to read the first part of a file without reading the whole file. I think there are some Unix file commands that read the first 1-2K of a file to determine if the file is text or binary? Perhaps I want to concatenate some big .WAV files together. There is some header info at the beginning of these files that needs to be interpreted. In the OP's question, this is a single .jpg and there is no reason to read the file in "hunks" because the image has to be processed as a single unit. However, other scenarios do exist.

      I do commend you for the choice of 8*1024 as buf size. That is a very good number with most file systems. Certain byte boundaries are important for the file system to work efficiently.

        Re "There are a lot of scenarios", Maybe, but the discussion at hand is about reading the entire file.

        I used 8*1024 because read reads in 8 KiB chunks anyway.

        $ perl -e'print "x" x 100_000' \ | strace perl -e'read(\*STDIN, my $buf, 100_000)' 2>&1 \ | grep -P 'read\(0,' read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 1696

        But the parameter refers to the number of character to return, which could be different than the number of bytes read if an :encoding layer is used. So really, the number I picked is nothing to praise. If you want efficiency, it's probably best to use sysread with a very large number and decode afterwards.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116390]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-16 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found