Re^5: Error binmode() on unopened filehandle

Ok, spent some more time fiddling with this. I now see what you mean (code attached).
I have never read a binary file with the <> angle operator. That idea would have never have occured to me. I have always used a read() specifying the num of bytes to read as shown below. This implies that although this "angle read" appeared to work in my .jpg example, there could be some CRLF sequence in the data that would cause this .jpg read to fail. I find that interesting to know. Good point.

I guess bottom line: Don't use <> bracket when reading binary files!

#!/usr/bin/perl

use strict;
use warnings;

my $data = <<EOF;
first
second
EOF

print "data var in text mode - this works...\n";
print "$data\n";
print "----\n";

open my $fh, '<', \$data;
binmode $fh;
my $num_bytes = read ($fh, my $buf, 20000);
print "read () binary doesn't completely work..the normal way to read 
+binary\n";
print "this is Windows machine and I don't see both CR and LF characte
+rs\n";
print "but I think that is due to Perl translation of line endings\n";
print "bytes read = $num_bytes\n"; 
print '',$buf;

print "----\n";
print "using angle operator for binary read doesn't work\n";
print "I've never tried this before and I'm not sure why\n";
print "this doesn't work - need explanation of the angle <>op.\n";
close $fh;
open $fh, '<', \$data or die "$!";
binmode $fh;
my $bdata = <$fh>;
print '',$bdata;

__END__
data var in text mode - this works...
first
second

----
read () binary doesn't completely work..the normal way to read binary
this is Windows machine and I don't see both CR and LF characters
but I think that is due to Perl translation of line endings
bytes read = 13
first
second
----
using angle operator for binary read doesn't work
I've never tried this before and I'm not sure why
this doesn't work - need explanation of the angle <>op.
first
[download]

Comment on Re^5: Error binmode() on unopened filehandle Download Code

Replies are listed 'Best First'.
Re^6: Error binmode() on unopened filehandle by haukex (Archbishop) on May 03, 2020 at 16:05 UTC
Once you go to BINMODE on a file handle, a record separator makes no sense. I think I understand where you're coming from: when reading a binary file, it often makes more sense to use read instead of readline (aka `<>`), and I personally would probably use the former. However, I also see several incorrect statements mixed in your nodes, like "Use of the DATA file handle is "special". Your initial premise that you could read binary data from the DATA file handle is wrong. That data will be in a character format." - this is wrong, see my node here. I guess bottom line: Don't use <> bracket when reading binary files! `DATA` is just another filehandle, and readline is not that magical, it can be used to read any filehandle (whether `DATA`, a binary file, etc.), as long as you pay attention to $/. For example, you can set $/ to a reference to an integer, and then readline will read "records" from the file, very much like read does. Update: `print "this is Windows machine and I don't see both CR and LF characte +rs\n"; print "but I think that is due to Perl translation of line endings\n";` [download] binmode turns off the CRLF to LF conversion, so if you're not seeing CRLF line endings (not sure how you determined that?) ~~then that means the source file has only LF instead of CRLF line endings~~. Update 2: Hmm, see replies. Minor edits for clarity.	[reply] [d/l] [select]
Re^7: Error binmode() on unopened filehandle by Marshall (Canon) on May 03, 2020 at 21:17 UTC
That is interesting. I also would use read() for reading a binary file. binmode turns off the CRLF to LF conversion, so if you're not seeing CRLF line endings (not sure how you determined that?) then that means the source file has only LF instead of CRLF line endings. the Perl source file is written on Windows machine with CRLF line endings. n_bytes is 13, which is 2 short. I am a bit perplexed about that. This: `my $data = <<EOF; first second EOF` [download] evidently deletes the <CR> characters. Update: `use strict; use warnings; open (my $out, '>', "test_endings.txt") or die "$!"; print $out "first\n"; print $out "second\n"; close $out; open (my $in, '<', "test_endings.txt") or die "$!"; binmode $in; my $num_bytes = read ($in, my $buf, 20000); print "bytes read = $num_bytes\n"; ## prints 15 The <CR>'s are there in bin mode` [download]	[reply] [d/l] [select]
Re^8: Error binmode() on unopened filehandle by haukex (Archbishop) on May 03, 2020 at 22:52 UTC
This: `my $data = <<EOF; first second EOF` [download] evidently deletes the <CR> characters. Hmm, I'm quite surprised by that, and I'm still looking for the place where that's documented. Even trying to turn off the default `:crlf` layer on Windows doesn't seem to restore the CRLFs in `$data`. In addition, even on *NIX, `eval "<<BAR\r\nx\r\ny\r\nBAR"` causes the returned value to have only `\n`'s, so it appears to be something to do with how heredocs are parsed. In fact, I've reported a bug.	[reply] [d/l] [select]
Re^9: Error binmode() on unopened filehandle by Marshall (Canon) on May 07, 2020 at 04:05 UTC
Re^8: Error binmode() on unopened filehandle by AnomalousMonk (Archbishop) on May 04, 2020 at 00:38 UTC
`my $data = <<EOF;` `...` `EOF` evidently deletes the <CR> characters. As I understand it, in this particular case the CRs (carriage returns) are never there (in `$data`) to begin with. A here-doc is just another way to compose a string, in this case with double-quote interpolation (but that has no bearing here). Each line ends in a single `\n` (newline) character. Writing such a line to a Windoze "text"-mode (i.e., non-binmode-ed) file causes CRs to be added. This can be seen with an "ordinary" string containing newlines that is written in "text" mode and then read back `binmode`-ed: `c:\@Work\Perl\monks\Marshall>perl -wMstrict -e "use autodie; ;; use Data::Dump qw(dd); ;; my $s = qq{first\nsecond\n}; dd 's:', $s; print 'length: ', length $s, qq{\n}; ;; { open my $fh, '>', 'junque'; print $fh $s; close $fh; } ;; { open my $fh, '<', 'junque'; binmode $fh; my $t = do { local $/; <$fh>; }; dd 't:', $t; print 'length: ', length $t, qq{\n}; close $fh; } " ("s:", "first\nsecond\n") length: 13 ("t:", "first\r\nsecond\r\n") length: 15` [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^9: Error binmode() on unopened filehandle by Marshall (Canon) on May 07, 2020 at 03:53 UTC
Re^6: Error binmode() on unopened filehandle by jo37 (Deacon) on May 03, 2020 at 15:52 UTC
I guess bottom line: Don't use <> bracket when reading binary files! I don't see a problem with this, as long as you use `binmode` and `local $/` Greetings, -jo `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply] [d/l] [select]
Re^6: Error binmode() on unopened filehandle by ikegami (Patriarch) on May 04, 2020 at 03:33 UTC
Naw, you just need to set `$/` appropriately. There is no real difference between `binmode $fh; read($fh, my $buf, 20000);` [download] and `binmode $fh; local $/ = \20000; my $buf = <$fh>;` [download] Of course, both are junk. Why would you only read the first 20,000 bytes? The following make more sense: `binmode $fh; my $file = ''; 1 while read($fh, $file, 8*1024, length($file));` [download] or `binmode $fh; local $/; my $file = <$fh>;` [download]	[reply] [d/l] [select]
Re^7: Error binmode() on unopened filehandle by Marshall (Canon) on May 07, 2020 at 00:09 UTC
Of course, both are junk. Why would you only read the first 20,000 bytes? There are a lot of scenarios where you might want to read the first part of a file without reading the whole file. I think there are some Unix file commands that read the first 1-2K of a file to determine if the file is text or binary? Perhaps I want to concatenate some big .WAV files together. There is some header info at the beginning of these files that needs to be interpreted. In the OP's question, this is a single .jpg and there is no reason to read the file in "hunks" because the image has to be processed as a single unit. However, other scenarios do exist. I do commend you for the choice of 8*1024 as buf size. That is a very good number with most file systems. Certain byte boundaries are important for the file system to work efficiently.	[reply]
Re^8: Error binmode() on unopened filehandle by ikegami (Patriarch) on May 07, 2020 at 18:57 UTC
Re "There are a lot of scenarios", Maybe, but the discussion at hand is about reading the entire file. I used 81024 because `read` reads in 8 KiB chunks anyway. $ perl -e'print "x" x 100_000' \ \| strace perl -e'read(\STDIN, my $buf, 100_000)' 2>&1 \ \| grep -P 'read\(0,' read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 read(0, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 1696 [download] But the parameter refers to the number of character to return, which could be different than the number of bytes read if an :encoding layer is used. So really, the number I picked is nothing to praise. If you want efficiency, it's probably best to use `sysread` with a very large number and decode afterwards.	[reply] [d/l] [select]
Re^9: Error binmode() on unopened filehandle by Marshall (Canon) on May 08, 2020 at 22:22 UTC
Re^10: Error binmode() on unopened filehandle by ikegami (Patriarch) on May 09, 2020 at 08:27 UTC
Some notes below your chosen depth have not been shown here


Just another Perl shrine
	PerlMonks