Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

sysread and null characters

by ggg (Scribe)
on Mar 24, 2005 at 04:15 UTC ( [id://441956]=perlquestion: print w/replies, xml ) Need Help??

ggg has asked for the wisdom of the Perl Monks concerning the following question:

The goal here is two-fold:
A) to see what the distribution of digits is in the value of PI and
B) to play with some parts of Perl that are new to me.

This is an attempt to read in the digits of PI stored in an external file. I wanted to look at how the distribution of numbers in PI varies with increasingly large numbers of digits. There are some web sites the have PI to a billion places.
In keeping with goal B, I want to read in the digits one-at-a-time, incrementing a hash as I go. The only commands I could find to read in characters individually were GETC and SYSREAD; the latter looked like the best bet. The sysread syntax is shown in line 7 and the command is in line 16, followed by 5 lines of diagnostics.
The problem is that, even though I re-declare the scalar, $digit, in each while loop, $digit is accumulating null characters - one for each loop. The null chars are first with the digit which was read being at the end of the string of nulls.
Most of the code at this point is for trying to figure out what's going on. Lines 17 & 21 print the length of $digit and it's contents before and after a crude attempt to remove anything from the variable that isn't a digit. The output listing shows that I haven't altered the length at all.

So, I've got two questions: Where are the nulls coming from and why can't I regex them out of the variable (at line 19)?

#!/opt/bin/perl -w my %distr; my $i = -1; my $c = 1; # line 5 # sysread Filehandle, Scalar, Length, Offset open DF, "<Pi-00009" or die "File, DATA, is missing: $!"; print "# Before_Trim\tAfter_Trim\n"; # line 10 print "# Len\tdigit\tLen\tdigit\n"; # print "\tOffset\tLength\tScalar\n"; while ($i<=7){ my $digit; # line 15 sysread DF, $digit, $c, ++$i; print "# ",length($digit),"\t$digit\t"; $_ = $digit; s/(\d)/$1/; $digit = $_; # line 20 print length($digit),"\t$digit\n"; if ($digit =~ /[\d]/){ # print "\t",$i,"\t",$c,"\t",$digit,"\n"; $distr{$digit} += 1; }; # line 25 } close DF; __OUTPUT__ # Before_Trim After_Tri; # line 30 # Len digit Len digit # 1 3 1 3 # 2 . 2 . # 3 1 3 1 # 4 4 4 4 # 5 1 5 1 # 6 5 6 5 # 7 7 # 8 9 8 9 # 9 2 9 2

My eyes are glazing over and my mind is mush. (typical programming session for me :-} ) I'll revisit this tomorrow.

TIA

ggg

Replies are listed 'Best First'.
Re: sysread and null characters
by Tanktalus (Canon) on Mar 24, 2005 at 04:26 UTC

    I'm trying to figure out why you want the "++$i" on your sysread line. Eliminate that - you'll probably be much closer to what you want.

    What's happening is that you're reading into an offset of $digit. Perl sees that $digit is empty, and prepends all the nulls to that point.

    Then you won't need to do any funkiness. Such as the next bit, which I'll pretend is still needed (it shouldn't be):

    $_ = $digit; s/(\d)/$1/; $digit = $_;
    You're assigning $digit to the global $_, then changing the digit in it to ... itself (and not changing anything else), then reassigning back. Choices:
    #1 $_ = $digit; s/.*(\d).*/$1/; # change everything to the captured digit $digit = $_; #2 $_ = $digit; m/(\d)/ && $digit = $1 # change $digit to be the captured digit #3 $digit =~ m/(\d)/ && $digit = $1 #4 $digit =~ s/.*(\d).*/$1/;
    Hope that helps!

      Your example #4 is what I had tried first, but without the ".*" parts. I'm still not sure why they're needed. What I thought I was asking for was a match with any digit no matter where it was in the string. My bad.

      At this point, after adding the ".*", it's doing just what I expected. Hurray!

      As to the need for ++$i; don't I need to change the offset each time I read the file in order to step further into the file to get the next digit? You seem to be implying that that's unneeded. Does sysread auto-increment it's own offset when it's used in a loop? If I just remove the ++, I get a runaway loop, so I guess not. What did you have in mind, please?

      I could leave it just the way it is, but I'd rather understand sysread a little better.

      ggg

        Ok, maybe I should have been a wee bit more explicit on the sysread part. What you want is:

        sysread DF, $digit, $c;
        The fourth parameter to sysread is called "offset". But it's not the offset into the file, it's the offset into the buffer (in your case, $digit). All file-reading functions read from the "current file position". Always. You can change the "current file position" on physical files (using seek), but not on all filehandles (e.g., a pipe from another process, or a socket). The act of reading from (or writing to) a file handle implicitly advances the position.

        The purpose of the offset in sysread, then, is to automatically concatenate multiple reads in a single buffer. This is not what you're doing.

        The reason for the .* parts in a substitution (s///) operator is to have the regular expression match the whole string. This way you're replacing the whole string rather than just replacing the digit (with itself). You may want to peruse the Regular expression tutorial and/or the regular expression reference for more info here.

Re: sysread and null characters
by BUU (Prior) on Mar 24, 2005 at 06:37 UTC
    Because I just couldn't resist:  perl -le'$/=\1024; while(<>){ $x[$_]++ for split//}' /path/to/pi Note that I set $/ to a reference to 1024, which causes the <> operator to read chunks of 1024 characters at a time. Also note that I used an array to store the values in stead of a hash, with a total of 9 digits, an array should be faster and more space efficient.

    Update: as frodo72 pointed out, it was indeed missing a closing curly bracket. Fixed.
      Some corrections and new ideas; you want to be able to output your results, right?
      perl -F// -lane 'BEGIN{$/=\1024} $x[$_]++for@F; END{print $x[$_]for 0..9}' /path/to/pi
      Too bad the -O command switch can't take \1024. That's a cool trick.

      --
      [ e d @ h a l l e y . c c ]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://441956]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2024-04-23 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found