Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: =~ matches non-existent symbols

by Anonymous Monk
on Nov 17, 2014 at 20:57 UTC ( [id://1107476]=note: print w/replies, xml ) Need Help??


in reply to =~ matches non-existent symbols

Okay, so, I slightly corrected my original code, and now it works fine.

#!/usr/bin/perl -w use strict; open (INPUT_FILE, "$ARGV[0]") || die "can't open file: $!"; if ( <INPUT_FILE> =~ m/[^actgACTG\s]/ ) { print "File contains something besides actg sequence.\n"; } else { print "good!\n"; } close INPUT_FILE;

Are there any potential problems with this code?

Replies are listed 'Best First'.
Re^2: =~ matches non-existent symbols
by graff (Chancellor) on Nov 17, 2014 at 22:30 UTC
    Apart from what Laurent_R mentioned, this latest version won't tell you anything about what sort of unexpected stuff is showing up in the data (my version will do that). Maybe that's not important to you in this particular process, but when I have to work with defective or unreliable input, I find that it's very helpful to be able to see what's wrong with the data.

    BTW, in case my last reply wasn't clear, here's what I was talking about:

    #!/usr/bin/perl use strict; use warnings; $/ = undef; # slurp-mode for input, just in case while ( <> ) { # reads stdin or all file names in ARGV s/\s+//g; # remove whitespace my $content = $_; # keep a working copy tr/ACGTacgt//d; # remove all acgt if ( length() ) { # anything left? print "$ARGV bad content: $_\n"; do_something_with_bad_data( $ARGV, $content ); } else { print "$ARGV all clean!\n"; do_something_with_good_data( $ARGV, $content ); } } sub do_something_with_bad_data { my ( $filename, $data ) = @_; # . . . fix it? report it to someone? } sub do_something_with_good_data { my ( $filename, $data ) = @_; # . . . whatever you want to do }
Re^2: =~ matches non-existent symbols
by Laurent_R (Canon) on Nov 17, 2014 at 22:02 UTC
    It looks it will work, but only insofar you have only one long line in your file. If your file comes with more than one line, you're in trouble. I would use a loop or some other mechanism to make sure it will still work fine the day I get two or more lines. Below, I localized $/ (the input record separator) so that the whole file will be slurped into the scalar.

    As a side note, there are some commonly agreed best practices in the Perl community. Among them:

    • use the use warnings; pragma rather than the -w flag
    • Use lexical filehandles rather than bareword filehandles
    • Use the three-argument syntax for the open function
    Putting all this together, this a possible (untested) rewrite of your script:
    #!/usr/bin/perl use strict; use warnings; my $infile = shift; open my $INPUT_FILE, "<", $infile" or die "can't open $infile: $!"; local $/; # the whole file will be slurped, even if it has several lin +es my $dna = <$INPUT_FILE>; if ( $dna =~ m/[^actg\s]/i ) { print "File contains something besides actg sequence.\n"; } else { print "good!\n"; } close $INPUT_FILE;

Re^2: =~ matches non-existent symbols
by AnomalousMonk (Archbishop) on Nov 17, 2014 at 22:27 UTC
    if ( <INPUT_FILE> =~ m/[^actgACTG\s]/ ) {
        ...

    Are there any potential problems with this code?

    The character class  \s includes  ' ' (space, 0x20) and IIRC  \t \n \r \f other whitespace characters. Your test allows the string read from the file to have any number of any combination of these characters. Please see perlrecharclass.

    I must say that I don't understand your desparate, last-ditch efforts to avoid the use of chomp, for it seems very likely that the line you're reading from your file is newline-terminated (whatever a newline happens to be in your OS). Here's how I might handle the file-read-and-validate portion of your program (untested):

    use warnings; use strict; die "no filename given" unless @ARGV; my $filename = $ARGV[0]; open my $fh_input, '<', $filename or die "opening '$filename': $!"; my @lines = <$fh_input>; die "no lines read from '$filename': $!" unless @lines; close $fh_input or die "closing '$filename': $!"; chomp @lines; die "more than one line in '$filename'" unless @lines == 1; my $line = $lines[0]; die "'$filename' contains something other than ACTG sequence" if $line =~ m{ [^actgACTG] }xms; my $result = do_something_with($line); print "result is: 'result'"; exit; sub do_something_with { ... }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1107476]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-25 19:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found