Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Argument "" isn't numeric in numeric le (<=)

by gone2015 (Deacon)
on Aug 24, 2009 at 15:41 UTC ( [id://790846] : note . print w/replies, xml ) Need Help??

in reply to Argument "" isn't numeric in numeric le (<=)

I do not understand how the line consisting of dashes enter the if in which i (should only) have two numeric values.

The problem is that the regular expression in:

if($line2=~ m/(\d*)\s*([A-Z]*)\s*(\d*)-(\d*)/)
accepts anything that has at least one '-' in it. Remember that \d* accepts zero or more digits, and \s* accepts zero or more white-space characters, etc. So this regex will match zero or more digits, followed by zero or more white-space characters, followed by zero or more [A-Z] characters, etc. The only thing that has to be present is a '-' to get a match.

I suspect that you would be perfectly happy with:

if($line2=~ m/(\d+)\s*([A-Z]+)\s*(\d+)-(\d+)/)
where you are using the regex to do two things at once: (a) recognise the lines that contain the information you wish to process further, and (b) parse those lines to extract that information.

If you are supremely confident that (a) the file you are processing always contains correctly formed lines, and (b) that your code recognises those correctly formed lines, then all will be well. In general I think it is wise to check the lines that are being rejected by the regex and warn about any whose format is not recognised. It's extra work to start with, but can save your bacon if some huge file at some future date contains broken data or stuff in a form you haven't catered for.

Replies are listed 'Best First'.
Re^2: Argument "" isn't numeric in numeric le (<=)
by hotel (Beadle) on Aug 24, 2009 at 17:53 UTC
    Thank you for your comments and replies. I fixed the code after the first message by changing the * to +, and it works fine.

    ps: oshall, thank you for your advices. I try to follow most of them when i'm dealing with large files.

    But I still do not understand why Perl throws this warning for the dashed lines which do not even go into the loop in which the comparison takes place, instead of pointing to the lines that cause the problem?

      That's a different question...

      ...the while(substr($line2, 0 , 3) ne "---") loop will certainly stop when $line2 is dashed. However, you enter the loop with $line2 set to the "Gene:" line you just processed, and at the top of the loop you read the next line. So, a dashed line is processed in the loop, and then brings the loop to a halt.

      The inner loop could be recast:

      while ($line2=<geneREAD>) { chomp $line2 ; if (substr($line2, 0 , 3) eq "---") { break ; } ; .... } ;
      ...mind you, you might want to check that the "Gene:" line is followed by something ? But that is part of the general problem of verifying the input.