parsing file/regex question

smackdab has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: parsing file/regex question by tadman (Prior) on Oct 23, 2003 at 20:40 UTC
Are those `\n` characters supposed to be newlines? Try this: `while (<DATA>) { s/\\n/\n/g; print "yep\n" if /$PRE($VALID1+)$PST/; }` [download] Two yeps.	[reply] [d/l]
Bit more help on this? Re: Re: parsing file/regex question by smackdab (Pilgrim) on Oct 23, 2003 at 21:31 UTC
Thanks, that does make sense, but does "break" the data driven approach I am trying to come up with...I have expanded the example and maybe someone will come up with a different idea...if not I'll do it the way suggested ;-) $PRE = '\[\s('; $VALID1 = '[-a-zA-Z0-9_. \t\n]'; $VALID2 = '[-a-z0-9_.\n]'; $VALID3 = '[a-zA-Z]'; $VALID4 = '[-a-zA-Z0-9]'; $PST = ')\s\]'; while (<DATA>) { s/\\n/\n/g; #Are these harmless if s/\\t/\t/g; #not needed??? print "yep\n" if m/$PRE($VALID1+)$PST\s* $PRE($VALID2+)$PST\s* $PRE($VALID3+)$PST\s* $PRE($VALID4+)$PST\s* /x; } __DATA__ [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST\tDATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST\tDATA ]\n [download]	[reply] [d/l]
One more q: on: parsing file/regex question by smackdab (Pilgrim) on Oct 23, 2003 at 23:25 UTC
Thanks for all of the help on this so far...I took the suggestions and I expanded the sample program to see if that makes a difference... I am hoping to get this as data driven as possible to reduce errors (especially when I cut-n-paste ;-) I am looking to process some lines in a file and validate text (I am not yet using Taint, but will at some point ;) My problem is how to validate \n or \t, as sometimes it is allowed in the text field. The following code should work, but I just want to make sure that the s/\\t/\t/g; (and the other ones that I might need) are the best way to go) Thanks again for any help!!!! $PRE = '\[\s'; $VALID1 = '[-a-zA-Z0-9_. \t\n]'; $VALID2 = '[-a-z0-9_.\n]'; $VALID3 = '[a-zA-Z]'; $VALID4 = '[-a-zA-Z0-9]'; $PST = '\s\]'; while (<DATA>) { s/\\n/\n/g; #Are these harmless if s/\\t/\t/g; #not needed??? print "yep\n" if m/$PRE($VALID1+)$PST $PRE($VALID2+)$PST $PRE($VALID3+)$PST $PRE($VALID4+)$PST /ox; } __DATA__ [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST\tDATA ]\n [TEST \n DATA] [ TEST DATA ] [ 2345423 ] [ TEST DATA ]\n [download]	[reply] [d/l]
Re: One more q: on: parsing file/regex question by graff (Chancellor) on Oct 24, 2003 at 03:35 UTC
You didn't say which (if any) of the three data records is supposed to yield "yep"... it looks like none of them will, because $VALID3 specifies letters only, and all three data lines have only digits in the third field. Also, for any of them to match, $PST should include "\s" after the close bracket, as well as before it (or maybe this should be added before the open bracket in $PRE). You do have the right notion for converting a literal (two character) '\n' or '\t' into the corresponding regex for the given type of whitespace. Note that some portions of your regexes can be simplified: `[a-zA-Z0-9_]` is really just "\w", and if you want to match space, newline and tab, you might as well just use "\s". Are $VALID1 and $VALID2 really supposed to accept periods and asterisks, as well as alphanumerics and whitespace? (Just checking... sometimes people tend to make the mistake of putting "." inside of square brackets when they really have something else in mind.)	[reply] [d/l]
Re: parsing file/regex question by tcf22 (Priest) on Oct 23, 2003 at 20:49 UTC
I'm assuming that the '\n' in DATA are actually new lines. Maybe something like this is what you want: `my $re = qr/\[\s([-a-zA-Z0-9_.\s]+)\s*\]/; my ($last); while (<DATA>) { chomp; $_ = $last.$_ if($last); if (/$re/){ print "yep\n"; $last = ''; }elsif(substr($_, -1, 1) ne ']'){ $last = $_; }else{ $last = ''; } } __DATA__ [TEST DATA] [ TEST DATA ]` [download] - Tom	[reply] [d/l]
Re: parsing file/regex question by TomDLux (Vicar) on Oct 23, 2003 at 21:05 UTC
The regular expression doesn't change, so you should notify Perl that it's safe to compile it once,. rather than each time that line is encountered. For two invocations, it obviously doesn't matter, but small scripts have a habit of growing, so you might as well get off to a good start right away: `if /$PRE($VALID1+)$PST/o` You parenthesize `$Valid1` twice, once as the last character of `$PRE` and the first character of `$PST`, then explitly when you string them together: `/$PRE($VALID1+)$PST/`. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]
Re: Re: parsing file/regex question by Anonymous Monk on Oct 24, 2003 at 08:36 UTC
/o must go (look around for many threads about the /o bug ...)	[reply]


Welcome to the Monastery
	PerlMonks