more useful options | |
PerlMonks |
RegEx to match at least one non-adjacent termby Cefu (Beadle) |
on Dec 07, 2007 at 15:44 UTC ( [id://655675]=perlquestion: print w/replies, xml ) | Need Help?? |
Cefu has asked for the wisdom of the Perl Monks concerning the following question: Monks, I appologize in advance for this being a RegEx rather than Perl specific question. I'm trying to clean up a list of data entered in a free-text field (who needs validation anyway). For the most part the data consists of one or more numbers (which I want to keep) and sometimes color "words" which might appear before, after or between the number. I want to discard some specific colors but not other colors or other text. I'm trying to craft a regex to match the following and remove it: So my current regex substitution looks something like: s/\s*\(?\s*re?d?\s*\)?\s*//gi
This seemed to be working flawlessly until my spot checks revealed the following humorous example:
12345 Gray 6789 Red => 12345 Gay 6789 To avoid workplace embarrassment I thought it best to make sure that the bit I was removing occurred either just before or just after a number rather than in the middle of other text. So my thought is to modify the regex to somthing like
s/(\d?)\s*\(?\s*re?d?\s*\)?\s*(\d?)//gi The problem is that I can't leave both digits optional (as shown) or I'm still in the same boat. I also can't make either one mandatory or I'm dictating a before-number-only or after-number-only match. What I really want is one or the other (or both) but not neither. As you might guess from the parens around the digits, I also considered checking what matched in the second part and substituting back in the original if I didn't see a digit. However, I ran in to various problems ($1 being undefined, the A?B:C syntax not working inside the regex, etc.) So, is there some nice way to do this in a single regex? Can I somehow ask that the regex match one or more of two disjointed parts? Thanks, Update: Found a solution that almost matches my requirements and is actually better for what I needed. I'd downvote myself for not thinking this through first if I could. :) s/(^|\d)\s*\(?\s*re?d?\s*\)?\s*($|\d)/$1$2/gi While messing around with getting a conditional to work on the right side of the substitution I noticed that it happily substituted no characters when $1 or $2 are undefined. It does the same, without complaint, when they match the anchors ^ and $. So, rather than shoot for some weird hybrid of optional and mandatory I decided to make them both mandatory but with palatable alternatives (the anchors).
Where this differes from my requirements is that it will not match a Red-like word if there is a digit on one side and more text (rather than the beginning or end of the string) on the other. So, for example, with my new regex: (Red) 123 Reddish-Orange 456 Orange 789 => 123 Reddish-Orange 456 Orange 789 Whereas if someone had come up with a way to get the behavior I asked for above it would have done this: (Red) 123 Reddish-Orange 456 Orange 789 => 123 dish-Orange 456 Orange 789 The results of my number-or-edge-of-string before and after suit me better. Requirements translate into code or code translates into requirements..... I never can seem to remember how that's supposed to go. :)
Back to
Seekers of Perl Wisdom
|
|