Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regexp - match if not between [ ]

by Anonymous Monk
on May 30, 2011 at 13:51 UTC ( [id://907316]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there all,

I need to split a string at dots (.) that are not between square braces, or not immediately after two patterns ( Cf. or \w\d+.\d+ pattern, like A432.23 ).

If the sample is The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was even better , then the split should happen at '23. The' substring. Actually, this sample covers the typo cases I have encountered.

Any idea is appreciated.

salmonix

Replies are listed 'Best First'.
Re: Regexp - match if not between [ ]
by moritz (Cardinal) on May 30, 2011 at 14:02 UTC
    I need to split a string at dots (.) that are not between square braces

    Sounds like a task for Text::CSV with brackets as delimiters and dot as separator

    or not immediately after two patterns ( Cf. or \w\d+.\d+ pattern, like A432.23 ).

    Post-process the output from Text::CSV, and join two adjacent columns if the first of them ends in one of the patterns.

Re: Regexp - match if not between [ ]
by JavaFan (Canon) on May 30, 2011 at 14:08 UTC
    Something like this (untested):
    my @chunks = /[^C\[.]*(?:(?:Cf\.|C(?!f)|\[[^]]*\])[^C\[.]*)*/g;

      Exactly. Thanx.

Re: Regexp - match if not between [ ]
by AnomalousMonk (Archbishop) on May 30, 2011 at 15:34 UTC

    Using Text::CSV would probably be better. However, this regex approach, while more verbose, is perhaps more maintainable. Needs 5.10+ Special Backtracking Control Verbs. (I've made a guess at the proper regex for a A423.23 thingy.)

    >perl -wMstrict -le "my $s = 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The ' . 'swallow was even better'; print qq{''$s''}; ;; my $parens = qr{ \[ [^]]* \] }xms; my $cf = qr{ (?i) cf \. }xms; my $ref = qr{ [[:alpha:]]+ \d+ (?: \. \d+)+ }xms; my $splitter = qr{ (?: $parens | $cf | $ref) (*SKIP)(*FAIL) | \. }xms; ;; my @ra = split $splitter, $s; print qq{'$_'} for @ra; " ''The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was eve +n better'' 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23' ' The swallow was even better'

      Thanx for all, refreshing.

      salmonix

Re: Regexp - match if not between [ ]
by BrowserUk (Patriarch) on May 30, 2011 at 14:18 UTC

    If your \w\d+ pattern can be substituted by \w\d{3}, then this seems to work:

    $s = 'The fox did it[ at 12.23 ] well, Cf. 23 A423.23. The swallow was + even better,';; print for split '(?<!Cf)(?<!\w\d{3})\.(?![^\]]+])', $s;; The fox did it[ at 12.23 ] well, Cf. 23 A423.23 The swallow was even better,

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Look-behind is problematic, for the number of digits etc. are not fixed. But thanks. I was really thinking in the wrong direction.

        Look-behind is problematic, for the number of digits etc. are not fixed.

        Look behinds can still accommodate the task, but it does get pretty unwieldy if the width variation is more than a few characters:

        print for split m[ (?<! Cf ) (?: (?<! \w\d\d\d ) | (?<! \w\d\d ) | (?<! \w\d ) ) \. (?! [ ^\] ]+ \] ) ]x, $s;; The fox did it[ at 12.23 ] well, Cf. 23 A423.23 The swallow was even better,

        But it sounds like you've settled on a solution.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://907316]
Approved by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found