Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: How to enforce match priority irrespective of string position

by Polyglot (Chaplain)
on Mar 07, 2021 at 12:24 UTC ( [id://11129259]=note: print w/replies, xml ) Need Help??


in reply to Re: How to enforce match priority irrespective of string position
in thread How to enforce match priority irrespective of string position

I am, of course, dealing with some exceptions in a body of text. The text has some irregularities, but could be parsed correctly if only I am able to impose a strict ordering of match priority. It isn't an issue of quotes, nor is nesting involved; it's actually an issue of some potential "false positives" that must be initially skipped in favor of a more favorable match unless that more favorable match cannot be found--in which case the "false positive" might be the correct match. Does this make sense?

Blessings,

~Polyglot~

  • Comment on Re^2: How to enforce match priority irrespective of string position

Replies are listed 'Best First'.
Re^3: How to enforce match priority irrespective of string position
by haukex (Archbishop) on Mar 07, 2021 at 12:59 UTC

      I sure was hoping someone would be able to suggest a regexp secret that I had not yet learned. I was hoping there would be some way of doing this. I may have to just pre-parse looking for the false positives, and exchange them temporarily for a marker of some sort before parsing a second time. I'm not even sure if that would work. I'll have to ponder that some more. I need to be able to reorder the sentences following a specific ruleset and in a specific order, by order of appearance in the sentence.

      Sigh. Too bad regex can't do everything!

      Blessings,

      ~Polyglot~

        I was hoping there would be some way of doing this. ... Sigh. Too bad regex can't do everything!

        Be aware of the "if all you have is a hammer, everything looks like a nail" effect. Doing everything in a single regex is nice, but shouldn't be a requirement - sometimes, things can be expressed much more cleanly with a few regexes and some code. And be aware of premature optimization as well - sure, oftentimes a single regex is faster than multiple, but usually it's better to get things working first instead of trying to bend over backwards and trying to wrap your head around a complex regex. Especially in the case you describe, IMHO the brainpower is much better spent on writing up test cases first!

        use warnings; use strict; use Test::More; sub my_sentence_splitter { my $input = shift; my @output; # ... magic ... return \@output; } is_deeply my_sentence_splitter(<<END), I'm looking for the end of a sentence, where possible. However, in so +me cases, I'll need to go with a non-conventional "end" to it, such a +s: "Here's a quote by a famous person which is supposed to exceed for +ty words and is therefore required to be set apart as a separate, ind +ented paragraph per APA style." (Famous, 1999) Note that the regex ne +eds to look for the full end of the sentence, if it exists: it cannot + simply stop at the colon unless there is no further part to the sent +ence provided in that paragraph. END [ q#I'm looking for the end of a sentence, where possible.#, q#However, in some cases, I'll need to go with a non-conventional +"end" to it, such as:#, q#"Here's a quote by a famous person which is supposed to exceed f +orty words and is therefore required to be set apart as a separate, i +ndented paragraph per APA style."#, q#(Famous, 1999)#, q#Note that the regex needs to look for the full end of the senten +ce, if it exists: it cannot simply stop at the colon unless there is +no further part to the sentence provided in that paragraph.#, ]; # TODO: Many more test cases here! done_testing;
        Could you perhaps match repeatedly within the same string, in a loop, and then manually select what you consider to be the most appropriate match?
Re^3: How to enforce match priority irrespective of string position
by Takeshi Kovacs (Beadle) on Mar 07, 2021 at 12:36 UTC
    I'd say use Hippo's template of an SSCCE Re: Matching a string in a parenthesized block (regex help) to write some tests for
    • what you want and
    • what you don't want.
    This would certainly be beneficial for you too.

    Other than that, |-or conditions with swallowing can prioritize areas, like "quoted" ones. demo

    DB<132> $_ = 'phrase. "phrase1.phrase2" phrase. phrase' 0 'phrase. "phrase1.phrase2" phrase. phrase' DB<133> split /(".*?"|\.)/ 0 'phrase' 1 '.' 2 ' ' 3 '"phrase1.phrase2"' 4 ' phrase' 5 '.' 6 ' phrase' DB<134>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11129259]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-24 01:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found