Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: using substitution and pattern matching

by Anonymous Monk
on Dec 18, 2004 at 16:20 UTC ( [id://415867]=note: print w/replies, xml ) Need Help??


in reply to Re: using substitution and pattern matching
in thread using substitution and pattern matching

great!
I knew there was a simple answer

thanks
  • Comment on Re^2: using substitution and pattern matching

Replies are listed 'Best First'.
Re^3: using substitution and pattern matching
by graff (Chancellor) on Dec 18, 2004 at 17:28 UTC
    It's a simple start, at least.

    Usage of non-alphabetic marks in text (in English, at least) will always pose some boundary cases that are really hard or basically impossible to treat with a straight-forward, procedural algorithm (and on top of that, people who create text tend to make mistakes or ignore "rules" of style).

    For the current task, there's the problem of the possessive apostrophe without a following "s" (because the word ends in "s") -- and sometimes, punctuation will follow a close-quote (even though style manuals say it shouldn't). Here's a worst case for you:

    'You've got to talk to Miles' brother', she said.

    Easy for humans, hard for programs. There is a regex that will treat this one correctly:

    s/ '(.*)'(\W)/ "$1"$2/; # note the greedy use of ".*"
    but it will screw up on some other case that would need a non-greedy match, like:

    When he said 'kiss the sky,' I heard 'kiss this guy.'

    You just have to make a guess what sort of mistake will happen less often (and hope your data isn't really this bad).

    One other hint: for stuff like this, where initial and final positions in the string might make things more complicated, it's okay to "cheat" a little: add a space or some other "safe" character at the beginning and end of the string before working on the quotes, so that the edge cases can be treated just like the non-edge cases. You can take the edge padding off when you're done.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://415867]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found