Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Parsing/regex help required

by Anonymous Monk
on Sep 27, 2021 at 13:49 UTC ( [id://11137039]=note: print w/replies, xml ) Need Help??


in reply to Re: Parsing/regex help required
in thread Parsing/regex help required

each paragraph text is captured using mojo->all_text so that's all good. Running that code:
my $entry = "123. The Quick brown fox – jumped over"; my( $num, $text1, $text2 )= $entry =~ m{^ (\d+) \. \s+ (.*?) \s+-\s+ ( +.*?) $}x; say "$num|$text1|$text2";
gives
Use of uninitialized value $num in concatenation (.) or string at ./te +st.pl line 10. Use of uninitialized value $text1 in concatenation (.) or string at ./ +test.pl line 10. Use of uninitialized value $text2 in concatenation (.) or string at ./ +test.pl line 10. ||

Replies are listed 'Best First'.
Re^3: Parsing/regex help required
by Fletch (Bishop) on Sep 27, 2021 at 19:50 UTC

    Problem is your dash is a fancy unicode-y en dash, not just a simple "-" character so my naïve attempt's not matching. I had to do some monkeying with Encode cutting and pasting your sample (which I don't think you'd need for Mojo when you're actually fetching your real results) but then I was able to get this to match.

    ## I set $_ to your sample string cut-n-pasted, then ran it through +decode DB<33> $_ = Encode::decode( q{UTF-8}, $_ ) ## Afterwards this worked (U+2013 is EN DASH); if you're not interes +ted in what ## the separator was you can of course change that bit to non-captur +ing DB<38> x m{ ^ (\d+) \. \s+ (.*?) \s+(-|\N{EN DASH}|\N{EM DASH})\s+ ( +.*?) $}x 0 123 1 'The Quick brown fox' 2 '\x{2013}' 3 'jumped over'

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re^3: Parsing/regex help required
by AnomalousMonk (Archbishop) on Sep 27, 2021 at 20:01 UTC

    This is what I get:

    Win8 Strawberry 5.30.3.1 (64) Mon 09/27/2021 15:56:45 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -Mfeature=say my $entry = "123. The Quick brown fox - jumped over"; my( $num, $text1, $text2 )= $entry =~ m{^ (\d+) \. \s+ (.*?) \s+-\s+ ( +.*?) $}x; say "$num|$text1|$text2"; ^Z 123|The Quick brown fox|jumped over
    Are you sure the code you posted is really the code you're running?


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137039]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-20 02:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found