http://qs321.pair.com?node_id=11137039


in reply to Re: Parsing/regex help required
in thread Parsing/regex help required

each paragraph text is captured using mojo->all_text so that's all good. Running that code:
my $entry = "123. The Quick brown fox – jumped over"; my( $num, $text1, $text2 )= $entry =~ m{^ (\d+) \. \s+ (.*?) \s+-\s+ ( +.*?) $}x; say "$num|$text1|$text2";
gives
Use of uninitialized value $num in concatenation (.) or string at ./te +st.pl line 10. Use of uninitialized value $text1 in concatenation (.) or string at ./ +test.pl line 10. Use of uninitialized value $text2 in concatenation (.) or string at ./ +test.pl line 10. ||

Replies are listed 'Best First'.
Re^3: Parsing/regex help required
by Fletch (Bishop) on Sep 27, 2021 at 19:50 UTC

    Problem is your dash is a fancy unicode-y en dash, not just a simple "-" character so my naïve attempt's not matching. I had to do some monkeying with Encode cutting and pasting your sample (which I don't think you'd need for Mojo when you're actually fetching your real results) but then I was able to get this to match.

    ## I set $_ to your sample string cut-n-pasted, then ran it through +decode DB<33> $_ = Encode::decode( q{UTF-8}, $_ ) ## Afterwards this worked (U+2013 is EN DASH); if you're not interes +ted in what ## the separator was you can of course change that bit to non-captur +ing DB<38> x m{ ^ (\d+) \. \s+ (.*?) \s+(-|\N{EN DASH}|\N{EM DASH})\s+ ( +.*?) $}x 0 123 1 'The Quick brown fox' 2 '\x{2013}' 3 'jumped over'

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re^3: Parsing/regex help required
by AnomalousMonk (Archbishop) on Sep 27, 2021 at 20:01 UTC

    This is what I get:

    Win8 Strawberry 5.30.3.1 (64) Mon 09/27/2021 15:56:45 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -Mfeature=say my $entry = "123. The Quick brown fox - jumped over"; my( $num, $text1, $text2 )= $entry =~ m{^ (\d+) \. \s+ (.*?) \s+-\s+ ( +.*?) $}x; say "$num|$text1|$text2"; ^Z 123|The Quick brown fox|jumped over
    Are you sure the code you posted is really the code you're running?


    Give a man a fish:  <%-{-{-{-<