http://qs321.pair.com?node_id=1186770

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks who are always smarter than I am. I have a regex to return lines with names and it's working but I'm also getting lines back with my regex that look like names but aren't. Things like "750 New Jobs". I'm looking for Regex that will find names but omit things with numbers. Here's what I have but it's not working:

if ($line =~ m/\b-|b(^[A-Z][a-z]+\s[A-Z][a-z]+$+?!0-9)\|b/) {print "$line";}

Thanks in advance for your help. I'm a noob and regex is not my strong suit at all as you can tell.

Replies are listed 'Best First'.
Re: regex to return line with name but not if it has a number
by kcott (Archbishop) on Apr 03, 2017 at 10:14 UTC

    As others have already stated, you've posted insufficient information for us to provide a solution. It sounds like you probably want code like this:

    for my $line (@lines) { next if $line =~ /\d/; print $line if $line =~ $regex_that_is_working; }

    The regex you posted (with the vague "it's not working") has a lot of potential problems:

    • You have a boundary assertion (\b) in one place and two plain 'b' letters elsewhere: are they also meant to be assertions?
    • You have one alternation (with |) and later an escaped pipe ('\|'): was that supposed to be another alternation?
    • You're using a capture but don't access what is captured. What's the intent here?
    • You have '0-9' which I suspect should be '[0-9]'; I'm even wondering if '!0-9' should be '[^0-9]'.
    • You have what appears to be anchor assertions ('^' and '$') in strange positions; there is no 'm' modifier to indicate a multiline string.
    • There are other bits I just gave up trying to guess at.

    Take a look "perlintro: Regular expressions". This has introductory information on that page; it also has many links to more detailed documentation on other pages. In addition, search for /regular expression/ on the main "perldoc - perl" page.

    The "How do I post a question effectively?" page, on this site, will give you an idea of what you need to post to get better answers from us.

    — Ken

Re: regex to return line with name but not if it has a number
by hippo (Bishop) on Apr 03, 2017 at 07:58 UTC
    I'm looking for Regex that will find names but omit things with numbers.

    "names" and "numbers" may mean something to you and something different to someone else. I highly recommend that you have a read of How to ask better questions using Test::More and sample data and provide such a test script here in the absence of a detailed spec.

Re: regex to return line with name but not if it has a number
by shmem (Chancellor) on Apr 03, 2017 at 13:50 UTC
    Here's what I have but it's not working:
    if ($line =~ m/\b-|b(^[A-Z][a-z]+\s[A-Z][a-z]+$+?!0-9)\|b/) {print "$l +ine";}

    It helps if you go through the elements of the regexp telling in plain words what it is trying to match, or writing it as an extended regexp and comment it:

    if ( $line =~ m/ # match \b- # a word boundary followed by a - | # or ( # (beginning of capture) b # the letter "b" ^[A-Z] # a capital letter anchored at start of string # ...wait, what? at the beginning?

    I just stop here. Matching a literal b and then something anchored at start of string never succeeds. All wrong. Start over. What are you trying to match?

    This regexp looks like some working one which has been tinkered with, without having read perlre trying to understand as much as possible of it. Do that now.

    Post the original regex, samples and expected output.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: regex to return line with name but not if it has a number
by Laurent_R (Canon) on Apr 03, 2017 at 07:38 UTC
    Please show some examples of lines that you want to match and lines that you don't want to match.

    It seems to me that your regex probably has quite a few problems, but I can't tell for sure if I don't know what you want to match; and, more than that, it is not possible to suggest a correction.

Re: regex to return line with name but not if it has a number
by Anonymous Monk on Apr 03, 2017 at 06:23 UTC
    use two separate match operations, one to match the stuff you want, one to reject the stuff you dont want
Re: regex to return line with name but not if it has a number
by Anonymous Monk on Apr 03, 2017 at 13:46 UTC

    Here's some more info on what I'm trying to match and not: Here is an example of what I am trying to remove with the regex: "73 Dental Assistant" or "124 Dental Technician". Here is an example of what I am trying to match: "John Smith" or "Dr. John Smith" or "John O'Smith". The lines start with "-" and the string to match ends with "|". I currently get returned a mix of: John Smith 73 Dental Assistants Karen Flower 18 Dental Technician Thanks in advance for the help. -A

      Thank you for clarifying your situation. To better understand it, can you please post some concrete (yet anonymized!) lines together with the indication of whether they should match or not?

      If you give us the concrete input you have and the relevant code you have, we can much easier pinpoint where your approach goes wrong and provide relevant documentation and help.

      From your description, I would think that you could simply reject all lines that match /^-.*\d.*\|/, but your original approach seems to be much longer.

        Here's an example of the data that I'm currently getting:

        - Anne Green| Manufacturing Engineer at Emerson … · - 3,600+ Dental Lab Jobs in Burlingame, CA | ... Director (47) Exe +cutive (10) See more. ... Lab jobs in Burlingame, CA. - 85 Clinical Laboratory Scientist Jobs in Mountain View, CA ... › + Dental Scientist Jobs 85 Clinical Laboratory Scientist jobs in Mount +ain View, CA on LinkedIn. Leverage your professional network, ... Bur +lingame, California (4) Dublin, California (2) - 152 Laboratory Technician Jobs in South San Francisco, CA ... Jo +bs › Dental Technician Jobs LinkedIn has 152 Dental Technician jobs i +n South San Francisco, ... 1-2 years of previous academic or commerci +al lab experience a plus. ... Burlingame, CA, US. … - 130 Quality Control Director Jobs in Mountain View, CA ... Jobs +› Quality Control Director Jobs New Quality Control Director jobs add +ed daily. LinkedIn. Sign in; ... Burlingame, California (1) Company. +Relypsa (7) Aramark (3) Think Surgical (3) Cushman ...
        I'd like have line one returned as it's a name but not the others. Thanks.