Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex help

by jayto (Acolyte)
on Jun 21, 2012 at 13:26 UTC ( [id://977636]=perlquestion: print w/replies, xml ) Need Help??

jayto has asked for the wisdom of the Perl Monks concerning the following question:

I need to make a few regex expressions to parse the colums in this table

Port 1 Database Assignments Region Data Type # Records GLOBAL -- LOCAL -- BUF -- D1 Unused D2 Unused D3 Unused D4 Unused D5 Unused D6 Unused D7 Unused D8 Unused A1 Unused A2 Unused A3 Unused USER Unused

I need to ignore all the stuff at the top and identify each column and grab the data from the row. So if someone could show how to parse one column, I would really appreciate it!! Thanks Monks!

Replies are listed 'Best First'.
Re: Regex help
by Utilitarian (Vicar) on Jun 21, 2012 at 14:36 UTC
    So you need to
    • open the file
    • skip the first two line
    • while there is more data in the file:
      • split each line in two parts, a unique region identifier a data type and possibly a record count.
      • store the last two values in a hash of hashes keyed on region and column name
    • Iterate over the keys and do whatever it is you want to do with them

    Have a go at coding that up, if it doesn't work for you, bring your issue to the forum and we'll do our best, but we are a teaching monastery, not a code writing service

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: Regex help
by kennethk (Abbot) on Jun 21, 2012 at 13:49 UTC
    What have you tried? What didn't work? What resources are you using? This smells a lot like homework. I would say that split is probably a more appropriate tool than regular expressions. For some great resources on learning Perl, see http://learn.perl.org/. And if you post some code as per How do I post a question effectively?, we'll be happy to help you debug.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Its not HW its for my job, im trying to parse data outputed from a sel2030. Ive never used regexs before and I am having a hard time figuring out read down a list and to get the parser to start at correct place in the list. Is there anyway besides using split? I am passing these regex expressions into xml attributes so if there is a way to do it without using perl commands that would be optimal. Any ideas great Monks?
        There are number of ways to do it with straight regular expressions, but the technology you are feeding this into (and thus the necessary input-output mapping) is unfamiliar to me. Parsing one of your lines may be as easy as /^\s*(\S+)\s+(\S+)\s*$/, but maybe this needs the m modifier depending on context. This particular expression will fail on all lines other than your data lines, since all it does is (thanks to YAPE::Regex::Explain):
        The regular expression: (?m-isx:^\s*(\S+)\s+(\S+)\s*$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        and can be shown to work on this example with
        #!/usr/bin/perl -w use strict; use Data::Dumper; $_ = <<'EOT'; Port 1 Database Assignments Region Data Type # Records GLOBAL -- LOCAL -- BUF -- D1 Unused D2 Unused D3 Unused D4 Unused D5 Unused D6 Unused D7 Unused D8 Unused A1 Unused A2 Unused A3 Unused USER Unused EOT my %hash; while (/^\s*(\S+)\s+(\S+)\s*$/mg) { $hash{$1} = $2; } print Dumper \%hash;
        However, it'll break pretty quickly if your input is not representative; e.g. if you Region or Data Type contain white space (this looks fixed width to me) or if # Records is not null.

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      I would say that split is probably a more appropriate tool than regular expressions.

      Not trying to start an argument, but doesn't split use a regular expression ("pattern") as its first parameter?

        From a pedantic perspective, yes, split uses a regular expression to determine how to break up the string. However in my experience, "using regular expressions" in common usage is generally taken to mean using the bare expression for matching, capturing or substitution.

        So you are technically correct - the best kind of correct.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Regex help
by Anonymous Monk on Jun 21, 2012 at 13:48 UTC
Re: Regex help
by rovf (Priest) on Jun 21, 2012 at 14:29 UTC
    How do you define what's a column? Are the columns tab-delimited, or fixed width, or ....?

    -- 
    Ronald Fischer <ynnor@mm.st>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://977636]
Approved by rovf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (9)
As of 2024-03-28 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found