Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
split, unpack, and regexes are all ways to parse a given line of data. Each is useful in different circumstances. For example:
  • split is more useful with delimited lines, such as tab-delimited or comma-delimited. (However, using a module like Text::CSV is better for delimited text. This is because of lines like "abcd,'Smith, John', blah" - the comma in the quotes is part of the item, not a delimiter.) Now, one could use a regex here, but the regex is harder to understand, and even harder to get right.
    my @items = split $delim, $line; #### vs. (and I know this will make mistakes my @items = $line =~ /^?([^$delim]*)(?:${delim}$)?/g;
  • unpack (if you understand how to use it!) is really good with data that is formatted, like so many columns is the first thing, so many the second, etc. This is often data from a mainframe.

    Again, you can use a regex here, but you have to roll it for it to be maintainable. (I'd put an unpack example here, if I was comfortable knowing how to work it.)

    my @columns = ( 20, 10, 25, 5, 2, 2, 20); my $regex = map { "(.{$_})" } @columns; $regex = qr/^${regex}$/; my @items = $line =~ /$regex/;
For every example I give on different parsing needs, there is a module on CPAN that does it better, faster, and safer. I personally would never hand-parse data in production. Heck, you can use CGI to parse HTML pages without even having an http server!

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.


In reply to Re3: Bottom-Up Data Mining with Perl by dragonchild
in thread Bottom-Up Data Mining with Perl by rje

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-03-28 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found