Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: Problem with a regex?

by Jim (Curate)
on Jul 15, 2011 at 17:52 UTC ( [id://914674]=note: print w/replies, xml ) Need Help??


in reply to Re: Problem with a regex?
in thread Problem with a regex?

In my experience parsing and transforming printer files ("report scraping"), I've used regular expression pattern matching more often than substr or unpack. Why? Because there's no guarantee the report data will be consistently aligned in column positions. As it happens, items tend to drift left and right a bit, especially over the lifetime of a report that changes occassionally. Maybe the date was in column positions 33 through 42 for a few years, then somebody modified the report; thereafter, the date was in column positions 23 through 32. Obviously, there could be other variation over time besides the shifting left or right of report items, but this is precisely why, in general, I've found it better to start with regular expression pattern matching right out of the chute. It's more adaptable in the face of variation.

I've also found it better (more understandable, more maintainable, etc.) to parse the report into pages or records first, and then to scrape the data from each page or record in a separate step, typically using a function that returns a list or hash of the parsed data.

Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://914674]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-19 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found