Re^2: Problem with a regex?

In my experience parsing and transforming printer files ("report scraping"), I've used regular expression pattern matching more often than substr or unpack. Why? Because there's no guarantee the report data will be consistently aligned in column positions. As it happens, items tend to drift left and right a bit, especially over the lifetime of a report that changes occassionally. Maybe the date was in column positions 33 through 42 for a few years, then somebody modified the report; thereafter, the date was in column positions 23 through 32. Obviously, there could be other variation over time besides the shifting left or right of report items, but this is precisely why, in general, I've found it better to start with regular expression pattern matching right out of the chute. It's more adaptable in the face of variation.

I've also found it better (more understandable, more maintainable, etc.) to parse the report into pages or records first, and then to scrape the data from each page or record in a separate step, typically using a function that returns a list or hash of the parsed data.

Jim

Comment on Re^2: Problem with a regex? Select or Download Code


Your skill will accomplish what the force of many cannot
	PerlMonks