Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Suggestion for regular expression speed improvement.

by dsheroh (Monsignor)
on Jun 15, 2009 at 16:48 UTC ( [id://771716]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Suggestion for regular expression speed improvement.
in thread Suggestion for regular expression speed improvement.

I do not understand this response. Using a regex such as you described is less flexible than using split, not more flexible: The regex will only match on lines containing at least 25 tab-separated fields. If there are fewer fields, it will fail to match and return no data. If there are more, then some fields will not be separated from each other and returned as a single field1. split will work with any number of tab-separated fields right out of the box.

Going beyond split to a proper CSV-handling module, you will be able to not only read arbitrary numbers of tab-separated columns, but it will also give you the ability to recognize quoting of the fields, so that they can contain embedded tabs without causing false field separations. Accomplishing this with regexes is messy, at best.

1 ...unless you switch from (.+) to ([^\t]+), in which case it will only match lines containing exactly 25 fields.

Replies are listed 'Best First'.
Re^4: Suggestion for regular expression speed improvement.
by Anonymous Monk on Jun 15, 2009 at 17:15 UTC
    I just want to give you an example. The logs that I need to parse will not have a definite single separator like , or tab. But the my question had simple tab separated format. I would be parsing lines of this format : A=XX;Testing of YY;ZZ;Criticality:WW In the above line, I may need to extract XX, YY, ZZ and WW. So, by allowing regular expression, I would be able to achieve that with grouping.
Re^4: Suggestion for regular expression speed improvement.
by bala.linux (Novice) on Jun 15, 2009 at 17:20 UTC
    I just want to give you an example. The logs that I need to parse will not have a definite single separator like , or tab. But the my question had simple tab separated format. I would be parsing lines of this format :
    A=XX;Testing of YY;ZZ;Criticality:WW
    In the above line, I may need to extract XX, YY, ZZ and WW. So, by allowing regular expression, I would be able to achieve that with grouping.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://771716]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-23 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found