Re: Suggestion for regular expression speed improvement.

Replies are listed 'Best First'.
Re^2: Suggestion for regular expression speed improvement. by bala.linux (Novice) on Jun 15, 2009 at 12:32 UTC
Thanks. Your suggestion can be well used for the properly separated log files like CSV. But, I want my code to work with regular expression so that I can parse any format of logs. Hope you understand my problem. So, unfortunately I can not use split or CSV modules :(	[reply]
Re^3: Suggestion for regular expression speed improvement. by dsheroh (Monsignor) on Jun 15, 2009 at 16:48 UTC
I do not understand this response. Using a regex such as you described is less flexible than using `split`, not more flexible: The regex will only match on lines containing at least 25 tab-separated fields. If there are fewer fields, it will fail to match and return no data. If there are more, then some fields will not be separated from each other and returned as a single field¹. `split` will work with any number of tab-separated fields right out of the box. Going beyond `split` to a proper CSV-handling module, you will be able to not only read arbitrary numbers of tab-separated columns, but it will also give you the ability to recognize quoting of the fields, so that they can contain embedded tabs without causing false field separations. Accomplishing this with regexes is messy, at best. ¹ ...unless you switch from `(.+)` to `([^\t]+)`, in which case it will only match lines containing exactly 25 fields.	[reply] [d/l] [select]
Re^4: Suggestion for regular expression speed improvement. by Anonymous Monk on Jun 15, 2009 at 17:15 UTC
I just want to give you an example. The logs that I need to parse will not have a definite single separator like , or tab. But the my question had simple tab separated format. I would be parsing lines of this format : A=XX;Testing of YY;ZZ;Criticality:WW In the above line, I may need to extract XX, YY, ZZ and WW. So, by allowing regular expression, I would be able to achieve that with grouping.	[reply]
Re^4: Suggestion for regular expression speed improvement. by bala.linux (Novice) on Jun 15, 2009 at 17:20 UTC
I just want to give you an example. The logs that I need to parse will not have a definite single separator like , or tab. But the my question had simple tab separated format. I would be parsing lines of this format : A=XX;Testing of YY;ZZ;Criticality:WW In the above line, I may need to extract XX, YY, ZZ and WW. So, by allowing regular expression, I would be able to achieve that with grouping.	[reply]
Re^3: Suggestion for regular expression speed improvement. by pKai (Priest) on Jun 15, 2009 at 12:55 UTC
…so that I can parse any format of logs. Can you elaborate how you hope to handle "any format" with regular expressions?	[reply]
Re^4: Suggestion for regular expression speed improvement. by bala.linux (Novice) on Jun 15, 2009 at 15:30 UTC
By "any format", I meant single line having different formats which can be matched by the users and using groups he can indicate us whats required for him. Further, we will process only the grouped strings. And, not for the logs having multi-lines to convey a mail delivery like qmail logs :)	[reply]


Think about Loose Coupling
	PerlMonks