Please comment on my regular expression.

Gyatso has asked for the wisdom of the Perl Monks concerning the following question:

Venerable Monks,

I have written a small regular expression which I would like to be verified.Though the regular expression is working but I just wanted to check if there is a better way of picking up the values from the text string.Here it is...

Test String :

TAG001 LOCATION UPDATE 2009-05-08 14:30:50 TAG002 32300045985 11580;;543;MARIANO ESCOBEDO;OF 406 C 323 2 ACTIVE 2008-07-25 18:07:11 ERO007 2009-05-08 14:30:50 JTE015 1T61FJWLB97R2 MX TAG005 WENDA MARIANO ESCOBEDO 543 OF 406 COL RINCON DEL BOSQUE 11580 MEXICO DF DF MEXICO WNE051125MS8 10 Online PARTIAL 2 ZEXC 37 ES 0000000000094 TAG029 32300045984 1 TAG029

My Objective was is to pick two values from the above test string:

1.The value appearing after TAG002 i.e 32300045985 in $1

2.All of the strings appearing between TAG005 and the 'very first TAG029' in $2

The regular expression I wrote is:

.+TAG002 (\d+).+TAG005 (.+?)TAG029

Though the regular expression is working fine just wanted to check if there is a better way of doing it.Please comment if you think there was a better way to write the regular expression.

NOTE:The test string has special characters.

Thanks & Regards,

Gyatso

Update:

Thanks Utilitarian and Suaveant for your comments and suggestions.

Regards,

Gyatso

Comment on Please comment on my regular expression.

Replies are listed 'Best First'.
Re: Please comment on my regular expression. by Utilitarian (Vicar) on Jun 24, 2009 at 14:04 UTC
Hi Gyatso You could improve readability with `use charnames qw( :full);`. This would allow you to specify the special character in a legible way, other than that your regex seems sane.	[reply] [d/l]
Re: Please comment on my regular expression. by suaveant (Parson) on Jun 24, 2009 at 14:35 UTC
The only thing I might suggest is specifying the delimiter that is show before the TAG as well, just to help make sure you don't match anything in the middle of the text (though it may be the case that with this data that is impossible...) You probably might as well make that other .+ a .+? just to save the regexp engine unnecessary work. - Ant - Some of my best work - (1 2 3)	[reply]


Perl Monk, Perl Meditation
	PerlMonks