comment on

I have put together some code to parse a colon delimited data file. The regular expression I have built traps some of the colon delimited values. In any case, I would like some pointers with the regexp pattern. An all purpose matching of anything in-between the colon delimited string would be ideal in addition to matching strings that contain fewer data values as can be seen below:

foreach (<DATA>) {
    

if($_ =~ m/(\d{1,2})?\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\:?(\d{1,2}
+.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bsquare\b)?/) {


#if($_ =~ $_ =~ m/(\d{1,2})\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\|\|?
+(\d{1,2}.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bYELLOW\b)
+?/) {


if (defined $1) {
    
    $count=$1;
    
}

else {
    
    $count="nothing";
}

if (defined $2) {
    #code
    $grade=$2;
}

else {
    
    $grade="nothing";
}


if (defined $3) {
    #code
    $pos=$3;
}

else {
    
    $pos="nothing";
}




if (defined $4) {
    #code
    $name=$4;
}

else {
    
    $name="nothing";
}


if (defined $5) {
    #code
    $country=$5;
}

else {
    
    $country="nothing";
}


if (defined $6) {
    #code
    $date=$6;
}

else {
    
    $date="nothing";
}


if (defined $7) {
    #code
    $age=$7;
}

else {
    
    $age="nothing";
}


if (defined $8) {
    #code
    $vacant=$8;
}

else {
    
    $vacant="nothing";
}

if (defined $9) {
    #code
    $square=$9;
}

else {
    
    $count="nothing";
}








#print "We have a match!\n";
print join " ",$count,$grade,$pos,$name,$date,$country,$age,$vacant,"\
+n";

}

}

__DATA__
1:D2:DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT BRITAIN
+ AND NORTHERN IRELAND::::
1:D1:DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF::::
1:P5:SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KINGDOM 
+OF GREAT BRITAIN AND NORTHERN IRELAND::::
9:P5:SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF GREAT B
+RITAIN AND NORTHERN IRELAND::::
D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF:*:::
D. Green::10/26/1955:58:SPAIN KINGDOM OF:*:::
D. Green::5/15/1967:47:FRENCH REPUBLIC::::
D. Green:g:12/6/1954:59:FIJI REPUBLIC OF::::
D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRE
+LAND::::
D. Green::9/16/1960:54:UNITED STATES OF AMERICA::::
N/A::Vacant:UNASSIGNED::YELLOW::
[download]

Output from above:

nothing D2 DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT B
+RITAIN AND NORTHERN IRELAND ::: nothing nothing   
nothing D1 DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF 
+::: nothing nothing   
nothing P5 SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KI
+NGDOM OF GREAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing   
nothing P5 SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF G
+REAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing   
nothing nothing D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF *::: no
+thing nothing   
nothing nothing D. Green::10/26/1955:58:SPAIN KINGDOM OF *::: nothing 
+nothing   
nothing nothing D. Green::5/15/1967:47:FRENCH REPUBLIC ::: nothing not
+hing   
nothing nothing D. Green:g:12/6/1954:59:FIJI REPUBLIC OF ::: nothing n
+othing   
nothing nothing D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN 
+AND NORTHERN IRELAND ::: nothing nothing   
nothing nothing D. Green::9/16/1960:54:UNITED STATES OF AMERICA ::: no
+thing nothing   
nothing nothing N/A::Vacant:UNASSIGNED::YELLOW : nothing nothing   
nothing nothing nothing  nothing nothing
[download]

Many thanks

In reply to Regular Expression to Extract Anything from Colon Delimited String by GuiPerl

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks