I have put together some code to parse a colon delimited data file. The regular expression I have built traps some of the colon delimited values. In any case, I would like some pointers with the regexp pattern. An all purpose matching of anything in-between the colon delimited string would be ideal in addition to matching strings that contain fewer data values as can be seen below:
foreach (<DATA>) {
if($_ =~ m/(\d{1,2})?\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\:?(\d{1,2}
+.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bsquare\b)?/) {
#if($_ =~ $_ =~ m/(\d{1,2})\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\|\|?
+(\d{1,2}.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bYELLOW\b)
+?/) {
if (defined $1) {
$count=$1;
}
else {
$count="nothing";
}
if (defined $2) {
#code
$grade=$2;
}
else {
$grade="nothing";
}
if (defined $3) {
#code
$pos=$3;
}
else {
$pos="nothing";
}
if (defined $4) {
#code
$name=$4;
}
else {
$name="nothing";
}
if (defined $5) {
#code
$country=$5;
}
else {
$country="nothing";
}
if (defined $6) {
#code
$date=$6;
}
else {
$date="nothing";
}
if (defined $7) {
#code
$age=$7;
}
else {
$age="nothing";
}
if (defined $8) {
#code
$vacant=$8;
}
else {
$vacant="nothing";
}
if (defined $9) {
#code
$square=$9;
}
else {
$count="nothing";
}
#print "We have a match!\n";
print join " ",$count,$grade,$pos,$name,$date,$country,$age,$vacant,"\
+n";
}
}
__DATA__
1:D2:DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT BRITAIN
+ AND NORTHERN IRELAND::::
1:D1:DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF::::
1:P5:SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KINGDOM
+OF GREAT BRITAIN AND NORTHERN IRELAND::::
9:P5:SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF GREAT B
+RITAIN AND NORTHERN IRELAND::::
D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF:*:::
D. Green::10/26/1955:58:SPAIN KINGDOM OF:*:::
D. Green::5/15/1967:47:FRENCH REPUBLIC::::
D. Green:g:12/6/1954:59:FIJI REPUBLIC OF::::
D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRE
+LAND::::
D. Green::9/16/1960:54:UNITED STATES OF AMERICA::::
N/A::Vacant:UNASSIGNED::YELLOW::
Output from above:
nothing D2 DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT B
+RITAIN AND NORTHERN IRELAND ::: nothing nothing
nothing D1 DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF
+::: nothing nothing
nothing P5 SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KI
+NGDOM OF GREAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing
nothing P5 SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF G
+REAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing
nothing nothing D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF *::: no
+thing nothing
nothing nothing D. Green::10/26/1955:58:SPAIN KINGDOM OF *::: nothing
+nothing
nothing nothing D. Green::5/15/1967:47:FRENCH REPUBLIC ::: nothing not
+hing
nothing nothing D. Green:g:12/6/1954:59:FIJI REPUBLIC OF ::: nothing n
+othing
nothing nothing D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN
+AND NORTHERN IRELAND ::: nothing nothing
nothing nothing D. Green::9/16/1960:54:UNITED STATES OF AMERICA ::: no
+thing nothing
nothing nothing N/A::Vacant:UNASSIGNED::YELLOW : nothing nothing
nothing nothing nothing nothing nothing
Many thanks
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.