I'm still working on my survey loading program, but now I've moved on to parsing the survey data files. This may be trivial to some, but I've never really done file parsing before.
The SEG-P1 format specifies that survey headers should be composed of lines that would be matched by the regex /^H/. Unfortunately not all survey companies adhere to this, only putting the 'H' at the start of the first header line. Also, it seems some places make 20-line headers, while others make 22-line headers.
I have two problems, but this may be able to solve both. My question is this: How can I parse out the header block correctly each time, regardless of the length or formatting? I include one example of each type of header (not looking at number of lines here) below.
First the format-specified version:
HLINE NUMBER : ABCDE
HPROJECT ID :
HGROUP :
HAREA NAME : *********
HOPERATOR : *********
HCONTRACTOR : ENERTEC
HSURVEY AUDITOR : ACCU-AUDIT
HSURVEY DATE : *********
HUTM ZONE : 11
HSURVEY QUALITY : ASCM,1
HCOMMENTS : *********
H :
H :
H :
HLINE LENGTH (Km): 2.65
HGRID VERSION : ATS 2.6
HDATUM : NAD 27
HAUDIT DATE : *********
H<....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....>
H<.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM
+MENT>
Now the variant version:
HLINE NUMBER : ABCDE
PROJECT ID :
GROUP :
AREA NAME : *********
OPERATOR : *********
CONTRACTOR : ENERTEC
SURVEY AUDITOR : ACCU-AUDIT
SURVEY DATE : *********
UTM ZONE : 11
SURVEY QUALITY : ASCM,1
COMMENTS : *********
:
:
:
LINE LENGTH (Km): 2.65
GRID VERSION : ATS 2.6
DATUM : NAD 27
AUDIT DATE : *********
<....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....>
<.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM
+MENT>
The actual survey data (point coordinates) come starting on the line after the last line above.
Here's the code I have for getting the first (I'll call it "proper") version (for some reason I can't see, chomping wouldn't work, but push works well enough for me):
while (<IN>) {
if (/^H/) { ## Assumes all header lines start with 'H'
push(@hdr, $_);
next; ## skip to next (possibly header) line
}
##
## Capture each line of data in file
##
}
What can I do to make this work for both kinds of headers?
Update: Here's one more sample header:
H CLIENT : **********
+
H PROSPECT : *******
+
H CONTRACTOR : ***** LINE NAME : *******
+
H SURVEY CO. : ************ UNIQUE ID : *******
+
H SURVEY DATE : DEC 1977 ORIG.LINE NAME : *******
+
H SURVEYOR : _N/A ENERGY SOURCE : DYNAMITE
+
H --------------------------------------------------------------------
+----------
H PRODUCED BY : DIVESTCO GEOMATICS FIRST SP : 101
+
H WEBSITE : ********************** LAST SP : 222
+
H EMAIL : ********************** LINE LENGTH : 8.003 K
+M
H DATE : ************ PROJECT NUMBER :
+
H JOB NUMBER : ************ AFE NUMBER : *********
+***
H FILE NAME : ******** CLIENT REFERENCE : *******
+
H MAPSHEET : ************* DATUM : NAD 1983 - Canada
+
H ZONE : Z11N : 117W SOURCE INT.: *** F STN INT.:
+*** F
H GRID REF. : ATS 4.1 HTKO :
+
H UNITS : Decimeters VTKO :
+
H ELLIPSOID : GRS 1980 SURVEY QUALITY CODE : *********
+**
H DATA QUALITY : Transcription 2D
+
H<LINE NAME ><POINT >< LAT >< LONG >< EAST ><NORTH ><ELE><
+>< ><>