I don't know what else I can think my data is. The example data I supplied is what it is.
The crux of my code is
open(INFILE,"$ew_sites_file") || die "Cant open $ew_sites_file for
+ reading $!\n";
while($line = <INFILE>) {
chomp $line;
($unused, $site, $date, $city, $state, $facility, $unused, $dd
+url) = split(/\t/,$line); #tab-delimited file
$ddrul =~ s/"//g;
$ddurl =~ s/\r//; #lose that bad newline break
#build the registration site lookup flatfile : state and javas
+cript URL
open(OUTFILE,">>$ew_regist_parse_file") || die "cant open $ew_
+regist_parse_file, $!\n";
print OUTFILE "$state|$start_tag$site$inbetween$city$inbetween
+$state$inbetween$date$inbetween$ddurl$semicolon$city, $state : $date$
+endtag\n";
close OUTFILE;
#build the locations flatfile
open(OUT2FILE,">>$ew_locate_parse_file") || die "cant open $ew
+_locate_parse_file, $!\n";
print OUT2FILE "$state|$city|$facility|$date|$ddurl\n";
close OUT2FILE;
}
close INFILE;
The warnings say that the $ddrul =~ s/"//g; is using an initalized value, which I don't understand because it is. The output is two files, an example of one (the simpler one) is:
TX|Austin|University of Texas at Austin|11-Oct| http://www.utexas.edu/
+cee/tcc/forms/tcclargemap.pdf
TX|Houston|University of Houston - Clear Lake|19-Oct|"http://prtl.uhcl
+.edu/portal/page?_pageid=328,217631,328_217645&_dad=portal&_schema=PO
+RTALP"
TX|El Paso|University of Texas - El Paso|25-Oct| http://www.utep.edu/s
+earch/campusmaplarge.html
Still with the extra quotes. |