I've a file that I need to parse. What I am trying to do, essentially, is this:
#!/usr/bin/perl
$file = "spg.txt";
open(SPG, $file) or die "Couldn't open $file: $!\n";
while (defined($line = <SPG>)) {
$line =~ s/\s+/ /g;
$line =~ s/^\s//g;
my ($title, $start_date, $start_time, $end_date, $end_time, $s
+tatus, $prixit) = split(/\s/, $line);
print "$title $status\n";
}
that'd be pretty easy, but there are lines in the file that are "broken", so to speak. Take a look at the "spg-risk-ln_cdo_leg_synthetic" line below, for instance:
[vxp@vxp ~]$ cat spg.txt
spg-risk-ln-box 06/24/2009 21:14 06/24/2009 22:01 IN 3969
+3696/0
spg-risk-Fixed_Sterling 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-aaeml 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_abs_credit 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_abs_fixed 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_abs_fixed2 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_abs_flow 06/24/2009 21:14 06/24/2009 21:14 IN 3969
+3696/1
spg-risk-ln_aol_abs 06/24/2009 21:14 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_apcms 06/24/2009 21:14 06/24/2009 21:14 IN 3969
+3696/1
spg-risk-ln_bouwfonds 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_caprub 06/24/2009 21:43 06/24/2009 21:45 IN 3969
+3696/2
spg-risk-ln_capusd 06/24/2009 21:14 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_cdo 06/24/2009 21:14 06/24/2009 22:00 IN 3969
+3696/0
spg-risk-ln_cdo_leg_synthetic
06/24/2009 21:14 06/24/2009 21:18 IN 3969
+3696/0
spg-risk-ln_cdo_legacy 06/24/2009 21:14 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_cmbs 06/24/2009 21:15 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_cmbx 06/24/2009 21:15 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_credit_fixed 06/24/2009 21:15 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_credit_frn 06/24/2009 21:15 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_euresi 06/24/2009 21:15 06/24/2009 21:17 IN 3969
+3696/1
spg-risk-ln_fonspa 06/24/2009 21:15 06/24/2009 21:21 IN 3969
+3696/1
spg-risk-ln_hyloans 06/24/2009 21:15 06/24/2009 21:15 IN 3969
+3696/1
spg-risk-ln_ni 06/24/2009 21:15 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_resid_rmbs 06/24/2009 21:15 06/24/2009 21:17 IN 3969
+3696/1
spg-risk-ln_rmbs 06/24/2009 21:15 06/24/2009 21:17 IN 3969
+3696/1
spg-risk-ln_swaps 06/24/2009 21:15 06/24/2009 21:22 IN 3969
+3696/1
spg-risk-ln_synresi 06/24/2009 21:15 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_synthetics 06/24/2009 21:15 06/24/2009 21:17 IN 3969
+3696/1
spg-risk-ln_trefs 06/24/2009 21:15 06/24/2009 21:20 IN 3969
+3696/1
spg-risk-ln_ukpurch 06/24/2009 21:15 06/24/2009 21:19 IN 3969
+3696/0
spg-risk-ln_warehouse 06/24/2009 21:15 06/24/2009 21:16 IN 3969
+3696/1
spg-risk-ln_abs_frn 06/24/2009 21:15 06/24/2009 21:17 IN 3969
+3696/1
spg-risk-lnliq 06/24/2009 21:14 06/24/2009 21:18 IN 3969
+3696/1
[vxp@vxp ~]$
Аny ideas on what's needed to "fix" the file?
I can't do a regex to match the line that starts with "spg-risk-ln_cdo_leg_synthetic" and "stitch" it with the next line (that'd involve something like checking if there is any data after the first column, and if there isn't then place the first column into a hash (with the first column being the key) and then check the next line, if it starts with a space then assign those as the key's value. That can be done technically, but that solution won't work because I've thousands and thousands of these little files to parse, I can't possibly find all of these lines and write thousands and thousands of regexes... That's why I'm asking people here for a , possibly, "universal", so to speak, solution to this problem.
Thanks in advance! :)